HANA News Blog

SAP HANA Optimizer - the heart of the performance

Jens Gleichmann • 25. Mai 2025

The HANA Optimizer is one of the core components of the HANA database. It determines how SQL is executed.

A database optimizer behaves similarly to the navigation system in your car. You use various parameters to determine which route you want to take to reach a particular destination. Potential additional costs for tolls and ferries also play a role, as do other limitations such as the vehicle's height, length, and width. From these input parameters, the navigation system determines the optimal route based on current traffic information (traffic volume, construction sites, congestion, accidents, closures, etc.), weather conditions, and the length of the route. This means that with exactly identical input parameters, different routes, costs, and thus different travel times can arise depending on the given conditions.

The HANA optimizer behaves similarly. It receives its input variables from the user's selection criteria. The given conditions here are the available resources, the dimensions of the objects involved (number of entries, field lengths, column size, different column weights, etc.), index information, existing hints, existing execution plans, optimizer statistics, etc. Finally, potential costs are calculated. As with a navigation device, the goal is to determine the fastest route. This may often not be the shortest or the most resource-efficient.

If you purchase a more powerful car with modified driving characteristics, the same route can now be covered somewhat faster. If you discover a shortcut with your old car, an improvement is also possible, but without the cost of purchasing a completely new vehicle. In HANA, this corresponds to measures such as adapting the application logic or using DB hints with other engines. With this analogy, we want to demonstrate that more system resources or a new database revision are not necessarily the direct path to the actual goal. It is much more important to understand how and why the optimizer handles certain queries differently.

Sometimes there may be a traffic jam along the route, which can be quickly resolved or even prevented in the first place by a clever traffic control system (parameter and workload management). Therefore, it is important to know the system's workload and be able to react to significant deviations. We recommend defining KPIs for this purpose.

To understand exactly how the optimizer works, let's first look at the flow of an SQL processing operation. This will make it clear when it is used.

You have to know that SQL is declaritive language. This means it only describes "what" to be processed. The "how" is determined by the SQL processor. SQL Processor is smart, but needs to make decisions in very short time => the plan might be sub-optimal if too many tables are involved.

Important: It cannot fix data model or logical issue within the SQL.

Normally, the SAP HANA database consistently picks an optimal execution plan to process a database request. In some cases, though, the database can switch to a worse execution plan, resulting in higher runtimes and increased CPU or memory consumption.

Typical reasons for execution plan regressions are:

Bind values: The selectivity of the bind values / parameters used during preparation are considered to work out an optimal execution plan. Subsequent executions to a certain keep the same execution plan even if the selectivity are significantly different. This is particularly critical in context of HEX engine (SAP Note 2570371) and SAP HANA <= 2.0 SPS 07 because the generated HEX execution plan is highly static. See SAP Note 3513868 for more details.
Sampling: Working out a good execution plan to a certain extent depends on sampled data. For example, only a subset of partitions is sampled, so that many empty partitions should be avoided.
Complexity and design: The more complex the design of a database request and the underlying accessed objects is, the higher is the probability of misconceptions of the query optimizer. Thus, keep the design always as light-weight and simple as possible, also in context of concepts like CDS views and AMDP procedures. Also, make sure that modelling best practices are considered.
Data: The actual data in the underlying tables also impacts optimizer decisions. In rare cases, even small changes to the underlying table data can impact the optimizer decisions.

There are two ways to determine the statistics:

The columns are loaded into memory and data statistics are calculated as usual
The columns are not loaded and values are estimated

The data statistics are stored within monitoring view M_SYSTEM_DATA_STATISTICS. You can use HANA_SQL_Statistics_DataStatistics_2.00.040+ to read the most recent data. Which data you will get?

- TYPE: Data statistics type

- SCHEMA_NAME: Schema name

- TABLE_NAME: Table name

- COLUMNS: Columns (individual columns or column groups listed via comma-separated column list)

- SOURCE: SYSTEM in case of system generated, USER in case of user defined

- LAST_REFRESH: Last data statistics refresh time

- COUNT: Record count

- NUM_DISTINCT: Number of distinct values

- NUM_NULLS: Number of NULL values

- MIN_VALUE: Minimum column value

- MAX_VALUE: Maximum column value

- BUCKETS: Histogram buckets

- INDIV_COSTS: Cost factor for leading column of inverted individual indexes (SAP Note 2600076)

- TOP_1: Top 1 histogram bucket (e.g. frequency and value of most frequent column value in case of frequency histograms)

- TOP_2: Top 2 histogram bucket

- TOP_3: Top 3 histogram bucket

- DATA_STATISTICS_CONTENT: Raw data statistics details in JSON format for SYSTEM statistics, statistics name for USER statistics

This data statistics are fundamental for the decisions of the optimizers. Like our example with the navigation system, if you are using old or incorrect information about the route, the system will decide possible wrong based on this wrong input variables.

But there are also other statistics. The so called join statistics available through monitoring view M_JOIN_DATA_STATISTICS. Join statistics are required to optimally understand cardinalities when joining different tables. You can use HANA_SQL_Statistics_JoinStatistics_1.00.120+ to select the details.

- REFRESH_TIME: Last statistics refresh time

- HOST: Host

- PORT: Port

- SCHEMA_NAME_1: Schema name of table 1

- TABLE_NAME_1: Table name of table 1

- COLUMN_NAME_1: Column name of table 1

- SCHEMA_NAME_2: Schema name of table 2

- TABLE_NAME_2: Table name of table 2

- COLUMN_NAME_2: Column name of table 2

- NUM_ROWS_1: Number of rows in table 1

- DISTINCT_1: Distinct values in column of table 1

- MEMORY_KB: Used memory (KB)

- VALUE_COUNTS_1: Unique value counts (or a single average value)

- VALUE_COUNTS_FREQUENCIES_1: Frequency of different value counts (or a single average value)

- MATCHING_VALUES_COUNTS: Number of value counts in table 1 with corresponding entries in table 2

indexserver.ini -> [joins] -> statistics_method (default: cs_statistics).

Setting statistics_method to "traditional" can result in increased resource consumption and additional database trace messages.

What can take influence? Which parameter can be changed to influence the behaviour?

indexserver.ini -> [joins] -> cs_statistics_cache_invalidation_threshold_factor (default: 2.0)

Table content has significantly changed since the last execution.

Definition of significance: More than 25 rows, change happens more than 2 times.

indexserver.ini -> [joins] -> cs_statistics_cache_invalidation_by_shrinking_tables (default: false)

With SAP HANA 2.0 SPS 07 (>= 2.00.073) SAP recommends to set this parameter explicitly (SAP Note 2600030). With SAP HANA >= 2.00.080 the default for this parameter is changed to false.

indexserver.ini -> [sql] -> large_predicate_sampling_threshold = <elements> (default: 100)

Predicates like long OR concatenations are only sampled when the number of elements doesn't exceed 100.

Otherwise a potentially less accurate estimation will be used and the risk of slower execution plans increases. Using larger values can result in better optimizer decisions at the cost of some parsing overhead.

You can control the limit on statement level via hint LARGE_PREDICATE_SAMPLING_THRESHOLD(<elements>)

indexserver.ini -> [hex] -> min_table_increase_factor_to_recompile = <factor> (default: 4)

The default value of 4 triggers a recompilation as soon as the table size is factor 4 larger or smaller than during the last parsing.

indexserver.ini -> [sql] -> compile_time_sampling_size (default: 1000; HANA 2.0 SPS08)

The size of sample to estimate selectivity of a filter by sampling: compile_time_sampling_size specifies the number of sample records to be examined on SQL query compilation to get statistics of a table or a partition of a table in case of partitioned table. If the value is 0, records in tables will not be examined on SQL query compilation. If the value is larger than the number of records in a table or a partition, all rows in the table or the partition will be examined.

indexserver.ini -> [sql] -> compile_time_sampling_partitions (default: 8)

Maximum number of partitions to estimate selectivity of a filter by sampling: compile_time_sampling_partitions specifies the maximum number of partitions to be examined on SQL query compilation to get statistics of a partitioned table. If the value is 0 or larger than the number of partitions in a table, all partitions in the table will be examined.

When cost-based enumeration ends, the final query execution plan is generated (called "code generation"). This is the plan that will be executed by the execution engines. At this point, the role of the SQL optimizer also ends.

After code generation, the query plan can still be changed but only by the execution engines. This is called "engine-specific optimization" because the changes are determined and made by the execution engines.

Engine-specific decisions vary from engine to engine. An example of an engine-specific decision is the parallel thread decision made by the join engine. During the optimization process, the SQL optimizer does not consider aspects related to memory management and CPU. Its primary task is to make the plan run faster and to do so it uses all available resources to a maximum. Whether the query should run on multiple threads is not part of its decision-making. The join engine is responsible for making this decision. That is why you need to use an engine-specific hint and not an SQL optimizer hint to enforce parallel execution for the query. SAP recommends to implement statement hints in order to avoid performance issues or high resource consumption due to wrong optimizer decisions for some defined standard SQLs (2700051 - Delivery of SAP HANA Statement Hints). This hints will be updated from time to time. Our recommendation is to add this note to your favorite notes and track changes (recent change of the note: 05/25/2025).

< Älterer Beitrag

Neuerer Beitrag >

E-Mail

SAP HANA News by XLC

Why Databases Need Optimizer Statistics – With a Focus on SAP HANA

SAP HANA data statistics for the optimizer

von Jens Gleichmann • 28. Mai 2025

In the world of modern database management systems, query performance is not just a matter of hardware—it’s about smart execution plans. At the core of generating these plans lies a critical component: optimizer statistics. This article explores why databases need optimizer statistics, with particular emphasis on SAP HANA, while drawing parallels with Oracle, Microsoft SQL Server (MSSQL), and IBM DB2.

HANA OS maintenance

von Jens Gleichmann • 27. Mai 2025

Please notice that when you want to run HANA 2.0 SPS07, you need defined OS levels. As you can see RHEL7 and SLES12 are not certified for SPS07. The SPS07 release of HANA is the basis for the S/4HANA release 2023 which is my recommended go-to release for the next years. Keep in mind that you have to go to SPS07 when you are running SPS06 because it will run out of maintenance end of 2023.

HANA performance degradation after upgrade to SPS07+SPS08

Ongoing HANA HEX performance issues with SPS07+SPS08

von Jens Gleichmann • 27. Mai 2025

With SPS06 and even stronger in SPS07 the HEX engine was pushed to be used more often. This results on the one hand side in easy scenario to perfect results with lower memory and CPU consumption ending up in faster response times. But in scenarios with FAE (for all entries) together with FDA (fast data access), it can result in bad performance. After some customers upgraded their first systems to SPS07 I recommended to wait for Rev. 73/74. But some started early with Rev. 71/72 and we had to troubleshoot many statement. If you have similar performance issues after the upgrade to SPS07 feel free to contact us! Our current recommendation is to use Rev. 74 with some workarounds. The performance degradation is extreme in systems like EWM and BW with high analytical workload.

Is NSE worth the effort or is the better question: Do you know your cost per GB of RAM?

Is the effort for NSE worth it?

von Jens Gleichmann • 18. April 2025

Most of our presentations on data tiering projects end with these typical questions: How much we will save? How fast can it be implemented? Is the effort worth it over time? My counter question: "Do you know how much 1 GB of memory costs your company per month or year?" => how much memory we have to save to be beneficial?

Buchtipp: SAP HANA Deepdive: Optimierung und Stabilität im Betrieb

von Jens Gleichmann und Matthias Sander • 30. März 2025

Unser erster Buch mit dem Titel "SAP HANA Deepdive: Optimierung und Stabilität im Betrieb" ist erschienen.

More time to switch from BSoH to S/4HANA

SAP ERP Extended Maintenance until 2033 - but with conditions

von Jens Gleichmann • 7. Februar 2025

Recently handelsblatt released an article with a new SAP RISE option called SAP ERP, private edition, transition option. This option includes a extended maintenance until the end 2033. This means 3 years more compared to the original on-prem extended maintenance. This statement was confirmed by SAP on request of handelsblatt, but customers receive more details, such as the price, in the first half of the year. This is a quite unusual move of SAP without any official statement on the news page. Just to raise more doubts? Strategy? However a good move against the critics and the ever shorter timeline. Perhaps it is also a consequence of the growing shortage of experts for operating and migrating the large number of systems.

Smart SAP HANA Optimization: Applying NSE for Better Efficiency

von Matthias Sander • 15. Januar 2025

When it comes to optimizing SAP HANA, the balance between performance and cost efficiency is critical. I am happy to share a success story where we used the Native Storage Extension (NSE) to significantly optimize memory usage while being able to adjust the sizing at the end. The Challenge: Our client was operating on a 4 TB memory SAP HANA system, where increasing data loads were driving up costs and memory usage. They needed a solution to right-size their system without compromising performance or scalability. The client wanted to use less hardware in the future. The Solution: We implemented NSE to offload less frequently accessed data from memory. The activation was customized based on table usage patterns: 6 tables fully transitioned to NSE 1 table partially transitioned (single partition) 1 table transitioned by specific columns

SAP HANA NSE - a technical deepdive with Q&A

HANA NSE - technical details Q&A

von Jens Gleichmann • 6. Januar 2025

SAP NSE was introduced with HANA 2.0 SPS04 and based on a similar approach like data aging. Data aging based on a application level approach which has a side effect if you are using a lot of Z-coding. You have to use special BADI's to access the correct data. This means you have to adapt your coding if you are using it for Z-tables or using not SAP standard functions for accessing the data in your Z-coding. In this blog we will talk about the technical aspects in more detail.

The SAP Enterprise Cloud Services Private Cloud Customer Center (PC3) - a new digital delivery

3 years of RISE with SAP: ECS realignment

von Jens Gleichmann • 5. Januar 2025

The SAP Enterprise Cloud Services Private Cloud Customer Center (PC3) - a new digital delivery engagement model dedicated to manage service delivery for RISE with SAP S/4HANA Cloud, private edition customers.

Proactive maintenance for SAP RISE will start now in 2025

SAP RISE: Proactive Maintenance