HANA News Blog

SAP HANA data statistics for the optimizer

Jens Gleichmann • 28. Mai 2025

Popular misconcept: HANA needs no statistics

In the world of modern database management systems, query performance is not just a matter of hardware—it’s about smart execution plans. At the core of generating these plans lies a critical component: optimizer statistics. Every DBA is aware of the optimizer data statistics. Without this details the optimizer can only generate wrong execution plans. There are multiple optimizer statistics depending on the database: 

  • system statistics (CPU, I/O, memory)
  • workload statistics
  • noworkload statistics
  • table statistics
  • column statistics
  • index statistics


Besides the different objects also the collect method is important. There are dynamic or runtime statistics and histograms. As always up-to-date and accurate statistics will lead to better execution plans depending on the costs. If a table is changing too frequently accurate and up-to-date statistics are not possible and would generate too heavy overhead. It must be in balance and this is depending on the different workloads.

Accurate statistics help the optimizer decide on:

  • Join order
  • Join method (e.g., hash join vs. nested loop)
  • Index usage
  • Degree of parallelism

Poor decisions can lead to excessive memory usage and CPU spikes.


The note 2100010 - SAP HANA: Popular Misconceptions deserves an update regarding the optimizer stats. Many "experts" talk about HANA as magical database which tunes itself and needs no optimizer statistics. The truth is that also HANA relies on different statistic data to generate an optimal execution plan. For virtual/remote tables like for SDA scenarios the manual statistics are used since HANA 1.0 SPS 06. There are different types of data statistics:

Statistics type Source Details Created
HISTOGRAM explicit Creates a data statistics object that helps the query optimizer estimate the data distribution in a single-column data source. If you specify multiple columns in <data_sources>, then multiple data statistics objects (HISTOGRAM) are created--one per column specified. manually
SAMPLE explicit, runtime When beneficial, the SQL optimizer generates system SAMPLE data statistics objects automatically on column and row store tables. However, this behavior can incur a cost to performance. You can avoid this cost by creating SAMPLE data statistics objects explicitly (in advance). Creating them explicitly is especially useful in situations where sampling live table data is expensive (for example, very large tables). (default: 1000 records) automatically on column and row store tables
SIMPLE explicit, runtime Simple column statistics likemin, max, null count, count, and distinct count for a single-column data source. When beneficial, the SQL optimizer maintains system SIMPLE data statistics objects automatically on column and row store tables only. Superset of RECORD COUNT statistics automatically on column and row store tables
SKETCH explicit Column group statistics, e.g. number of combination of values of multiple columns Useful for selectivity estimations when more than one column is evaluated manually
TOPK explicit, runtime Column statistics including most frequent column values Mainly used for join-size estimation Superset of SIMPLE statistics automatically on column store tables
RECORD_COUNT explicit, runtime Record count on table level automatically on column and row store tables

runtime: Automatic ad-hoc calculation

explicit: Manual creation via Syntax CREATE STATISTICS

For SAMPLE statistics, explicit statistics take precedence over runtime statistics.

For other statistics types, runtime statistics take precedence over explicit statistics. If you want to change this behavior, you can force the use of explicit statistics instead of runtime statistics using the following parameter: indexserver.ini -> [datastatistics] -> dev_force_use_non_runtime_datastatistics = true


Although HANA is an in-memory database not always all data is loaded into the memory. In some cases only a partition or some columns are loaded. The runtime statistics can only be created if all data is loaded. This happens during the prepare and compile phase. In some cases e.g. in the context of NSE this can take significant longer.

If the needed column is not fully loaded the data statistics are estimated. This approach eliminates column load overhead, but the data statistics estimations may be imprecise, imposing the risk that a bad execution plan is picked.  The distinct count of a column is in this case set to the row count, representing a unique column. This can be far away from the real distinct count and unfavorable join orders and execution engines can be the consequence. In this case it can make sense to use manual statistics which have also to be updated frequently.


If you create data statistics manually you can choose between different refresh types:

Refresh type Detail
AUTO specifies that the data statistics object is refreshed automatically when underlying data changes. AUTO is only supported on column store, extended store, and multistore tables.
MANUAL specifies that the database statistics object is not refreshed until a rebuild is explicitly requested by a REFRESH STATISTICS statement.
DEFAULT specifies that the database server decides the best refresh strategy based on the data source. For example, for data statistics objects on column store data sources, the database server applies AUTO for the default.

Examples:

CREATE STATISTICS ON TABLE_NAME.COLUMN TYPE SAMPLE SAMPLE SIZE 10000 REFRESH TYPE MANUAL;

CREATE STATISTICS ON TABLE_NAME.COLUMN TYPE TOPK BUCKETS 1500 REFRESH TYPE AUTO;


Details:

2800028 - FAQ: SAP HANA Optimizer Statistics

2000002 - FAQ: SAP HANA SQL Optimization

SAP HANA News by XLC

HANA OS maintenance
von Jens Gleichmann 27. Mai 2025
Please notice that when you want to run HANA 2.0 SPS07, you need defined OS levels. As you can see RHEL7 and SLES12 are not certified for SPS07. The SPS07 release of HANA is the basis for the S/4HANA release 2023 which is my recommended go-to release for the next years. Keep in mind that you have to go to SPS07 when you are running SPS06 because it will run out of maintenance end of 2023.
HANA performance degradation after upgrade to SPS07+SPS08
von Jens Gleichmann 27. Mai 2025
With SPS06 and even stronger in SPS07 the HEX engine was pushed to be used more often. This results on the one hand side in easy scenario to perfect results with lower memory and CPU consumption ending up in faster response times. But in scenarios with FAE (for all entries) together with FDA (fast data access), it can result in bad performance. After some customers upgraded their first systems to SPS07 I recommended to wait for Rev. 73/74. But some started early with Rev. 71/72 and we had to troubleshoot many statement. If you have similar performance issues after the upgrade to SPS07 feel free to contact us! Our current recommendation is to use Rev. 74 with some workarounds. The performance degradation is extreme in systems like EWM and BW with high analytical workload.
The HANA Optimizer is one of the core components of the HANA database. It determines how SQL is exec
von Jens Gleichmann 25. Mai 2025
A database optimizer behaves similarly to the navigation system in your car. You use various parameters to determine which route you want to take to reach a particular destination. Potential additional costs for tolls and ferries also play a role, as do other limitations such as the vehicle's height, length, and width. From these input parameters, the navigation system determines the optimal route based on current traffic information (traffic volume, construction sites, congestion, accidents, closures, etc.), weather conditions, and the length of the route. This means that with exactly identical input parameters, different routes, costs, and thus different travel times can arise depending on the given conditions.
Is NSE worth the effort or is the better question: Do you know your cost per GB of RAM?
von Jens Gleichmann 18. April 2025
Most of our presentations on data tiering projects end with these typical questions: How much we will save? How fast can it be implemented? Is the effort worth it over time? My counter question: "Do you know how much 1 GB of memory costs your company per month or year?" => how much memory we have to save to be beneficial?
Buch: SAP HANA Deepdive
von Jens Gleichmann und Matthias Sander 30. März 2025
Unser erster Buch mit dem Titel "SAP HANA Deepdive: Optimierung und Stabilität im Betrieb" ist erschienen.
More time to switch from BSoH to S/4HANA
von Jens Gleichmann 7. Februar 2025
Recently handelsblatt released an article with a new SAP RISE option called SAP ERP, private edition, transition option. This option includes a extended maintenance until the end 2033. This means 3 years more compared to the original on-prem extended maintenance. This statement was confirmed by SAP on request of handelsblatt, but customers receive more details, such as the price, in the first half of the year. This is a quite unusual move of SAP without any official statement on the news page. Just to raise more doubts? Strategy? However a good move against the critics and the ever shorter timeline. Perhaps it is also a consequence of the growing shortage of experts for operating and migrating the large number of systems.
Optimize your SAP HANA with NSE
von Matthias Sander 15. Januar 2025
When it comes to optimizing SAP HANA, the balance between performance and cost efficiency is critical. I am happy to share a success story where we used the Native Storage Extension (NSE) to significantly optimize memory usage while being able to adjust the sizing at the end. The Challenge: Our client was operating on a 4 TB memory SAP HANA system, where increasing data loads were driving up costs and memory usage. They needed a solution to right-size their system without compromising performance or scalability. The client wanted to use less hardware in the future. The Solution: We implemented NSE to offload less frequently accessed data from memory. The activation was customized based on table usage patterns: 6 tables fully transitioned to NSE 1 table partially transitioned (single partition) 1 table transitioned by specific columns
SAP HANA NSE - a technical deepdive with Q&A
von Jens Gleichmann 6. Januar 2025
SAP NSE was introduced with HANA 2.0 SPS04 and based on a similar approach like data aging. Data aging based on a application level approach which has a side effect if you are using a lot of Z-coding. You have to use special BADI's to access the correct data. This means you have to adapt your coding if you are using it for Z-tables or using not SAP standard functions for accessing the data in your Z-coding. In this blog we will talk about the technical aspects in more detail.
The SAP Enterprise Cloud Services Private Cloud Customer Center (PC3) - a new digital delivery
von Jens Gleichmann 5. Januar 2025
The SAP Enterprise Cloud Services Private Cloud Customer Center (PC3) - a new digital delivery engagement model dedicated to manage service delivery for RISE with SAP S/4HANA Cloud, private edition customers.
Proactive maintenance for SAP RISE will start now in 2025
von Jens Gleichmann 5. Januar 2025
Proactive maintenance for SAP RISE will start now in 2025 with minor tasks like updating SPAM/SAINT and ST-PI / ST-A/PI. For those companies which are familiar with frequent maintenance windows, they are good to have such time frames to hold the systems up-to-date and secure. However, for larger companies where such frequent maintenance windows are not common because every minute of downtime is costly and may only really be necessary once, the situation is quite different.
more