HANA News Blog

HANA systems: Linux swappiness

Jens Gleichmann • März 24, 2024

Page Priority

################################

German version (scroll down for English)

################################

Seit SLES15 SP4 und RHEL8 (auch bei anderen Linux Distributionen) hat sich das Verhalten beim Reclaim geändert (Details)

Vorher: vm.swappiness Default 60 [Range: 0 - 100]

Nachher: vm.swappiness Default 60 [Range: 0 - 200]


2 Arten von Pages:

  • anonymous page: dynamische Laufzeitdaten wie stack und heap
  • FS (Filesystem) page: Payload wie Applikationsdaten, shared libs etc.


Mit einem Wert von 100 wären damit anonymous pages und FS pages gleichgewichtet. Mit 0 - wie bisher auch - wird das swap Verhalten deaktiviert. Je höher also der Wert gesetzt wird, desto höher gewichtet man die anonymous pages. Es wird immer "kalte" Pages geben, welche einmalig in den Speicher geladen wurden und bei denen es Sinn macht sie früher auszulagern bevor der Speicher tatsächlich mal knapp wird.


Details

Früher (vor SLES11 SP3) wurden die anonymous pages nicht in das Verhalten miteinbezogen. Seit dem hatten man eine 1:1 Gewichtung der Kosten (file-cached pages : anonymous pages) eingeführt. Es wurde immer davon ausgegangen die file pages häufiger zu scannen und auszulagern als die anonymous pages, da es als recht teuer und damit als störend empfunden wurde. Es wurde also über Zeit erkannt, dass es in manchen Situationen doch Sinn macht dieses Verhalten zu ändern, daher hat man den Mechanismus granularer gestaltet.


Summary

Am Ende des Tages, wer hätte das gedacht, ist der neue Algorithmus effektiver und sinnvoller, sorgt aber dafür dass es zu höherem Paging Verhalten führt. Das macht aber nichts, da die Pages die ausgelagert werden tatsächlich mehr als "kalt" sind. Es muss sich also keiner Sorgen machen der bei gleichem Workload und Sizing nun erhöhtes Swapaufkommen bemerkt. Dazu gibt es auch neue metriken in vmstat mit denen sich das Verhalten monitoren lässt.


Empfehlung

Wenn es also viel I/O im System anliegt und der Anteil von FS pages gering ist, kann eine Erhöhung des vm.swappiness Parameters die Performance positiv beeinflussen. Daher empfehle ich den Parameter bei SLES15 SP4+ _nicht_ auf 0 oder 1 zu setzen. Tests haben gezeigt, dass sich mit dem Wert von 10 am besten mit HANA und dem damit verbunden Workload arbeiten lässt, allerdings kann auch ein höherer Wert dienlich sein (10 - 45 aus meinen Tests), da es darauf ankommt welche Third-Party Tools auf dem System zusätzlich laufen (AV, backup, monitoring etc.). Dies kann nur mit Tests und längeren Analysen beantwortet werden.


Fazit

Bleibt aber die Frage nach dem richtigen Monitoring offen. Früher hat man alarmiert sobald swap space genutzt wurde, da man davon aus ging, dass ein Speicherengpass vorliegt. Diese Frage muss sich nun jeder selbst stellen und für sich beantworten. Welche Metriken wurden dafür benutzt? Ab welchen Schwellwert alarmiere ich anhand der neuen Metriken? Kann mein Monitoring Tool diese neuen Metriken auslesen? Muss ich mir eine custom Lösung bauen? All das ist abhängig von der aktuellen Monitoringlösung.





################################

English version

################################

Since SLES15 SP4 and RHEL8(also in other Linux distributions) the behavior of reclaim has changed (details)

Before: vm.swappiness Default 60 [Range: 0 - 100]

After: vm.swappiness Default 60 [Range: 0 - 200]


2 types of pages:

  • anonymous page: dynamic runtime data such as stack and heap
  • FS (Filesystem) page: Payload such as application data, shared libs etc.



With a value of 100, anonymous pages and FS pages would be weighted equally. With 0 - as before - the swap behavior is deactivated. The higher the value is set, the higher the anonymous pages are weighted. There will always be "cold" pages that have been loaded into memory once and for which it makes sense to swap them out earlier before the memory actually runs out.


Details


Previously (before SLES11 SP3) anonymous pages were not included in the behavior. Since then, a 1:1 cost weighting (file-cached pages: anonymous pages) has been introduced. It was always assumed that the file pages would be scanned and outsourced more frequently than the anonymous pages, as it was perceived as quite expensive and therefore annoying. Over time it was recognized that in some situations it makes sense to change this behavior, so the mechanism was made more granular.


Summary


At the end of the day, who would have thought, the new algorithm is more effective and sensible, but ensures that it leads to higher paging behavior. But that doesn't matter because the pages that are being swapped out are actually more than "cold". So no one has to worry about noticing increased swap volumes with the same workload and sizing. There are also new metrics in vmstat that can be used to monitor behavior.


Recommendation


So if there is a lot of I/O in the system and the proportion of FS pages is low, increasing the vm.swappiness parameter can have a positive effect on performance. I therefore recommend not setting the parameter to 0 or 1 for SLES15 SP4+. Tests have shown that a value of 10 is best for working with HANA and the associated workload, although a higher value can also be useful (10 - 45 from my tests), as it depends on which third-party tools are used also run on the system (AV, backup, monitoring, etc.). This can only be answered with tests and longer analyses.


Conclusion


However, the question of the correct monitoring remains open. In the past, an alarm was raised as soon as swap space was used because it was assumed that there was a memory bottleneck. Everyone has to ask themselves this question and answer it for themselves. What metrics were used for this? At what threshold do I alert based on the new metrics? Can my monitoring tool read these new metrics? Do I have to build a custom solution? All of this depends on the current monitoring solution.


SAP HANA News by XLC

HANA OS maintenance
von Jens Gleichmann 30 Apr., 2024
Please notice that when you want to run HANA 2.0 SPS07, you need defined OS levels. As you can see RHEL7 and SLES12 are not certified for SPS07. The SPS07 release of HANA is the basis for the S/4HANA release 2023 which is my recommended go-to release for the next years. Keep in mind that you have to go to SPS07 when you are running SPS06 because it will run out of maintenance end of 2023.
Performance degradation after upgrade to SPS07
von Jens Gleichmann 29 Apr., 2024
With SPS06 and even stronger in SPS07 the HEX engine was pushed to be used more often. This results on the one hand side in easy scenario to perfect results with lower memory and CPU consumption ending up in faster response times. But in scenarios with FAE (for all entries) together with FDA (fast data access), it can result in bad performance. After some customers upgraded their first systems to SPS07 I recommended to wait for Rev. 73/74. But some started early with Rev. 71/72 and we had to troubleshoot many statement. If you have similar performance issues after the upgrade to SPS07 feel free to contact us! Our current recommendation is to use Rev. 74 with some workarounds. The performance degradation is extreme in systems like EWM and BW with high analytical workload.
SUM tooling with target HANA
von Jens Gleichmann 18 März, 2024
Numerous IT projects such as S/4HANA projects or HANA migrations will go live over the Easter weekend. Mostly this tasks will be controlled by the SAP provided SUM tool. The SUM is responsible for the techn. migration/conversion part of the data. Over the past years it become very stable and as long as you face no new issues nearly every technical oriented employee at SAP basis team can successfully migrate also bigger systems. In former times you needed a migrateur with certification which is no longer required. As long as all data could be migrated and the system is up and running the project was successful. But what does the result look like? Is it configured according to the best recommendation and experience? Is it running optimized and tuned?No, this is where the problem begins for most companies. The definition of the project milestone is not orienting on KPIs. It is simply based on the last dialog of the SUM tool, which states that the downtime has ended and all tasks have been executed successfully.
Abstract SQL Plans
von Jens Gleichmann 13 Dez., 2023
The feature plan stability is not a new one, but can help you in case of an revision update/upgrade, if you recognize big performance degradations. But you have to activate this feature at least 1-2 weeks before the maintenance to capture the SQLs and the execution plan. You can compare the plan performance and apply the execution plan with the best performance. You can also use it as always on feature in daily operations. This may be required due to changes in data over time which may cause the query optimizer to propose different execution plans which may have a negative impact on performance and memory consumption of a query. An additional preparation step can be used to apply filters so that only specific queries are captured. In the background execution statistics are recorded so that the performance of the query can be measured and the best execution plan can
S/4HANA Compatibility Packs
von Jens Gleichmann 18 Nov., 2023
The release S/4HANA 2023 will be the last one with CP. This means the coding based on all the help views has to be adjusted after this release. My recommendation is to use this as target release for the next years! But also keep an eye on the other components. The announcement about the extension of maintenance for Business Suite solutions has no influence on the end of compatibility pack use rights - they will be terminated after 2025. In the exceptional cases of CS, LE-TRA and PP-PI, the usage right to their respective compatibility pack items (cf. matrix) terminates at the end of 2030.
S/4HANA maintenance
von Jens Gleichmann 18 Nov., 2023
When you are running a S/4HANA system you have to check frequently if all components are still in support. This starts at the hardware level, continues on hypervisor level over OS till the SAP Kernel and the SAP S/4HANA version. The lifecycle of each component is different. So, this means that once a year you have to check this support dilemma. With this little diagram we want to help you to keep your components in sync.
von Jens Gleichmann 14 Nov., 2023
If you are not monitoring your buffer cache (=BC) over your landscape when you are using NSE, you can get serious problems. If the BC is too big, you waste memory and the saving rate drops. This is no big deal but should be avoided. The other way around - the BC is too small - will create some trouble.
DMOVE2S4 and Homogeneous option for SAP SUM DMO
von Jens Gleichmann 25 Okt., 2023
DMO used to be heterogeneous (change of DB). Now you can use DMO with system move (also on-prem to target on-prem) to migrate and convert ECC on HANA to S/4 or just update/upgrade S/4 and move it to another location like other DC or other instance in one step. The DMOVE2S4 is just to move and convert to the cloud in one step. This includes also anyDB source systems and downtime-optimization techniques (doDMO and doC).
Intel COD / SNC NUMA feature
von Matthias Sander + Jens Gleichmann 28 Sept., 2023
Der Einsatz von SNC/COD kann in Verbindung mit HANA zu Performanceproblemen und Inkonsistenzen führen. The use of SNC/COD in conjunction with HANA can lead to performance problems and inconsistencies.
EoM HANA 2.0 SPS06 in 2023
von Jens Gleichmann 04 Sept., 2023
Wenn Sie Ende 2023 ein Upgrade auf HANA 2.0 SPS07 planen, sollten Sie derzeit auf SPS06 sein. If you plan to upgrade to HANA 2.0 SPS07 at the end of 2023, you should currently be on SPS06.
more
Share by: