No Days Off (because security never sleeps).: Netapp Clustered Data ONTAP oddities

Roasted Spindles

Recently I've been working on an issue with poor performance on a CDOT pair of FAS3240's supporting a VMware environment and running NetApp Release 8.3.2P5. VMware commentary aside, investigation revealed the current master node in the pair had a root volume seeing disk utilization of >80% at most hours of the day. While not directly a security issue, this cluster has VM's dealing with ~1TB of daily log aggregation, forwarding, notifications, feeding data to Splunk, forwarding data offsite, and a host of other things. Logs matter, and you want them handled appropriately.

The load mix did not seem to impact the disk utilization numbers, and vol0 was your typical small installation 3-disk setup. Wondering why the utilization was so high, I resorted to poking my head into the shell on the problematic node, and gathering some data with vmstat -s:

sadnode-01% vmstat -s
2384195110 cpu context switches
2722466877 device interrupts
3104969435 software interrupts
1867337226 traps
4090014302 system calls
    63344 kernel threads created
3827554 fork() calls
1088783 vfork() calls
        0 rfork() calls
   111503 swap pager pageins
   268359 swap pager pages paged in
    90417 swap pager pageouts
   271672 swap pager pages paged out
   744212 vnode pager pageins
1955211 vnode pager pages paged in
        0 vnode pager pageouts
        0 vnode pager pages paged out
   469164 page daemon wakeups
458176247 pages examined by the page daemon

Compared to another cluster, the pageins and pageouts certainly seemed excessive, as well as the work the page daemon was doing:

happynode-01% vmstat -s
2061309073 cpu context switches
3391879346 device interrupts
2611757802 software interrupts
3300814929 traps
3599776707 system calls
   343228 kernel threads created
21972759 fork() calls
9041120 vfork() calls
        0 rfork() calls
     2712 swap pager pageins
     9542 swap pager pages paged in
     2968 swap pager pageouts
    13830 swap pager pages paged out
    55276 vnode pager pageins
   322421 vnode pager pages paged in
        0 vnode pager pageouts
        0 vnode pager pages paged out
    17243 page daemon wakeups
458176247 pages examined by the page daemon

Since ONTAP is a highly specialized BSD variant, and since I know a little something about Unix, I started to suspect a memory shortfall on sadnode-01, leading to excessive page scanning and paging activity, which in turn would tend to bump up utilization numbers. In other words, a classic Unix memory shortfall issue.

However, Netapp offers no method (that I know of) for tuning the VM subsystem either from the systemshell or from ONTAP, and any modifications you might make will cause Netapp support to at least raise an eyebrow.

Seemingly unrelated at first, I also noticed from perfstat and autosupport logs that vol0 was suffering from a moderate amount of block fragmentation. Latency on vol0 was not excessive, but it was notably higher (>28ms) than it should have been on a typical FAS3240 root volume. While there may have been a memory shortfall, there was also a structural inefficiency in being able to use regular paging mechanisms to cope with that shortfall.

By default, ONTAP runs reallocation scans on vol0, which means it attempts to optimize the layout of blocks on vol0 to maximize performance. As a background process, ONTAP is able to do this on the fly, by automated scheduling. Sometimes, reallocation never finishes within the allotted time, or simply gets preempted. The solution is to run the reallocate manually, preferably during off-peak hours. It is non-disruptive, but it does add some overhead. On a misbehaving node, run:

cluster::> set -priv diag
cluster::*> system node -node sadnode-01
sadnode-01> reallocate start -o -p /vol/vol0

This will perform the reallocation, and should take care of the hot spindle problem. In my real-world example, latency dropped to <10ms and vol0 utilization returned to a typical 5-15%, depending on cluster workload. I still suspect there is a memory shortfall issue, and perhaps a problem in the underlying swap/paging configuration of ONTAP. Further investigation is warranted, but for the time being, remember this if you ever run into similar issues.

No Days Off (because security never sleeps).

Wednesday, August 2, 2017

Netapp Clustered Data ONTAP oddities

Roasted Spindles

No comments:

Post a Comment