Friday, August 4, 2017

Simple Disaster Recovery with Netapp & ONTAP

So, The Building Fell Down

 

Planning Means Practicing To Fail

First off, no buildings were harmed in the production of this blog post.  But fires, earthquakes, floods, and security breaches will bring down your business.  When the unthinkable happens, you had better have a solid DRP (disaster recovery plan) or BCP (business continuity plan) in place.  More than that, you better have a playbook for this plan, and you must know that it works.  All the plans in the world are potentially useless if they are flawed, untested, and unverified.  

Moving A Lot of Bytes Around

A very smart person I know views the world as data.  To him, everything is data in some way, and I don't think that's an unreasonable outlook.  To the point, businesses need people and processes, but they also need their data to allow the people to interact with the processes.  If a building falls down, say after hours, you may know all the processes and you may save all the people.  What good is that to the company if there is no data to work with?  You need copies offsite, and more than just something on tertiary storage (tape).  For rapid recovery, you need hot copies of data on secondary storage (disk), ready to go at a moment's notice.  One way to do this is with ONTAP and Volume SnapMirror.

Because of the variations in how it is used, I won't go into any detailed examples.  Suffice it to say that using VSM to mirror critical data to another site is the heart of your recovery strategy as a storage/security person.  

Often this data is in a database, and VSM'ing a hot database will lead to either 1) an unrecoverable dataset, or 2) the need to replay transaction logs to fix the database.  The first one means now you're automatically going to tape for a long and painful recovery (bad).  The second means you might or might not be going to tape, but recovery is going to take longer than anticipated.  Executives like milestones, and each DRP or BCP has them.  The sooner you reach the milestones and get running again, the sooner E-staff can take the straw out of the Pepto-Bismol bottle.   


Some Like It Hot


Because VSM relies on automated snapshots, you want the process to get a good, quiesced view of the dataset.  When VSM'ing something like an Oracle 12 dataset, be sure your scripts first put the database into hot backup mode.  Your DBA can help you with this, if necessary.  Then, kick off your VSM job, place the DB back into normal or online mode, and let VSM do the rest in the background.  

This will let you get reasonably up-to-date copies of your dataset to an offsite location, with the primary facility acting as the data source and the offsite facility as the destination.  So now you are about 1/3 of the way there, what next?


Attach of the Clones

Assume the worst has happened and now you are going to commence operating out of a backup facility.  Your data is there, your VSM's look good, what do you do?  Most would say "break the VSM relationship from the destination side, map any LUNs to the appropriate FCP (or iSCSI) initiators, present them to the server, and go.  

Not so fast.  Remember that the VSM you have on the destination side may represent the very last known good copy of your data.  Breaking the VSM relationship will make those volumes read-write, meaning they will never be in that state again (assume snapshots get deleted for any number of reasons).  In other words, that last VSM update is a point in time that you may need to preserve for legal, financial, or regulatory reasons.  Instead, clone the VSM destination to a NEW volume AND split the volume.  In other words, make a copy of your data and work off that.  If anything goes wrong in the DR plan and you corrupt some data, you only corrupted a copy, not the master.  

Sure, the split operation will add time to the DR plan, so factor that in ahead of time.  Smart executives (a few of them do exist) will realize the benefit of taking the time to clone the data, and the folks in legal and HR will almost always insist upon it.  


So How Do I...?

I'll make this a basic hit list for your reference.

1.  Create a failover group:
     
network interface failover-groups create -failover-group dr-icl-01 -node cluster1-1 -port e0c
network interface failover-groups create -failover-group dr-icl-01 -node cluster1-1 -port e0d
 
2.  Create intercluster LIFs on each node:

network interface create -vserver cluster1-1 -lif cluster1-1_icl1 -role intercluster -home-node cluster1-1 -home-port e0c -address 10.1.1.90 -netmask 255.255.255.0 -failover-group dr-icl-01 -failover-policy nextavail

network interface create -vserver cluster1-2 -lif cluster1-2_icl1 -role intercluster -home-node cluster1-2 -home-port e0c -address 10.1.1.91 -netmask 255.255.255.0 -failover-group dr-icl-01 -failover-policy nextavail

 3.  Create the peer relationship between the production cluster and the DR cluster:

cluster1::> cluster peer create -peer-addrs <remote_ip_of_peer_intercluster_LIF> -username <UID>

cluster2::> cluster peer create -peer-addrs <remote_ip_of_peer_intercluster_LIF> -username <UID> 


4.  Create your SnapMirror relationships like you normally would on the destination:   

 snapmirror create -S <src_path> -destination-path <dest_path> -type DP -vserver <managing vserver>

 snapmirror initialize  -S <src_path> -destination-path <dest_path>


I generally use the 'admin' user as the remote authenticating ID, though any user with the proper role configuration can be used.  Because I care about data integrity, I take this a step further with a secure cluster peer policy.  To view the existing policy:

cluster1::> cluster peer policy show
Is Unauthenticated Cluster Peer Communication Permitted:  false
                        Minimum Length for a Passphrase:  8

cluster1::> 

 
If unauthenticated communication is permitted, use cluster peer policy modify to change this to 'false'.  In step 3 you will then be prompted for a passphrase.  Use this on the destination side of the configuration to peer with the source, and then do the same on the source side to peer with the destination. 
All this assumes that routing, routing-groups, and name resolution are set up and in place.  This also assumes we're going from CDOT to CDOT.  For those of you that may have 7-Mode in production, and access to a remote CDOT environment, you can still pull this off.  The 'type' will be TDP (transitional data protection), and a big caveat is that you cannot reverse the SnapMirror/VSM relationship to return data from a CDOT environment to a 7-Mode environment.  You have the benefit of DR capability, but it is truly a one-way street.

Wednesday, August 2, 2017

Some Basic Redhat 7 Password Hardening

Build It Right The First Time

Today we'll look briefly at some strategies and considerations for hardening RHEL7 instances, be they physical or virtual.  A general security strategy focuses on two primary areas - the physical and the technical.  If we deploy systems with good security in the first place, we can avoid 'fire drill' exercises and reactive behavior.  Put another way, an ounce of prevention is worth a pound of cure.

Lock It Up

Most IT professionals will not have much say over many aspects of physical security.  Usually facilities staff handles card locks, access keys, power into the datacenter (though power inside the datacenter is another matter!), doors, windows, and other access controls.  This will be the focus of another article, but for the time being, keep your equipment locked up.  Closets, insecure offices, and cubicles are no place for your critical server infrastructure!

From The Top

Security can be viewed as a bottom-to-top or top-to-bottom process.  Whatever it is for you, the goal is the same - minimize risk to the business, employees, vendors, and customers.  That said, here is a brief overview of areas to consider.

A Dumb Thing To Do?

A month ago, I was installing some RHEL 7 instances as VM's - pretty routine and boring stuff - and noticed there was a way to disable shadow passwords.  I haven't seen a sane Unix or Unix-alike OS, since about 1989, that 1) did not mandate the use of /etc/shadow and 2) did give you the option of circumventing /etc/shadow as an install-time option.  Maybe this has been an industry-wide option for all these years, but it certainly it not an option I would have ever used.  

If you disable shadowing, your one-way password hashes will be stored in /etc/passwd, a world-readable file.  All the extra cozy, fuzzy security you get from SHA-512 hashed passwords is diminished by making those hashes available to any user on the system.   That would be a dumb thing to do, so don't do it.  

Complex is Hard, So Make It Hard

A system administrator should also enforce the use of quality password for user accounts.  My investigation of fresh RHEL7 installs shows that quality checks are not necessarily enabled by default.  So, scurry off and check /etc/pam.d/passwd for the following line, and if missing, add it:

password required pam_pwquality.so retry=3

 Then, in /etc/security/pwquality.conf, have the following at a minimum:

minlen = 8 
minclass = 4
maxsequence =0
maxrepeat =2

This requires a password of at least 8 characters, inclusive of all 4 character classes, with 0 permitted sequences, i.e. '1234' or 'abcd' and a maximum of 2 identical consecutive characters, i.e. 'll' 'mm'.  Normally I would make maxrepeat = 0, but this gives users some leeway to help them remember passwords based on, but not actually, double-consonant words.  I prefer to keep minlen at 10 and maxrepeat at 0, but your constraints may lead to different needs.

Like a Fine Wine

This part is easy - make sure password aging is enabled.  Since I sometimes need to have different ages for different account types, each is handled (via scripts) on a case-by-case basis.  For the simplest aging setup, simply run:

chage -M 90 <user> 

This will force the user to pick a new password every 90 days, which is generally considered 'secure enough' in most enterprises.  Of course it would be even more secure to perform authentication from a central store like RHEL IdM or a flavor of LDAP, but that is a post for another time!

Lock Them Out

Intruders are a persistent bunch, and sometimes they'll resort to brute force password guessing.  To intercept and deal with this behavior, make sure you configure the system to lock out user accounts after a given number of failed authentication attempts.  

Add the following lines to the auth sections of /etc/pam.d/system-auth and /etc/pam.d/password-auth:

auth required pam_faillock.so preauth silent audit deny=3 \ unlock_time=1200

auth [default=die] pam_faillock.so authfail audit deny=3 \ unlock_time=1200

and add

account required pam_faillock.so

to the account section of the above files.  
 
This will give a user 3 tries at their password.  On a fourth try, the account is locked for 20 minutes.  I have found this time is long enough for a legitimate user to call in for assistance, which then provides an opportunity to verify the user's identity and perhaps have a little educational chat with them about system security.  

To see who is locked out, the access method, and when the lockout timer started, simply run:

faillock 

I like to poll this data via cron on busy systems and generate reports to find problem users or to identify targeted users.  Knowing who the bad guy is targeting can help you take additional steps to mitigate threats going forward.   

There are many more things you can do to harden just the password subsystem of RHEL7, but I'll end this here.  This is just an introduction to some tips, strategies, configurations, and techniques you may want to consider using in your own physical or virtual RHEL 7 environment.   



Netapp Clustered Data ONTAP oddities

Roasted Spindles

Recently I've been working on an issue with poor performance on a CDOT pair of FAS3240's supporting a VMware environment and running NetApp Release 8.3.2P5.  VMware commentary aside, investigation revealed the current master node in the pair had a root volume seeing disk utilization of >80% at most hours of the day.  While not directly a security issue, this cluster has VM's dealing with ~1TB of daily log aggregation, forwarding, notifications, feeding data to Splunk, forwarding data offsite, and a host of other things.  Logs matter, and you want them handled appropriately. 

The load mix did not seem to impact the disk utilization numbers, and vol0 was your typical small installation 3-disk setup.  Wondering why the utilization was so high, I resorted to poking my head into the shell on the problematic node, and gathering some data with vmstat -s:

sadnode-01% vmstat -s
2384195110 cpu context switches
2722466877 device interrupts
3104969435 software interrupts
1867337226 traps
4090014302 system calls
    63344 kernel threads created
  3827554  fork() calls
  1088783 vfork() calls
        0 rfork() calls
   111503 swap pager pageins
   268359 swap pager pages paged in
    90417 swap pager pageouts
   271672 swap pager pages paged out
   744212 vnode pager pageins
  1955211 vnode pager pages paged in

        0 vnode pager pageouts
        0 vnode pager pages paged out
   469164 page daemon wakeups

458176247 pages examined by the page daemon
 
Compared to another cluster, the pageins and pageouts certainly seemed excessive, as well as the work the page daemon was doing:

happynode-01% vmstat -s
2061309073 cpu context switches
3391879346 device interrupts
2611757802 software interrupts
3300814929 traps
3599776707 system calls
   343228 kernel threads created
 21972759  fork() calls
  9041120 vfork() calls
        0 rfork() calls
     2712 swap pager pageins
     9542 swap pager pages paged in
     2968 swap pager pageouts
    13830 swap pager pages paged out
    55276 vnode pager pageins
   322421 vnode pager pages paged in
        0 vnode pager pageouts
        0 vnode pager pages paged out
    17243 page daemon wakeups
458176247 pages examined by the page daemon


 Since ONTAP is a highly specialized BSD variant, and since I know a little something about Unix, I started to suspect a memory shortfall on sadnode-01, leading to excessive page scanning and paging activity, which in turn would tend to bump up utilization numbers.  In other words, a classic Unix memory shortfall issue.

However, Netapp offers no method (that I know of) for tuning the VM subsystem either from the systemshell or from ONTAP, and any modifications you might make will cause Netapp support to at least raise an eyebrow.  

Seemingly unrelated at first, I also noticed from perfstat and autosupport logs that vol0 was suffering from a moderate amount of block fragmentation.  Latency on vol0 was not excessive, but it was notably higher (>28ms) than it should have been on a typical FAS3240 root volume.  While there may have been a memory shortfall, there was also a structural inefficiency in being able to use regular paging mechanisms to cope with that shortfall.  

By default, ONTAP runs reallocation scans on vol0, which means it attempts to optimize the layout of blocks on vol0 to maximize performance.  As a background process, ONTAP is able to do this on the fly, by automated scheduling.  Sometimes, reallocation never finishes within the allotted time, or simply gets preempted.  The solution is to run the reallocate manually, preferably during off-peak hours.  It is non-disruptive, but it does add some overhead.  On a misbehaving node, run:

cluster::> set -priv diag
cluster::*> system node -node sadnode-01
sadnode-01> reallocate start -o -p /vol/vol0

 This will perform the reallocation, and should take care of the hot spindle problem.  In my real-world example, latency dropped to <10ms and vol0 utilization returned to a typical 5-15%, depending on cluster workload.  I still suspect there is a memory shortfall issue, and perhaps a problem in the underlying swap/paging configuration of ONTAP.  Further investigation is warranted, but for the time being, remember this if you ever run into similar issues.