During my time here at Ning I have had the opportunity to work on many different projects open and closed source. The one thing that has been a continuous trend is our ability to provide stable and useful open source code. Complimented by the likes of Thomas Dudziak, Brian McCallister, and Gerir Ning has proven to be able to remain on the cutting edge of technology without losing customer focus.
Once again Ning is continuing this trend by releasing Theia as our most recent Open Source project. Theia provides a free alternative for monitoring NetApp utilization including CPU load, Disk usage, iops, interface utilization and more. Additionally Theia can also send passive health checks to nagios via nsca addon.
So where do I come in? Well thats an easy question to answer with a long explanation. Gerir wrote Theia as an internal tool for gathering statistical data from our production NetApps. Being an architecty kinda guy we keep him very busy, so unfortunately he doesn’t have as much time to work on Theia as we would like. This is where I come in.. After having some conversations with Gerir about Theia, we decided that she needed some attention and after a few revisions it would be time for Theia to be introduced to the Open Source community through http://github.com/ning
Getting ready for open source!
The main feature that was holding us back from open sourcing Theia was the lack of interface autodiscovery. This was a crucial part to open sourcing Theia because without this you would have to create a copy of Theia for every netapp you were planning to monitor. As part of the metrics collection we parse network interface statistics. Previously this required you to hard code the name of the NetApp interface into theia. From an Operations perspective this would be a nightmare to manage if you have more than one NetApp, or if the interfaces ever changed. In addition to the management nightmares python dictionary’s only support unique Key’s which Theia uses for text formatting which would prevent Theia from ever being able to display statistical data for multiple network interfaces.
I overcame these obstacles by utilizing modules that were already being imported as to limit overhead, and to keep things tidy. So first things first.. we need to “discover” the active interfaces on the NetApp. There are a few built in tools which are exposed to provide this functionality but they dont appear to be written with automation in mind.
By using the python subprocess library we can login to the netapp and collect some data as shown below:
cmd = "/usr/bin/ssh -o BatchMode=yes -2 -ax -i /root/.ssh/perfstat_dsa %s 'stats show ifnet'|sed 's/:/ /g'|sort|awk '{print $2}'|uniq|tr -d '\r'" % (hostname)
If we break down this command we can see that we are establishing an SSH connection to the NetApp and running ‘stats show ifnet’ which is a utility exposed by ONTAP 7.3. After we have the interface details we can then perform some text formatting so that we can parse the network interfaces and load them into a python list.
NetApp> stats show ifnet ifnet:svif0:recv_packets:22510/s ifnet:svif0:recv_errors:0/s ifnet:svif0:send_packets:41798/s ifnet:svif0:send_errors:0/s ifnet:svif0:collisions:0/s ifnet:svif0:recv_data:1969198b/s ifnet:svif0:send_data:61504464b/s ifnet:svif0:recv_mcasts:0/s ifnet:svif0:send_mcasts:0/s ifnet:svif0:recv_drop_packets:0/s NetApp*>
Code:
cmd = "/usr/bin/ssh -o BatchMode=yes -2 -ax -i /root/.ssh/perfstat_dsa %s 'stats show ifnet'|sed 's/:/ /g'|sort|awk '{print $2}'|uniq|tr -d '\r'" % (hostname) p = subprocess.Popen(cmd,shell=True,stdout=subprocess.PIPE,stderr=subprocess.PIPE) so = p.communicate()[0]
Note that the subprocess library returns both stdout and stderr so we must tell python to only return stdout by specifying [0] at the end our p.communicate() command.
Now that we have the names of the interfaces we can collect data for each individual interface by iterating over our python list. This is done by using a basic for loop as shown below.
if_list = [] if_list.append(so.rstrip('\n')) for interface in if_list: vifs = [] vifs = interface.splitlines()
As you can see in the for loop listed above we are calling the buit in python library splitlines() as well as rstrip(). In order to properly format our list we must first remove the newline characters that are inserted by the stats show ifnet command, and then sort them into the “vifs” list. Here is the before and after using print statements to compare.
if_list.append(so.rstrip('\n'))
['e0b\ne0d\ne0e\ne0f\nvif0']
for interface in if_list:
vifs = []
vifs = interface.splitlines()
['e0b', 'e0d', 'e0e', 'e0f', 'vif0']
Up to this point we have been doing prep work leading in to the working portion of this code. In order for Theia to iterate over our active interfaces and collect the statistical data that we are looking for we need to present these values to theia in the form of a python dictionary. Dictionary’s are pretty simple and consist of a Key and a Value. The Key specifies the name of our Value. For example if you had a Key called “Food”, Pizza could be your value and would be represented as:
mealtime = { 'Food' : "pizza" } print mealtime
Theia uses a dictionary called xnmetrics that will be used to collect the specific data from the NetApp. Since the python dictionary requires you to have unique Keys, we must also specify the interface name as part of the Key. We do this by writing another simple for loop, but we will append the data in our list into the xnmetrics dictionary.
for vif in vifs: xnmetrics["NetRE %s" % (vif)] = "ifnet:%s:recv_errors" % (vif) xnmetrics["NetSE %s" % (vif)] = "ifnet:%s:send_errors" % (vif)
Here we are doing a few things. First and most obvious is our for loop but whats important is the action we are performing. While iterating through our list of interfaces we are appending the Key and Value for each interface as the vif into our xnmetrics dictionary. By appending the interface name into the key we can ensure that we always have our required unique Key name and as an afterthought we can also easily identify the name of the interface.
I hope you find this article useful, and as with all open source projects we are looking for contributors who would like to use and expand upon Theia. So if you are interested head over to the Ning github @ https://github.com/ning/Theia to get started!
autodiscover.py – code integrated into Theia for interface autodiscovery
#!/usr/bin/env python import subprocess import sys # xnmetrics dictionary (Original Theia code) xnmetrics = { 'APB' : "system:system:avg_processor_busy", 'CIFSO' : None, 'CPU' : "system:system:cpu_busy", 'CPtime' : "system:system:cp_time", 'CacheAge' : "system:system:cache_age", 'CacheHit' : "system:system:cache_hit", 'DiskRead' : "system:system:disk_data_read", 'DiskUtil' : "system:system:disk_busy", 'DiskWrite' : "system:system:disk_data_written", 'HTTPO' : None, 'NFSO' : "nfsv3:nfs:nfsv3_ops", 'NFSRL' : "nfsv3:nfs:nfsv3_read_latency", 'NFSRO' : "nfsv3:nfs:nfsv3_read_ops", 'NFSWL' : "nfsv3:nfs:nfsv3_write_latency", 'NFSWO' : "nfsv3:nfs:nfsv3_write_ops", 'NetIn' : "system:system:net_data_recv", 'NetOut' : "system:system:net_data_sent", 'TPB' : "system:system:total_processor_busy", 'TapeRead' : None, 'TapeWrite' : None, 'TotalO' : "system:system:total_ops", 'cpu0' : "processor:processor0:processor_busy", 'cpu1' : "processor:processor1:processor_busy", 'cpu2' : "processor:processor2:processor_busy", 'cpu3' : "processor:processor3:processor_busy" } # interface autodiscovery code (Open Source Initiative) def vifautodiscover(): hostname = sys.argv[1] cmd = "/usr/bin/ssh -o BatchMode=yes -2 -ax -i /root/.ssh/perfstat_dsa %s 'stats show ifnet'|sed 's/:/ /g'|sort|awk '{print $2}'|uniq|tr -d '\r'" % (hostname) p = subprocess.Popen(cmd,shell=True,stdout=subprocess.PIPE,stderr=subprocess.PIPE) so = p.communicate()[0] if_list = [] if_list.append(so.rstrip('\n')) for interface in if_list: vifs = [] vifs = interface.splitlines() for vif in vifs: xnmetrics["NetRE %s" % (vif)] = "ifnet:%s:recv_errors" % (vif) xnmetrics["NetSE %s" % (vif)] = "ifnet:%s:send_errors" % (vif) print xnmetrics vifautodiscover()