Thursday 5 July 2012

GPFS Highly Available (HA) SNMP monitoring configuration

The available/documented GPFS SNMP implementation (By IBM) is not designed to be highly available.
There is only one SNMP collector node at a time. If that node fails the SNMP monitoring does not fail over to any other node and thus complete loss of cluster monitoring/reporting.

This blog post offers a simple implementation of gpfs snmp monitoring failover. The failover scheme uses a callback mechanism triggered by a quorumNodeLeave event and the eventNode is its only parameter.

First create a folder to contain your callbacks (if you already have a location for your callbacks, use that instead)
Download/copy the following script into the callbacks location and make it executable.


Modify the script to indicate your available collector nodes by substituting "quorum" and "quorum_node_2" with the hostnames of your gpfs quorum nodes.

Note that in this case the callbacks location is /callback so you may have to modify the script accordingly.

Copy the modified script to all quorum nodes

Add the callback (run once from any quorum node);
mmaddcallback NodeDownCallback --command  /callback/snmp_collector_failover.sh --event quorumNodeLeave --parms %eventNode

if you want monitoring to be reverted back to the default prefered collector node after it comes back online, you may consider adding a node join callback;
mmaddcallback NodeJoinCallback --command  /callback/snmp_collector_failover.sh --event quorumNodeJoin --parms %eventNode

No comments:

Post a Comment