There is only one SNMP collector node at a time. If that node fails the SNMP monitoring does not fail over to any other node and thus complete loss of cluster monitoring/reporting.
This blog post offers a simple implementation of gpfs snmp monitoring failover. The failover scheme uses a callback mechanism triggered by a quorumNodeLeave event and the eventNode is its only parameter.
First create a folder to contain your callbacks (if you already have a location for your callbacks, use that instead)
Download/copy the following script into the callbacks location and make it executable.
Modify the script to indicate your available collector nodes by substituting "quorum" and "quorum_node_2" with the hostnames of your gpfs quorum nodes.
Note that in this case the callbacks location is /callback so you may have to modify the script accordingly.
Copy the modified script to all quorum nodes
Add the callback (run once from any quorum node);
mmaddcallback NodeDownCallback --command /callback/snmp_collector_failover.sh --event quorumNodeLeave --parms %eventNode
if you want monitoring to be reverted back to the default prefered collector node after it comes back online, you may consider adding a node join callback;
mmaddcallback NodeJoinCallback --command /callback/snmp_collector_failover.sh --event quorumNodeJoin --parms %eventNode
No comments:
Post a Comment