Friday 20 November 2009

Zenoss: Simple deduplication of events for clustered filesystems

Imagine if you have a huge cluster filesystem (eg gpfs, or gfs) which if mounted on a thousand nodes.
All nodes(hosts) are monitored in zenoss
You want to monitor this file system (which appear the same on all nodes)
By default, all nodes will create events/alerts for this filesystem
You might end up having a thousand email about the same gpfs filesystem mounted on a thousand nodes

If, however, you choose to monitor this filesystem on one node, what happens when that node goes down, even the filesystem in ok?... no monitoring

To mitigate this, here is a simple solution (not very smart but works)

In the /Events/Perf/Filesystem, More--->Transform
Add this code:

comp = evt.component
sev = str(evt.severity)
if comp.find("gpfs")>=0:
   evt.dedupid = "cluster|"+comp+"|/Perf/Filesystem|usedBlocks_usedBlocks|high disk usage|"+sev

assuming that your filesystems have a particular naming convention e.g. in my case (gpfs_data, gpfs_finance, gpfs_images etc)
The approach is to use a single unique dedupid for each clusterd filesystem. Only the first node to notice the event will alert.
However, it would need a CLEAR from this particular nod for the event to be cleared.

No comments:

Post a Comment