One of the other talks at the meetup, Stop Using Nagios by Andy Sykes from Forward3D (@supersheep) got me thinking about using Salt as the core component of a distributed monitoring system. I believe it fits the mould very well:
- It has an established, secure and most importantly, fast, master-minion setup
- It has built in scheduling capability both for the master and separately for the minions.
- It already has built in support for piping whatever comes back from your minion status checks into Graphite for graphing and MySQL/Postgres/SQlite or Cassandra/Mongo/Redis/Couch for storage/trending etc in its Returners
- It can act on event on the minions using it’s Events and Reactor systems
Some other interesting work has already been done in prototypes of Salt modules to run NPRE checks on minions.
And there are plenty of Graphite Dashboards that could be co-opted and built upon to provide other views of check data, not to mention Salt’s experimental Halite which may have possibilities as another UI facet.
I’ve started doing some testing of my own on this, but I’d be very interested in feedback.