Tuesday, April 28, 2009

Distributed monitoring with Zabbix - Part 1

After Red Hat announced their plans to discontinue their monitoring product, Red Hat Command Center, I was on the hunt for a good replacement. Our goals for a monitoring product was quite straight forward:
  1. Free and Open Source. Most commercial products charge per IP, monitor, agent etc. and that could quickly become very expensive for us since we are planning to monitor many customers down the road.
  2. Agent-less monitoring. Primary focus is on SNMP based monitoring and only use agent based monitoring as a last resort. SNMP is the least intrusive way to monitor a system and it is a mature technology supported by many development platforms.
  3. Scalable solution. We want the product to be able to perform distributed monitoring (master-slaves).
  4. Web based interface. Dealing with thick clients is a hassle when it comes to the management software or even the agent software.
  5. More than up-down monitoring. We want to collect historic usage data for reporting and convenient analysis of trends.
  6. Custom reports.
With these goals in mind, I started to scour the web for various monitoring solutions. Fortunately, my search was made easy by a wonderful comparison matrix of monitoring products on Wikipedia.

We tried Zenoss, Groundwork community, ManageEngine OpManager, and we already had experiences with Nagios, OpenNMS, WhatsUp Gold, and ProactiveNet. For the sake of brevity, I am not going to explain why we did not choose any of the above mentioned products. Although, I should mention that we really liked ManageEngine's OpManager 8, however, their per monitor license fee was a bit hard to swallow. Finally, we hesitently settled on Zabbix because it met all our goals and offered few other goodies such as:
  1. Server, proxy and agent daemons run unpriveleged.
  2. Ease of custom template creation.
  3. Ability to create nice custom graphs.
  4. Ability to graph just about any monitored numerical value.
  5. LDAP authentication for web interface.
Zabbix does have plenty of flaws, and it is by no means a perfect product.
  1. Although the Zabbix Manual is over 300 pages long, it is not very helpful. There is a lot of room for improvement when it comes to documentation pertaining to doing more advanced monitoring with Zabbix.
  2. The php web interface is not very intutive. Considering this is the primary interface for interacting with the monitoring solution, it is also the primary source of much of my frustration. Hopefully, I will touch on many of them in my future posts.
  3. Zabbix forum is not very helpful. I am not sure if this is due to the size of the community, my phrasing of the questions or something else.
My goal with this and the following blog entries would be to document how to setup distributed monitoring with Zabbix. I will try to point out the what I consider to be flaws or non-intutive features and how they (Zabbix devs) can possibly improve it.


No comments:

Post a Comment