Building an App to Monitor Your App
Keeping a web application up and running usually requires lots of moving parts. Since these parts are all codependent, when the whole app goes down, the immediate question is: what broke?
I want to share with you the solution we came up with at Sqoot. Let’s dive right into the code and then talk about how we got here.
Diving Right In
We start with a simple Sinatra web app:
Then we just need a bunch of _up?
methods to do the checking, like this:
In addition to Solr, we also make sure that Mongo and Memcached are running.
Finally, we need a couple endpoints for Pingdom to hit (e.g, /solr
) so that we can check each service independently:
Add a couple trivial views, some CSS3 awesome, and voilà, you’ve got health.sqoot.com!
Taking a Step Back
The first thing we did to monitor uptime was point Pingdom at Sqoot’s homepage. We would perform some trivial check and when the site went down we dealt with a deluge of errors we couldn’t do anything about (e.g., Timeout::Error
). Since we use Hoptoad to stay on top of application exceptions all these errors got really noisy. We also still didn’t know which dependency had gone down. We just knew the site was hosed.
We had to be able to monitor each service independently. So we created a StatusesController
in the app that had an action for every service we wanted to monitor. The good thing was that we could now really introspect the process (like above) to make sure it was OK. For example, we might want to know that search is running and there are a certain number of documents indexed too. Unfortunately, if the status of one service changed, they usually all changed (due to requests queuing up, etc.). So we still didn’t have great insight as to what broke.
By creating a separate application that performed the checks, we were able to get detailed insight as to what services were running (or not). We can use Ruby to make each check meaningful and since it lives outside our app, it reports on each service exclusively.
Boom!