Building an App to Monitor Your App

Keeping a web application up and running usually requires lots of moving parts. Since these parts are all codependent, when the whole app goes down, the immediate question is: what broke?

I want to share with you the solution we came up with at Sqoot. Let’s dive right into the code and then talk about how we got here.

Diving Right In

We start with a simple Sinatra web app:

require 'erb'
require 'sinatra'
require 'active_support'

PROCESSES = [:solr, :memcached, :mongo] # ...can be anything

get '/' do
  # Ask each process what its status is (e.g., { :solr => true, ... })
  @process_statuses = PROCESSES.inject({}) do |memo, process|
    memo[process] = self.send("#{process}_up?")
    memo
  end

  # Set the response code to 200 if all the statuses are up
  status @process_statuses.all? { |k, v| v } ? 200 : 503

  erb :index
end

Then we just need a bunch of _up? methods to do the checking, like this:

require 'httparty'

def solr_up?
  solr_host = '...' # replace with your Solr server
  HTTParty.get("http://#{solr_host}/admin/ping").success?
end

In addition to Solr, we also make sure that Mongo and Memcached are running.

Finally, we need a couple endpoints for Pingdom to hit (e.g, /solr) so that we can check each service independently:

PROCESSES.each do |process|
  get "/#{process}" do
    @process = process
    @status  = self.send("#{process}_up?")

    status @status ? 200 : 503

    erb :process
  end
end

Add a couple trivial views, some CSS3 awesome, and voilà, you’ve got health.sqoot.com!

Taking a Step Back

The first thing we did to monitor uptime was point Pingdom at Sqoot’s homepage. We would perform some trivial check and when the site went down we dealt with a deluge of errors we couldn’t do anything about (e.g., Timeout::Error). Since we use Hoptoad to stay on top of application exceptions all these errors got really noisy. We also still didn’t know which dependency had gone down. We just knew the site was hosed.

We had to be able to monitor each service independently. So we created a StatusesController in the app that had an action for every service we wanted to monitor. The good thing was that we could now really introspect the process (like above) to make sure it was OK. For example, we might want to know that search is running and there are a certain number of documents indexed too. Unfortunately, if the status of one service changed, they usually all changed (due to requests queuing up, etc.). So we still didn’t have great insight as to what broke.

By creating a separate application that performed the checks, we were able to get detailed insight as to what services were running (or not). We can use Ruby to make each check meaningful and since it lives outside our app, it reports on each service exclusively.

Boom!

Thanks for reading! I'm Avand.

I am a full-stack software engineer, product designer, and teacher. I’ve been working on the web for over a decade and am passionate about building great products.

I currently work at Airbnb, where I help internal product teams stay abreast with customer feedback. Before that, I was at Mystery Science, transforming how elementary school teachers teach science. And since 2013, I’ve worked on-and-off with General Assembly, teaching aspiring developers what I know about front-end web development.

I was born in Boston, grew up in Salt Lake City, and spent many years living in Chicago. Now, I call San Francisco my home.

I’m an aspiring rock climber. I have a love affair with music and cars, especially vintage BMWs and Volkswagens. One day, I’ll buy a van and transform it into an offroad-capable camping rig.

But that’s enough about me. How can I help you?

Read my other posts or get in touch: