Building an App to Monitor Your App

Keeping a web application up and running usually requires lots of moving parts. Since these parts are all codependent, when the whole app goes down, the immediate question is: what broke?

I want to share with you the solution we came up with at Sqoot. Let’s dive right into the code and then talk about how we got here.

Diving Right In

We start with a simple Sinatra web app:

require 'erb'
require 'sinatra'
require 'active_support'

PROCESSES = [:solr, :memcached, :mongo] # ...can be anything

get '/' do
  # Ask each process what its status is (e.g., { :solr => true, ... })
  @process_statuses = PROCESSES.inject({}) do |memo, process|
    memo[process] = self.send("#{process}_up?")
    memo
  end

  # Set the response code to 200 if all the statuses are up
  status @process_statuses.all? { |k, v| v } ? 200 : 503

  erb :index
end

Then we just need a bunch of _up? methods to do the checking, like this:

require 'httparty'

def solr_up?
  solr_host = '...' # replace with your Solr server
  HTTParty.get("http://#{solr_host}/admin/ping").success?
end

In addition to Solr, we also make sure that Mongo and Memcached are running.

Finally, we need a couple endpoints for Pingdom to hit (e.g, /solr) so that we can check each service independently:

PROCESSES.each do |process|
  get "/#{process}" do
    @process = process
    @status  = self.send("#{process}_up?")

    status @status ? 200 : 503

    erb :process
  end
end

Add a couple trivial views, some CSS3 awesome, and voilà, you’ve got health.sqoot.com!

Taking a Step Back

The first thing we did to monitor uptime was point Pingdom at Sqoot’s homepage. We would perform some trivial check and when the site went down we dealt with a deluge of errors we couldn’t do anything about (e.g., Timeout::Error). Since we use Hoptoad to stay on top of application exceptions all these errors got really noisy. We also still didn’t know which dependency had gone down. We just knew the site was hosed.

We had to be able to monitor each service independently. So we created a StatusesController in the app that had an action for every service we wanted to monitor. The good thing was that we could now really introspect the process (like above) to make sure it was OK. For example, we might want to know that search is running and there are a certain number of documents indexed too. Unfortunately, if the status of one service changed, they usually all changed (due to requests queuing up, etc.). So we still didn’t have great insight as to what broke.

By creating a separate application that performed the checks, we were able to get detailed insight as to what services were running (or not). We can use Ruby to make each check meaningful and since it lives outside our app, it reports on each service exclusively.

Boom!

Thanks for reading! I'm Avand.

I’ve been working on the web for over a decade and am passionate about building great products.

My last job was with Airbnb, where I focused on internal products that helped teams measure the quality of the software they were building. I also built internal tools for employees to stay more connected, especially after the COVID-19 pandemic. Before that, I was lead engineer at Mystery Science, the #1 way in which science is taught in U.S. elementary school classroms. For a while, I also taught with General Assembly, teaching aspiring developers the basics of front-end web development.

I was born in Boston, grew up in Salt Lake City, and spent many years living in Chicago. Now, I call San Francisco my home and Mariposa my home away from home.

I enjoy the great outdoors and absolutely love music and dance. Cars have been an lifelong obsession of mine, especially vintage BMWs and Volkswagens. I’m the proud owner of a 2002 E-250 Sportsmobile van, and he and I have enjoyed many trips to beautiful and remote parts of the West Coast to create good vibes.

What can I do for you?

Read my other posts or get in touch: