Monitoring server-side processes with God

Publié le 4 janvier 2012 par Nicolas Cavigneaux | outils

Cet article est publié sous licence CC BY-NC-SA

Have you ever needed to put a critical website online? Ensure that it’s up & running 24/7? If you did, you know that it can be a real pain to check that everything is ok, that all services are running, that no process is eating too much resources (CPU / memory).

Here at Synbioz we have to ensure services reliability for our customers. There are many ways to do that, you can write your own shell scripts, play with some crontabs, send email on failure, … But it is kind of difficult to write effective scripts, ensure it’s working well and more over, most of the time your homemade scripts will not be portable and will only work with a specific application. So what is the best way to handle this?

God is upon you!

Here comes God which is a monitoring framework you can rely on to keep your processes and tasks running well. God is written in Ruby and aims to be a simple, powerful and flexible way to write monitoring tasks.

Before going deeper into God, you must know that it will only work on Unix-like systems. Sorry Windows users but hey I know you never ever want to deploy a production app on a Windows server…

God config files are written in Ruby, so you can do basically everything Ruby allows you to do, and it’s a lot of stuff.

God features are:

  • Config files are written in Ruby
  • Easily write your own custom conditions in Ruby
  • Supports both poll and event based conditions
  • Different poll conditions can have different intervals
  • Integrated notification system (eg: XMPP notifier)
  • Easily control non-deamonizing scripts

God basics

Install

You must first start by installing God on your system, I mean the production server:

$ sudo gem install god

or add it to your Gemfile like so:

gem "god"

You can now create a God configuration file for the deamon you want to monitor:

$ touch config/unicorn.god

Naming config file with .god extension is a convention but this file is in fact a plain Ruby file.

Handle deamons start and stop

RAILS_ROOT = File.dirname(File.dirname(__FILE__))

God.watch do |w|
  pid_file = File.join(RAILS_ROOT, "tmp/pids/unicorn.pid")

  w.name = "unicorn"
  w.dir = RAILS_ROOT
  w.interval = 60.seconds
  w.start = "unicorn -c #{RAILS_ROOT}/config/unicorn.rb -D"
  w.stop = "kill -s QUIT $(cat #{pid_file})"
  w.restart = "kill -s HUP $(cat #{pid_file})"
  w.start_grace = 20.seconds
  w.restart_grace = 20.seconds
  w.pid_file = pid_file

  w.uid = 'nico'
  w.gid = 'team'

  w.env = { 'RAILS_ENV' = "production" }

  w.behavior(:clean_pid_file)
end

God monitoring

We’re now going to enhance our config file to add real process monitoring. Monitoring will allow us to check CPU and memory usages by process:

RAILS_ROOT = File.dirname(File.dirname(__FILE__))

God.watch do |w|
  pid_file = File.join(RAILS_ROOT, "tmp/pids/unicorn.pid")

  w.name = "unicorn"
  w.interval = 60.seconds
  w.start = "unicorn -c #{RAILS_ROOT}/config/unicorn.rb -D"
  w.stop = "kill -s QUIT $(cat #{pid_file})"
  w.restart = "kill -s HUP $(cat #{pid_file})"
  w.start_grace = 20.seconds
  w.restart_grace = 20.seconds
  w.pid_file = pid_file

  w.behavior(:clean_pid_file)

  # When to start?
  w.start_if do |start|
    start.condition(:process_running) do |c|
      # We want to check if deamon is running every ten seconds
      # and start it if itsn't running
      c.interval = 10.seconds
      c.running = false
    end
  end

  # When to restart a running deamon?
  w.restart_if do |restart|
    restart.condition(:memory_usage) do |c|
      # Pick five memory usage at different times
      # if three of them are above memory limit (100Mb)
      # then we restart the deamon
      c.above = 100.megabytes
      c.times = [3, 5]
    end

    restart.condition(:cpu_usage) do |c|
      # Restart deamon if cpu usage goes
      # above 90% at least five times
      c.above = 90.percent
      c.times = 5
    end
  end

  w.lifecycle do |on|
    # Handle edge cases where deamon
    # can't start for some reason
    on.condition(:flapping) do |c|
      c.to_state = [:start, :restart] # If God tries to start or restart
      c.times = 5                     # five times
      c.within = 5.minute             # within five minutes
      c.transition = :unmonitored     # we want to stop monitoring
      c.retry_in = 10.minutes         # for 10 minutes and monitor again
      c.retry_times = 5               # we'll loop over this five times
      c.retry_within = 2.hours        # and give up if flapping occured five times in two hours
    end
  end
end

You can repeat the God watch block as much as you need to handle other deamons your application makes us of.

Usage

Now that your config file is ready, you can check current God status

$ god status

which will tell you that unicorn is down. So we’re going to start it:

$ god -c config/unicorn.god

Same but not deamonized:

$ god -c config/unicorn.god -D

God status should now tell you that unicorn is up and running.

$ god log unicorn

will show you what God did with the deamon and will also show you monitoring results such as last memory and CPU usages in real-time.

If you like to play you can now try to kill unicorn process from another shell and look at what happen in God logs:

$ kill $(cat tmp/pid/unicorn.pid)

You should see that God detected that unicorn isn’t running anymore, deleted pid file if it existed and started unicorn deamon again.

Now if you need to stop all God monitorings:

$ god terminate

or a given one:

$ god stop unicorn

Server init process

Great we’re happy with our monitoring system, but how do I start this thing when server starts or reboots? You have to write an init script! But relax, I have one for you:

Init script

#!/bin/bash
#
# God
#
# chkconfig: - 85 15
# description: start, stop, restart, status for God
#

RETVAL=0

case "$1" in
    start)
      god -P /var/run/god.pid -l /var/log/god.log
      god load /etc/god.conf
      RETVAL=$?
      ;;
    stop)
      kill `cat /var/run/god.pid`
      RETVAL=$?
      ;;
    restart)
      kill `cat /var/run/god.pid`
      god -P /var/run/god.pid -l /var/log/god.log
      god load /etc/god.conf
      RETVAL=$?
      ;;
    status)
      /usr/bin/god status
      RETVAL=$?
      ;;
    *)
      echo "Usage: god {start|stop|restart|status}"
      exit 1
  ;;
esac

exit $RETVAL

Global God config file

As you can see, the above script makes use of a file named /etc/god.conf. This file has only one simple purpose, load a bunch of God config files at once:

God.load "/etc/god/*.god"

This trick allows you to create a symlink of your app God config files into /etc/god/ directory to ensure it will be loaded on server boot. This is very similar to the technique used for Mongrel.

Now you can do:

$ /etc/init.d/god start

$ /etc/init.d/god status

$ /etc/init.d/god stop

Notification on failures

Let’s say you want to be notified everytime a process exits, you can add this to your God configuration file:

w.transition(:up, :start) do |on|
  on.condition(:process_exits) do |c|
    c.notify = 'devteam'
  end
end

Now god knows that everytime our process exits when starting it should send a notification to “devteam”. You can use notify in any condition block.

But what is “devteam” and how the hell are notification sent?!

Sending emails

First solution is to send email.

We’ll first start by defining some default for email in our God config file:

God::Contacts::Email.defaults do |d|
  d.from_email = 'god@synbioz.com'
  d.from_name = 'God'
  d.delivery_method = :sendmail
end

Then we need to define a contact:

God.contact(:email) do |c|
  c.name = 'Dev Team'
  c.group = 'devteam'
  c.to_email = 'team@synbioz.com'
end

You can define as much contacts as you need but be sure “name” attribute is unique! Now our dev team will receive email notification when there’s such a problem.

Jabber notifications

You don’t like emails and want XMPP notifications? No problem:

God::Contacts::Jabber.defaults do |d|
  d.host = "jabber.synbioz.com"
  d.from_jid = "foo@synbioz.com"
  d.password = "bar"
end

God.contact(:jabber) do |c|
  d.to_jid = "baz@synbioz.com"
end

Other notification systems

You can also use Campfire, Prowl, Scout, Twitter and WebHook to send notifications. They are part of God core.

You can easily extends notifications if you need to use your own system, maybe an internal tracking system.

God is good for you!

I hope this quick introduction to God will be helpful for those of you who want to monitor their applications. Don’t think God is only for Rails apps or even Ruby apps. You can use God for anything you want to monitor, Rails projects or not!

Synbioz Team.