Posted on Fri 24 January 2014

Deploying Python Apps

A feature in our software goes through many stages before it goes live. Each feature begins with an idea, then a discussion, wire frames, designs, a feature branch, tickets in swim lanes on pivotal, unit and acceptance tests, staging try out, bug fixes, sign off and then after all that, deployment to live. I've spent a lot of energy making deployments from the build server to live as reliable as possible.

A key element of almost every single application we've developed is that only one installation runs on any one machine. Queue process names have only recently been name-spaced and isolation of dependencies outside of python modules is still ongoing.

We use Fabric to handle the actual deployment. Here's the code that will deploy any given application at Stickyworld:

import datetime
from fabric.api import cd, env, local, run, settings, task
from fabric.decorators import runs_once
from os.path import abspath, dirname


@task
@runs_once
def tag_release():
    utc_str = datetime.datetime.utcnow().strftime("%Y-%m-%d-%H-%M-%S")
    local('git tag deployment-%s' % utc_str)
    local('git push origin master --tags')


@task
def install_packages():
    """
    Install system packages
    """
    run("sudo /usr/bin/apt-get -y -qq update")
    packages_file = '%s/../packages.txt' % abspath(dirname(__file__))
    with open(packages_file, 'r') as open_file:
        items = open_file.read()
        run("sudo /usr/bin/apt-get install -y -qq --force-yes %s" %
            ' '.join(items.split('\n')))
    run("sudo /usr/bin/apt-get clean")
    run("sudo /sbin/ldconfig")


@task
def deploy():
    find_it = 'ps aux | grep "app\_name.*python" | grep -v grep | wc -l'
    kill_it = "ps auxww | grep \"app\_name.*python\" | grep -v grep " +\
        " | awk '{print $2}' | xargs kill"

    install_packages()

    with cd(env.base_src_dir):
        run('git pull origin master')

        run('source %s/activate && grep -v distribute %s/requirements.txt | '
            ' xargs pip install -q ' % (env.venv_bin_dir, env.base_src_dir))
        run('sudo /usr/bin/supervisorctl -c '
            '/etc/supervisor/supervisord.conf stop %s %s' % (env.task_name,
            ' '.join(env.worker_names)))

        if int(run(find_it)):
            with settings(warn_only=True):
                run(kill_it)

        with cd("%s/src" % env.base_src_dir):
            for command in ('syncdb', 'migrate', 'loaddata'):
                run('%s/python manage.py %s' % (env.venv_bin_dir, command))

        workers_in_rev = list(env.worker_names)
        workers_in_rev.reverse()
        run('sudo /usr/bin/supervisorctl -c '
            '/etc/supervisor/supervisord.conf start %s %s' % (
            ' '.join(workers_in_rev), env.task_name,))
        run('sudo /usr/bin/supervisorctl -c '
            '/etc/supervisor/supervisord.conf update')
        run('crontab -r')
        run('crontab %s/etc/crontab.txt' % env.base_src_dir)

    tag_release()

    for url in env.health_check_urls:
        local('curl --insecure --silent -I %s | grep "HTTP/1.1"' % url)

To trigger a deployment off our build machine we only need to run the following command:

➫ fab deploy

Fabric will collect various configuration items from it's environment and will populate the env object.

Here's an example env object:

env.hosts = ['web1', 'web2']
env.user = 'app_name'
env.base_src_dir = '/home/app_name/app_name'
env.venv_bin_dir = '/home/app_name/.virtualenvs/app_name/bin'
env.task_name = 'app_name_www' # Our primary application server
env.worker_names = ('app_name_beat', 'app_name_convert_media',
    'app_name_create_multiple_avatar_sizes',) # Worker queues
env.health_check_urls = ('https://app_name.stickyworld.com/',
    'https://app_name.stickyworld.com/another-area/')

Installing System Dependencies

One of the first tasks invoked by deploy installs all system dependences onto the live boxes:

run("sudo /usr/bin/apt-get -y -qq update")
packages_file = '%s/../packages.txt' % abspath(dirname(__file__))
with open(packages_file, 'r') as open_file:
    items = open_file.read()
    run("sudo /usr/bin/apt-get install -y -qq --force-yes %s" %
        ' '.join(items.split('\n')))
run("sudo /usr/bin/apt-get clean")
run("sudo /sbin/ldconfig")

We make sure apt-get is up-to-date, install all packages listed in packages.txt. Having this file full of our system dependencies is very helpful to explaining what is needed to get our application running and for keeping track of dependency changes. A typical packages.txt file can look like the following:

➫ cat packages.txt
autoconf
bc
build-essential
checkinstall
jpegoptim
libjpeg-dev
libmagic-dev
...

After that we run apt-get clean to remove any no-longer-needed packages. We usually remove packages by hand (as it's a rare thing to do) and run this command afterword. Since that's a manual action I wanted to make sure there is something ensuring the system is tidy in case one of us forgets.

Finally we run ldconfig to make sure links to all the shared libraries and caching are in order.

You might notice how some commands run sudo into a run method rather than via the sudo method provided by Fabric. This is because it's easier to guarantee the form in which the sudo command will run on live. We've setup sudoers to allow these privileged commands to run without a password prompt. Fabric would execute sudo calls as sudo su -c "/sbin/ldconfig" and if we've shelled into the server we'd generally run sudo /sbin/ldconfig so this saves us needing two separate config lines in sudoers.

Updating the Python code itself

This step has a lot of pieces to it and they need to run in a certain order. We need to get the latest repository checked out onto the server, install any new or updated python dependencies, stop all the application servers and workers, sync (create) new databases tables, run database schema migrations and load in all initial fixture data.

with cd(env.base_src_dir):
    run('git pull origin master')

    run('source %s/activate && grep -v distribute %s/requirements.txt | '
        ' xargs pip install -q ' % (env.venv_bin_dir, env.base_src_dir))
    run('sudo /usr/bin/supervisorctl -c '
        '/etc/supervisor/supervisord.conf stop %s %s' % (env.task_name,
        ' '.join(env.worker_names)))

    if int(run(find_it)):
        with settings(warn_only=True):
            run(kill_it)

    with cd("%s/src" % env.base_src_dir):
        for command in ('syncdb', 'migrate', 'loaddata'):
            run('%s/python manage.py %s' % (env.venv_bin_dir, command))

The piece around find_it, kill_it is interesting. All our tasks run via Celery and they have warm and cold shut downs. If you want to wait for a task to close down nicely, let it do a warm shut down. This is the shut down we do with supervisorctl stop worker_name. If that process is frozen or in an unresponsive state then we kill it with fire:

Find it:

➫ ps aux | grep "app\_name.*python" | grep -v grep | wc -l

Kill it:

➫ ps auxww | grep "app\_name.*python" | grep -v grep | awk '{print $2}' | xargs kill

We always run our applications under an unprivileged account so elevated permissions aren't needed for this operation.

Only once the dependencies are in and the database is up-to-date can we start everything back up again.

Turn everything back on again

We list our workers in order that they should be turned off (not that it matters normally) so we reverse that list of workers when telling supervisor to turn them back on.

workers_in_rev = list(env.worker_names)
workers_in_rev.reverse()
run('sudo /usr/bin/supervisorctl -c '
    '/etc/supervisor/supervisord.conf start %s %s' % (
    ' '.join(workers_in_rev), env.task_name,))

We then update supervisor which will make it read it's config files and include any new workers we've created. We have a supervisord.conf file in every application repository which is symbolically linked to /etc/supervisor/conf.d/app_name.conf.

run('sudo /usr/bin/supervisorctl -c '
    '/etc/supervisor/supervisord.conf update')

Update the Crontab

We occasionally use crontab to run periodic jobs that are more maintenance related. We keep a crontab.txt file in each application's repository so we can keep track of what is running via the crontab. If there is a change to the crontab, at least it'll be logged in git.

run('crontab -r')
run('crontab %s/etc/crontab.txt' % env.base_src_dir)

The only thing I don't like about this is that it makes the code base installation-specific. In an ideal world there would be a build repository that is separate where all configuration and settings are stored, secured and change is recorded. This would be broken down by target as well so we could manage the application on varying types of servers and environments.

Tag the release

At this point we're not sure if everything works well but the code has been deployed and installed but we want to record that we have made it this far.

utc_str = datetime.datetime.utcnow().strftime("%Y-%m-%d-%H-%M-%S")
local('git tag deployment-%s' % utc_str)
local('git push origin master --tags

Again, this ties our git repository to a single installation of our code base which isn't great. But what this does allow us to do is see what has changed between releases should we need to investigate anything:

➫ git diff deployment-2013-11-06-15-15-43...deployment-2013-11-06-15-43-32
diff --git a/assets/styles/global.css b/assets/styles/global.css
index 3fe116b..f6d7450 100644
--- a/assets/styles/global.css
+++ b/assets/styles/global.css
@@ -209,6 +209,31 @@ video {
   width: 128px;
 }

+.square-for-video-in-list {
+  padding: 10px;
+  width: 100%;
+  margin-bottom: 10px;
+  border: #eee solid 1px;
+}
+
+.square-for-video {
+  padding: 10px;
+  float: left;
+  mid-width: 300px;
+  margin-right: 10px;
+  margin-bottom: 10px;
+  border: #eee solid 1px;
+}
+
+.square-for-video-in-list img,
+.square-for-video img {
+  margin-right: 10px;
+}
+
+.auto-width {
+  width: auto !important;
+}
+
 footer {
   border-top: 4px solid #0088cc;
 }

This Fabric command runs with the @runs_once decorator. If we deploy to multiple servers we don't want to tag the same commit twice.

Does any of this work?

The last thing we do is call on a list of public URLs to see if we're getting HTTP 200 responses. Errors will boil up so it's good to see right away if something is amiss. We grep curl responses for the HTTP/1.1 string in the response headers and examine which HTTP response code it is that we're seeing.

for url in env.health_check_urls:
    local('curl --insecure --silent -I %s | grep "HTTP/1.1"' % url)

The reason that --insecure is passed is so SSL issues don't stop us from seeing if the application server itself is in trouble or if it's running fine. SSL issues on their own rarely come up and when they do they require a lot more investigation. We have yet to automate any SSL debugging other than checking externally-hosted frontend assets and foreign REST services are working and their certificates aren't expiring any time soon..

Conclusion

Writing this blog post was a good exercise in code review of our deployment code. If the material in this here was of interest and you are skilled in this area please do drop me a line.

Signup for our low-traffic newsletter:

Powered by MailChimp

© Giulio Fidente. Built using Pelican. You can fork the theme on github. .