DevOps in small companies – part I – configuration management

So you are a team of 3–5 and run a small company. You are happy with that and so we are. As we are commited to our deliverables, we need to do our job smoothly. We need appropriate tools for the right time to let the business run (and to make money, right?). Altough our teams are small and resources are limited, we still can improve our velocity. It’s actually inevitable if you want to stay on the market. Each such investment implies a non-zero cost, because of the learning curve etc. Thus it’s essential to invest in something valuable, that would keep us on the front – improve our throughput.

This set of posts aims to be somewhat a guideline of how to improve deliverables, by applying DevOps culture in a small company, or in particular – the automation.

Overview of current state

Did you hear about the Joel test? It’s quite old from the IT point of view, but still valid. As a matter of fact, it’s not an issue if you didn’t, because it’s somewhat a quality measurement, however very valuable, because it gives an overview of the current company state. So, how much points are you compliant with? Those twelve questions are the validator to help your business win so go and find them useful. Likewise, there are various aspects related to those questions and I’m going to touch some of them. In this case I mean managing the configuration.

Where configuration meets automation

Well, automation of provisioning the environment is not a new topic, because people are doing it for years or perhaps even decades. Bash, Perl or Python were predecessors, but in the last few years the topic evolved vast. Actually, you’re already at the gates of the Kingdom of Happiness even if you’re doing it with simple Bash script, e.g. to install Nginx, configure firewall or whatever is needed to deliver your app. It is, because you have some configuration process that let’s you provision the environment (or part of it) with reliability in any point of time.

As the above process remains valid, today we have some nicer toys to play with configuration, e.g. Chef, Puppet, Ansible, Salt or even Packer (it slightly differs from the others). These will help your company, because they push orchestration on completely new level of abstraction. OK, You’d say:

– but I need only few tools to run my app – why should I care?

– read below.

The Kittens world

Kittens are pets. Each cute little kitten has a name, get stroked every day, have special food and needs including “cuddles”. Without constant attention your kittens will die. Common types of “kittens” are MSSQL databases, Sharepoint, Legacy apps and all Unix systems. Kitten class computing is expensive, stressful and time consuming.

Unfortunately, often these Kittens are our production environments, which in case of any failure, results in a huge blow–up. To give an example, imagine you’re doing release upgrade on your Ubuntu LTS or just PostgreSQL version upgrade. Sure, you can put your app into maintenance mode and throw away all the users for a half day, but that’s not the case these days. Some call this approach the Phoenix Server Pattern and some the Immutable Deployments. The point is to deliver profits with immutability. Instead of doing Ubuntu release upgrade, throw it away and provision new VM with latest release.

Human failure

It’s in our nature to make mistakes, however we can minimize them. Any process that brings some automation, also minimizes failure probability. Despite it’s an investment, it’s profitable.

In the Rubyist world, there’s a tool called Bundler to manage dependencies. Bundler ensures that dependencies are consistent according to app needs. OSS world changes often and not always fluently to migrate from version X to Y. You need to manage these dependencies, e.g. to ensure version 1.2.3 of some dependency and 2.1.1 of some other. Bundler gives you extremely powerful engine to manage them and so CM tools give you the power to manage your environments. You always get the desired state.

Build your environment

CM tools are somewhat like build tools, e.g. Maven or Gradle, but instead of getting the result as file or set of files, you get freshly baked environment. Baked according to the rules from Cookbooks (Chef), Manifests (Puppet) or Playbooks (Ansible).

Any of these tools also offer extra level of abstraction to ensure maximum flexibility, but yet, organized in some manner. Having a set of VM’s, you can tell them to first configure some common context, e.g. a firewall or SSH, then a web–server, database, proxy or whatever is needed. For any given set of VM’s, you get the desired state, with open ports 22 and 5432, but closed everything else. Then for any subset of these VM’s, installed web–server or database. Any defined rule is applied where it’s desired – for a node (VM), set of nodes or even set of subset of nodes. It’s all up to you how you manage it. There’re some common patterns, e.g define roles (nodes), which include profiles (a set of rules to configure given tool, e.g. nginx). For Puppet it’s roles–profiles, whereas with Ansible it’s somewhat enforced by default.

It’s also worth noting that whatever rule you apply with desired CM tool, the applied rule is idempotent. It means that it will not apply firewall rules twice or more and mess with your setup, no matter how many times you’d apply that rule.

Keep calm and scale

To some extent, it’s just fine to scale vertically, however the cons are that it requires extra machine reboot and sometimes might be just a waste of resources utilization. On the other hand, to scale horizontally, it’s essential to have new environment(s) prepared to the desired state. Sure, you’d use the golden image approach and scale just fine, but well, these days have passed. Just imagine a new library installation with golden image approach and you’re off of this idea. CM tools give us much more flexibility to handle such cases.

Where shall I start?

Before you’ll start with anything, these below are your key points:

In other words, gather requirements first. See how the business works and understand it, deeply. Now, blame me, but for me validation is just fine even if you do peer review as the underlying aim is not to overload ourselves. Then, finally, start playing with your desired tool. If you don’t have any, yet, go and find whatever would be useful for you. I’ve used Puppet for some time, but switched to Ansible then, because of simplicity. Puppet has his own Ruby–based DSL to write manifests and is built upon master–agent pattern in its basis. However, it implies that each node needs Puppet–agent installed and set up SSL certs so that master and agents can talk to each other. For better node management, Puppet has some third party tools to better utilize his capabilities, e.g. Hiera to manage global environment config (e.g. to apply Ruby version 2.1 on a subset of nodes), or R10K to deal with any sort of environments (e.g. dev or production). There’s one more caveat to Puppet, quite common actually – because of Puppet design, if there isn’t explicit rules (resources) hierarchy, Puppet would apply them in a random order, which may cause unexpected results. In order to prevent it, Puppet DSL implements dedicated ordering by setting relationships between resources.

Ansible Playbooks on the other hand are YAML–based and top–bottom applied rules. It means first rule in Playbook is applied first, then second, then third etc. Besides, Ansible doesn’t implement master–agent architecture. Everything you need to run it on nodes is Python installed with python-simplejson library. I claim Ansible has also shorter learning curve according to Puppet, more modules supported by the Core team or just better docs. I’ve prepared simple Puppet vs. Ansible comparison (it needs Vagrant and VirtualBox) that simply configures SSH and firewall so you can play with both.

Kill your Kitten and see what happen

The idea behind this post was to unveil that CM matters. Even if you’re tiny player on the market and spinning new apache installation twice a year or doing whatever library upgrade ever less once in a while, it might be a valuable investment. Just after a few years, maintaining such Kitten becomes a pain, because no one ever remember what was there and what for. Keep your environments lean and auto–configurable and you’ll notice the profit.

Yet another data migration problem

TL;DR Ensure data consistency while copying data across databases having RDBMS (PostgreSQL in this case) on board.

The problem

Imagine you have two databases, such a they’ve had the same parent in the past. As the time goes by, some of the data might change in any of them. Now, you’d like to copy object A between databases under assumption that it’s only going to create a copy if there’s no equal object in the destination database. The object might contain foreign keys and such associations are also considered during checking equality.

Considerations

The easiest solution you’d think of is dump the data you want and then restore in destination database. Such approach, however, implies that you’d need a tool taking only data you want to copy. Not the whole database or table, only object A with its associations. PostgreSQL provides pg_dump or copy for data migrations, however none of them lets you deal with associations easily. You’d then use some higher level tools, e.g. any ORM you like and deal with deep object copy itself.

To check for equality, you’d need some data to compare. The best candidate would be to compare record id and its foreign keys. In this case however, you’re guaranteed that id in database X and Y points to the same record. They may differ and result in a mess.

Check for hash(database_X(A)) == hash(database_Y(A))

Another approach would be to calculate a hash of the data you’d like to compare and then use hashes instead of ids. So if the result matches, you’d not need to make a copy and for further operations, you’d just use record id.

Build a hash of record

To build a hash, you’d add a trigger to your database with appropriate function, e.g:

CREATE OR REPLACE FUNCTION update_post_footprint_func()
RETURNS trigger AS $$
DECLARE raw_footprint text;
BEGIN

raw_footprint := concat(NEW.title, NEW.content, NEW.owner_id);
NEW.footprint := (SELECT md5(raw_footprint));

RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER update_post_footprint BEFORE INSERT OR UPDATE ON posts FOR EACH ROW EXECUTE PROCEDURE update_post_footprint_func();

Such function will build new hash for given record for each insert or update. As you’d notice, this use case considers only 1 x 1 relationship at most and doesn’t cover 1 x N. For instance, a post record might have many tags. In this case you have two choices, either select for footprints of the dependencies (note that it implies any dependency has its own footprint), e.g:

raw_footprint := concat(...,
(select array_to_string(array(select footprint from tags where post_id = NEW.id order by id ASC), '|')));

or build parent footprint based on the dependency data, e.g:

raw_footprint := concat(...,
(select array_to_string(array(select name from tags inner join post_tags on tags.id = post_tags.tag_id where post_tags.post_id = NEW.id order by id ASC), '|')));

The footprint build process is somewhat similar to the Russian Doll caching pattern, despite you need to be aware that dependencies footprint must be built before the record footprint. However, it only applies when refering dependency footprints directly.

Possible issues

  1. Depending on the record dependencies, there might be a need to build a few/several triggers, where each generates sub-footprint, finally assembled with the main footprint.
  2. The speed. Since each trigger execution is a non-zero time consuming operation, the need of using it should be further discussed and associated with the use case. If it’s going to be rarely used and data insertions/updates are heavy, perhaps it would be a better idea to use it within the app itself.

Yet another Phoenix failure

As many of you, some time ago I’ve finished reading The Phoenix Project and no, I won’t write yet another review how good or bad is this book. However, it seems there’re two camps around, one loves the novel, and one hates. If you still aren’t a camper of any, come and join us. Perhaps you’ll learn something or just waste yet another several hours, not for the first time. Come and be a camper!

I won’t write yet another review, but it seems there’re Phoenix projects everywhere or at least they look like such. Today is Monday and I wanted to do a bank transfer. No chance, it didn’t work. Such crucial bank service is not accessible all day and they still haven’t fixed it. Guess what, they performed a customer migration to a brand new platform with completely new UI, perhaps even better than the previous one. There’s just one thing, it doesn’t work. So I’ve tried to send a message through the system to tell them all the issues, but it also failed again and again.

They spent probably thousands of hours working on a new platform, invested time and money and when it came to delivery time, it just failed. Of course they say they’re familiar with these issues and the whole IT department is working on it, but that’s not the case while everything is burning. I mean, it mustn’t never happen, especially if it’s a bank and there’s money involved.

We all want to be IT professionals, but such things are still happening and I started pondering how come. Is it because of simple math and probability, because the internet now achieved the point it never been ever since and among thousands of online services, some of them must just fail? Is it because of the vast changes in IT so no one could understand it well? Is it because of IT people since they just don’t care? Finally, is it because of management pressure, because whatever is happening, the product must be delivered on time?

Such app failure is not just a problem to solve. The point is, the whole migration process has failed and from customer point of view, new product is completely unusable, no matter how it look like or how well it is designed regarding UX best practices. The business can’t operate with such product.

If you’re familiar with such situation, waste several hours and read The Phoenix Project.