Archive

Devops

We, being the company I work for, recently set up a mysql galera cluster and haproxy to load balance connections between the nodes. Haproxy has a mysql health check, but it only logs into the server, and we wanted a bit more than that (galera’s rsync option puts the server that is being synced from in read_only mode). What I didn’t want to do was install apache or similar because I wanted to leave as much of the systems resources available to mysql. I solved the problem with a perl script.

Before I move on, I should mention that I don’t like perl. Other languages, such as Go, provide just as easy of a solution, but perl is installed on pretty much all linux distros, and, therefore, was less setup. The backbone of the script is the HTTP::Simple::Server:CGI package. My version of the script weighs in at a whole 26 lines of code. Here it, mostly, is.

#!/usr/bin/perl

use File::Pid;

{
package MyWebServer;

use HTTP::Server::Simple::CGI;
use base qw(HTTP::Server::Simple::CGI);

    sub handle_request {
        my ($self, $cgi) = @_;
        $isFine = 0;
        //-----
        //do your checking logic here
        //-----
        if($isFine) {
            print "HTTP/1.0 200 OK\r\n";
            print $cgi->header;
        } else {
            print "HTTP/1.0 503 Service Unavailable\r\n";
            print $cgi->header;
        }
    }
}

my $pidfile = File::Pid->new();
if(!$pidfile->running) {
    my $server = MyWebServer->new(12345);
    $server->host('YOUR SERVERS IP GOES HERE');
    my $pid = $server->background();
    $pidfile->pid($pid);
    $pidfile->write;
}

The above code checks to see if a running PID for the script exists and exits if it does (the if block towards the bottom). It then sets up the server to listen on part 12345, use whatever port you want. The next line tells it to listen on a specific ip address, I set that from chef as part of the .erb that builds this script, you could pass a parameter to the script if you don’t want to do that ($ARGV[0]). It then creates the server in the background and writes the PID file.

Of course, the real action is in the handle_request function in the package. That function gets called every time the script receives an http request. All mine does, and you could do a lot more here, is collect some information about the state of the server, a bit more on that in a second, and either returns a status of 200 or 503 which is all haproxy cares about. If your load balancer checks for actual content in the response then, you would add some prints after the $cgi->header calls.

As I mentioned in the first paragraph, the reason we set this up was to discover if the server happens to be in read_only mode. Thus, all my check does it shell out to mysql with a -e option to show global variables, and then runs a regex over that for read_only being set to off.

I’ve also set cron up to run the script ever minute, which is why the PID stuff is in there. Pretty simple really.

Advertisements

With the reliance most web apps place on databases making sure they are always available is important for improving your reliability. I consider MySQL to be finicky (a manual master to slave fail-over at 5am is not my idea of fun), but it is what I’m stuck with. Here is my understanding of the options for keeping your app up when mysql isn’t.

Master – Slave

This is a pretty basic and common pattern. You have one server that gets all of the writes (the master) and another server that is replicating those writes (the slave). The replication is asynchronous and; therefore, can fall behind. Generally, if you keep the load under control on the slave it should keep up. You can send all of your traffic to the master or you can send reads to the slave distributing your read performance. Writing to the slave will probably break replication (definitely break it if you insert into an auto incrementing column).

Cleaning up a broken slave can be difficult. You either need to hunt down and undo any changes made to the slave or pull a dump from the master and import it. Care must be taken when importing to make sure that you know what position in the binlog to resume replication from.

If the master fails you can fall over to the slave though this is a manual process. You’ll need to stop anything being written to the master (if it isn’t completely dead), stop the slave process on the slave, tell your application to write to the slave, get the master back up and running, get the master up to date (you’re probably not going to know the binlog position which means you’ll likely need to do a full import), get what was the master set up to be the slave, and start it replicating.

The benefit here is that it’s easy to set up, mysql is pretty stable so you’ll rarely have to fix it, and as long as 1 server can give you enough write throughput, you can be reasonably happy. If you ever need more read throughput you can add additional slaves. The replication overhead on the master is low.

Master – Master

Similar to master-slave but now both servers are configured to be slave and master for each other. This allows you to read and write to either server because it will be replicated the other. Of course all of the same problems regarding repairing broken slaves and known binlog positions, etc still apply. Both servers can’t attempt to get the same id from an auto increment so you’ll need to do something in the lines of configuring one server to only use even numbers and the other to only user odd. If servers come under load they might start to fall behind in replication. The order updates and inserts are applied in might be different for each server which might lead to them having different data.

One way to resolve some of these problems is to only send traffic to one server.

With a VIP

If you are running on linux then you can use a virtual IP. This requires a bit of network wizardry. What you end up with is a system where the passive server (the not in use server) is polling the active server (the in user server) to make sure it’s alive. If it discovers that the active server is down then it steals the VIP and in doing so promotes itself to being active. Your application won’t need to know about it as the ip it’s connecting to never changes, just the destination. You’ll still need to figure out what was wrong with the broken server and get it working again but in theory they can be no downtime to the user.

With a distributed file system

The idea here is that the file system the servers write to is shared between them (NAT or SAN or the like). This is really more an active-passive solution than master-master as one of you’re servers will need to be turned off or you risk corrupting data. You can also combine this with the VIP method though you’ll need something to mount the drive (depending on sharing method) and start mysql. What you get with this is the knowledge that the data on the active and passive node will be the same (it’s the same mount) at the expensive of a little downtime while the passive mysql starts.

NDB

This is mysql’s cluster. It’s it’s own engine so you can’t use innodb or myisam, it has many moving parts, and it requires at least 3 servers, but it gives you a system where you can read from and write to any node without any of the data integrity complications inherent in the previous patterns. The system is composed of API nodes (generally mysql), data nodes, and master nodes. These processes can live on distinct machines or all on the same machine. Unlike the previous examples, all of your data does not live on all of your servers but is distributed across the cluster. This has the benefit of increasing your throughput as the number of nodes increases though individual query performance can be impacted.

Data can be mirrored between the storage nodes meaning the loss of any individual node will not result in the loss of data. Nodes can be inserted and dropped without fuss or harm. For example, NDB updates itself through a rolling update were each node, one at a time, is dropped out of the pool, updated, and entered back into the pool.

You will probably need to make some application changes in order to use NDB. One set of concerns relates to security as NDB is innately insecure, requiring proper use of DMZs. Please take a look at mysqls documentation for more information. Large join, sort, etc performance can also be bad as the rows involved will likely be spread across the data nodes.

Galera

Galera is a solution for MariaDB or Percona, which are forks of mysql. Galera is also a clustered solution that replaces mysql’s asynchronous replication with synchronous replication. Galera combines recent research and Percona’s XtraDB fork of innodb (myisam is in beta I believe but isn’t production ready) to provide solid performance for synchronous replication. As with NDB, Galera allows you to read and write to any node, and add or remove nodes with ease. When you add a node the cluster will automatically get it synced with the rest of the cluster.

Unlike NDB, all data lives on all nodes. This has benefits and drawbacks for performance. Read performance is fast and joins, sorts, etc are fast as everything is on the node getting the request. Inserts and updates speed will be depending on the speed of the slowest node in your cluster. This is important to consider given that you will likely run a Galera cluster on commodity hardware. You can find benchmark data at this webpage.

I favor this solution but suspect that it isn’t suitable for solutions that require a lot of mysql servers to meet throughput demands. Using NDB with a caching layer to speed up frequent reads might be a better solution in that scenario.

Tungsten

Like Galera, Tungsten is a cluster solution that replaces mysql’s built in replication. It allows for complex replication strategies and replication between different versions of mysql or different databases altogether. Replication happens by Tungsten Replicator which is a separate process from mysql, so the solution is not as simple as Galera but probably makes up for it in it’s flexibility.

RDS

Amazon’s RDS (relational database service) is part of it’s AWS offerings. You define what type of throughput you need and they handle the rest. The only drawback I know of here is that they don’t yet support encryption at rest so if you have PHI or other data you need to encrypt you are SOL. If you are in AWS and don’t require encryption at rest this is probably the right place to start.

I just spent an embarrassing amount of time trying to figure out why some resources in one of my puppet class where not being executed. All of these were inside of an if branch that was checking if another class was defined. It was defined (wouldn’t be much of a blog post if that wasn’t the case), there were not errors, it just wasn’t happening. Turns out that defined does not work the way I thought it did. From Puppet’s documentation (https://docs.puppetlabs.com/references/latest/function.html#defined):

Checking whether a given resource has been declared is, unfortunately, dependent on the parse order of the configuration

Because defined happens during the parse step and not another step the order in which your resources are declared matters. To give you a concrete example, I have two classes, php and imagemagick. The imagemagick class installs the imagick php extension for you if php is defined. How nice of it, right? This has always worked perfectly. Until my most recent manifest where I had something like:

class{'imagemagick': }
class{'php':
  before => Class['imagemagick'],
}

Which looks good. Php has to happen before imagemagick so it should be defined. But, because imagemagick is parsed before php, php isn’t actually defined for imagemagick, so nothing inside of my if was run during the apply. To make this actually work it needed to like like:

class{'php':
before => Class['imagemagick'],
}
class{'imagemagick': }

And now it works. Ridiculous. Hope this saves someone some time.

Recently, I needed to add https support to our dev installs of our web app. The app itself needed to know it was using https, to generate proper urls and the like, so terminating the ssl connection at the proxy was not a viable solution for me. HAProxy added support for SSL in 1.5 but this article isn’t about that because I’m using CentOS and therefore am stuck with HAProxy 1.4.

First up, how not to solve this problem. My first thought was that if I put HAProxy in tcp mode it shouldn’t know anything about whether the connection was SSL or not. This did not work. Unfortunately my notes don’t say why this didn’t work but I assume either HAProxy was spitting out BADREQ with PR– in the logs or the payload was getting messed up and causing errors in negotiations.

Enter stunnel. Stunnel is an SSL tunnel and is what I used to handle the https request. Stunnel can be configured in either a server mode, which terminates SSL connections, or as a client, which initiates SSL connections. This solution uses both. The general solution, which I found here, is to have the https connection received by a stunnel server, who forwards the now http connection to haproxy, who forwards the http connection to a stunnel client, who changes it back to https and forwards it along to server.

–https–> stunnel server –http–> haproxy –http–> stunnel client –https–> web server

Not pretty, but effective, and because all of the traffic between stunnel and haproxy is on localhost, it’s relatively fast.

The first thing needed is to get stunnel installed. It’s in yum.

Now to set up stunnel. I made a folder at /etc/stunnel to hold my configs and .pem file, which is the .key and .crt file concatenated together. I placed the .pem file in that folder. Now you will need a config file for the stunnel server and client. I named my server.conf and client.conf. You might be able to do this with 1 config file, I’m not that familiar with stunnel. In both config files you will need/want 5 global settings defined:

cert=<path to your .pem file>
pid=/var/run/stunnel_(server|client).pid or something similar
output=/var/log/stunnel_(server|client).log or something similar
socket=l:TCP_NODELAY=1
socket=r:TCP_NODELAY=1

Basically, define where the pem lives because we need that for the SSL handshake, define a pid file and a log file because those are handy, and two lines that are basically gibberish to me but seem important (all of the google results seemed to have them). I would also suggest you add in foreground=yes while you test the config files so that you can easily see what is happening and kill (ctrl-c) the process to make changes. The next bits are define what the stunnel client is actually going to be doing. For the server:

[https]
accept=443
connect=8081

That basically says to listen on port 443 and if you get something there forward the connection to 8081. Port 443 is important but you could change 8081 to be something different. The client will look at lot the same:

[https]
client=yes
accept=8082
connect=<server ip address>:443

There we tell stunnel that it’s to operate in client mode (client=no is default which is why it wasn’t in the server config), to listen on 8082 (which you could change to something else), and to connect to our webserver on 443. If you had multiple web servers you could put multiple connect lines in and it will round robin the connections.

The last thing we need to change is our HAProxy config. This, at it’s most basic, would look something like:

frontend my_ssl_webpage
bind :8081
default_backend my_ssl_webpage_backend

backend my_ssl_webpage_backend
reqadd X-Forwarded-Proto:\ https
server stunnel1 127.0.0.1:8082

Now, if you restart HAProxy, and start stunnel for both config files (sudo stunnel <path to config file>), you should have https requests arriving on your webserver.

 

Bonus:
Found a great manual for HAProxy 1.4. Here is the link:
http://cbonte.github.io/haproxy-dconv/configuration-1.4.html

An hour before I was going to leave work for the long holiday weekend I broke replication to a MySQL slave database. Go see fireworks? No thanks, I’ve made my own.

The first thing I’ve learned is that changing to master_host field in a CHANGE MASTER TO command will reset your master log and position values. This will destroy your replication or corrupt your data or both! Probably both. I had recently changed an LAN IP of the master and wanted to point the slave at that, figured if I didn’t change the master log and position that everything would be fine. Nope. What I should have done, and feel foolish for not doing, was write down the current master log file and position after issuing a STOP SLAVE. That’s a really good idea whenever you do something with slave data just in case something goes wrong. Specifically here, after seeing that the file and position where different I could have put the correct numbers back in before restarting the slave. So, there is the first lesson,

Before a CHANGE MASTER TO, run a SHOW SLAVE STATUS\G and record the Master_Log_File and Exec_Master_Log_Pos.

The second thing I’ve learned is that the –master-data option on a mysqldump does not do what I thought it would. It records the master data, which is what it says it does, but that’s the SHOW MASTER STATUS data not the SHOW SLAVE STATUS data. It’s the numbers you need if you want to scp a snapshot to a potential slave and get it runnig. It is not the numbers you need if you do a backup on the slave and want to possibly recover in the case of a failure. I figured this out after uncompressing, importing, and attempting to START SLAVE. This did not make me happy. To recover from this, I ended up running a backup off the master, something I would rather not do for performance reasons but holiday weekend, and importing that backup. I haven’t looked into a long term fix for this yet. It can wait for Monday. So, there is the second lesson,

The –master-data option on mysqldump is the SHOW MASTER STATUS equivalent and not the SHOW SLAVE STATUS equivalent.

The third thing I’ve learned, rediscovered really, is that database imports from mysqldump take a long time to import. A really long time. Seriously.  For this I wrote a quick script that makes things a little bit faster. The time required to import a db from mysqldump is the sum of time required to import each individual table. My script parses the dump file and splits it into a bunch of individual table files which reduces the time required to that of the longest table import. The script is hacked together, written in perl (which I’m not the best at), and missing prompts, help, or safety measures, but here it is in case you want it. You’ll need to edit the mysql command in the function in order to connect to mysql and use the correct database.

#!/usr/bin/perl

use threads;

#requires the path for the db gzipped dump
my $gzipped = $ARGV[0];

`mkdir ./dbbackupimport`;
`cp $gzipped ./dbbackupimport/backup.sql.gz`;
`gzip -d ./dbbackupimport/backup.sql.gz`;

#break the dump file up into files per table
open(my $dumpfh, "<", "./dbbackupimport/backup.sql");
open(my $currentfh, ">", "./dbbackupimport/backupheader.sql");
my @tables = ();
while(<$dumpfh>) {
 $line = $_;
 if($line =~ /^DROP TABLE IF EXISTS \`([\w]+)\`.*/) {
 $table = $1;
 print "Found table $table\n";
 push @tables, $table;
 if($currentfh) {
 close $currentfh;
 }
 #include the header so the imports can disable keys
 `cat ./dbbackupimport/backupheader.sql > ./dbbackupimport/$table.sql`;
 open($currentfh, ">>", "./dbbackupimport/$table.sql");
 print $currentfh "\n";
 }
 print $currentfh $line;
}
close $currentfh;

#spawn threads to import the data
#each thread will execute this function
sub import_thread {
 my $table = @_[0];
 print "mysql < ./dbbackupimport/$table\n";
 `mysql < ./dbbackupimport/$table.sql`;
 return $table;
}
#spawn the threads
@threads = ();
for my $table (@tables) {
 push @threads, threads->create('import_thread', $table);
}
#collect the threads
for my $thread (@threads) {
 my $table = $thread->join();
 print "Finished importing $table\n";
}
Hourly Drop

Drops at 14:00 and 15:00.

For a long time puppet apply lines were added to server crontab files and we were content with the trade-off between timely updates and server resources used. More recently, while looking at some of our monitoring graphs, we noticed that one of our core business activities took a noise dive every hour, on the hour. When did puppet run? Every hour, on the hour. Time for a change.

My goal was the make so that there was a big red button to press whenever we wanted to run puppet, which is largely whenever we want to push code. Automatic deploys would be really nice but FDA regulations pretty much force something similar to waterfall. Anyway, Jenkins play button is close enough to the big red button so I went with that – also already had it installed. That left the following problems to overcome: running commands in production, getting around the DMZ, running puppet with sudo privileges, and doing it all in a timely manner. Turns out, this actually wasn’t terribly difficult. Here’s what I had to do.

  • Set up Puppet. I’m assuming you’ve already got this part.
  • Set up Jenkins. Again, assuming you’ve done this, but if not, it’s probably in your package manager.
    • Bonus, secure Jenkins so random web traffic can’t access it. I used HAProxy rules to only forward the request to Jenkins if it originated inside of the office.
  • Install the Publish Over SSH plugin. This will allow you to ssh into whatever server you already have punched through the DMZ. I’ll refer to this server as Smuggler here for short.
  • Install pdsh on Smuggler.
  • Run ssh-keygen on Smuggler if you haven’t already.
  • In Jenkins, add Smuggler (Manage Jenkins -> Configure System) with it’s public key.
  • Distribute Smuggler’s public key to every DMZed server you want to run puppet on. You can do this with ssh-copy-id or just copy the key into .ssh/authorized_keys on the servers you want to log into.
  • On ever server you want Smuggler to log into, run visudo – sudo visudo is a weird sort of command – and change the following:
    • Comment out “Defaults requiretty”. This makes it so you can run sudo from ssh in one command.
    • Add a line with, “user    ALL = (ALL)    NOPASSWD: /usr/bin/puppet”. Replace user with the correct username. That gives that user the ability to run puppet as sudo without a password prompt.
  • Create or edit a project in Jenkins and add a new ssh build step.
    • Select your server to connect to.
    • In the execute command part of the ssh step you want to set up a pdsh command. You’ll need two parameters for this,
      • -R exec. This basically tells pdsh to execute a command for every server. If you don’t have any atypical ssh options, you could say -R ssh.
      • -w <targets>. This is where you specify what to log into. You’ll need a comma separated list hosts or IPs. Thankfully you can use a range in the form of [01-16]. To ssh into IPs 192.168.0.1 to 192.168.0.5 and 192.168.0.100 to 192.168.0.115 you would say, “-x 192.168.0.[1-5],192.168.0.1[01-15]”.
    • The last part of the pdsh command you need is the command to run. If you used exec you’ll end up with something like, “ssh %h sudo puppet apply myManifest.pp”, or “ssh %h sudo puppet agent –no-daemonize –onetime”, or something like that. The %h substitues the servers host into the command.
    • All together, you end up with something in your execute field that looks something like:
      • pdsh -R exec -w 192.168.0.[1-5],192.168.0.1[01-15] ssh %h sudo puppet apply myManifest.pp

With that in place, we are no longer wasting cycles having puppet accomplish nothing. As a bonus we can also modify puppet modules, stage code (we build rpms and use yum to deploy our code), or whatever else we need to do without fear of puppet sending it out before we’re ready. Double bonus, if we screw up we don’t have to wait for the next puppet run to deploy our fix.

Yesterday I wrote an article that mostly explained how to install and configure haproxy. Today I want to describe the specific solution I’ve come up with for handling a development environment with multiple services running on multiple servers. My goal is to simply things. Specifically our networking and configuration. Complicating factors include:

  • not wanting to change some domains/url (though I do want to remove the ports)
  • minimize ip usage
    • A proper dev or qa install includes several VMs load balanced together.
    • DHCP works but messes up load balancing when renews happen.
    • Hand picking IPs is a bit burdensome.
  • Performance would be nice.

The solution I’ve come up with is to create a tree topology out of haproxyed servers. Basically, one server sets at the top and all port 80 traffic gets forwarded to it from the router. We’ll call it Lancelot. Lancelot’s haproxy rules are configured to search out the hdr_beg for domains like wiki. and jira. and forward those along to the appropriate servers. Say we have two additional servers, Arthur and Galahad, were we set up virtual environments. Lancelot also has hdr_end lines for arthur.domain.com and galahad.domain.com which forward the requests on to those servers. Galahad has virtual environments purity and sword. Arthur has virtual environments excalibur, lwr (large wooden rabbit), and hhg (holy hand grenade). Galahad’s haproxy is configured with hdr_beg lines for purity. and sword. which forward requests onto VirtualBox private networks. Arthur’s haproxy is configured with hdr_beg lines for excalibur., lwr., and hhg. which forward requests onto VirtualBox private networks.

Setup like the above, a request for excalibur.arthur.domain.com would:

  1. Get sent to Lancelot by the router (port 80 forwarding rule)
  2. Trigger Lancelots hdr_end rule for arthur.domain.com and get forward to Arthur
  3. Trigger Arthur’s hdr_beg rule for excalibur and get forwarded to a 192.168 that corrisponds to excalibur’s load balancer
  4. The request gets handled and winds it’s way back through the proxies to your web browser.

This satisfies most of my goals. Domains for things like wiki.domain.com remain the same because haproxy is forwarding the request directly to it’s appropriate server. Because installs like excalibur and purity use only private VM networking IPs from the office as a whole aren’t used and I have a static IP to load balance with.  Performance could be better but screw it, it isn’t production.