Archive

Tag Archives: mysql

We, being the company I work for, recently set up a mysql galera cluster and haproxy to load balance connections between the nodes. Haproxy has a mysql health check, but it only logs into the server, and we wanted a bit more than that (galera’s rsync option puts the server that is being synced from in read_only mode). What I didn’t want to do was install apache or similar because I wanted to leave as much of the systems resources available to mysql. I solved the problem with a perl script.

Before I move on, I should mention that I don’t like perl. Other languages, such as Go, provide just as easy of a solution, but perl is installed on pretty much all linux distros, and, therefore, was less setup. The backbone of the script is the HTTP::Simple::Server:CGI package. My version of the script weighs in at a whole 26 lines of code. Here it, mostly, is.

#!/usr/bin/perl

use File::Pid;

{
package MyWebServer;

use HTTP::Server::Simple::CGI;
use base qw(HTTP::Server::Simple::CGI);

    sub handle_request {
        my ($self, $cgi) = @_;
        $isFine = 0;
        //-----
        //do your checking logic here
        //-----
        if($isFine) {
            print "HTTP/1.0 200 OK\r\n";
            print $cgi->header;
        } else {
            print "HTTP/1.0 503 Service Unavailable\r\n";
            print $cgi->header;
        }
    }
}

my $pidfile = File::Pid->new();
if(!$pidfile->running) {
    my $server = MyWebServer->new(12345);
    $server->host('YOUR SERVERS IP GOES HERE');
    my $pid = $server->background();
    $pidfile->pid($pid);
    $pidfile->write;
}

The above code checks to see if a running PID for the script exists and exits if it does (the if block towards the bottom). It then sets up the server to listen on part 12345, use whatever port you want. The next line tells it to listen on a specific ip address, I set that from chef as part of the .erb that builds this script, you could pass a parameter to the script if you don’t want to do that ($ARGV[0]). It then creates the server in the background and writes the PID file.

Of course, the real action is in the handle_request function in the package. That function gets called every time the script receives an http request. All mine does, and you could do a lot more here, is collect some information about the state of the server, a bit more on that in a second, and either returns a status of 200 or 503 which is all haproxy cares about. If your load balancer checks for actual content in the response then, you would add some prints after the $cgi->header calls.

As I mentioned in the first paragraph, the reason we set this up was to discover if the server happens to be in read_only mode. Thus, all my check does it shell out to mysql with a -e option to show global variables, and then runs a regex over that for read_only being set to off.

I’ve also set cron up to run the script ever minute, which is why the PID stuff is in there. Pretty simple really.

Advertisements

With the reliance most web apps place on databases making sure they are always available is important for improving your reliability. I consider MySQL to be finicky (a manual master to slave fail-over at 5am is not my idea of fun), but it is what I’m stuck with. Here is my understanding of the options for keeping your app up when mysql isn’t.

Master – Slave

This is a pretty basic and common pattern. You have one server that gets all of the writes (the master) and another server that is replicating those writes (the slave). The replication is asynchronous and; therefore, can fall behind. Generally, if you keep the load under control on the slave it should keep up. You can send all of your traffic to the master or you can send reads to the slave distributing your read performance. Writing to the slave will probably break replication (definitely break it if you insert into an auto incrementing column).

Cleaning up a broken slave can be difficult. You either need to hunt down and undo any changes made to the slave or pull a dump from the master and import it. Care must be taken when importing to make sure that you know what position in the binlog to resume replication from.

If the master fails you can fall over to the slave though this is a manual process. You’ll need to stop anything being written to the master (if it isn’t completely dead), stop the slave process on the slave, tell your application to write to the slave, get the master back up and running, get the master up to date (you’re probably not going to know the binlog position which means you’ll likely need to do a full import), get what was the master set up to be the slave, and start it replicating.

The benefit here is that it’s easy to set up, mysql is pretty stable so you’ll rarely have to fix it, and as long as 1 server can give you enough write throughput, you can be reasonably happy. If you ever need more read throughput you can add additional slaves. The replication overhead on the master is low.

Master – Master

Similar to master-slave but now both servers are configured to be slave and master for each other. This allows you to read and write to either server because it will be replicated the other. Of course all of the same problems regarding repairing broken slaves and known binlog positions, etc still apply. Both servers can’t attempt to get the same id from an auto increment so you’ll need to do something in the lines of configuring one server to only use even numbers and the other to only user odd. If servers come under load they might start to fall behind in replication. The order updates and inserts are applied in might be different for each server which might lead to them having different data.

One way to resolve some of these problems is to only send traffic to one server.

With a VIP

If you are running on linux then you can use a virtual IP. This requires a bit of network wizardry. What you end up with is a system where the passive server (the not in use server) is polling the active server (the in user server) to make sure it’s alive. If it discovers that the active server is down then it steals the VIP and in doing so promotes itself to being active. Your application won’t need to know about it as the ip it’s connecting to never changes, just the destination. You’ll still need to figure out what was wrong with the broken server and get it working again but in theory they can be no downtime to the user.

With a distributed file system

The idea here is that the file system the servers write to is shared between them (NAT or SAN or the like). This is really more an active-passive solution than master-master as one of you’re servers will need to be turned off or you risk corrupting data. You can also combine this with the VIP method though you’ll need something to mount the drive (depending on sharing method) and start mysql. What you get with this is the knowledge that the data on the active and passive node will be the same (it’s the same mount) at the expensive of a little downtime while the passive mysql starts.

NDB

This is mysql’s cluster. It’s it’s own engine so you can’t use innodb or myisam, it has many moving parts, and it requires at least 3 servers, but it gives you a system where you can read from and write to any node without any of the data integrity complications inherent in the previous patterns. The system is composed of API nodes (generally mysql), data nodes, and master nodes. These processes can live on distinct machines or all on the same machine. Unlike the previous examples, all of your data does not live on all of your servers but is distributed across the cluster. This has the benefit of increasing your throughput as the number of nodes increases though individual query performance can be impacted.

Data can be mirrored between the storage nodes meaning the loss of any individual node will not result in the loss of data. Nodes can be inserted and dropped without fuss or harm. For example, NDB updates itself through a rolling update were each node, one at a time, is dropped out of the pool, updated, and entered back into the pool.

You will probably need to make some application changes in order to use NDB. One set of concerns relates to security as NDB is innately insecure, requiring proper use of DMZs. Please take a look at mysqls documentation for more information. Large join, sort, etc performance can also be bad as the rows involved will likely be spread across the data nodes.

Galera

Galera is a solution for MariaDB or Percona, which are forks of mysql. Galera is also a clustered solution that replaces mysql’s asynchronous replication with synchronous replication. Galera combines recent research and Percona’s XtraDB fork of innodb (myisam is in beta I believe but isn’t production ready) to provide solid performance for synchronous replication. As with NDB, Galera allows you to read and write to any node, and add or remove nodes with ease. When you add a node the cluster will automatically get it synced with the rest of the cluster.

Unlike NDB, all data lives on all nodes. This has benefits and drawbacks for performance. Read performance is fast and joins, sorts, etc are fast as everything is on the node getting the request. Inserts and updates speed will be depending on the speed of the slowest node in your cluster. This is important to consider given that you will likely run a Galera cluster on commodity hardware. You can find benchmark data at this webpage.

I favor this solution but suspect that it isn’t suitable for solutions that require a lot of mysql servers to meet throughput demands. Using NDB with a caching layer to speed up frequent reads might be a better solution in that scenario.

Tungsten

Like Galera, Tungsten is a cluster solution that replaces mysql’s built in replication. It allows for complex replication strategies and replication between different versions of mysql or different databases altogether. Replication happens by Tungsten Replicator which is a separate process from mysql, so the solution is not as simple as Galera but probably makes up for it in it’s flexibility.

RDS

Amazon’s RDS (relational database service) is part of it’s AWS offerings. You define what type of throughput you need and they handle the rest. The only drawback I know of here is that they don’t yet support encryption at rest so if you have PHI or other data you need to encrypt you are SOL. If you are in AWS and don’t require encryption at rest this is probably the right place to start.

An hour before I was going to leave work for the long holiday weekend I broke replication to a MySQL slave database. Go see fireworks? No thanks, I’ve made my own.

The first thing I’ve learned is that changing to master_host field in a CHANGE MASTER TO command will reset your master log and position values. This will destroy your replication or corrupt your data or both! Probably both. I had recently changed an LAN IP of the master and wanted to point the slave at that, figured if I didn’t change the master log and position that everything would be fine. Nope. What I should have done, and feel foolish for not doing, was write down the current master log file and position after issuing a STOP SLAVE. That’s a really good idea whenever you do something with slave data just in case something goes wrong. Specifically here, after seeing that the file and position where different I could have put the correct numbers back in before restarting the slave. So, there is the first lesson,

Before a CHANGE MASTER TO, run a SHOW SLAVE STATUS\G and record the Master_Log_File and Exec_Master_Log_Pos.

The second thing I’ve learned is that the –master-data option on a mysqldump does not do what I thought it would. It records the master data, which is what it says it does, but that’s the SHOW MASTER STATUS data not the SHOW SLAVE STATUS data. It’s the numbers you need if you want to scp a snapshot to a potential slave and get it runnig. It is not the numbers you need if you do a backup on the slave and want to possibly recover in the case of a failure. I figured this out after uncompressing, importing, and attempting to START SLAVE. This did not make me happy. To recover from this, I ended up running a backup off the master, something I would rather not do for performance reasons but holiday weekend, and importing that backup. I haven’t looked into a long term fix for this yet. It can wait for Monday. So, there is the second lesson,

The –master-data option on mysqldump is the SHOW MASTER STATUS equivalent and not the SHOW SLAVE STATUS equivalent.

The third thing I’ve learned, rediscovered really, is that database imports from mysqldump take a long time to import. A really long time. Seriously.  For this I wrote a quick script that makes things a little bit faster. The time required to import a db from mysqldump is the sum of time required to import each individual table. My script parses the dump file and splits it into a bunch of individual table files which reduces the time required to that of the longest table import. The script is hacked together, written in perl (which I’m not the best at), and missing prompts, help, or safety measures, but here it is in case you want it. You’ll need to edit the mysql command in the function in order to connect to mysql and use the correct database.

#!/usr/bin/perl

use threads;

#requires the path for the db gzipped dump
my $gzipped = $ARGV[0];

`mkdir ./dbbackupimport`;
`cp $gzipped ./dbbackupimport/backup.sql.gz`;
`gzip -d ./dbbackupimport/backup.sql.gz`;

#break the dump file up into files per table
open(my $dumpfh, "<", "./dbbackupimport/backup.sql");
open(my $currentfh, ">", "./dbbackupimport/backupheader.sql");
my @tables = ();
while(<$dumpfh>) {
 $line = $_;
 if($line =~ /^DROP TABLE IF EXISTS \`([\w]+)\`.*/) {
 $table = $1;
 print "Found table $table\n";
 push @tables, $table;
 if($currentfh) {
 close $currentfh;
 }
 #include the header so the imports can disable keys
 `cat ./dbbackupimport/backupheader.sql > ./dbbackupimport/$table.sql`;
 open($currentfh, ">>", "./dbbackupimport/$table.sql");
 print $currentfh "\n";
 }
 print $currentfh $line;
}
close $currentfh;

#spawn threads to import the data
#each thread will execute this function
sub import_thread {
 my $table = @_[0];
 print "mysql < ./dbbackupimport/$table\n";
 `mysql < ./dbbackupimport/$table.sql`;
 return $table;
}
#spawn the threads
@threads = ();
for my $table (@tables) {
 push @threads, threads->create('import_thread', $table);
}
#collect the threads
for my $thread (@threads) {
 my $table = $thread->join();
 print "Finished importing $table\n";
}