We use elasticsearch as part of a centralized logging system (logstash, elasticsearch, kibana). Unfortunately we didn’t give the ES machine ES much disk space, and thus, ran out of space. After cleaning up some space and starting ES, it starts writing lots of warnings and stack traces like:

  • sending failed shard for
  • received shard failed for
  • failed to parse
  • failed to start shard
  • failed to recover shard

The disks were filling up again with error logs and the CPU was pegged. Thankfully I found https://groups.google.com/forum/#!topic/elasticsearch/HtgNeUJ5uao that forum post. A few posts in Igor Motov suggests deleting the corrupted translog files. The idea is that because the server ran out of disk space it didn’t complete writing to the translogs, and because the translogs were incomplete files, ES couldn’t read them to bring the indices back into correct states. If you delete those files then you may loose a few queries that had yet to be written into the indices but at least the indices will work again.

To fix this you need to look in the ES Logs, /var/log/elasticsearch/elasticsearch.log for CentOS, and find the error lines above. On those lines you’ll see something like

[<timestamp>][WARN ][cluste.action.shard] [<wierd name>] [logstash-2014.05.13][X]

where X (shard) is some number, likely (0,1,2,3,4), and the block before that, logstash-date for me, and you if your doing centralized logging like we are, is the index name. You then need to go to the index location, /var/lib/elasticsearch/elasticsearch/nodes/0/indices/ on centos. In that directory you’ll be able to find the following structure, logstash-date/X/translog/translog-<really big number>. That’s the file you’ll need to delete, so:

  1. sudo service stop elasticsearch
  2. sudo rm /var/lib/elasticsearch/elasticsearch/nodes/0/indices/logstash-date/X/translog/translog-blalblabla
  3. repeat step 2 for all indices and shards in the error log
  4. sudo service start elasticsearch

Watch the logs and repeat that process as needed until the ES logs stop spitting out stack traces.

On Friday I ran into a problem that very much looked like a bug in PHP. From what I can tell a function was not called that should have been. Here is an oversimplification of the code:

function foo() {
   ... some lines of code ...
   $this->bar()
   ... couple lines of code ...
   if(some stuff)
      error log that happened
   ... some more lines of code ...
}

function bar() {
   ... some lines of code ...
   error log that didn't happen
   $this->someState = a new state
   ... some lines of code ...
}

I ran this code once and the state of the class didn’t change and the error log line from bar() was not in the logs but the error log from foo() was in the logs. The error log in bar() isn’t any any kind of if or other flow control nor is there any place to return from bar() before the error log line. In foo() the call to bar() is also not in any if or similar block. Basically, if foo() was called, which I know if was because of the log line, and I know it was that log line because it’s unique, then bar() must have been called, and if bar() was called the log from bar must have been generated.

I saw that, scratched my head for awhile, and then ran the code again. This time I saw both log lines and the objects state changed as it is supposed to when bar() is run. The code was not changed between executions. The same data was provided for the execution each time. The log line in bar() was new but I had run the code with that change a few times before with different data and bar() properly wrote to the error log.

Yesterday I wrote an article that mostly explained how to install and configure haproxy. Today I want to describe the specific solution I’ve come up with for handling a development environment with multiple services running on multiple servers. My goal is to simply things. Specifically our networking and configuration. Complicating factors include:

  • not wanting to change some domains/url (though I do want to remove the ports)
  • minimize ip usage
    • A proper dev or qa install includes several VMs load balanced together.
    • DHCP works but messes up load balancing when renews happen.
    • Hand picking IPs is a bit burdensome.
  • Performance would be nice.

The solution I’ve come up with is to create a tree topology out of haproxyed servers. Basically, one server sets at the top and all port 80 traffic gets forwarded to it from the router. We’ll call it Lancelot. Lancelot’s haproxy rules are configured to search out the hdr_beg for domains like wiki. and jira. and forward those along to the appropriate servers. Say we have two additional servers, Arthur and Galahad, were we set up virtual environments. Lancelot also has hdr_end lines for arthur.domain.com and galahad.domain.com which forward the requests on to those servers. Galahad has virtual environments purity and sword. Arthur has virtual environments excalibur, lwr (large wooden rabbit), and hhg (holy hand grenade). Galahad’s haproxy is configured with hdr_beg lines for purity. and sword. which forward requests onto VirtualBox private networks. Arthur’s haproxy is configured with hdr_beg lines for excalibur., lwr., and hhg. which forward requests onto VirtualBox private networks.

Setup like the above, a request for excalibur.arthur.domain.com would:

  1. Get sent to Lancelot by the router (port 80 forwarding rule)
  2. Trigger Lancelots hdr_end rule for arthur.domain.com and get forward to Arthur
  3. Trigger Arthur’s hdr_beg rule for excalibur and get forwarded to a 192.168 that corrisponds to excalibur’s load balancer
  4. The request gets handled and winds it’s way back through the proxies to your web browser.

This satisfies most of my goals. Domains for things like wiki.domain.com remain the same because haproxy is forwarding the request directly to it’s appropriate server. Because installs like excalibur and purity use only private VM networking IPs from the office as a whole aren’t used and I have a static IP to load balance with.  Performance could be better but screw it, it isn’t production.

At work we have 1 ip address and many web services to run (ticket management, a wiki, a small army of dev and qa installs of our application, etc). Our solution has been to use dns names when the thing is running on the same server (wiki.domain.com, dev1.domain.com, qa1.domain.com, etc) and different ports to send requests to different servers. This works be requires use to remember the magic port number for everything.

Enter haproxy. Haproxy is a layer 4 and 7 (the important bit) load balancer. It’s really easy to install, really easy to set up, and allows you to load balancing or proxy requests based on domain name or url. To install haproxy all that needed to be done was, “yum install haproxy.” I’m sure you can substitute apt-get or your distros package manager of choice just as easily. All of the configuration happens in /etc/haproxy/haproxy.cfg. This file came pre-loaded with global and defaults sections which I kept and frontend and 2 backend sections that I deleted. The basic setup is that you create a frontend section the corresponds to requests, and backend sections that correspond to servers which handle the requests. For example:

frontend domain
    bind *:80
    acl wiki hdr_beg(host) -i wiki.
    acl dev1 hdr_beg(host) -i dev1.
    use_backend wiki_back if wiki
    use_backend dev1_back if dev1

backend wiki_back
    server wiki1 192.168.0.50

backend dev1_back
    server dev1-1 192.168.0.51

The above says that a request for wiki.domain.com (assuming domain.com was being forwarded to the server we set this up on) will be sent to 192.168.0.50. Specifically it’s saying:

  • frontend domain
    • I have a frontend named domain.
  • bind *:80
    • the front end is listening to requests on any ip on port 80
  • acl wiki hdr_beg(host) -i wiki.
    • I’m listening for a hostname beginning (hdr_beg(host)) with wiki (-i wiki.)
  • use_backend wiki_back if wiki
    • If the request is for wiki then I’m going to use some backend named wiki_back
  • backend wiki_back
    • I have some backend named wiki_back
  • server wiki1 192.168.0.50
    • The backend has a server with ip 192.168.0.50 that’s named wiki1

This isn’t really using a lot of the power of haproxy. For one we could set up load balancing on the backends with a line like, “balance roundrobin”, in a backend block. This is also only looking at the beginning of hostnames. We could set acl that listens for domain.com with, “acl domainDefault hdr_end(host) -i domain.com”. We could set up a acl that listens based on the url such as, “acl logs path_beg /kibana”. And that covers my knowledge of haproxy but haproxy has many more features such as the ability to deal with and change headers.

This (Invalid Apache directory – unable to find httpd.h under /usr/include/httpd/) error has been a thorn in my side while building a php rpm with the –with-apache php config argument. I’m running CentOS 6 and have the httpd-devel package installed which places the apache header files (including httpd.h) in /usr/include/httpd. Perms on the /usr/include/httpd directory are 755 and the files inside are 644.  Everything looked good.

Turns out that –with-apache builds the static apache 1.x module which isn’t really what you want if you have installed apache in the last 10 years. What you want instead is –with-apxs2 which will build the apache 2.x shared library.

One of the problems I recently needed to solve at work was moving (and transforming) a lot (TBs) of data from an old system to a new system. Solved it, and things were good. Unfortunately we (co-workers and I) have noticed that the script just stops working. It doesn’t crash it just stops doing anything. Through a bit of luck I discovered this was due to network IO (or lack thereof).

The problem I faced over the last few days is that the script stops transferring data over the network but doesn’t hit a network timeout. The transfer of data grinds to a halt and doesn’t trigger any of our monitoring. I’ve solved this with an inotifywait loop.

While the script is busy copying data it spits information out into a log file. It writes to this file a few times per second with occasional pauses of up to 25ish seconds to collect a new collection of work to transfer. It so turns out that inotifywait has a -t input which is timeout in seconds. If inotifywait gets a notify event it exits with code 0 and if it times out it exits with code 2. With that bit of knowledge in hand, I wrote a wrapper script that launches the aforementioned script in the background and then goes into an infinite loop. It then sets up an inotifywait with -t 60 and -e modify on the log file. If the exit code is 2 then it runs a ps aux | grep to get the pid, kills the script, and relaunches it in the background. In pseudo-code:

./myScript.sh &
loop
   inotify -t 60 -e modify /path/to/my/logfile
   if exitCode == 2 then
      pid = ps aux | grep myScript | awk '{print $2}'
      kill pid
      ./myScript.sh &
   end
end

With that in place the script runs until it stops outputting to the log file for 60 seconds which this wrapper script interprets as it failing. In that event the wrapper script kills the script, restarts it, and we’re back up and running again. Not something I would consider a long term solution but this isn’t a long term problem.

The most painfully obvious reason to keep a development journal came to me at work today. One of the major reasons to keep a lab journal is to log the experiments you run. This is so future you and others can read about how you did things and what results there were so that standards can be upheld and test can be rerun if needed. When I wrote my last post—and this has been baffling me all day—it did not occur to me that experiments = tests. When you are testing code write down what you did and what the results were. If you do that you won’t have to worry as much about forgetting what you tested, forgetting the results of your tests, convincing a coworker that you tested something, and convincing a coworker of the results of your tests. Future you will thank you.

Inspired by scientific lab journals I have decided to start a development journal. This may be considered a waste of time given programs being fairly easy to reproduce, comments acting as documentation within code, and source code tools providing a history which can be used to prove originality. I believe it can have other uses that will make it a valuable use of time.

Why?

One straightforward benefit is taking the extra time to consider decisions. Notes describing problems and notes describing solutions can both benefit. When bug fixing the act of listing all of the known conditions and factors may uncover the problem. Writing down what was done and the justification for doing it creates an internal dialogue which opens the doors for self evaluation.This evaluation gives time to analyze whether a solution solves the problem and whether another solution may solve the problem in a way that could be considered better. In the few days I have been keeping a development journal I have noticed a change in the design of the projects I have been working on towards a more structured use of design patterns. This is not enough to say the code I am writing is better but I consider it promising enough to continue with the journal.

Another potential benefit is the switch from keyboard, mouse, and screen to pen and paper providing a boast to creativity. This mental state shift will allow the problem to be looked at differently. Writing down the problem gives times to restate the problem which can also lead to being more creative. Sadly I can’t seem to find any evidence for this point so I’m not sure how strong of a benefit this is.

There are other benefits as well. Paper is a particularly good medium for drawing out UML, flow, state, and other forms of diagrams. It provides a place to leave more justification for why something was done than what is typically put into comments.

Even though writing out a journal is likely to slow down the development process some I believe the potential for better code outweigh this risk. I will aim to write another post when I have more anecdotal information. Also, I’ll edit this post if I can think of any other potential benefits.

How?

While entertaining the idea of writing a development journal I read several posts about lab journals to determine how to go about things. Of particular aid was Maintaining a laboratory notebook by Colin Purrington. Unfortunately the world of science and computer science are different enough to make much of the advice inapplicable. There will be much to learn regarding what belongs and what does not but I believe there are a few things that carry over very well or seem useful enough to act as a starting point.

If you’re going to be spending a decent amount of time writing you might as well enjoy the process. Therefore, buying a good quality notebook and pen seem like appropriate first steps.

For a notebook I decided on a Kokuyo A5 B (the one with 28 lines) notebook with one of the A5 covers. The notebook+cover was more expensive than other nice notebooks such as a Leuchtturm (my second choice) but the cost of replacement notebooks is much cheaper. I personally wanted a notebook with faint or no lines. My Kokuyo notebook has faint blue lines, the Leuchtturn notebook I was looking at had dots instead of a being boxed or lined, and a notebook such as Whitelines notebooks may be work looking into. I would advice against getting notebooks with spiral bounding, perforated pages, or using a 3 ring binder as you really shouldn’t be removing pages from the notebook. I would also advise against using a 3 ring binder as you shouldn’t be adding pages to the journal. I would advise you to get a good quality notebook with glued or threaded binding. You want your notebook to last and be a potential reference.

Pens are probably much more about personal preference though I would say you should definitely be using a pen. Knowing you are about to write something down that you can’t erase, in a notebook that you won’t remove pages from, that someone else might read, gives an added bit of deliberation before you commit thought to paper. A nice pen also writes better and lasts a lot longer on the paper. I am currently using a fine tipped fountain pen because I enjoy writing with it. I would also be happy with a 0.38mm G2 gel rollerball or a 0.3mm Micron. Point here is to write with what you like to write with. You’re going to be doing a lot of it so you should enjoy the process.

To prepare your notebook you should leave a few pages in the front for a table of contents. If your pages are not already numbered you should number the remaining pages of the book. Put your name on the cover or the first page as well as the start of the date range the notebook will cover.

I am not completely sure of the best way to add entries into the book. It makes since to title projects as that gives a name for the table of contents and ties together multiple entries on the same topic. A date should be given to each topic. If a topic continues for multiple days without another topic interrupting it then you should probably put a date in the notebook for the start of that days notes. I would also think it generally better to not put multiple topics on a page though leaving sufficient space between topics on a page may make it clear enough that there are different topics.

This is all that I currently know and speculate about a development journal. I will provide further posts in the future as I discover more.

Here are some articles I found interesting from this week.

Automatic Deploys At Etsy by rlerdorf
 In which a deployment strategy designed for 0 interruption time is discussed.

Design Patterns after Design is Done by Jim Bird
Frames refactoring and code legibility in terms of design patterns discussing what works and what does not work.

6 Warning Signs That Your Technology is Headed South by Christopher Taylor
Discusses technological and personal costs of using old technology.

The Date Question by George Dinwiddie
Discusses software deadlines trying to get to the root of the question, “When will this software be done?”

Ambient occlusion for Minecraft-like worlds by Mikola Lysenko
Discusses Ambient Occlusion in voxel based games using Minecraft as a specific example.

Erlang at Basho, Five Years Later by Justin Sheehy
Sheehy talks about the challenges that were expected using Erlang and the challenges that were actually encountered.

15 workplace barriers to better code by Peter Wayner
A list of things that annoy programmers and get us out of “the zone.”

Why Javascript by Alex Russell
Russell defends javascript on the web and confronts frequent arguments against it’s use.

How Clutter Affects Your Brain (and What You Can Do About It) by Mikael Cho
A not exactly minimal article on minimalism and what effect clutter can have on your ability to focus and be creative.

Recently I needed to install Apache Thrift (version 0.9.0). This did not go as I hoped. Make failed with a message along the lines of “No rule to make target…” and something about ParentService.cpp. After trying several combinations of settings I managed to get pasted the issue. There isn’t really any active development on thrift so I did a git clone from their git repo. Manually installed autoconf because centos 6’s version was outdated –my distro from 2012 has a less up to date version than required by a product last updated in 2010. Ran the ./bootstrap.sh, ./configure, make, sudo make install. Done. Hope that helps someone.

tl;dr: If running into a “No rule to make target” error during the make of Apache Thrift 0.9.0, then head from git and build that.