Code Kata: Prefix search (Part 3: more fun with tries)

In my last post, I demonstrated that crit-bit trees are really fast. This time, I want to talk a little bit about my specific implementation of these tries, and some interesting applications you can use them for.

I also want to acknowledge some of the prior art that I looked at before deciding to roll my own implementation. Professor Bernstein himself has written an implementation himself for his portable qhasm assembler. But since it’s part of a larger software product, I didn’t want to disentangle it. Adam Langley has an implementation on github with some outstanding documentation that was very useful, but it only stores zero-terminated strings, not arbitrary data. While I didn’t read his code too closely to keep my own implementation challenging, I did steal the clever bit-mask trick from it.

More Things To Do With Tries

While they are really good at string lookup, what crit-bit trees are best at is finding strings with a common prefix. I implemented two separate functions for this:

const void * cb_find(critbit_tree * cb, const void * key, size_t keylen);

int cb_foreach(critbit_tree * cb, const void * key, size_t keylen, int (*match_cb)(const void * match, const void * key, size_t keylen, void *), void *data);

The first function is straightforward, it looks for an exact match of the first keylen bytes of key in the given tree. The second function calls the match_cb callback on every match, which is a much easier pattern to implement than trying to return all matches. Where would you store them? Dynamic memory allocation is right out, of course. I compromised a bit by writing the b_find_prefix function, which takes a buffer supplied by the caller, but beare that paging through the results by calling it repeatedly is quadratic in complexity, so you shouldn’t do that. It’s really just a hack.

Use As A Fast Key-Value Store

One really nifty thing that I have been using this code for is as a fast key-value store. You could insert a couple of strings like “config_value=42” in the tree and look for the first match with the prefix “config_value=”. There are two wrapper macros in critbit.h to simplify this, and being able to store data other than zero-terminated strings means I am not limited to storing string values. Here’s some example code that shows their use:

int i = 42;
char buffer[20];
const char * key = "herpderp";
void * match;
size_t len;

len = cb_new_kv(key, strlen(key), &i, sizeof(int), buffer);
cb_insert(&cb, buffer, len);
if (cb_find_prefix(&cb, key, strlen(key)+1, &match, 1, 0)) {
cb_get_kv(match, &i, sizeof(int));
}

What these macros essentially do is this: cb_new_kv creates a key|0|value memory block (written into buffer). Once you insert it, you can search for it by prefix-search for the key plus its zero-terminator, and use cb_get_kv to extract the value from the first match.

Testing

I approached this project with a test-first attitude, writing tests before writing the actual implementation, and fixing bugs until the tests passed. I wrote almost as many lines of tests for this as I wrote actual code, and it still wasn’t enough – one or two small issues slipped through. Lots of pointers and memory-pokery is always tricky, and I think this is one of those cases where it should be undisputed that unit tests are just incredibly useful.

Code Kata: Prefix search (Part 2: crit-bit trees)

The problem with the prefix search problem I described in my previous post is that there is most likely no good data structure in your language to do this with. There certainly isn’t one in C. But the good news is that we get to build our own! How often do we still get to write data structures from scratch in this day and age? And when is it ever a good idea?

Of course, you should always try to go with the simplest thing that works: In my case, that’s building an array of your million strings, and to search for a prefix, just strncmp each entry. We’ll benchmark that, see how well it performs, and use it as our baseline for a good implementation. 

Here’s a link to the naive implementation that I wrote. Finding a needle in a haystack of a million strings takes almost not time at all, but building the array takes a while, so I’ll just do 1000 searches, each for a single needle, in a set of  a million strings, followed by 1000 searches for a prefix that occurs exactly 11 times:

$ bin/naive -x 100 1m 111111
init: 0.30s
find: 0.82s
$ bin/naive -x 1000 1m 111111
init: 0.26s
find: 7.51s

8 ms per lookup may not sound like much, but it’s actually not a very good time at all. Excellent! We like this in a naive solution that we want to beat 🙂

Introducing crit-bit trees

The data structure we want for this sort of thing is conventionally known as a radix-tree or trie. I think that’s either pronounced like try or like tree, but either way it’s really confusing. Also, it really helps to think of the data we are operating on as a string of bits, not characters. The specific type of trie that I decided to implement is from this paper by the always amazing djb.

The general idea of this approach is that we have a binary tree, and if some of our bit-strings have a common prefix x such that one string has a prefix of x0 and the other x1, then the tree shall contain a node where all the strings beginning with x0 are found to the left of the node, and all strings starting with x1 are found to the right.

What’s neat is taht we don’t need a node for every possible bit position either, but only for those where actual strings in or dataset diverge. To give an example, if we had only two strings 0001001 and 0100100, (these start to differ at the second bit), then we create an internal node in my tree that contains the bit-position, and two child pointers which point at external, or leaf, nodes that contain our two strings. To find a string in this, we follow a trail of internal nodes from the root, check the bit position in our needle string, and recurse down to the respective child node. Once we reach a leaf, we compare our needle to the string stored here, because it’s possible that our needle and the stored string differ at other bits that we did not check on our way down.

Apart from being a data structure for key lookups, an interesting property of tries is that all entries beginning with a particular substring are children of the same node. That makes this data structure the perfect candidate for our problem of prefix search. So, how fast is it? Actually, it’s way too fast to measure with just 100 searches, but we can do simple divisions, so let’s do a million lookups and prefix searches instead:

$ bin/benchmark -x 1m 1m 111111
init: 0.62s
find: 0.25s
$ bin/benchmark -x 1m 1m 11111
init: 0.66s
find: 1.09s

250 nanoseconds per lookup, and a microsecond per prefix-search. That’s a pretty neat improvement of four orders of magnitude! In my next post, I am going to talk a little bit about the actual implementation and interface. I have put the source code on github.

Some Thoughts

  1. My dataset is pretty small. A million keys like this can probably fit into a modern CPU’s L2 cache if they are small enough. But trust me, the numbers are only getting better when the dataset grows!
  2. Each internal node is really small: I need two pointers, a word to store the bit position (my implementation is excessive and uses size_t, plus a byte to store a bit mask. You could go a little smaller at the cost of performance and/or size limitations, but it’s on the order of one or two dozen bytes. All internal nodes are the same size, so you would want to pair this with a clever memory allocator.
  3. Initialization is somewhat slower, because creating the tree is more complicated than just doing a couple of allocations. However, you could conceivably store the final tree in a compact format on disk once it’s generated, and load it as a big chunk of memory, where you replace a pointer to an internal node with an integer index into your array of internal nodes.
  4. The crit-bit tree also allows fast deletion, which I haven’t benchmarked, mostly because I have nothing to compare it to.
  5. All times are taken with clock() on a JoyentCloud VM. I have no idea what kind of CPU that translates to.

Atlantis TDD (4) Working with Legacy Code

After all this prep-work, it’s time to actually work on the game. Which game? I’m going to use Atlantis 1, the original game that started it all. The movement code in it is really gross, and the code only compiles under DOS, so the first order of business is to clean it up a bit. The result compiles under Linux without warnings, and even if nothing else comes out of this project of mine, at least there is now an Atlantis 1 code base you can just use.

This is classic legacy code: It’s not very modular at all, and there are absolutely no tests. Before I make any changes, I want to make sure that I won’t break anything, so I am going to put the movement code under testing. For this, I’m extracting the movement code into a process_movement function. Next, I’m going to make sure all code paths in that function are tested at least once.

static void test_movement(CuTest * tc)
{
  unit * u;
  region *r1, *r2;

  resetgame();
  r1 = createregion(0, 0);
  r1->terrain = T_PLAIN;
  r2 = createregion(1, 0);
  r2->terrain = T_PLAIN;
  createregion(2, 0)->terrain = T_PLAIN;
  u = createunit(r1);
  strcpy(u->thisorder, "move east");
  process_movement();
  CuAssertPtrEquals(tc, u, r2->units);
}

Since this test is only protecting the original Atlantis game against regressions, I can write it in terms of the Atlantis code base and its objects (and use r2->units directly, etc). As long as this test passes, I can be certain that regular unit movement works. This kind of protection gives enormous peace of mind to a programmer who is working with a code base that isn’t his own (or is son old he doesn’t remember writing it). A couple more tests around modifications in the Atlantis source needed to be written, you can find them in the atlantis_test.c source on github.

Next up, I’m going to slap together an interface to stitch the Atlantis 1 implementation into my interfaces! And maybe then I can write my movement logic?

Atlantis TDD (3) CMake is awesome

I started this project with a simple Makefile. Eventually, I’d gotten to 30+ source files, and the Makefiles were getting cumbersome to update. I mean, judge for yourself: How easy is this?


cmake_minimum_required(VERSION 2.4)
project (logic C)

include_directories (..)

set (MOCK_SRCS ../mock/region.c ../mock/unit.c
../mock/keyvalue.c ../mock/mock_svc.c) add_executable ( movement_test movement.c movement_test.c ../cutest/CuTest.c ${MOCK_SRCS} ) add_test(movement_test movement_test)

This is the rule to build the tests for my movement code and link them with the mock implementation of the game structures. It’s platform-agnostic, too: On Windows, I can use this to build Visual Studio project files. On Linux, cmake generates Makefiles for me that are far cleverer than any I ever wrote by hand.

This isn’t my first project to use CMake. I’ve looked at a lot of build systems over the years, and this is the one I’ve stuck with. There are still things that are hard to do in it, but they are the kind of things that are huge pains in every other build system, too. But the easy stuff, the 95% of the work, is actually easy.

This is the first time I’ve used CTest, though, which ships with CMake. You can add all your tests with the ADD_TEST command, and CMake will generate the input for CTest from that, so you can run all the different tests in your project with a single command.

Atlantis TDD (2) Unit Testing

Before I get to write any more implementation, it’s time to write some tests for my interfaces. I like to use CuTest for this, because it’s small, pure C, doesn’t try to be fancy, and there’s only a single file to link to. And now I’m writing tests for every function in the interface and its edge cases:

static void test_get_regions(CuTest * tc)
{
  struct region * results[4];
  void * cur;
  icursor * icur;
  int i, n = 0;

  svc.reset();

  for (i=0;i!=3;++i) {
    svc.regions->create(0, i);
  }
  cur = svc.get_regions(&icur);
  CuAssertPtrNotNull(tc, cur);
  CuAssertPtrNotNull(tc, icur);
  CuAssertIntEquals(tc, 3, icur->get(cur, 4, (void**)results));
  for (i=0;i!=3;++i) {
    int x, y;
    svc.regions->get_xy(results[i], &x, &y);
    CuAssertIntEquals(tc, 0, x);
    n |= (1<<y);
  }
  CuAssertIntEquals(tc, 7, n);

  CuAssertIntEquals(tc, 3, icur->advance(&cur, 4));
  CuAssertIntEquals(tc, 0, icur->advance(&cur, 4));
  if (icur->destroy) icur->destroy(cur);
}

This function tests the cursor returned by the get_regions function that I talked about in my previous post. As tests go, it’s pretty big: after first creating 3 regions, it asserts that the cursor will return those regions, will only return three even if asked for four, and that it’s able to advance until it reaches the end. Notice how, once again, the code is entirely written in terms of the new interface. I tend to write “struct region” instead of “region” for the type here, to remind myself that I don’t have access to the members of my objects, and apart from knowing it’s a struct, I really don’t know anything about it.

Now, I can’t run this test yet, because to don that, I would need an actual implementation of the interface to test. Since I haven’t broken out the Atlantis source yet, I’m going to write my own implementation of the data structures that does the bare minimum, and hook it up to the interfaces. You can see the result of that in the mock directory of my github project. This reference implementation will also allow me to test my game logic later without having to link against Atlantis or Eressea.

Atlantis TDD (1) Service Provider and Interfaces

The first thing I’m going to need is a service provider that serves the interface to the game. I’m a little bit spoiled by testing in PHP, where it’s easy to use an associative array to build an interface on the fly, but I obviously won’t get this here. Here’s what my interface roughly looks like:

typedef struct iunit {
  struct unit * (*create)(void);
  void (*destroy)(struct unit *);
  struct unit * (*get)(int);

  int (*get_uid)(const struct unit *);
  struct region * (*get_region)(const struct unit *);
  void (*set_region)(struct unit *, struct region *);
  ... /* more of the same */
} iunit;

/* similar struct for iregion, iship, ibuilding TBD */

typedef struct igame {
  struct iunit * units;
  struct iregion * regions;

  int max_directions;

  void (*reset)(void);
  void * (*get_regions)(struct icursor **);
  ... /* more of the same */
} igame;

That’s a lot of function pointers! As I said in my last post, the idea is to write code that is independent of implementation, so at no point can I use the actual unit structure that’s in Atlantis, or even access u->no directly. Some other game might have an entirely different way of storing ids, after all. Instead, I will be using svc.units->get_uid(u), which is more to type, but easy to plug into with an implementation-dependent function. You can see the rest of the interface classes on github, if you are curious.

I’m pretty pleased with the icursor interface. The global list of the game’s regions in Atlantis is a next-pointer chained linked list. In Eressea, it is an unrolled linked list, and I assume that in A5, it is a std::list. Or maybe it’s a hashtable? I will be iterating over all regions a lot in the game logic, and similar issues will arise for all units in a region, etc. So here’s the icursor interface:

typedef struct icursor {
  void (*destroy)(void * cursor);
  int (*get)(void * cursor, int n, void * results[]);
  int (*advance)(void ** cursor, int n);
} icursor;

A function like svc.get_regions returns a void * typed data pointer (the cursor), and an icursor interface that can be used to iterate the particular type of container it represents. You can either get one or more elements from a cursor, or advance it. When you’re done, release the cursor by calling destroy. Here’s an example:

  icursor *icursor;
  void *cur = svc.get_regions(&icursor);
  region *r;
  while (icur->get(cur, 1, &r)) {
    printf("region %d,%dn", r->x, r->y);
    icur->advance(&cur, 1);
  }
  icur->destroy(cur);

That’s the first game logic I’ve written since I started! If you think I’m off to a good start and will be writing that movement code next, think again: It’s time to write tests first!

Code Kata: Unrolled Linked List (in C)

The other day I needed a linked list for the umpteenth time, and instead of going with the old (data, next) pairs, I decided I wanted something a bit more efficient, like an unrolled linked list. This also provided a good opportunity to use the CuTest unit testing framework and do some test-first development.

The result is pretty nice, testing actually found a small bug, despite the fact that I was sure can do linked lists in my sleep, and I was so pleased with the performance characteristics (better cache efficiency and far fewer allocations) that I replaced all the lists in Eressea with it.

Code sample:

quicklist *q, *ql = 0;
int i;
// insert element at index:
ql_insert(ql, 0, foo);
ql_insert(ql, 0, bar);
assert(ql_get(ql, 1)==foo);
assert(ql_length(ql)==2);
// push element at end, get indexed element:
ql_push(ql, baz);
assert(ql_get(ql, 2)==baz);
// iterate over list:
for (q=ql,i=0;q;ql_advance(&ql, &i, 1)) {
  printf("%p ", ql_get(q, i));
}

Code is on github. Use as you please.

Adventures in Optimization

This weekend I thought it would be a good idea to run the Eressea server both with and without optimizations enabled and compare the output. In theory, I thought, optimization should not change the results, and different results would hint at bugs like uninitialized variables or illegal memory access.

Needless to say, the output wasn’t the same. It was slightly different, and it looked like a small error snowballing towards the end. I’ll spare you the tale of a day trying to narrow down the exact location, and cut right to the chase:
Continue reading

Eressea: text vs. binary data files

I’m posting about this because I like talking about code and nobody is on IRC. If you don’t like code, you may want to skip this. Also, none of this is rocket science, but it was a nice exercise.

Traditionally, the data files in Eressea have been text files. You know the kind where you’re doing a lot of fprintf and fscanf everywhere in the code. There are usually two advantages associated with that: small integers take up less space (2-3 bytes instead of 4), and when something goes terribly wrong, you can edit the data with vi.

The latter had long been a mixed blessing, with edits occasionally doing more harm than good, and since the introduction of a shell and script access to all the game data, it’s not really been necessary. Which left three reasons not to switch: Slight space improvements, the amount of work to change over to something else, and backward compatibility. However, I had some time on my hands recently and decided to tackle all of those issues.
Continue reading