Testing a filesystem with the ping_pong tool
The ping_pong tool is a tiny piece of C code that can be used to tell you some very useful things about a cluster filesystem. If you are interested in seeing if your favourite cluster filesystem might be used for CTDB/Samba then I highly recommend starting by running ping_pong and making sure it passes.
Download it from http://junkcode.samba.org/ftp/unpacked/junkcode/ping_pong.c
Compile it like this:
cc -o ping_pong ping_pong.c
What it tests
The ping_pong tool can test the following aspects of your cluster filesystem
- If it supports coherent byte range locks between cluster nodes
- How fast it handles lock contention
- If it supports coherent read/write IO between nodes
- How fast it handles contended IO between nodes
- If it supports coherent mmap between nodes
- How fast the mmap coherence works
All this in 176 lines of C ! What a bargain.
I was also rather surprised to find that it isn't uncommon for this test to crash (or lockup) cluster filesystems that haven't tried it before. I guess that just shows how much filesystem developers tend to neglect locking.
Testing lock coherence
Login to several nodes of your cluster. Start by running ping_pong on just one of the nodes like this:
ping_pong test.dat N
where N is at least 1 more than the number of nodes you will be testing on. The filename (test.dat in the above) should point at the same shared file on all cluster nodes.
You'll see ping_pong print out a lock rate once per second. As you are running only on one node, you should expect to get a very high rate, as you have no contention. So for a typical server style CPU you should expect to get a rate of perhaps 500k to 1M locks/second. If ping_pong doesn't print a locking rate once per second then you have a bug. Talk to your filesystem vendor.
Now start a second copy of ping_pong on another node in your cluster. Use exactly the same parameters. You should see that the locking rate drops dramatically. That is because the cluster filesystem now has to handle the contended case for every lock it grants. On a gigabit network you should hope to now get a locking rate of between 1k/sec and 10k/sec depending on how fast the lock coherence algorithms of your cluster filesystem are.
Again, if you don't see a lock rate printed once per second, or if the locking rates shown in the two instances are not almost equal, or if the locking rate did not drop when you ran the second copy, then you almost certainly have a buggy cluster filesystem. Talk to your vendor.
Now start a 3rd, copy of ping_pong, and keep going up one at a time, noting how the locking rate changes as you add nodes. That shows you how well the lock coherence algorithms scale with the number of nodes.
Finally, kill of the ping_pong test one node at a time. As you kill them, you should see the locking rate increase until you get back to the single node case. If it doesn't increase as expected, then you have a filesystem bug. Contact your friendly vendor.
Testing IO coherence
ok, so you managed to pass the lock coherence test. Great! Now lets look at IO coherence.
Kill all your copies of ping_pong, and start the whole process again (adding one at a time) but this time add the command line switch -rw. So you'll do this:
ping_pong -rw test.dat N
You'll probably see a much lower locking rate. This is because ping_pong is now doing a one byte read and a one byte write after each lock. It also prints a "data increment" value, which should be equal to the number of nodes that is running the ping_pong test (I'm afraid it only supports up to 256 nodes with this test).
If the "data increment" value doesn't equal the number of nodes currently running the ping_pong test, or if it doesn't print a lock rate once per second, or if the lock rate starts to approach zero, then you have a bug. Talk to your vendor.
The locking rate this prints is a simple measure of your IO contention rate. Bigger numbers are better.
Testing mmap coherence
If you add the -m switch to ping_pong along with -rw then it will do the IO coherence test via mmap. It isn't absolutely essential that a cluster filesystem supports coherent mmap for CTDB/Samba, but it's nice for bragging points over other cluster filesystems. If your cluster filesystem doesn't pass this test then just use the "use mmap = no" option in smb.conf. Even if it does pass this test that option may be a good idea on most cluster filesystems.
How it works
Well, you could just read the code. Did I mention it's just 176 lines long?
Anyway, for those of you too lazy to read C or (gasp!) unable to read C, what ping_pong does is a "one foot on the ground" test, aiming at defeating any possible optimisations or shortcuts that clusters filesystems might use to prevent you measuring their coherence times.
So the test does this locking pattern:
lock byte 0 lock byte 1 unlock byte 0 lock byte 2 unlock byte 1 lock byte 3 ... ... lock byte N unlock byte N-1 lock byte 0 unlock byte N ... etc etc
all done in a tight loop. If the filesystem is behaving correctly then two nodes can't lock the same byte at the same time. As each instance of the ping_pong program always has "one foot on the ground", meaning one byte locked, this means that two instances of ping_pong cannot overtake one another.
This means lot of contention. The filesystem can't optimise away this contention with cache mechanisms, so we end up measuring the real contention times that the filesystem achieves.