Nando's blog
About Nando Nando's blog Posts about video, movies etc. Posts about computing Posts about music Posts about literature Philosophical posts Posts about programming

Installing Tokyo Cabinet and its Python driver

Tokyo Cabinet is the fastest database available, and it comes in several flavors. Here is how to install it in Ubuntu and use it with Python.

After realizing that CouchDB is not appropriate when you know you will need ad hoc queries (which doesn’t mean CouchDB isn’t very cool), I am trying out Tokyo Cabinet.

Here is a post about it, using Ruby.

Tokyo Cabinet offers several types of database, but the version that comes in Ubuntu 8.10 is missing the fixed-length and the table database, so I had to compile it:

sudo apt-get install checkinstall build-essential libbz2-dev
# Dependencies installed; now compile:
./configure --prefix=/usr
make clean
make
# checkinstall -R    # would create an RPM package
sudo checkinstall -D # creates and installs a Debian package
# Now you have a package that you can install AND uninstall, instead of
# sudo make install

Now the library, programs and headers are available, so we can install the Python driver. The most well-known is the old pytc which also doesn’t offer all kinds of databases. tc is the Python driver we need. Here is the author’s blog.

I download the latest code from github, uncompress it, enter its directory, then:

./setup.py install

That will succeed in Ubuntu 8.10 as long as the header files are found.

Unfortunately this driver is only nearly complete. As of 2008-04-03, the table database lacks a close() method, and the query API is being finished (you cannot execute queries yet).

Anyway, what software are we going to write using Tokyo Cabinet? How about an ambitious object-oriented database for Python apps? Something to replace ZODB? I am talking about pykesto and I wish you would help me write it.

However, it will take time to get there, and we might have to start by creating a higher-level interface to Tokyo Cabinet, because right now instantiating, for instance, the table database involves passing ugly flags like this:

import tc
t = tc.TDB('test.tdb', tc.TDBOWRITER | tc.TDBOCREAT)
# The above means open the file 'test.tdb' for writing and
# create it if it does not exist.
t.put('row id', {'col id':'value'})
t.get('row id')
# Out: {'col id': 'value'}
t.close()

Another reason to create a higher-level API is that the query API for the table database also involves lots of flags. I am sure this can easily be made Pythonic. I will put the code at pykesto as soon as I have it.