Here at AboutUs, we have hundreds of GBs of data. Most are static, immutable data, keyed by a specific primary key.
With such a use case, key value databases make perfect sense. After looking at various options, we chose to experiment with Tokyo Cabinet & Tokyo Tyrant first, because Tokyo has such a low barrier to entry.
A couple of benefits that we saw immediately from using Tokyo:
- It is fast. Not as fast as in-memory store, but if you have more disk space than memory for storing data — for example, cache data — Tokyo is an attractive choice.
- Tokyo’s performance is affected by number of keys. But it took far more keys to slow it down than we expected. Tokyo didn’t show any significant slowing even when we pushed 10 million keys in a single database. We were pleasantly surprised with its performance.
- It can handle memcache protocol. This lowers the barrier to entry, making Tokyo attractive for experimentation.
Installation
Installing Tokyo isn’t trivial, but it’s not hard, either. It requires libbz2-dev to be installed first:
# For RPM based sudo yum install libbz2-dev # For debian based sudo apt-get install libbz2-dev
If you want to use Tokyo’s binary protocol — via rufus-tokyo for example — you need to have both Tokyo Cabinet and Tokyo Tyrant installed. We guessed that we might need just the header files, but decided we did not want to manage those files manually.
First Try — Success!
In our first non-experimental run, we let Tokyo Tyrant handle our page cache. We have been interested in alternative databases for a while, but page cache was the first project that allowed us to try one out.
Each AboutUs domain page is built from at least three individual widgets. Because we intend to create more widgets, building pages on the fly, every render, is clearly inefficient. That’s why we came up with the page cache solution.
Deploying page cache is a smooth process. The size of our casket.tch file has grown to 51 GB, but so far, we’ve had to restart Tokyo Tyrant only once.
Second Try — Fail!
Our next opportunity to try out Tokyo was in making our domain information easily available to our web application. Our total domain information is about 160 GB. It’s not a trivial size to handle, and we still want quick access to it, especially since each piece of information is keyed by domain name.
Since our first attempt was a success, we thought Tokyo Tyrant would have no problem with handling our 160 GB opportunity.
Filled with hope, we naively pushed the entire 160GB to Tokyo Tyrant without throttling, using a simple script. As Tokyo grew, approaching 40 GB, it began slow down significantly. After 60GB, it died. We could neither restart the ttserver, nor recover the casket.tch.
In hind sight, we could have:
- Used more than one database.
- Sharded multiple instances of Tokyo based on domain names.
But we didn’t. Instead, we recovered quickly, and used Amazon’s SimpleDB.
Conclusion
We are still very happy with Tokyo’s performance in handling our page cache.
With better design, we could have used it to serve our domain information. But Tokyo is competing against other databases we are currently evaluating, including Amazon Web Services. For us, ease of use is a big deciding factor. But we’re still evaluating.
Related posts:
{ 5 comments… read them below or add one }
Thanks for this interesting article. What do you mean by “we’ve had to restart Tokyo Tyrant only once”? Did the server crash?
Yes, the ttserver crashed and we weren’t able to restart it.
Glad you’re happy with it!
Are you using any replication with TT?
We don’t at the moment because we haven’t had to.