At Fasterize, we mainly use Zookeeper for storing our customers configurations.
We use its notification system to broadcast any changes to all our optimization engine processes. Indeed, every process of our optimization engine keeps all customers configurations in memory for instant access.
Every process which needs to read our customers configurations subscribes to changes using Zookeeper watchers.
In details, these watchers trigger an event every time a change occurs (a configuration is added, deleted or updated) and all registered processes receive the updated configuration.
It worked great until some real world events occured: irregular network latency, network partitions, unavailable server, …
We came accross situations where some processes did not receive any update anymore, could not connect to Zookeeper, flooded the network after a server became unavailable, …
Actually, Zookeeper is not directly responsible of all failures we encountered: on the Zookeeper cluster side, we did not notice any unexpected behavior. And this is rather reassuring as Zookeeper is widely used by a lot of popular projects.
In our case, most of these failures were attribuable to the Zookeeper client driver and our stack:
– our optimization engine is implemented with NodeJS and thus we were not using the Java driver but the C driver. Unfortunately, the C driver does not seems to be as stable as the Java driver (which everyone with a good Zookeeper experience seems to be using).
– Zookeeper native API and specifically its watcher API is low level (for example, every time a watcher is triggered, the client has to recreate a new watcher). We thus were using a higher level abstraction implemented by node-zkplus NodeJS module. At the time we started using these NodeJS modules (the low level driver binding node-zookeeper and node-zkplus), both had corner cases which made us use some unmaintained fork and/or fix it ourself. We finally ended up with something that mostly worked but was not (easily) updatable.
After some time trying to stabilize our Zookeeper stack, we decided to take another way.
Like others have done before us, we decided to develop Geonosis.
Geonosis is a little Zookeeper daemon which uses the Java Zookeeper driver and which synchronizes any Zookeeper subtrees to local files.
Our optimization engine now just listen for local files changes and has no more any Zookeeper driver dependencies.
Moreover, now with Geonosis, our optimization engine is independent from Zookeeper and can start with (possibly stale) customers configurations if Zookeeper is unavailable.
Geonosis is now open source and available on our Github site. If you are is the same situation as we were, don’t hesitate to try it and give us some feedback. Pull Requests are welcome 😉