Bitcask Capacity Calculator

These calculators will assist you in sizing your cluster if you plan to use the default Bitcask storage back end.

This page is designed to give you a rough estimate when sizing your cluster. The calculations are a best guess, and they tend to be a bit on the conservative side. It’s important to include a bit of head room as well as room for unexpected growth so that if demand exceeds expectations you’ll be able to add more nodes to the cluster and stay ahead of your requirements.

Recommendations

To manage your estimated 183.9 million key/bucket pairs where bucket names are ~10 bytes, keys are ~36 bytes, values are ~36 bytes and you are setting aside 16.0 GiB of RAM per-node for in-memory data management within a cluster that is configured to maintain 3 replicas per key (N = 3) then Riak, using the Bitcask storage engine, will require at least:

  • 5 nodes
  • 11.5 GiB of RAM per node (57.3 GiB total across all nodes)
  • 33.2 GiB of storage space per node (166.0 GiB total storage space used across all nodes)

Details on Bitcask RAM Calculation

With the above information in mind, the following variables will factor into your RAM calculation:

Variable Description
Static Bitcask per-key overhead 44.5 bytes per key
Estimated average bucket-plus-key length The combined number of characters your bucket + keynames will require (on average). We’ll assume 1 byte per character.
Estimated total objects The total number of key/value pairs your cluster will have when started
Replication Value (n_val) The number of times each key will be replicated when written to Riak (the default is 3)

The actual equation

Approximate RAM Needed for Bitcask = (static bitcask per key overhead + estimated average bucket+key length in bytes) * estimate total number of keys * n_val

Example:

  • 50,000,000 keys in your cluster to start
  • approximately 30 bytes for each bucket+key name
  • default n_val of 3

The amount of RAM you would need for Bitcask is about 9.78 GBs across your entire cluster.

Additionally, Bitcask relies on your operating system’s filesystem cache to deliver high performance reads. So when sizing your cluster, take this into account and plan on having several more gigabytes of RAM available for your filesystem cache.