Pinterest's Wide Column Database in Python with RocksDB

In a recent article on Pinterest Engineering Blog, they desribed in details how they implemented in C++ a RocksDB-based distributed wide column database called Rockstorewidecolumn. While their system tackles petabytes and millions of requests per second with a distributed architecture, the core concepts of mapping a wide column data model onto a key-value store like RocksDB are fascinating.

This article explore how to implement a simpler, single-instance version of Pinterest’s Rockstorewidecolumn in Python using the power and efficiency of RocksDB.

What’s a Wide Column Database, Anyway?

Think beyond traditional relational tables with fixed schemas. A wide column database offers:

This model is great for use cases like user profiles where users have varying attributes which can be available for some users and not for others, time-series data, or, as Pinterest showed, storing user event sequences.

From Wide Columns to Simple Keys & Values

RocksDB is an incredibly fast embedded key-value store. However, It doesn’t inherently understand “rows,” “columns,” or “versions.” It just knows keys and values, and both of which are byte strings. Our main task is to cleverly design a key structure that lets us represent our wide column model.

From Pinterest’s article, the Data Model Mapping (or Logical View) from Wide Columns to Key-Value looks like this

To store a specific cell (a value for a given dataset, row, column, and time), we can simply concatenate these elements into a single RocksDB key and use a separator like the null byte \x00. The choice of the good separator is crutial as so what we don’t confuse it with characters from the other attributes. This Storage View is visually explained with the following diagram.

+----------------------+-----------------+-------------------+-----------------+-----------------------+-----------------+-----------------------------------------+
| dataset_name_bytes   | KEY_SEPARATOR   | row_key_bytes     | KEY_SEPARATOR   | column_name_bytes     | KEY_SEPARATOR   | timestamp_bytes                         |
+----------------------+-----------------+-------------------+-----------------+-----------------------+-----------------+-----------------------------------------+
| (String as UTF-8)    | (Null Byte `\0`)| (String as UTF-8) | (Null Byte `\0`)| (String as UTF-8)     | (Null Byte `\0`)| (8-byte uint64, Big-Endian, Inverted)   |
+----------------------+-----------------+-------------------+-----------------+-----------------------+-----------------+-----------------------------------------+

One other thing to consider in the implementation is the versioning, and the ability to retrieve the latest versions of a column first.

For this, we can use a Timestamp trick that leverages the fact that RocksDB sorts keys lexicographically in ascending order. In fact, we can get a descending order for timestamps as follows:

This way, newer (smaller inverted) timestamps will sort before older ones.

Here is a complete Python snippet that demostrates how a Key is constructed:

import struct
SEPARATOR = b"\x00"
dataset = b"user_profile"
row_key = b"user123"
column_name = b"email"
timestamp_ms = 1678886400000
MAX_UINT64 = 2**64 - 1
inverted_ts_bytes = struct.pack('>Q', MAX_UINT64 - timestamp_ms)
# The RocksDB key might look like
dataset + SEPARATOR + row_key + SEPARATOR + column_name + SEPARATOR + inverted_ts_bytes

Which results in a Key that looks like this:

b'user_profile\x00user123\x00email\x00\xff\xff\xfey\x1a\x92\x8f\xff'

Python Implementation

The full implementation of this Datastore can be found at this GitHub KVWC project, specifically in the WideColumnDB class.

Here are some key points from this implementation:

Unlocking Wide Column Features

With the chosen key structure in our implementation, several wide column features become quite natural:

What’s Next?

The Trade-offs made in our Python implementation, makes it the datastore surprisingly useful for smaller-scale applications:

Here are few things to consider if we were to expand this implementation:

That’s all folks

This article walkthrough the implementation of a simplified version of Pinterest’s Rockstorewidecolumn. We demonstrated that by carefully designing a key structure, we can map complex data models onto a high-performance key-value store like RocksDB.

I hope you enjoyed this article, feel free to leave a comment or reach out on twitter @bachiirc.