Vamos is a novel database persistence layer developed by Dan Hughes, designed to provide better performance than traditional databases like LevelDB, RocksDB, and BerkeleyDB under the random access workloads typical of distributed ledger networks.
Background
In the realm of distributed ledgers, persistent storage is crucial for tracking various elements such as UTXOs, balances, and smart-contract variables. Traditional databases struggle with the high-entropy, randomized nature of data in blockchain applications, primarily due to the inefficiencies of their indexing methods (like B-Trees and LSMTs) under such workloads.
Existing databases generally utilize indexes to track the location of data on the disk. These databases, when dealing with high-entropy identifiers like hashes common in blockchain, face significant performance issues. This is because hashes, even of similar data, look very different, and the databases can't effectively optimize read or write operations for such data. As the size of these databases grows, especially in blockchain contexts, the performance degrades significantly due to increased disk reads required for querying the growing indexes.
Concept and Design
Vamos was conceptualized by Dan Hughes to overcome these limitations. The core idea revolves around improving the indexing mechanism to suit the high-entropy, randomized nature of blockchain data. Vamos employs a ‘fixed addressable shard-space’ approach for its indexes. Each index slot is pre-allocated and has a maximum capacity, ensuring that querying any key always costs a single disk read, regardless of the total number of items in the database. This approach drastically reduces the number of necessary disk reads, enhancing performance.
Key Features and Innovations
- Fixed Addressable Shard-Space for Indexes: This approach ensures that each key maps to a specific index slot on disk referenced by an in-memory slot table, reducing the complexity and time of queries.
- Significant Performance Improvement: Early tests show Vamos maintaining a consistent 1 read per key existence check and 1-2 reads/writes per write operation with increasing data sizes, in contrast to the performance drop observed in traditional databases like BerkeleyDB.
- Optimized for Commodity Hardware: Vamos is designed to work efficiently on standard hardware configurations, making it accessible for widespread use.
Development and Testing
- Hughes re-coded the index portion of the old Vamos code to implement these ideas.
- Preliminary tests reveal that Vamos significantly outperforms BerkeleyDB under crypto-like workloads, maintaining nearly twice the performance efficiency at 250 million records.
- Continuous improvements in I/O utilization and checkpointing algorithms have further enhanced Vamos's capabilities.
- Vamos has been successfully integrated with Cassandra, showing robust performance with no degradation even under substantial loads (500 writes per second on a 50 million tweet dataset).
Future Prospects
- Vamos continues to mature, showing promising results in preliminary tests and integrations.
- It is poised to be used in upcoming public tests for high-throughput applications, potentially reaching throughputs of around 1 million transactions per second.
- The development of Vamos indicates a potential shift in database technology, especially in the context of blockchain and crypto applications, pushing the limits of what is currently possible.