Sharding [ /ˈʃɑːdɪŋ/ ] is a method of scaling a database horizontally by partitioning its architecture into discrete ‘shards’ whose management can be distributed across multiple nodes or servers for improved load balancing, performance, and responsiveness.
Sharding is distinct from vertical partitioning, which often involves dividing a large database table into smaller subsets within the same server.
Sharding in Distributed Ledgers
Within distributed ledgers (DLTs) like Radix, sharding primarily relates to data availability and transaction execution. In unsharded systems, every node must store and process the ledger's entire state, resulting in a low transactional capacity.
In contrast, sharding entails splitting a ledger into several partitions that function in parallel, with some models even proposing separate blockchains. However, a straightforward sharding technique might undermine transaction composability, a critical aspect for DeFi applications.
The essence of distributed ledgers revolves around the 'spend' concept. Sharding public DLTs is intricate due to the need to ascertain a transaction's singularity network-wide. The presence of multiple shards compounds this challenge, introducing the potential for double-spending. This phenomenon, whereby a single transaction registers more than once across the network, opens the door to fraud.
Sharding has been dropped from Ethereum’s roadmap in favor of vertical Layer 2 solutions such as rollups.
Advantages
By breaking up large datasets into smaller, more manageable pieces, network nodes are better able to search through and retrieve individual data points as well as handle concurrent requests. This approach is especially beneficial when unsharded systems grow excessively large, resulting in performance degradation. By implementing sharding, data storage and processing tasks are distributed across multiple computers, enhancing the system's scalability.
Sharding is commonly used in distributed databases, where it allows for the efficient storage and retrieval of large amounts of data across multiple nodes. By dividing the data into shards and distributing them across multiple nodes, a sharded database can support more concurrent requests and handle larger volumes of data without slowing down or becoming overloaded.
In addition to improving scalability and performance, sharding can also help to improve the availability and reliability of a distributed system. By storing data on multiple nodes, a sharded system can continue to function even if one or more nodes fail, ensuring that the data remains accessible and that the system can continue to serve requests.
There are several different approaches to sharding, each with its own tradeoffs and benefits. Some common sharding strategies include range-based sharding, which divides data into shards based on a key value or range; hash-based sharding, which uses a hash function to distribute data across shards; and directory-based sharding, which uses a lookup table to determine which shard a piece of data belongs to.
Disadvantages
Sharding introduces unique challenges and complexities, such as determining the optimal strategy for partitioning data to ensure efficient operations.
Sharding's complexities within distributed ledgers hinge on ensuring single transactional occurrences. Given the spatially distributed nature of ledger shards, instantaneous consistency remains elusive. The CAP theorem suggests that a distributed data store can't simultaneously guarantee Consistency, Availability, and Partition tolerance.
Sharding in Radix
In Radix, sharding is a characteristic of the state model and refers to both data availability and execution capacity.
The current Radix Mainnet (Alexandria) is not sharded. Radix’s current roadmap is to ‘pre-shard’ its ledger into 2^256 shards upon its Xi’an release. This is in contrast to the dynamic adaptive state sharding model adopted by networks such as NEAR, where shards are added incrementally as required. In Radix, the responsibility for validating shards is assigned to groups of validators called ‘shard groups’, which may grow or shrink dynamically in response to load demand.
Despite the terminology, the ‘shards’ in Radix actually refer to ‘substates’, which are the smallest components of a transaction.
“Radix's terminology is IMO confusing. They use ‘shard’ to mean "a piece of state". Which is a common meaning in database work, but not in crypto. I prefer the term shard-bit, but the closest accepted Radix term is "substate". So sharding is a way to let a validator set work on a bunch of substates Radix calls a shard-set. The unique benefit of Radix's way to "pre-shard" the network allows the contents of a shard-set to shrink dynamically, if for example the amount of state a validator-set can handle shrinks because of the accumulation of state (as time progresses) that the validators need to remember gradually increases. I have no control over which substate will contain my vaults or smart accounts or instantiated blueprints. The location of the state data that these entities must hold is based on a hash that deterministically places the substate data in random locations amog the sparse matrix with 2^256 possile entries which is one way of viewing its ledger. Because of the pseudo-randomization that that the hash uses, there is (via other design features) a high probability that truly independent transactions will ALWYS use different shards for any "utxo"s that may be needed. This enables massive parallelism and hence very high TPS.” - A.H.Simon
One of the standout features of Radix's sharding model is its unique "braiding" consensus mechanism. This technique, introduced in the sharded form of Cerberus, braids together the various shards to form a cohesive, interconnected network. The benefit of this is twofold: it ensures security and transactional consistency across all shards.
A major challenge in sharding is the issue of composability, especially when it comes to DeFi platforms and applications. Radix addresses this by ensuring that all transactions are atomically composable across-shard by default. This means transactions can be conducted seamlessly between different shards without any hindrance.