Sharding [ /ˈʃɑːdɪŋ/ ] is the practice of partitioning databases to improve load balance and responsiveness across nodes or servers. By breaking up large datasets into smaller, more manageable pieces, network nodes are better able to search through and retrieve individual data points as well as handle concurrent requests without sacrificing performance or reliability. In distributed ledgers such as Radix, sharding is a characteristic of the state model and refers to both data availability and execution capacity.
Sharding in Radix
The current Radix Mainnet (Alexandria) is not sharded. Radix’s current roadmap plans to introduce a fixed shardspace of 2^256 shards upon its Xi’an release. This is in contrast to the dynamic adaptive state sharding model adopted by networks such as NEAR and Algorand, where shards are added incrementally as required. In Radix, the responsibility for validating shards is assigned to groups of validators called ‘shard groups’, which may grow or shrink dynamically in response to load demand.
Sharding is commonly used in distributed databases, where it allows for the efficient storage and retrieval of large amounts of data across multiple nodes. By dividing the data into smaller chunks, called shards, and distributing them across multiple nodes, a sharded database can support more concurrent requests and handle larger volumes of data without slowing down or becoming overloaded.
In addition to improving scalability and performance, sharding can also help to improve the availability and reliability of a distributed system. By storing data on multiple nodes, a sharded system can continue to function even if one or more nodes fail, ensuring that the data remains accessible and that the system can continue to serve requests.
There are several different approaches to sharding, each with its own tradeoffs and benefits. Some common sharding strategies include range-based sharding, which divides data into shards based on a key value or range; hash-based sharding, which uses a hash function to distribute data across shards; and directory-based sharding, which uses a lookup table to determine which shard a piece of data belongs to.
Overall, sharding is a powerful technique for improving the scalability, performance, and reliability of distributed systems. By dividing large datasets into smaller pieces and storing them on multiple nodes, sharding allows systems to handle more data and more concurrent requests without sacrificing performance or availability.
Radix's terminology is IMO confusing. They use ‘shard’ to mean "a piece of state". Which is a common meaning in database work, but not in crypto. I prefer the term shard-bit, but the closest accepted Radix term is "substate". So sharding is a way to let a validator set work on a bunch of substates Radix calls a shard-set. The unique benefit of Radix's way to "pre-shard" the network allows the contents of a shard-set to shrink dynamically, if for example the amount of state a validator-set can handle shrinks because of the accumulation of state (as time progresses) that the validators need to remember gradually increases. I have no control over which substate will contain my vaults or smart accounts or instantiated blueprints. The location of the state data that these entities must hold is based on a hash that deterministically places the substate data in random locations amog the sparse matrix with 2^256 possile entries which is one way of viewing its ledger. Because of the pseudo-randomization that that the hash uses, there is (via other design features) a high probability that truly independent transactions will ALWYS use different shards for any "utxo"s that may be needed. This enables massive paralleism and hence very high TPS. - A.H.Simon