- What happens if a subset of validators cease processing transactions, leading to a liveness failure?
- How can validator sets be dynamically adjusted in case of liveness failures?
- State Model
- Advantages & Tradeoffs
- Ongoing Research and Development
THE MEME STUDIO | Web3 Marketing Agency
Crypto Content Wizards. Degens, you found your Web3 Marketing Agency! Experienced in helping DAOs, DeFi, P2E, NFT, and Metaverse projects with 3D Animations, Memes and more! The Meme Studio is your Web3 Agency.
Hybrid (Proof of Work / Proof of Stake)
Cassandra evolved out of Dan Hughes’ Flexathon project in January 2021. A technical demonstration using Twitter data was premiered on Twitch on March 13th, 2021. 40% of the project was described as ‘new code’.
Implementing the Xi'an upgrade to a fully sharded network is necessary for Radix to scale to “support the needs of the $400 trillion global financial market”. However, transitioning to an explicitly sharded system introduces significant security, decentralization, and liveless challenges. Sharding enables the ledger to process transactions in parallel across multiple shards, but raises questions around how validators across different shards can remain in consensus on the global state.
To study these issues, Hughes started the Cassandra research project from scratch, setting aside existing consensus algorithms like Cerberus and HotStuff that Radix currently uses. The objective was to take a blue sky approach to determine what components are necessary to create a secure, decentralized, and live sharded environment.
Question 1: Liveness in Sharded Networks
In a sharded environment, it is possible for a subset of validators across different shards to stop processing transactions, leading to a liveness failure where parts of the network stall. This can happen unintentionally due to technical issues or intentionally by adversaries looking to disrupt the network.
Liveness failures are especially problematic in networks like Radix that use deterministic consensus algorithms to achieve transaction finality. These algorithms require approval from 2/3 of validators to finalize transactions. If 1/3 or more validators stop participating, the network cannot make progress.
Existing consensus algorithms lacked strong provisions to ensure liveness in sharded environments. Hughes wanted to find a solution that would allow the Radix network to maintain weak liveness during partial liveness failures, so that the live parts of the network could continue making progress.
Question 2: Dynamically Adjusting Validator Sets
In decentralized networks, validator sets are determined algorithmically through ‘sybil protection’ mechanisms like staking. However, liveness failures may require proactively removing non-participating validators from the active set. This raises a challenge: How can validators be removed without centralized control of the ledger? In addition, any validator set adjustment process could be vulnerable to malicious validators looking to attack the network without needing 1/3 of the total stake. Hughes wanted to find a solution that would allow validator sets to be adjusted in response to liveness issues, but prevent adversaries from manipulating this process to their advantage.
Cassandra exists on a fixed shardspace of 2^256 shards indexed on the Vamos database structure. Transactions are processed on a first-come, first-served basis and in parallel if the transactions use different inputs. The network scales linearly as more nodes are added.
Cassandra uses a hybrid probabilistic / deterministic consensus, where round leaders within shard groups are determined by proof of work. This provides flexibility to maintain liveness and safety under different network conditions. Cassandra incorporates elements of Cerberus, but is distinct from it.
The probabilistic phase is driven by a proof-of-work system. Validators who wish to propose the next set of transactions must solve a computational ‘puzzle’ to submit proposals. This makes the cost of submitting proposals proportional to computational power rather than stake.
If validators submit conflicting proposals, the ledger forks until 2/3 of the validators vote in favor of the canonical version. This allows the network to dynamically adjust relative voting powers and make progress even if 1/3 or more of validators are not participating.
A key advantage of Cassandra's probabilistic phase is that validators are rewarded with increased voting power on the network for providing useful proposals. This allows the network to recover from liveness failures by bringing in new validators. For example, if 50% of validators go offline, the remaining validators create proposals to make progress on their shard. As these proposals gain majority approval, the participating validators increase their voting power, which dilutes the weight of the offline validators. Over time, this allows the network to regain a 2/3 majority of participating stake needed to finalize transactions. The more stake that goes offline, the more time it takes for the network to ‘earn back’ enough voting power to resume finalizing transactions.
Once 2/3 of validators agree on a proposal, the consensus process is determined by stake-weighted voting and provides finality on the accepted transactions.
The probabilistic proposal phase provides an inherent weak liveness guarantee, even if 1/3 or more validators stop participating. As long as some validators continue submitting proposals, the network can continue building potential blocks. This is similar to how Bitcoin's Nakamoto consensus allows the network to continue mining and extending the longest chain even if a large portion of miners disappear. Progress is slowed proportional to the lost computational power, but the network continues converging on a canonical chain. In Cassandra, validators who continue participating will eventually converge around a common branch of proposals. This allows the network to continue making progress during liveness failures until enough validators regain participation to finalize transactions again.
Cassandra's hybrid model maintains a high threshold for safety guarantees, even while providing liveness flexibility.
Attackers must overcome two hurdles to violate safety guarantees like double spends:
- Gain majority control of proposing power to build fraudulent branches during the probabilistic phase. This requires large computational resources.
- Acquire 1/3+ of voting power during the deterministic phase to finalize a fraudulent branch. This requires staking resources.
Attackers lacking either sufficient compute power or stake cannot unilaterally violate safety. The cost to attack both phases makes Cassandra more robust compared to pure probabilistic or deterministic consensus alone.
Advantages & Tradeoffs
Cassandra provides several key advantages compared to traditional consensus algorithms:
- Strong safety guarantees: Attackers face high costs to build fraudulent branches and finalize them.
- Liveness guarantees: The network can make progress even with partial liveness failures.
- Validator set shuffling: The network can adjust relative voting powers to recover from liveness issues.
- Protection against attacks: Manipulating validator sets is costly due to the hybrid model.
- Slower finality time: Cross-shard coordination introduces additional latency before finality.
- Throughput vs finality: Faster finality requires more centralized shard configuration. Slower finality enables higher decentralization.
- Complexity: Hybrid model is more complex than pure probabilistic or deterministic algorithms.
While Cassandra provides significant advantages, there are also tradeoffs to consider. The hybrid model requires more coordination which can introduce latency. However, it allows Radix to optimize for both decentralization and throughput.
The Cassandra research demonstrates that maintaining both liveness and strong safety guarantees in a highly decentralized sharding environment is possible. However, it requires balancing complex tradeoffs between latency, throughput, decentralization, and simplicity.
Ongoing Research and Development
The Cassandra research project has demonstrated promising solutions to core problems around implementing sharding for Radix. However, significant work remains to refine, optimize, and evaluate Cassandra for production use.
Optimizing the Protocol
While initial results are positive, improvements can be made to Cassandra to optimize latency, throughput, attack resistance, and efficiency. Areas of ongoing research include:
- Reducing cross-shard coordination overhead
- Optimizing networking and message passing between shards
- Improving reward mechanisms and sybil resistance
- Enhancing protection against long-range attacks
- Minimizing wasted work during probabilistic proposal creation
Evaluating Production Viability
Before deploying Cassandra, extensive testing and peer review is required to validate the algorithm and analyze any potential vulnerabilities. Research areas include:
- Formal verification of safety proofs
- Security audits and attack simulation
- Benchmarking throughput, latency, and efficiency
- Testnet trials and experimental deployment
- Continued alignment with academic state of the art
Integration into Xi'an
While Cassandra may not be deployed as-is, components of it could improve Radix's Xi'an implementation based on HotStuff. Analyzing which specific components can be integrated is an ongoing process.
The Cassandra research provides a strong foundation, but remains a work in progress. By building on these concepts, Radix aims to develop production-ready solutions for secure, decentralized, and scalable sharding to meet the demands of global finance.