Feature: Turbine Improvements (1.14.17)

Turbine is the name for Solana’s block propagation methodology. Version 1.14 brings with it two significant changes to Turbine. The first of these changes is in the construction and transmission of erasure batches during turbine broadcast and the second is a modification to the Turbine Tree topology.

Erasure Batch Construction & Transmission

When a leader needs to transmit their block to the rest of the network, they do so by breaking their block data into maximum transmissible unit (MTU) sized chunks called “shreds.” In order to increase the reliability and censorship resistance of the network, these data shreds are used to generate paired recovery shreds through a Reed-Solomon erasure coding scheme. The set of data shreds and their paired recovery shreds together is known as an erasure batch. The scheme is designed such that the entirety of the block can be reconstructed even if some portion of the erasure batch is missing. With the erasure batch in hand, the leader sends data and recovery shreds one-by-one. For each individual shred, a stake weighted shuffle of the staked nodes is performed in order to construct a tree-like topology to transmit over. This is colloquially referred to as the Turbine Tree.

The old method of constructing erasure batches required a set of 32 data shreds, out of which 32 recovery shreds would be generated, creating a 32:32 erasure batch. However, for a variety of reasons, receiving nodes in the Turbine Tree may not always receive enough data shreds to generate a 32:32 erasure batch by the time they are scheduled to retransmit their data over Turbine. As a result, nodes with less than 32 data shreds would send the shreds un-encoded when called upon to retransmit, but also hold onto the shreds in memory to retransmit again as a proper 32:32 erasure batch when the missing data shreds could be acquired.

This results in delayed reliability guarantees, provided by Reed-Solomon encoding, to lower staked nodes in the Turbine Tree who are more likely to be placed in lower levels. This is because reliable propagation to lower levels is now bottlenecked by higher levels getting all the data shreds in time. If the higher level node does not get all 32 data shreds required to generate a proper erasure batch in time, lower level nodes now suffer because they only get sent the raw data shreds and not a Reed-Solomon encoded erasure batch. They eventually will get the encoded erasure-batch, but only after the higher level nodes can obtain the missing data. This bottleneck does not reduce the overall probability of a node getting all the block data over Turbine, but it extends the time over which it takes to get this data.

Version 1.14 changes the erasure batch construction in order to eliminate this delay in reliable retransmission of data shreds. This is achieved because nodes in the Turbine Tree no longer require 32 data shreds to generate an erasure batch. Instead, an equivalently reliable erasure batch is constructed from how many ever data shreds the node has at the time of retransmission. While the Reed-Solomon coding scheme used before targeted 32:32 erasure batches, the new version is able to construct variable size erasure batches with the equivalent reliability of a 32:32 batch. For example, in the worst case of only having 1 data shred, a 1:544 is constructed and transmitted over Turbine. This data shred now has the same reliability as the data shreds in a 32:32 erasure batch. This change eliminates the bottleneck described above, meaning that lower staked nodes lower in the Turbine Tree do not have delayed reliability guarantees as compared to higher staked nodes.

Remove Redundant Turbine Path

In order to understand the Turbine topology changes, it’s important to first define some relevant terms. Structurally, Turbine consists of “layers”, made up of “neighborhoods.” A layer’s number refers to the shortest number of hops the leader’s messages must make in order to reach a node in that layer. Therefore nodes in Layer 1 are one network hop from the leader, nodes in Layer 2 are two network hops from the leader, so on and so forth. Nodes in each layer are grouped into sub-groups called neighborhoods. A neighborhood consists of N distinct nodes, the first of which is referred to as the “anchor.” Turbine also defines a parameter known as the “fanout.” The fanout of the network specifies how many nodes each neighborhood node forwards data to in lower layers. For example, for a given fanout M, a node in Layer X forwards data to M other nodes in Layer X+1.

Prior to activation of this feature, the number of neighborhoods in each layer is a function of the fanout such that each neighborhood in Layer X communicates with M other neighborhoods in Layer X+1. Therefore the number of neighborhoods in each layer is M^(X-1) where X is the layer number and M is the fanout. You may notice this matches the number of nodes each node in Layer X sends data to in Layer X+1. This is by design, as nodes in Layer X don’t just send data to M other nodes in Layer X+1, but each of these nodes is in a distinct neighborhood as well. Therefore, each node in position “k” in a neighborhood, sends its data to M other nodes in position k of a neighborhood in the layer below. Additionally, the anchor node of each neighborhood forwards data to every node in its neighborhood. This additional data path from the anchor to all nodes in the neighborhood is referred to as the “redundant turbine path.” It’s referred to as redundant because nodes are already receiving this data through the normal fanout path each node takes to distribute data to lower layers. Figure 1 below shows a visual description of the Turbine topology prior to this feature activation.

Figure 1: Turbine Tree with Redundant Turbine Path

Activation of this feature removes the redundant turbine path, effectively eliminating the concept of anchor nodes and neighborhoods. This can be easily visualized in Figure 2 below.

Figure 2: Turbine Tree without Redundant Turbine Path