Shoal Framework: Aptos Bullshark latency optimization breakthrough

2025-08-15 07:26:11

Shoal Framework: How to Drop Latency on Bullshark on Aptos

Aptos Labs has addressed two important open problems in DAG BFT, significantly dropping latency and eliminating the need for timeouts in deterministic practical protocols for the first time. Overall, it has improved the latency of Bullshark by 40% in fault-free scenarios and by 80% in fault scenarios.

Shoal is a framework that enhances any Narwhal-based consensus protocol ( such as DAG-Rider, Tusk, Bullshark ) through pipelining and leader reputation. Pipelining reduces DAG sorting latency by introducing anchors in each round, while leader reputation further improves the latency issue by ensuring that anchors are associated with the fastest validating nodes. Additionally, leader reputation allows Shoal to leverage asynchronous DAG constructions to eliminate timeouts in all scenarios. This enables Shoal to provide universally responsive properties, including the typically required optimistic responses.

This technology is very simple, involving multiple instances of underlying protocols running in sequence. Therefore, when instantiating with Bullshark, we get a group of "sharks" engaged in a relay race.

Background

In the pursuit of high performance in blockchain networks, people have always focused on dropping communication complexity. However, this approach has not led to a significant increase in throughput. For example, Hotstuff implemented in the early version of Diem only achieved 3500 TPS, far below the target of 100k+ TPS.

The recent breakthrough comes from the realization that data propagation is a major bottleneck based on leader protocols and can benefit from parallelization. The Narwhal system separates data propagation from core consensus logic, proposing an architecture where all validators propagate data simultaneously, while the consensus component only orders a small amount of metadata. The Narwhal paper reports a throughput of 160,000 TPS.

The Quorum Store introduced earlier is an implementation of Narwhal, separating data propagation from consensus to scale the current consensus protocol Jolteon. Jolteon is a leader-based protocol that combines Tendermint's linear fast path with PBFT-style view changes, reducing Hotstuff latency by 33%. However, leader-based consensus protocols cannot fully leverage Narwhal's throughput potential.

Therefore, it was decided to deploy Bullshark on top of the Narwhal DAG, which is a consensus protocol with zero communication overhead. Unfortunately, the DAG structure that supports Bullshark's high throughput incurs a 50% latency cost compared to Jolteon.

This article introduces how Shoal significantly reduces Bullshark latency.

DAG-BFT Background

Each vertex in Narwhal DAG is associated with a round. To enter round r, a validator must first obtain n-f vertices from round r-1. Each validator can broadcast one vertex per round, and each vertex must reference at least n-f vertices from the previous round. Due to network asynchrony, different validators may observe different local views of the DAG at any given time.

A key property of DAG is non-ambiguity: if two validating nodes have the same vertex v in the local view of the DAG, then they have exactly the same causal history for v.

Total Order

Consensus can be reached on a total order of all vertices in the DAG without additional communication overhead. To achieve this, the validators in DAG-Rider, Tusk, and Bullshark interpret the DAG structure as a consensus protocol, where vertices represent proposals and edges represent votes.

All existing Narwhal-based consensus protocols have the following structure:

Anchor Point: Every few rounds, there is a pre-determined leader, and the peak of the leader is called the anchor point.
Ordering Anchor Points: Validators independently but deterministically decide which anchor points to order and which anchor points to skip.
Causal history ordering: Validators process an ordered anchor point list one after another, sorting all previously unordered vertices in the causal history of each anchor point according to deterministic rules.

The key to ensuring security is to make sure that in step (2), all honest validating nodes create an ordered list of anchor points that share the same prefix. Shoal makes the following observations regarding all these protocols:

All validators agree on the first ordered anchor point.

Bullshark latency

The latency of Bullshark depends on the number of rounds between the ordered anchors in the DAG. While some synchronous versions have better latency than asynchronous versions, they are far from optimal.

Question 1: Average block latency. In Bullshark, there is an anchor point in each even round, and each odd round vertex is interpreted as a vote. In common cases, two rounds of DAG are needed to order the anchor points; however, the vertices in the causal history of the anchor points require more rounds to wait for anchor point sorting. Typically, odd round vertices need three rounds, and even round non-anchor vertices need four rounds.

Question 2: Fault case latency. If a round leader fails to broadcast the anchor point fast enough, the sorting of the anchor point ( is skipped ), and all unsorted vertices from the previous rounds must wait for the next anchor point to be sorted. This significantly drops the performance of the geo-replication network, especially since Bullshark uses timeout to wait for the leader.

Shoal Framework

Shoal enhances Bullshark( or any other Narwhal-based BFT protocol) through pipelining, allowing for anchor points in each round, reducing the latency of all non-anchor vertices in the DAG to three rounds. Shoal also introduces a zero-cost leader reputation mechanism in the DAG, favoring the selection of fast leaders.

Challenge

In the context of the DAG protocol, pipeline and leader reputation are considered difficult issues for the following reasons:

Previous attempts to modify the core Bullshark logic in the pipeline seem to be fundamentally impossible.
The leader's reputation is introduced in DiemBFT and formalized in Carousel, dynamically selecting future leaders based on the validators' past performance in ( Bullshark's anchor ). While discrepancies in leader identity do not violate the security of these protocols, they may lead to entirely different orderings in Bullshark, which raises the core issue: dynamically determining the anchor for selection rounds is necessary for consensus, and validators need to agree on the ordered history to select future anchors.

As evidence of the problem's difficulty, Bullshark's implementation ( does not support these features in the current production environment, including ).

Agreement

Shoal relies on the ability to perform local computations on the DAG to achieve the capability of preserving and reinterpreting information from previous rounds. With all validators agreeing on the insight of the first ordered anchor point, Shoal sequentially combines multiple Bullshark instances for pipelining, making ( the first ordered anchor point serve as the instance switching point, and the causal history of ) anchor points is used to compute the leader's reputation.

( assembly line

V that maps rounds to leaders. Shoal runs Bullshark instances one after another, with each instance's anchor predetermined by mapping F. Each instance orders an anchor, triggering a switch to the next instance.

Initially, Shoal launched the first instance of Bullshark in the first round of DAG and ran until the first ordered anchor point was determined, such as in round r. All validators agreed on this anchor point. Therefore, all validators can confidently agree to reinterpret the DAG from round r+1. Shoal simply starts a new Bullshark instance in round r+1.

In the best case, this allows Shoal to order one anchor per round. The first round anchor point is sorted by the first instance. Then, Shoal starts a new instance in the second round, which itself has an anchor point, and that anchor is sorted by that instance, followed by another new instance ordering an anchor point in the third round, and the process continues.

![Comprehensive Explanation of the Shoal Framework: How to Reduce Bullshark latency on Aptos?])https://img-cdn.gateio.im/webp-social/moments-0b0928cb6240e994c1514c75e080a4b2.webp###

( Leader Reputation

When skipping anchor points during the Bullshark sorting, latency increases. In this case, the pipeline is powerless, as a new instance cannot be started before the previous instance orders the anchor points. Shoal ensures that it is less likely to choose the corresponding leader to handle the missing anchor points in the future by distributing scores based on the recent activity history of each validation node using a reputation mechanism. Validators that respond and participate in the protocol receive high scores; otherwise, they are assigned low scores, as they may crash, be slow, or act maliciously.

The concept is to deterministically recalculate the predefined mapping F from rounds to leaders each time the score is updated, favoring leaders with higher scores. To achieve consensus on the new mapping, validators should reach consensus on the scores, thereby achieving consensus on the history used for deriving the scores.

In Shoal, pipelines and leadership reputation can naturally combine, as they both use the same core technology, which is reinterpreting the DAG after reaching consensus on the first ordered anchor point.

The only difference is that after sorting the anchor points in round r, the validators only need to calculate the new mapping F' starting from round r+1 based on the causal history of the ordered anchor points in round r. Then, the validating nodes begin to execute a new instance of Bullshark using the updated anchor point selection function F' starting from round r+1.

![Detailed Explanation of the Shoal Framework: How to Drop Bullshark Latency on Aptos?])https://img-cdn.gateio.im/webp-social/moments-859e732e16c3eee0e2c93422474debc2.webp###

( no timeout

Timeout plays a crucial role in all leader-based deterministic partially synchronous BFT implementations. However, the complexity they introduce increases the number of internal states that need to be managed and observed, raising debugging complexity and requiring more observability techniques.

Timeouts can also significantly increase latency, as it is important to configure them correctly, which often requires dynamic adjustments and is highly dependent on the environment ) network ###. Before transitioning to the next leader, the protocol pays the full timeout latency penalty for the faulty leader. Therefore, timeout settings cannot be too conservative, but if they are too short, the protocol may skip over good leaders. For example, we observed that under high load, leaders in Jolteon/Hotstuff were overwhelmed, and the timeout expired before any progress could be made.

Unfortunately, protocols based on leaders such as Hotstuff and Jolteon ( essentially require latency to ensure that the protocol makes progress whenever a leader fails. Without latency, even a crashed leader might halt the protocol indefinitely. Since it's impossible to distinguish between failed leaders and slow leaders during asynchronous periods, latency may lead to verification nodes changing all leaders without consensus activity.

In Bullshark, timeout is used for DAG construction to ensure that during synchronization, honest leaders can add anchors to the DAG quickly enough for sorting.

We observe that the DAG construction provides a "clock" for estimating network speed. In the absence of pauses, as long as n-f honest validators continue to add vertices to the DAG, the rounds will continue to progress. Although Bullshark may not be able to order ) at network speed due to leader issues (, the DAG still grows at network speed, even though some leaders have issues or the network is asynchronous. Ultimately, when fault-free leaders broadcast anchors quickly enough, the entire causal history of the anchors will be ordered.

In our assessment, we compared the Bullshark with and without latency in the following scenarios:

Fast Leader: At least faster than other validators. Both methods provide the same latency, as anchors are ordered and do not use timeouts.
Erroneous Leaders: No timeout Bullshark offers better latency, as validation nodes skip it immediately.

APT-2.61%

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

15 Likes

Reward
15
6
Repost
Share

Comment

0/400

ShibaSunglasses

· 08-18 00:09

Improved by 80%? Really bull.

View OriginalReply0

GasWrangler

· 08-17 20:52

meh, technically speaking the 40% improvement is sub-optimal. if you analyze the mempool data, we could push it to 65%+ with proper priority fee differentials...

Reply0

HashRateHermit