Feature Name: Pipelinable Block Syncing and Lookup
Start Date: 2020-07-02
Design PR: ZcashFoundation/zebra#583
Zebra Issue: ZcashFoundation/zebra#504

Summary

The Bitcoin network protocol used by Zcash allows nodes to download blocks from other peers. This RFC describes how we find and download this data asynchronously.

Motivation

To sync the chain, we need to find out which blocks to download and then download them. Downloaded blocks can then be fed into the verification system and (assuming they verify correctly) into the state system. In zcashd, blocks are processed one at a time. In Zebra, however, we want to be able to pipeline block download and verification operations, using futures to explicitly specify logical dependencies between sub-tasks, which we execute concurrently and potentially out-of-order on a threadpool. This means that the procedure we use to determine which blocks to download must look somewhat different than zcashd.

Block fetching in Bitcoin

Zcash inherits its network protocol from Bitcoin. Bitcoin block fetching works roughly as follows. A node can request block information from peers using either a getblocks or getheaders message. Both of these messages contain a block locator object consisting of a sequence of block hashes. The block hashes are ordered from highest to lowest, and represent checkpoints along the path from the node's current tip back to genesis. The remote peer computes the intersection between its chain and the node's chain by scanning through the block locator for the first hash in its chain. Then, it sends (up to) 500 subsequent block hashes in an inv message (in the case of getblocks) or (up to) 2000 block headers in a headers message (in the case of getheaders). Note: zcashd reduces the getheaders count to 160, because Zcash headers are much larger than Bitcoin headers, as noted below.

The headers message sent after getheaders contains the actual block headers, while the inv message sent after getblocks contains only hashes, which have to be fetched with a getdata message. In Bitcoin, the block headers are small relative to the size of the full block, but this is not always the case for Zcash, where the block headers are much larger due to the use of Equihash and many blocks have only a few transactions. Also, getblocks allows parallelizing block downloads, while getheaders doesn't. For these reasons and because we know we need full blocks anyways, we should probably use getblocks.

The getblocks Bitcoin message corresponds to our zebra_network::Request::FindBlocksByHash, and the getdata message is generated by zebra_network::Request::Blocks.

Pipelining block verification

As mentioned above, our goal is to be able to pipeline block download and verification. This means that the process for block lookup should ideally attempt to fetch and begin verification of future blocks without blocking on complete verification of all earlier blocks. To do this, we split the chain state into the verified block chain (held by the state component) and the prospective block chain (held only by the syncer), and use the following algorithm to pursue prospective chain tips.

ObtainTips

Query the current state to construct the sequence of hashes

[tip, tip-1, tip-2, ..., tip-9, tip-20, tip-40, tip-80, tip-160 ]

The precise construction is unimportant, but this should have a Bitcoin-style dense-first, then-sparse hash structure.

The initial state should contain the genesis block for the relevant network. So the sequence of hashes will only contain the genesis block

[genesis ]

The network will respond with a list of hashes, starting at the child of the genesis block.

Make a FindBlocksByHash request to the network F times, where F is a fanout parameter, to get resp1, ..., respF.
For each response, starting from the beginning of the list, prune any block hashes already included in the state, stopping at the first unknown hash to get resp1', ..., respF'. (These lists may be empty).
Combine the last elements of each list into a set; this is the set of prospective tips.
Combine all elements of each list into a set, and queue download and verification of those blocks.
If there are any prospective tips, call ExtendTips, which returns a new set of prospective tips. Continue calling ExtendTips with this new set, until there are no more prospective tips.
Restart after some delay, say 15 seconds.

ExtendTips

Remove all prospective tips from the set of prospective tips, then iterate through them. For each removed tip:
Create a FindBlocksByHash request consisting of just the prospective tip. Send this request to the network F times.
For each response, check whether the first hash in the response is a genesis block (for either the main or test network). If so, discard the response. It indicates that the remote peer does not have any blocks following the prospective tip. (Or that the remote peer is on the wrong network.)
Combine the last elements of the remaining responses into a set, and add this set to the set of prospective tips.
Combine all elements of the remaining responses into a set, and queue download and verification of those blocks.

DoS resistance

Because this strategy aggressively downloads any available blocks, it could be vulnerable to a DoS attack, where a malicious peer feeds us bogus chain tips, causing us to waste network and CPU on blocks that will never be valid. However, because we separate block finding from block downloading, and because of the design of our network stack, this attack is probably not feasible. The primary reason is that zebra_network randomly loadbalances outbound requests over all available peers.

Consider a malicious peer who responds to block discovery with a bogus list of hashes. We will eagerly attempt to download all of those bogus blocks, but our requests to do so will be randomly load-balanced to other peers, who are unlikely to know about the bogus blocks. When we try to extend a bogus tip, the extension request will also be randomly load-balanced, so it will likely be routed to a peer that doesn't know about it and can't extend it. And because we perform multiple block discovery queries, which will also be randomly load balanced, we're unlikely to get stuck on a false chain tip.

Fork-finding

When starting from a verified chain tip, the choice of block locator can find forks at least up to the reorg limit (99 blocks). When extending a prospective tip, forks are ignored, but this is fine, since unless we are prefetching the longest chain, we won't be able to keep extending the tip prospectively.

Retries and Fanout

We should consider the fanout parameter F and the retry policy for the different requests. I'm not sure whether we need to retry requests to discover new block hashes, since the fanout may already provide redundancy. For the block requests themselves, we should have a retry policy with a limited number of attempts, enough to insulate against network failures but not so many that we would retry a bogus block indefinitely. Maybe fanout 4 and 3 retries?

The Zebra Book