About
Zebra is the Zcash Foundation's independent, consensus-compatible implementation of the Zcash protocol, currently under development. Please join us on Discord if you'd like to find out more or get involved!
Alpha Releases
Our first alpha release is planned for December 8th, 2020.
The goals of this release are to:
- participate in the Zcash network,
- replicate the Zcash chain state,
- implement the Zcash proof of work consensus rules, and
- sync on Mainnet under excellent network conditions.
Currently, Zebra does not validate all the Zcash consensus rules. It may be unreliable on Testnet, and under less-than-perfect network conditions. See our current features and roadmap for details.
Getting Started
Building zebrad
requires Rust,
libclang, and a C++ compiler.
Detailed Build and Run Instructions
- Install
cargo
andrustc
.- Using
rustup
installs the stable Rust toolchain, whichzebrad
targets.
- Using
- Install Zebra's build dependencies:
- libclang: the
libclang
,libclang-dev
,llvm
, orllvm-dev
packages, depending on your package manager - clang or another C++ compiler:
g++
,Xcode
, orMSVC
- libclang: the
- Run
cargo install --locked --git https://github.com/ZcashFoundation/zebra --tag v1.0.0-alpha.0 zebrad
- Run
zebrad start
If you're interested in testing out zebrad
please feel free, but keep in mind
that there is a lot of key functionality still missing.
Build Troubleshooting
If you're having trouble with:
- dependencies:
- install both
libclang
andclang
- they are usually different packages - use
cargo install
without--locked
to build with the latest versions of each dependency
- install both
- libclang: check out the clang-sys documentation
- g++ or MSVC++: try using clang or Xcode instead
- rustc: use rustc 1.48 or later
- Zebra does not have a minimum supported Rust version (MSRV) policy yet
System Requirements
We usually build zebrad
on systems with:
- 2+ CPU cores
- 7+ GB RAM
- 14+ GB of disk space
On many-core machines (like, 32-core) the build is very fast; on 2-core machines it's less fast.
We continuously test that our builds and tests pass on:
- Windows Server 2019
- macOS Big Sur 11.0
- Ubuntu 18.04 / the latest LTS
- Debian Buster
We usually run zebrad
on systems with:
- 4+ CPU cores
- 16+ GB RAM
- 50GB+ available disk space for finalized state
- 100+ Mbps network connections
zebrad
might build and run fine on smaller and slower systems - we haven't
tested its exact limits yet.
Network Usage
zebrad
's typical network usage is:
- initial sync: 30 GB download
- ongoing updates: 10-50 MB upload and download per day, depending on peer requests
The major constraint we've found on zebrad
performance is the network weather,
especially the ability to make good connections to other Zcash network peers.
Current Features
Network:
- synchronize the chain from peers
- download gossipped blocks from peers
- answer inbound peer requests for hashes, headers, and blocks
State:
- persist block, transaction, UTXO, and nullifier indexes
- handle chain reorganizations
Proof of Work:
- validate equihash, block difficulty threshold, and difficulty adjustment
- validate transaction merkle roots
Validating proof of work increases the cost of creating a consensus split
between zebrad
and zcashd
.
This release also implements some other Zcash consensus rules, to check that Zebra's validation architecture supports future work on a full validating node:
- block and transaction structure
- checkpoint-based verification up to Sapling
- transaction validation (incomplete)
- transaction cryptography (incomplete)
- transaction scripts (incomplete)
- batch verification (incomplete)
Dependencies
Zebra primarily depends on pure Rust crates, and some Rust/C++ crates:
Known Issues
There are a few bugs in the Zebra alpha release that we're still working on fixing:
- Occasional panics in the
tokio
time wheel implementation #1452- workaround: restart
zebrad
- workaround: restart
- Occasional panics during client requests #1471
- workaround: restart
zebrad
- workaround: restart
- Peer connections sometimes fail permanently #1435
- these permanent failures can happen after a network disconnection, sleep, or individual peer disconnections
- workaround: use
Control-C
to exitzebrad
, and then restartzebrad
- Duplicate block errors #1372
- these errors can be ignored, unless they happen frequently
Future Work
In 2021, we intend to add RPC support and wallet integration. This phased approach allows us to test Zebra's independent implementation of the consensus rules, before asking users to entrust it with their funds.
Features:
- full consensus rule validation
- transaction mempool
- wallet functionality
- RPC functionality
Performance and Reliability:
- reliable syncing on Testnet
- reliable syncing under poor network conditions
- batch verification
- performance tuning
Documentation
The Zebra website contains user documentation, such as how to run or configure Zebra, set up metrics integrations, etc., as well as developer documentation, such as design documents. We also render API documentation for the external API of our crates, as well as internal documentation for private APIs.
Architecture
Unlike zcashd
, which originated as a Bitcoin Core fork and inherited its
monolithic architecture, Zebra has a modular, library-first design, with the
intent that each component can be independently reused outside of the zebrad
full node. For instance, the zebra-network
crate containing the network stack
can also be used to implement anonymous transaction relay, network crawlers, or
other functionality, without requiring a full node.
At a high level, the fullnode functionality required by zebrad
is factored
into several components:
-
zebra-chain
, providing definitions of core data structures for Zcash, such as blocks, transactions, addresses, etc., and related functionality. It also contains the implementation of the consensus-critical serialization formats used in Zcash. The data structures inzebra-chain
are defined to enforce structural validity by making invalid states unrepresentable. For instance, theTransaction
enum has variants for each transaction version, and it's impossible to construct a transaction with, e.g., spend or output descriptions but no binding signature, or, e.g., a version 2 (Sprout) transaction with Sapling proofs. Currently,zebra-chain
is oriented towards verifying transactions, but will be extended to support creating them in the future. -
zebra-network
, providing an asynchronous, multithreaded implementation of the Zcash network protocol inherited from Bitcoin. In contrast tozcashd
, each peer connection has a separate state machine, and the crate translates the external network protocol into a stateless, request/response-oriented protocol for internal use. The crate provides two interfaces:- an auto-managed connection pool that load-balances local node requests over available peers, and sends peer requests to a local inbound service, and
- a
connect_isolated
method that produces a peer connection completely isolated from all other node state. This can be used, for instance, to safely relay data over Tor, without revealing distinguishing information.
-
zebra-script
provides script validation. Currently, this is implemented by linking to the C++ script verification code fromzcashd
, but in the future we may implement a pure-Rust script implementation. -
zebra-consensus
performs semantic validation of blocks and transactions: all consensus rules that can be checked independently of the chain state, such as verification of signatures, proofs, and scripts. Internally, the library usestower-batch
to perform automatic, transparent batch processing of contemporaneous verification requests. -
zebra-state
is responsible for storing, updating, and querying the chain state. The state service is responsible for contextual verification: all consensus rules that check whether a new block is a valid extension of an existing chain, such as updating the nullifier set or checking that transaction inputs remain unspent. -
zebrad
contains the full node, which connects these components together and implements logic to handle inbound requests from peers and the chain sync process. -
zebra-rpc
andzebra-client
will eventually contain the RPC and wallet functionality, but as mentioned above, our goal is to implement replication of chain state first before asking users to entrust Zebra with their funds.
All of these components can be reused as independent libraries, and all communication between stateful components is handled internally by internal asynchronous RPC abstraction ("microservices in one process").
License
Zebra is distributed under the terms of both the MIT license and the Apache License (Version 2.0).
See LICENSE-APACHE and LICENSE-MIT.
User Documentation
This section contains details on how to install, run, and instrument Zebra.
Installing Zebra
Zebra is still under development, so there is no supported packaging or install mechanism. To run Zebra, check out the git repository:
git clone https://github.com/ZcashFoundation/zebra
and then run
cargo build
Be aware that Zebra is still in an extremely early stage of development.
Running Zebra
zebrad generate
generates a default config. These defaults will be used if
no config is present, so it's not necessary to generate a config. However,
having a config file with the default fields is a useful starting point for
changing the config.
The configuration format is the TOML encoding of the internal config structure, and documentation for all of the config options can be found here.
zebrad start
starts a full node.
Return Codes
0
: Application exited successfully1
: Application exited unsuccessfully2
: Application crashedzebrad
may also return platform-dependent codes.
Tracing Zebra
Zebra supports dynamic tracing, configured using the config's
TracingSection
and (optionally) an HTTP RPC endpoint.
If the endpoint_addr
is specified, zebrad
will open an HTTP endpoint
allowing dynamic runtime configuration of the tracing filter. For instance,
if the config had endpoint_addr = '127.0.0.1:3000'
, then
curl -X GET localhost:3000/filter
retrieves the current filter string;curl -X POST localhost:3000/filter -d "zebrad=trace"
sets the current filter string.
See the filter
documentation for more details.
Zebra also has support for generating flamegraphs of tracing spans,
configured using the flamegraph
option.
Zebra Metrics
Zebra has support for Prometheus, configured using the MetricsSection
.
This requires supporting infrastructure to collect and visualize metrics, for example:
- Install Prometheus and Grafana via Docker
# create a storage volume for grafana (once)
sudo docker volume create grafana-storage
# create a storage volume for prometheus (once)
sudo docker volume create prometheus-storage
# run prometheus with the included config
sudo docker run --detach --network host --volume prometheus-storage:/prometheus --volume /path/to/zebra/prometheus.yaml:/etc/prometheus/prometheus.yml prom/prometheus
# run grafana
sudo docker run --detach --network host --env GF_SERVER_HTTP_PORT=3030 --env GF_SERVER_HTTP_ADDR=localhost --volume grafana-storage:/var/lib/grafana grafana/grafana
Now the grafana dashboard is available at http://localhost:3030 ; the default username and password is admin
/admin
.
Prometheus scrapes Zebra on localhost:9999
, and provides the results on locahost:9090
.
- Configure Grafana with a Prometheus HTTP Data Source, using Zebra's
metrics.endpoint_addr
.
In zebrad.toml:
[metrics]
endpoint_addr = "127.0.0.1:9999"
In the grafana dashboard:
- Create a new Prometheus Data Source
- Enter the HTTP URL:
127.0.0.1:9090
- Save the configuration
Developer Documentation
This section contains the contribution guide and design documentation. It does not contain API documentation, which is generated using Rustdoc:
doc.zebra.zfnd.org
renders documentation for the public API;doc-internal.zebra.zfnd.org
renders documentation for the internal API.
Contributing
Running and Debugging
See the user documentation for details on how to build, run, and instrument Zebra.
Bug Reports
File an issue on the issue tracker using the bug report template.
Pull Requests
PRs are welcome for small and large changes, but please don't make large PRs without coordinating with us via the issue tracker or Discord. This helps increase development coordination and makes PRs easier to merge.
Check out the help wanted or good first issue labels if you're looking for a place to get started!
Zebra RFCs
Significant changes to the Zebra codebase are planned using Zebra RFCs. These allow structured discussion about a proposed change and provide a record of the planned design.
To make a Zebra RFC:
-
Choose a short feature name like
my-feature
. -
Copy the
book/src/dev/rfcs/0000-template.md
file tobook/src/dev/rfcs/XXXX-my-feature.md
. -
Edit the template header to add the feature name and the date, but leave the other fields blank for now.
-
Write the design! The template has a suggested list of sections that are a useful guide.
-
Create an design PR using the RFC template.
-
Take the next available RFC number (not conflicting with any existing RFCs or design PRs) and name the RFC file accordingly, e.g.,
0027-my-feature.md
for number 27. Make sure thatbook/src/SUMMARY.md
links to the numbered RFC. -
After creating an RFC PR, update the RFC header and the PR description with the PR number.
-
After the RFC is accepted, create an issue for the implementation of the design, and update the RFC header and PR description with the implementation issue number.
Design Overview
This document sketches the future design for Zebra.
Desiderata
The following are general desiderata for Zebra:
-
[George's list..]
-
As much as reasonably possible, it and its dependencies should be implemented in Rust. While it may not make sense to require this in every case (for instance, it probably doesn't make sense to rewrite libsecp256k1 in Rust, instead of using the same upstream library as Bitcoin), we should generally aim for it.
-
As much as reasonably possible, Zebra should minimize trust in required dependencies. Note that "minimize number of dependencies" is usually a proxy for this desideratum, but is not exactly the same: for instance, a collection of crates like the tokio crates are all developed together and have one trust boundary.
-
Zebra should be well-factored internally into a collection of component libraries which can be used by other applications to perform Zcash-related tasks. Implementation details of each component should not leak into all other components.
-
Zebra should checkpoint on Sapling activation and drop all Sprout-related functionality not required post-Sapling.
Non-Goals
- Zebra keeps a copy of the chain state, so it isn't intended for lightweight applications like light wallets. Those applications should use a light client protocol.
Internal Structure
The following is a list of internal component libraries (crates), and a description of functional responsibility.
zebra-chain
Internal Dependencies
None: these are the core data structure definitions.
Responsible for
-
definitions of commonly used data structures, e.g.,
Block
,Transaction
,Address
,KeyPair
...
-
parsing bytes into these data structures
-
definitions of core traits, e.g.,
ZcashSerialize
andZcashDeserialize
, which perform consensus-critical serialization logic.
Exported types
- [...]
zebra-network
Internal Dependencies
zebra-chain
Responsible for
- definition of a well structured, internal request/response protocol
- provides an abstraction for "this node" and "the network" using the internal protocol
- dynamic, backpressure-driven peer set management
- per-peer state machine that translates the internal protocol to the Bitcoin/Zcash protocol
- tokio codec for Bitcoin/Zcash message encoding.
Exported types
Request
, an enum representing all possible requests in the internal protocol;Response
, an enum representing all possible responses in the internal protocol;AddressBook
, a data structure for storing peer addresses;Config
, a configuration object for all networking-related parameters;init<S: Service>(Config, S) -> (impl Service, Arc<Mutex<AddressBook>>)
, the main entry-point.
The init
entrypoint constructs a dynamically-sized pool of peers
sending inbound requests to the provided S: tower::Service
representing "this node", and returns a Service
that can be used to
send requests to "the network", together with an AddressBook
updated
with liveness information from the peer pool. The AddressBook
can
be used to respond to inbound requests for peers.
All peerset management (finding new peers, creating new outbound connections, etc) is completely encapsulated, as is responsibility for routing outbound requests to appropriate peers.
zebra-state
Internal Dependencies
zebra-chain
for data structure definitions.
Responsible for
- block storage API
- operates on parsed block structs
- these structs can be converted from and into raw bytes
- primarily aimed at network replication, not at processing
- can be used to rebuild the database below
- operates on parsed block structs
- maintaining a database of tx, address, etc data
- this database can be blown away and rebuilt from the blocks, which are otherwise unused.
- threadsafe, typed lookup API that completely encapsulates the database logic
- handles stuff like "transactions are reference counted by outputs" etc.
- providing
tower::Service
interfaces for all of the above to support backpressure.
Exported types
Request
, an enum representing all possible requests in the internal protocol;- blocks can be accessed via their chain height or hash
- confirmed transactions can be accessed via their block, or directly via their hash
Response
, an enum representing all possible responses in the internal protocol;init() -> impl Service
, the main entry-point.
The init
entrypoint returns a Service
that can be used to
send requests for the chain state.
All state management (adding blocks, getting blocks by index or hash) is completely encapsulated.
zebra-script
Internal Dependencies
- ??? depends on how it's implemented internally
Responsible for
- the minimal Bitcoin script implementation required for Zcash
- script parsing
- context-free script validation
Notes
This can wrap an existing script implementation at the beginning.
If this existed in a "good" way, we could use it to implement tooling for Zcash script inspection, debugging, etc.
Questions
- How does this interact with NU4 script changes?
Exported types
- [...]
zebra-consensus
Internal Dependencies
zebra-chain
for data structures and parsing.zebra-state
to read and update the state database.zebra-script
for script parsing and validation.
Responsible for
- consensus-specific parameters (network magics, genesis block, pow parameters, etc) that determine the network consensus
- consensus logic to decide which block is the current block
- block and transaction verification
- context-free validation, e.g., signature, proof verification, etc.
- context-dependent validation, e.g., determining whether a transaction is accepted in a particular chain state context.
- verifying mempool (unconfirmed) transactions
- block checkpoints
- mandatory checkpoints (genesis block, sapling activation)
- optional regular checkpoints (every Nth block)
- modifying the chain state
- adding new blocks to
ZebraState
, including chain reorganisation - adding new transactions to
ZebraMempoolState
- adding new blocks to
- storing the transaction mempool state
- mempool transactions can be accessed via their hash
- providing
tower::Service
interfaces for all of the above to support backpressure and batch validation.
Exported types
block::init() -> impl Service
, the main entry-point for block verification.ZebraMempoolState
- all state management (adding transactions, getting transactions by hash) is completely encapsulated.
mempool::init() -> impl Service
, the main entry-point for mempool transaction verification.
The init
entrypoints return Service
s that can be used to
verify blocks or transactions, and add them to the relevant state.
zebra-rpc
Internal Dependencies
zebra-chain
for data structure definitionszebra-network
possibly? for definitions of network messages?
Responsible for
- rpc interface
Exported types
- [...]
zebra-client
Internal Dependencies
zebra-chain
for structure definitionszebra-state
for transaction queries and client/wallet state storagezebra-script
possibly? for constructing transactions
Responsible for
- implementation of some event a user might trigger
- would be used to implement a full wallet
- create transactions, monitors shielded wallet state, etc.
Notes
Communication between the client code and the rest of the node should be done
by a tower service interface. Since the Service
trait can abstract from a
function call to RPC, this means that it will be possible for us to isolate
all client code to a subprocess.
Exported types
- [...]
zebrad
Abscissa-based application which loads configs, all application components, and connects them to each other.
Responsible for
- actually running the server
- connecting functionality in dependencies
Internal Dependencies
zebra-chain
zebra-network
zebra-state
zebra-consensus
zebra-client
zebra-rpc
Unassigned functionality
Responsibility for this functionality needs to be assigned to one of the modules above (subject to discussion):
- [ ... add to this list ... ]
Zebra RFCs
We are experimenting with using a process similar to the Rust RFC process to document design decisions for Zebra.
- Feature Name: Pipelinable Block Syncing and Lookup
- Start Date: 2020-07-02
- Design PR: rust-lang/rfcs#0000
- Rust Issue: rust-lang/rust#0000
Summary
The Bitcoin network protocol used by Zcash allows nodes to download blocks from other peers. This RFC describes how we find and download this data asynchronously.
Motivation
To sync the chain, we need to find out which blocks to download and then download them. Downloaded blocks can then be fed into the verification system and (assuming they verify correctly) into the state system. In zcashd
, blocks are processed one at a time. In Zebra, however, we want to be able to pipeline block download and verification operations, using futures to explicitly specify logical dependencies between sub-tasks, which we execute concurrently and potentially out-of-order on a threadpool. This means that the procedure we use to determine which blocks to download must look somewhat different than zcashd
.
Block fetching in Bitcoin
Zcash inherits its network protocol from Bitcoin. Bitcoin block fetching works roughly as follows. A node can request block information from peers using either a getblocks
or getheaders
message. Both of these messages contain a block locator object consisting of a sequence of block hashes. The block hashes are ordered from highest to lowest, and represent checkpoints along the path from the node's current tip back to genesis. The remote peer computes the intersection between its chain and the node's chain by scanning through the block locator for the first hash in its chain. Then, it sends (up to) 500 subsequent block hashes in an inv
message (in the case of getblocks
) or (up to) 2000 block headers in a headers
message (in the case of getheaders
). Note: zcashd
reduces the getheaders
count to 160, because Zcash headers are much larger than Bitcoin headers, as noted below.
The headers
message sent after getheaders
contains the actual block headers, while the inv
message sent after getblocks
contains only hashes, which have to be fetched with a getdata
message. In Bitcoin, the block headers are small relative to the size of the full block, but this is not always the case for Zcash, where the block headers are much larger due to the use of Equihash and many blocks have only a few transactions. Also, getblocks
allows parallelizing block downloads, while getheaders
doesn't. For these reasons and because we know we need full blocks anyways, we should probably use getblocks
.
The getblocks
Bitcoin message corresponds to our zebra_network::Request::FindBlocksByHash
, and the getdata
message is generated by zebra_network::Request::Blocks
.
Pipelining block verification
As mentioned above, our goal is to be able to pipeline block download and verification. This means that the process for block lookup should ideally attempt to fetch and begin verification of future blocks without blocking on complete verification of all earlier blocks. To do this, we split the chain state into the verified block chain (held by the state component) and the prospective block chain (held only by the syncer), and use the following algorithm to pursue prospective chain tips.
ObtainTips
- Query the current state to construct the sequence of hashes
[tip, tip-1, tip-2, ..., tip-9, tip-20, tip-40, tip-80, tip-160 ]
The precise construction is unimportant, but this should have a Bitcoin-style dense-first, then-sparse hash structure.
The initial state should contain the genesis block for the relevant network. So the sequence of hashes will only contain the genesis block
[genesis ]
The network will respond with a list of hashes, starting at the child of the genesis block.
-
Make a
FindBlocksByHash
request to the networkF
times, whereF
is a fanout parameter, to getresp1, ..., respF
. -
For each response, starting from the beginning of the list, prune any block hashes already included in the state, stopping at the first unknown hash to get
resp1', ..., respF'
. (These lists may be empty). -
Combine the last elements of each list into a set; this is the set of prospective tips.
-
Combine all elements of each list into a set, and queue download and verification of those blocks.
-
If there are any prospective tips, call ExtendTips, which returns a new set of prospective tips. Continue calling ExtendTips with this new set, until there are no more prospective tips.
-
Restart after some delay, say 15 seconds.
ExtendTips
-
Remove all prospective tips from the set of prospective tips, then iterate through them. For each removed tip:
-
Create a
FindBlocksByHash
request consisting of just the prospective tip. Send this request to the networkF
times. -
For each response, check whether the first hash in the response is a genesis block (for either the main or test network). If so, discard the response. It indicates that the remote peer does not have any blocks following the prospective tip. (Or that the remote peer is on the wrong network.)
-
Combine the last elements of the remaining responses into a set, and add this set to the set of prospective tips.
-
Combine all elements of the remaining responses into a set, and queue download and verification of those blocks.
DoS resistance
Because this strategy aggressively downloads any available blocks, it could be vulnerable to a DoS attack, where a malicious peer feeds us bogus chain tips, causing us to waste network and CPU on blocks that will never be valid. However, because we separate block finding from block downloading, and because of the design of our network stack, this attack is probably not feasible. The primary reason is that zebra_network
randomly loadbalances outbound requests over all available peers.
Consider a malicious peer who responds to block discovery with a bogus list of hashes. We will eagerly attempt to download all of those bogus blocks, but our requests to do so will be randomly load-balanced to other peers, who are unlikely to know about the bogus blocks. When we try to extend a bogus tip, the extension request will also be randomly load-balanced, so it will likely be routed to a peer that doesn't know about it and can't extend it. And because we perform multiple block discovery queries, which will also be randomly load balanced, we're unlikely to get stuck on a false chain tip.
Fork-finding
When starting from a verified chain tip, the choice of block locator can find forks at least up to the reorg limit (99 blocks). When extending a prospective tip, forks are ignored, but this is fine, since unless we are prefetching the longest chain, we won't be able to keep extending the tip prospectively.
Retries and Fanout
We should consider the fanout parameter F
and the retry policy for the different requests. I'm not sure whether we need to retry requests to discover new block hashes, since the fanout may already provide redundancy. For the block requests themselves, we should have a retry policy with a limited number of attempts, enough to insulate against network failures but not so many that we would retry a bogus block indefinitely. Maybe fanout 4 and 3 retries?
Parallel Verification
- Feature Name: parallel_verification
- Start Date: 2020-07-27
- Design PR: ZcashFoundation/zebra#763
- Zebra Issue: ZcashFoundation/zebra#682
Summary
Zebra verifies blocks in several stages, most of which can be executed in parallel.
We use several different design patterns to enable this parallelism:
- We download blocks and start verifying them in parallel,
- We batch signature and proof verification using verification services, and
- We defer data dependencies until just before the block is committed to the state (see the detailed design RFCs).
Motivation
Zcash (and Bitcoin) are designed to verify each block in sequence, starting from the genesis block. But during the initial sync, and when restarting with an older state, this process can be quite slow.
By deferring data dependencies, we can partially verify multiple blocks in parallel.
By parallelising block and transaction verification, we can use multithreading and batch verification for signatures, proofs, scripts, and hashes.
Definitions
Blockchain:
- chain fork: Zcash is implemented using a tree of blocks. Each block has a single previous block, and zero to many next blocks. A chain fork consists of a tip and all its previous blocks, back to the genesis block.
- genesis: The root of the tree of blocks is called the genesis block. It has no previous block.
- tip: A block which has no next block is called a tip. Each chain fork can be identified using its tip.
Data:
- consensus rule: A protocol rule which all nodes must apply consistently, so they can converge on the same chain fork.
- context-free: Consensus rules which do not have a data dependency on previous blocks.
- data dependency: Information contained in the previous block and its chain fork, which is required to verify the current block.
- state: The set of verified blocks. The state might also cache some dependent data, so that we can efficiently verify subsequent blocks.
Verification Stages:
- structural verification: Parsing raw bytes into the data structures defined by the protocol.
- semantic verification: Verifying the consensus rules on the data structures defined by the protocol.
- contextual verification: Verifying the current block, once its data dependencies have been satisfied by a verified previous block. This verification might also use the cached state corresponding to the previous block.
Guide-level explanation
In Zebra, we want to verify blocks in parallel. Some fields can be verified straight away, because they don't depend on the output of previous blocks. But other fields have data dependencies, which means that we need previous blocks before we can fully validate them.
If we delay checking some of these data dependencies, then we can do more of the verification in parallel.
Example: BlockHeight
Here's how Zebra can verify the different Block Height consensus rules in parallel:
Structural Verification:
- Parse the Block into a BlockHeader and a list of transactions.
Semantic Verification: No Data Dependencies:
- Check that the first input of the first transaction in the block is a coinbase input with a valid block height in its data field.
Semantic Verification: Deferring a Data Dependency:
- Verify other consensus rules that depend on Block Height, assuming that the Block Height is correct. For example, many consensus rules depend on the current Network Upgrade, which is determined by the Block Height. We verify these consensus rules, assuming the Block Height and Network Upgrade are correct.
Contextual Verification:
- Submit the block to the state for contextual verification. When it is ready to be committed (it may arrive before the previous block), check all deferred constraints, including the constraint that the block height of this block is one more than the block height of its parent block. If all constraints are satisfied, commit the block to the state. Otherwise, reject the block as invalid.
Zebra Design
Design Patterns
When designing changes to Zebra verification, use these design patterns:
- perform context-free verification as soon as possible, (that is, verification which has no data dependencies on previous blocks),
- defer data dependencies as long as possible, then
- check the data dependencies.
Minimise Deferred Data
Keep the data dependencies and checks as simple as possible.
For example, Zebra could defer checking both the Block Height and Network Upgrade.
But since the Network Upgrade depends on the Block Height, we only need to defer the Block Height check. Then we can use all the fields that depend on the Block Height, as if it is correct. If the final Block Height check fails, we will reject the entire block, including all the verification we performed using the assumed Network Upgrade.
Implementation Strategy
When implementing these designs, perform as much verification as possible, await any dependencies, then perform the necessary checks.
Reference-level explanation
Verification Stages
In Zebra, verification occurs in the following stages:
- Structural Verification: Raw block data is parsed into a block header and transactions. Invalid data is not representable in these structures: deserialization (parsing) can fail, but serialization always succeeds.
- Semantic Verification: Parsed block fields are verified, based on their
data dependencies:
- Context-free fields have no data dependencies, so they can be verified as needed.
- Fields with simple data dependencies defer that dependency as long as possible, so they can perform more verification in parallel. Then they await the required data, which is typically the previous block. (And potentially older blocks in its chain fork.)
- Fields with complex data dependencies require their own parallel verification designs. These designs are out of scope for this RFC.
- Contextual Verification: After a block is verified, it is added to the state. The details of state updates, and their interaction with semantic verification, are out of scope for this RFC.
This RFC focuses on Semantic Verification, and the design patterns that enable blocks to be verified in parallel.
Verification Interfaces
Verification is implemented by the following traits and services:
- Structural Verification:
zebra_chain::ZcashDeserialize
: A trait for parsing consensus-critical data structures from a byte buffer.
- Semantic Verification:
ChainVerifier
: Provides a verifier service that accepts aBlock
request, performs verification on the block, and responds with ablock::Hash
on success.- Internally, the
ChainVerifier
selects between aCheckpointVerifier
for blocks that are within the checkpoint range, and aBlockVerifier
for recent blocks.
- Contextual Verification:
zebra_state::init
: Provides the state update service, which accepts requests to add blocks to the state.
Checkpoint Verification
The CheckpointVerifier
performs rapid verification of blocks, based on a set
of hard-coded checkpoints. Each checkpoint hash can be used to verify all the
previous blocks, back to the genesis block. So Zebra can skip almost all verification for blocks in the checkpoint range.
The CheckpointVerifier
uses an internal queue to store pending blocks.
Checkpoint verification is cheap, so it is implemented using non-async
functions within the CheckpointVerifier service.
Here is how the CheckpointVerifier
implements each verification stage:
- Structural Verification:
- As Above: the
CheckpointVerifier
accepts parsedBlock
structs.
- As Above: the
- Semantic Verification:
check_height
: makes sure the block height is within the unverified checkpoint range, and adds the block to its internal queue.target_checkpoint_height
: Checks for a continuous range of blocks from the previous checkpoint to a subsequent checkpoint. If the chain is incomplete, returns a future, and waits for more blocks. If the chain is complete, assumes that theprevious_block_hash
fields of these blocks form an unbroken chain from checkpoint to checkpoint, and starts processing the checkpoint range. (This constraint is an implicit part of theCheckpointVerifier
design.)process_checkpoint_range
: makes sure that the blocks in the checkpoint range have an unbroken chain of previous block hashes.
- Contextual Verification:
- As Above: the
CheckpointVerifier
returns success to theChainVerifier
, which sends verifiedBlock
s to the state service.
- As Above: the
Block Verification
The BlockVerifier
performs detailed verification of recent blocks, in parallel.
Here is how the BlockVerifier
implements each verification stage:
- Structural Verification:
- As Above: the
BlockVerifier
accepts parsedBlock
structs.
- As Above: the
- Semantic Verification:
- As Above: verifies each field in the block. Defers any data dependencies as long as possible, awaits those data dependencies, then performs data dependent checks.
- Note: Since futures are executed concurrently, we can use the same function
to:
- perform context-free verification,
- perform verification with deferred data dependencies,
- await data dependencies, and
- check data dependencies. To maximise concurrency, we should write verification functions in this specific order, so the awaits are as late as possible.
- Contextual Verification:
- As Above: the
BlockVerifier
returns success to theChainVerifier
, which sends verifiedBlock
s to the state service.
- As Above: the
Zcash Protocol Design
When designing a change to the Zcash protocol, minimise the data dependencies between blocks.
Try to create designs that:
- Eliminate data dependencies,
- Make the changes depend on a version field in the block header or transaction,
- Make the changes depend on the current Network Upgrade, or
- Make the changes depend on a field in the current block, with an additional consensus rule to check that field against previous blocks.
When making decisions about these design tradeoffs, consider:
- how the data dependency could be deferred, and
- the CPU cost of the verification - if it is trivial, then it does not matter if the verification is parallelised.
Drawbacks
This design is a bit complicated, but we think it's necessary to achieve our goals.
Rationale and alternatives
- What makes this design a good design?
- It enables a significant amount of parallelism
- It is simpler than some other alternatives
- It uses existing Rust language facilities, mainly Futures and await/async
- Is this design a good basis for later designs or implementations?
- We have built a UTXO design on this design
- We believe we can build "recent blocks" and "chain summary" designs on this design
- Each specific detailed design will need to consider how the relevant data dependencies are persisted
- What other designs have been considered and what is the rationale for not choosing them?
- Serial verification
- Effectively single-threaded
- Awaiting data dependencies as soon as they are needed
- Less parallelism
- Providing direct access to the state
- Might cause data races, might be prevented by Rust's ownership rules
- Higher risk of bugs
- Serial verification
- What is the impact of not doing this?
- Verification is slow, we can't batch or parallelise some parts of the verification
Prior art
TODO: expand this section
- zcashd
- serial block verification
- Zebra implements the same consensus rules, but a different design
- tower
Unresolved questions
- Is this design good enough to use as a framework for future RFCs?
-
Does this design require any changes to the current implementation?
- Implement block height consensus rule (check previous block hash and height)
-
Check that the
BlockVerifier
performs checks in the following order:- verification, deferring dependencies as needed,
- await dependencies,
- check deferred data dependencies
Out of Scope:
-
What is the most efficient design for parallel verification?
- (Optimisations are out of scope.)
-
How is each specific field verified?
-
How do we verify fields with complex data dependencies?
-
How does verification change with different network upgrades?
-
How do multiple chains work, in detail?
-
How do state updates work, in detail?
-
Moving the verifiers into the state service
Future possibilities
- Separate RFCs for other data dependencies
- Recent blocks
- Overall chain summaries (for example, total work)
- Reorganisation limit: multiple chains to single chain transition
- Optimisations for parallel verification
- Feature Name:
inventory_tracking
- Start Date: 2020-08-25
- Design PR: ZcashFoundation/zebra#952
- Zebra Issue: ZcashFoundation/zebra#960
Summary
The Bitcoin network protocol used by Zcash allows nodes to advertise data (inventory items) for download by other peers. This RFC describes how we track and use this information.
Motivation
In order to participate in the network, we need to be able to fetch new data that our peers notify us about. Because our network stack abstracts away individual peer connections, and load-balances over available peers, we need a way to direct requests for new inventory only to peers that advertised to us that they have it.
Definitions
- Inventory item: either a block or transaction.
- Inventory hash: the hash of an inventory item, represented by the
InventoryHash
type. - Inventory advertisement: a notification from another peer that they have some inventory item.
- Inventory request: a request to another peer for an inventory item.
Guide-level explanation
The Bitcoin network protocol used by Zcash provides a mechanism for nodes to
gossip blockchain data to each other. This mechanism is used to distribute
(mined) blocks and (unmined) transactions through the network. Nodes can
advertise data available in their inventory by sending an inv
message
containing the hashes and types of those data items. After receiving an inv
message advertising data, a node can determine whether to download it.
This poses a challenge for our network stack, which goes to some effort to abstract away details of individual peers and encapsulate all peer connections behind a single request/response interface representing "the network". Currently, the peer set tracks readiness of all live peers, reports readiness if at least one peer is ready, and routes requests across ready peers randomly using the "power of two choices" algorithm.
However, while this works well for data that is already distributed across the network (e.g., existing blocks) it will not work well for fetching data during distribution across the network. If a peer informs us of some new data, and we attempt to download it from a random, unrelated peer, we will likely fail. Instead, we track recent inventory advertisements, and make a best-effort attempt to route requests to peers who advertised that inventory.
Reference-level explanation
The inventory tracking system has several components:
- A registration hook that monitors incoming messages for inventory advertisements;
- An inventory registry that tracks inventory presence by peer;
- Routing logic that uses the inventory registry to appropriately route requests.
The first two components have fairly straightforward design decisions, but the third has considerably less obvious choices and tradeoffs.
Inventory Monitoring
Zebra uses Tokio's codec mechanism to transform a byte-oriented I/O interface
into a Stream
and Sink
for incoming and outgoing messages. These are
passed to the peer connection state machine, which is written generically over
any Stream
and Sink
. This construction makes it easy to "tap" the sequence
of incoming messages using .then
and .with
stream and sink combinators.
We already do this to record Prometheus metrics on message rates as well as to
report message timestamps used for liveness checks and last-seen address book
metadata. The message timestamp mechanism is a good example to copy. The
handshake logic instruments the incoming message stream with a closure that
captures a sender handle for a mpsc channel with a large buffer (currently 100
timestamp entries). The receiver handle is owned by a separate task that shares
an Arc<Mutex<AddressBook>>
with other parts of the application. This task
waits for new timestamp entries, acquires a lock on the address book, and
updates the address book. This ensures that timestamp updates are queued
asynchronously, without lock contention.
Unlike the address book, we don't need to share the inventory data with other
parts of the application, so it can be owned exclusively by the peer set. This
means that no lock is necessary, and the peer set can process advertisements in
its poll_ready
implementation. This method may be called infrequently, which
could cause the channel to fill. However, because inventory advertisements are
time-limited, in the sense that they're only useful before some item is fully
distributed across the network, it's safe to handle excess entries by dropping
them. This behavior is provided by a broadcast/mpmc channel, which can be
used in place of an mpsc channel.
An inventory advertisement is an (InventoryHash, SocketAddr)
pair. The
stream hook should check whether an incoming message is an inv
message with
only a small number (e.g., 1) inventory entries. If so, it should extract the
hash for each item and send it through the channel. Otherwise, it should
ignore the message contents. Why? Because inv
messages are also sent in
response to queries, such as when we request subsequent block hashes, and in
that case we want to assume that the inventory is generally available rather
than restricting downloads to a single peer. However, items are usually
gossiped individually (or potentially in small chunks; zcashd
has an internal
inv
buffer subject to race conditions), so choosing a small bound such as 1
is likely to work as a heuristic for when we should assume that advertised
inventory is not yet generally available.
Inventory Registry
The peer set's poll_ready
implementation should extract all available
(InventoryHash, SocketAddr)
pairs from the channel, and log a warning event
if the receiver is lagging. The channel should be configured with a generous
buffer size (such as 100) so that this is unlikely to happen in normal
circumstances. These pairs should be fed into an InventoryRegistry
structure
along these lines:
#![allow(unused)] fn main() { struct InventoryRegistry{ current: HashMap<InventoryHash, HashSet<SocketAddr>>, prev: HashMap<InventoryHash, HashSet<SocketAddr>>, } impl InventoryRegistry { pub fn register(&mut self, item: InventoryHash, addr: SocketAddr) { self.0.entry(item).or_insert(HashSet::new).insert(addr); } pub fn rotate(&mut self) { self.prev = std::mem::take(self.current) } pub fn peers(&self, item: InventoryHash) -> impl Iterator<Item=&SocketAddr> { self.prev.get(item).chain(self.current.get(item)).flatten() } } }
This API allows pruning the inventory registry using rotate
, which
implements generational pruning of registry entries. The peer set should
maintain a tokio::time::Interval
with some interval parameter, and check in
poll_ready
whether the interval stream has any items, calling rotate
for
each one:
#![allow(unused)] fn main() { while let Poll::Ready(Some(_)) = timer.poll_next(cx) { registry.rotate(); } }
By rotating for each available item in the interval stream, rather than just
once, we ensure that if the peer set's poll_ready
is not called for a long
time, rotate
will be called enough times to correctly flush old entries.
Inventory advertisements live in the registry for twice the length of the timer, so it should be chosen to be half of the desired lifetime for inventory advertisements. Setting the timer to 75 seconds, the block interval, seems like a reasonable choice.
Routing Logic
At this point, the peer set has information on recent inventory advertisements.
However, the Service
trait only allows poll_ready
to report readiness based
on the service's data and the type of the request, not the content of the
request. This means that we must report readiness without knowing whether the
request should be routed to a specific peer, and we must handle the case where
call
gets a request for an item only available at an unready peer.
This RFC suggests the following routing logic. First, check whether the
request fetches data by hash. If so, and peers()
returns Some(ref addrs)
,
iterate over addrs
and route the request to the first ready peer if there is
one. In all other cases, fall back to p2c routing. Alternatives are suggested
and discussed below.
Rationale and alternatives
The rationale is described above. The alternative choices are primarily around the routing logic.
Because the Service
trait does not allow applying backpressure based on the
content of a request, only based on the service's internal data (via the
&mut self
parameter of Service::poll_ready
) and on the type of the
request (which determines which impl Service
is used). This means that it
is impossible for us to apply backpressure until a service that can process a
specific inventory request is ready, because until we get the request, we
can't determine which peers might be required to process it.
We could attempt to ensure that the peer set would be ready to process a specific inventory request would be to pre-emptively "reserve" a peer as soon as it advertises an inventory item. But this doesn't actually work to ensure readiness, because a peer could advertise two inventory items, and only be able to service one request at a time. It also potentially locks the peer set, since if there are only a few peers and they all advertise inventory, the service can't process any other requests. So this approach does not work.
Another alternative would be to do some kind of buffering of inventory requests that cannot immediately be processed by a peer that advertised that inventory. There are two basic sub-approaches here.
In the first case, we could maintain an unbounded queue of yet-to-be
processed inventory requests in the peer set, and every time poll_ready
is
called, we check whether a service that could serve those inventory requests
became ready, and start processing the request if we can. This would provide
the lowest latency, because we can dispatch the request to the first
available peer. For instance, if peer A advertises inventory I, the peer set
gets an inventory request for I, peer A is busy so the request is queued, and
peer B advertises inventory I, we could dispatch the queued request to B
rather than waiting for A.
However, it's not clear exactly how we'd implement this, because this
mechanism is driven by calls to poll_ready
, and those might not happen. So
we'd need some separate task that would drive processing the buffered task to
completion, but this may not be able to do so by poll_ready
, since that
method requires owning the service, and the peer set will be owned by a
Buffer
worker.
In the second case, we could select an unready peer that advertised the
requested inventory, clone it, and move the cloned peer into a task that
would wait for that peer to become ready and then make the request. This is
conceptually much cleaner than the above mechanism, but it has the downside
that we don't dispatch the request to the first ready peer. In the example
above, if we cloned peer A and dispatched the request to it, we'd have to
wait for A to become ready, even if the second peer B advertised the same
inventory just after we dispatched the request to A. However, this is not
presently possible anyways, because the peer::Client
s that handle requests
are not clonable. They could be made clonable (they send messages to the
connection state machine over a mpsc channel), but we cannot make this change
without altering our liveness mechanism, which uses bounds on the
time-since-last-message to determine whether a peer connection is live and to
prevent immediate reconnections to recently disconnected peers.
A final alternative would be to fail inventory requests that we cannot route to a peer which advertised that inventory. This moves the failure forward in time, but preemptively fails some cases where the request might succeed -- for instance, if the peer has inventory but just didn't tell us, or received the inventory between when we dispatch the request and when it receives our message. It seems preferable to try and fail than to not try at all.
In practice, we're likely to care about the gossip protocol and inventory fetching once we've already synced close to the chain tip. In this setting, we're likely to already have peer connections, and we're unlikely to be saturating our peer set with requests (as we do during initial block sync). This suggests that the common case is one where we have many idle peers, and that therefore we are unlikely to have dispatched any recent requests to the peer that advertised inventory. So our common case should be one where all of this analysis is irrelevant.
- Start Date: 2020-08-10
- Design PR: ZcashFoundation/zebra#868
- Zebra Issue: ZcashFoundation/zebra#964
Summary
This RFC describes an architecture for asynchronous script verification and its interaction with the state layer. This architecture imposes constraints on the ordering of operations in the state layer.
Motivation
As in the rest of Zebra, we want to express our work as a collection of work-items with explicit dependencies, then execute these items concurrently and in parallel on a thread pool.
Definitions
- UTXO: unspent transaction output. Transaction outputs are modeled in
zebra-chain
by thetransparent::Output
structure. - Transaction input: an output of a previous transaction consumed by a later transaction (the one it is an input to). Modeled in
zebra-chain
by thetransparent::Input
structure. - lock script: the script that defines the conditions under which some UTXO can be spent. Stored in the
transparent::Output::lock_script
field. - unlock script: a script satisfying the conditions of the lock script, allowing a UTXO to be spent. Stored in the
transparent::Input::PrevOut::lock_script
field.
Guide-level explanation
Zcash's transparent address system is inherited from Bitcoin. Transactions spend unspent transaction outputs (UTXOs) from previous transactions. These UTXOs are encumbered by locking scripts that define the conditions under which they can be spent, e.g., requiring a signature from a certain key. Transactions wishing to spend UTXOs supply an unlocking script that should satisfy the conditions of the locking script for each input they wish to spend.
This means that script verification requires access to data about previous UTXOs, in order to determine the conditions under which those UTXOs can be spent. In Zebra, we aim to run operations asychronously and out-of-order to the greatest extent possible. For instance, we may begin verification of a block before all of its ancestors have been verified or even downloaded. So we need to design a mechanism that allows script verification to declare its data dependencies and execute as soon as all required data is available.
It's not necessary for this mechanism to ensure that the transaction outputs remain unspent, only to give enough information to perform script verification. Checking that all transaction inputs are actually unspent is done later, at the point that its containing block is committed to the chain.
At a high level, this adds a new request/response pair to the state service:
Request::AwaitUtxo(OutPoint)
requests atransparent::Output
specified byOutPoint
from the state layer;Response::Utxo(transparent::Output)
supplies requested thetransparent::Output
.
Note that this request is named differently from the other requests,
AwaitUtxo
rather than GetUtxo
or similar. This is because the request has
rather different behavior: the request does not complete until the state
service learns about a UTXO matching the request, which could be never. For
instance, if the transaction output was already spent, the service is not
required to return a response. The caller is responsible for using a timeout
layer or some other mechanism.
This allows a script verifier to asynchronously obtain information about previous transaction outputs and start verifying scripts as soon as the data is available. For instance, if we begin parallel download and verification of 500 blocks, we should be able to begin script verification of all scripts referencing outputs from existing blocks in parallel, and begin verification of scripts referencing outputs from new blocks as soon as they are committed to the chain.
Because spending outputs from older blocks is more common than spending outputs from recent blocks, this should allow a significant amount of parallelism.
Reference-level explanation
We add a Request::AwaitUtxo(OutPoint)
and
Response::Utxo(transparent::Output)
to the state protocol. As described
above, the request name is intended to indicate the request's behavior: the
request does not resolve until the state layer learns of a UTXO described by
the request.
To verify scripts, a script verifier requests the relevant UTXOs from the
state service and waits for all of them to resolve, or fails verification
with a timeout error. Currently, we outsource script verification to
zcash_consensus
, which does FFI into the same C++ code as zcashd
uses.
We need to ensure this code is thread-safe.
Implementing the state request correctly requires considering two sets of behaviors:
- behaviors related to the state's external API (a
Buffer
edtower::Service
); - behaviors related to the state's internal implementation (using
rocksdb
).
Making this distinction helps us to ensure we don't accidentally leak
"internal" behaviors into "external" behaviors, which would violate
encapsulation and make it more difficult to replace rocksdb
.
In the first category, our state is presented to the rest of the application
as a Buffer
ed tower::Service
. The Buffer
wrapper allows shared access
to a service using an actor model, moving the service to be shared into a
worker task and passing messages to it over an multi-producer single-consumer
(mpsc) channel. The worker task receives messages and makes Service::call
s.
The Service::call
method returns a Future
, and the service is allowed to
decide how much work it wants to do synchronously (in call
) and how much
work it wants to do asynchronously (in the Future
it returns).
This means that our external API ensures that the state service sees a linearized sequence of state requests, although the exact ordering is unpredictable when there are multiple senders making requests.
Because the state service has exclusive access to the rocksdb database, and the
state service sees a linearized sequence of state requests, we have an easy
way to opt in to asynchronous database access. We can perform rocksdb operations
synchronously in the Service::call
, waiting for them to complete, and be
sure that all future requests will see the resulting rocksdb state. Or, we can
perform rocksdb operations asynchronously in the future returned by
Service::call
.
If we perform all writes synchronously and allow reads to be either synchronous or asynchronous, we ensure that writes cannot race each other. Asynchronous reads are guaranteed to read at least the state present at the time the request was processed, or a later state.
Now, returning to the UTXO lookup problem, we can map out the possible states with this restriction in mind. This description assumes that UTXO storage is split into disjoint sets, one in-memory (e.g., blocks after the reorg limit) and the other in rocksdb (e.g., blocks after the reorg limit). The details of this storage are not important for this design, only that the two sets are disjoint.
When the state service processes a Request::AwaitUtxo(OutPoint)
referencing
some UTXO u
, there are three disjoint possibilities:
u
is already contained in an in-memory block storage;u
is already contained in the rocksdb UTXO set;u
is not yet known to the state service.
In case 3, we need to queue u
and scan all future blocks to see whether
they contain u
. However, if we have a mechanism to queue u
, we can
perform check 2 asynchronously, because restricting to synchronous writes
means that any async read will return the current or later state. If u
was
in the rocksdb UTXO set when the request was processed, the only way that an
async read would not return u
is if the UTXO were spent, in which case the
service is not required to return a response.
This behavior can be encapsulated into a PendingUtxos
structure described below.
#![allow(unused)] fn main() { // sketch #[derive(Default, Debug)] struct PendingUtxos(HashMap<OutPoint, oneshot::Sender<transparent::Output>>); impl PendingUtxos { // adds the outpoint and returns (wrapped) rx end of oneshot // return can be converted to `Service::Future` pub fn queue(&mut self, outpoint: OutPoint) -> impl Future<Output=Result<Response, ...>>; // if outpoint is a hashmap key, remove the entry and send output on the channel pub fn respond(&mut self, outpoint: OutPoint, output: transparent::Output); // scans the hashmap and removes any entries with closed senders pub fn prune(&mut self); } }
The state service should maintain an Arc<Mutex<PendingUtxos>>
, used as follows:
- In
Service::call(Request::AwaitUtxo(u))
, the service should:
- call
PendingUtxos::queue(u)
to get a futuref
to return to the caller; spawn a task that does a rocksdb lookup foru
, callingPendingUtxos::respond(u, output)
if present; - check the in-memory storage for
u
, callingPendingUtxos::respond(u, output)
if present; - return
f
to the caller (it may already be ready). The common case is thatu
references an old UTXO, so spawning the lookup task first means that we don't wait to check in-memory storage foru
before starting the rocksdb lookup.
- In
Service::call(Request::CommitBlock(block, ..))
, the service should:
- call
PendingUtxos::check_block(block.as_ref())
; - do any other transactional checks before committing a block as normal.
Because the
AwaitUtxo
request is informational, there's no need to do the transactional checks before matching against pending UTXO requests, and doing so upfront potentially notifies other tasks earlier.
- In
Service::poll_ready()
, the service should callPendingUtxos::prune()
at least some of the time. This is required because when a consumer uses a timeout layer, the cancelled requests should be flushed from the queue to avoid a resource leak. However, doing this on every call will result in us spending a bunch of time iterating over the hashmap.
Drawbacks
One drawback of this design is that we may have to wait on a lock. However, the critical section basically amounts to a hash lookup and a channel send, so I don't think that we're likely to run into problems with long contended periods, and it's unlikely that we would get a deadlock.
Rationale and alternatives
High-level design rationale is inline with the design sketch. One low-level
option would be to avoid encapsulating behavior in the PendingUtxos
and
just have an Arc<Hashmap<..>>
, so that the lock only protects the hashmap
lookup and not sending through the channel. But I think the current design is
cleaner and the cost is probably not too large.
Unresolved questions
- We need to pick a timeout for UTXO lookup. This should be long enough to account for the fact that we may start verifying blocks before all of their ancestors are downloaded.
State Updates
- Feature Name: state_updates
- Start Date: 2020-08-14
- Design PR: https://github.com/ZcashFoundation/zebra/pull/902
- Zebra Issue: https://github.com/ZcashFoundation/zebra/issues/1049
Summary
Zebra manages chain state in the zebra-state
crate, which allows state
queries via asynchronous RPC (in the form of a Tower service). The state
system is responsible for contextual verification in the sense of RFC2,
checking that new blocks are consistent with the existing chain state before
committing them. This RFC describes how the state is represented internally,
and how state updates are performed.
Motivation
We need to be able to access and modify the chain state, and we want to have a description of how this happens and what guarantees are provided by the state service.
Definitions
-
state data: Any data the state service uses to represent chain state.
-
structural/semantic/contextual verification: as defined in RFC2.
-
block chain: A sequence of valid blocks linked by inclusion of the previous block hash in the subsequent block. Chains are rooted at the genesis block and extend to a tip.
-
chain state: The state of the ledger after application of a particular sequence of blocks (state transitions).
-
block work: The approximate amount of work required for a miner to generate a block hash that passes the difficulty filter. The number of block header attempts and the mining time are proportional to the work value. Numerically higher work values represent longer processing times.
-
cumulative work: The sum of the block work of all blocks in a chain, from genesis to the chain tip.
-
best chain: The chain with the greatest cumulative work. This chain represents the consensus state of the Zcash network and transactions.
-
side chain: A chain which is not contained in the best chain. Side chains are pruned at the reorg limit, when they are no longer connected to the finalized state.
-
chain reorganization: Occurs when a new best chain is found and the previous best chain becomes a side chain.
-
reorg limit: The longest reorganization accepted by
zcashd
, 100 blocks. -
orphaned block: A block which is no longer included in the best chain.
-
non-finalized state: State data corresponding to blocks above the reorg limit. This data can change in the event of a chain reorg.
-
finalized state: State data corresponding to blocks below the reorg limit. This data cannot change in the event of a chain reorg.
-
non-finalized tips: The highest blocks in each non-finalized chain. These tips might be at different heights.
-
finalized tip: The highest block in the finalized state. The tip of the best chain is usually 100 blocks (the reorg limit) above the finalized tip. But it can be lower during the initial sync, and after a chain reorganization, if the new best chain is at a lower height.
-
relevant chain: The relevant chain for a block starts at the previous block, and extends back to genesis.
-
relevant tip: The tip of the relevant chain.
Guide-level explanation
The zebra-state
crate provides an implementation of the chain state storage
logic in a Zcash consensus node. Its main responsibility is to store chain
state, validating new blocks against the existing chain state in the process,
and to allow later querying of said chain state. zebra-state
provides this
interface via a tower::Service
based on the actor model with a
request/response interface for passing messages back and forth between the
state service and the rest of the application.
The main entry point for the zebra-state
crate is the init
function. This
function takes a zebra_state::Config
and constructs a new state service,
which it returns wrapped by a tower::Buffer
. This service is then interacted
with via the tower::Service
trait.
#![allow(unused)] fn main() { use tower::{Service, ServiceExt}; let state = zebra_state::on_disk::init(state_config, network); let request = zebra_state::Request::BlockLocator; let response = state.ready_and().await?.call(request).await?; assert!(matches!(response, zebra_state::Response::BlockLocator(_))); }
Note: The tower::Service
API requires that ready
is always called
exactly once before each call
. It is up to users of the zebra state service
to uphold this contract.
The tower::Buffer
wrapper is Clone
able, allowing shared access to a common state service. This allows different tasks to share access to the chain state.
The set of operations supported by zebra-state
are encoded in its Request
enum. This enum has one variant for each supported operation.
#![allow(unused)] fn main() { pub enum Request { CommitBlock { block: Arc<Block>, }, CommitFinalizedBlock { block: Arc<Block>, }, Depth(Hash), Tip, BlockLocator, Transaction(Hash), Block(HashOrHeight), // .. some variants omitted } }
zebra-state
breaks down its requests into two categories and provides
different guarantees for each category: requests that modify the state, and
requests that do not. Requests that update the state are guaranteed to run
sequentially and will never race against each other. Requests that read state
are done asynchronously and are guaranteed to read at least the state present
at the time the request was processed by the service, or a later state
present at the time the request future is executed. The state service avoids
race conditions between the read state and the written state by doing all
contextual verification internally.
Reference-level explanation
State Components
Zcash (as implemented by zcashd
) differs from Bitcoin in its treatment of
transaction finality. If a new best chain is detected that does not extend
the previous best chain, blocks at the end of the previous best chain become
orphaned (no longer included in the best chain). Their state updates are
therefore no longer included in the best chain's chain state. The process of
rolling back orphaned blocks and applying new blocks is called a chain
reorganization. Bitcoin allows chain reorganizations of arbitrary depth,
while zcashd
limits chain reorganizations to 100 blocks. (In zcashd
, the
new best chain must be a side-chain that forked within 100 blocks of the tip
of the current best chain.)
This difference means that in Bitcoin, chain state only has probabilistic finality, while in Zcash, chain state is final once it is beyond the reorg limit. To simplify our implementation, we split the representation of the state data at the finality boundary provided by the reorg limit.
State data from blocks above the reorg limit (non-finalized state) is
stored in-memory and handles multiple chains. State data from blocks below
the reorg limit (finalized state) is stored persistently using rocksdb
and
only tracks a single chain. This allows a simplification of our state
handling, because only finalized data is persistent and the logic for
finalized data handles less invariants.
One downside of this design is that restarting the node loses the last 100 blocks, but node restarts are relatively infrequent and a short re-sync is cheap relative to the cost of additional implementation complexity.
Another downside of this design is that we do not achieve exactly the same
behavior as zcashd
in the event of a 51% attack: zcashd
limits each chain
reorganization to 100 blocks, but permits multiple reorgs, while Zebra limits
all chain reorgs to 100 blocks. In the event of a successful 51% attack on
Zcash, this could be resolved by wiping the rocksdb state and re-syncing the new
chain, but in this scenario there are worse problems.
Service Interface
The state is accessed asynchronously through a Tower service interface. Determining what guarantees the state service can and should provide to the rest of the application requires considering two sets of behaviors:
- behaviors related to the state's external API (a
Buffer
edtower::Service
); - behaviors related to the state's internal implementation (using
rocksdb
).
Making this distinction helps us to ensure we don't accidentally leak
"internal" behaviors into "external" behaviors, which would violate
encapsulation and make it more difficult to replace rocksdb
.
In the first category, our state is presented to the rest of the application
as a Buffer
ed tower::Service
. The Buffer
wrapper allows shared access
to a service using an actor model, moving the service to be shared into a
worker task and passing messages to it over an multi-producer single-consumer
(mpsc) channel. The worker task receives messages and makes Service::call
s.
The Service::call
method returns a Future
, and the service is allowed to
decide how much work it wants to do synchronously (in call
) and how much
work it wants to do asynchronously (in the Future
it returns).
This means that our external API ensures that the state service sees a linearized sequence of state requests, although the exact ordering is unpredictable when there are multiple senders making requests.
Because the state service has exclusive access to the rocksdb database, and the
state service sees a linearized sequence of state requests, we have an easy
way to opt in to asynchronous database access. We can perform rocksdb operations
synchronously in the Service::call
, waiting for them to complete, and be
sure that all future requests will see the resulting rocksdb state. Or, we can
perform rocksdb operations asynchronously in the future returned by
Service::call
.
If we perform all writes synchronously and allow reads to be either synchronous or asynchronous, we ensure that writes cannot race each other. Asynchronous reads are guaranteed to read at least the state present at the time the request was processed, or a later state.
Summary
-
rocksdb reads may be done synchronously (in
call
) or asynchronously (in theFuture
), depending on the context; -
rocksdb writes must be done synchronously (in
call
)
In-memory data structures
At a high level, the in-memory data structures store a collection of chains, each rooted at the highest finalized block. Each chain consists of a map from heights to blocks. Chains are stored using an ordered map from cumulative work to chains, so that the map ordering is the ordering of worst to best chains.
The Chain
type
The Chain
type represents a chain of blocks. Each block represents an
incremental state update, and the Chain
type caches the cumulative state
update from its root to its tip.
The Chain
type is used to represent the non-finalized portion of a complete
chain of blocks rooted at the genesis block. The parent block of the root of
a Chain
is the tip of the finalized portion of the chain. As an exception, the finalized
portion of the chain is initially empty, until the genesis block has been finalized.
The Chain
type supports several operations to manipulate chains, push
,
pop_root
, and fork
. push
is the most fundamental operation and handles
contextual validation of chains as they are extended. pop_root
is provided
for finalization, and is how we move blocks from the non-finalized portion of
the state to the finalized portion. fork
on the other hand handles creating
new chains for push
when new blocks arrive whose parent isn't a tip of an
existing chain.
Note: The Chain
type's API is only designed to handle non-finalized
data. The genesis block and all pre sapling blocks are always considered to
be finalized blocks and should not be handled via the Chain
type through
CommitBlock
. They should instead be committed directly to the finalized
state with CommitFinalizedBlock
. This is particularly important with the
genesis block since the Chain
will panic if used while the finalized state
is completely empty.
The Chain
type is defined by the following struct and API:
#![allow(unused)] fn main() { #[derive(Debug, Default, Clone)] struct Chain { blocks: BTreeMap<block::Height, Arc<Block>>, height_by_hash: HashMap<block::Hash, block::Height>, tx_by_hash: HashMap<transaction::Hash, (block::Height, usize)>, created_utxos: HashSet<transparent::OutPoint>, spent_utxos: HashSet<transparent::OutPoint>, sprout_anchors: HashSet<sprout::tree::Root>, sapling_anchors: HashSet<sapling::tree::Root>, sprout_nullifiers: HashSet<sprout::Nullifier>, sapling_nullifiers: HashSet<sapling::Nullifier>, partial_cumulative_work: PartialCumulativeWork, } }
pub fn push(&mut self, block: Arc<Block>)
Push a block into a chain as the new tip
-
Update cumulative data members
- Add the block's hash to
height_by_hash
- Add work to
self.partial_cumulative_work
- For each
transaction
inblock
- Add key:
transaction.hash
and value:(height, tx_index)
totx_by_hash
- Add created utxos to
self.created_utxos
- Add spent utxos to
self.spent_utxos
- Add nullifiers to the appropriate
self.<version>_nullifiers
- Add key:
- Add the block's hash to
-
Add block to
self.blocks
pub fn pop_root(&mut self) -> Arc<Block>
Remove the lowest height block of the non-finalized portion of a chain.
-
Remove the lowest height block from
self.blocks
-
Update cumulative data members
- Remove the block's hash from
self.height_by_hash
- Subtract work from
self.partial_cumulative_work
- For each
transaction
inblock
- Remove
transaction.hash
fromtx_by_hash
- Remove created utxos from
self.created_utxos
- Remove spent utxos from
self.spent_utxos
- Remove the nullifiers from the appropriate
self.<version>_nullifiers
- Remove
- Remove the block's hash from
-
Return the block
pub fn fork(&self, new_tip: block::Hash) -> Option<Self>
Fork a chain at the block with the given hash, if it is part of this chain.
-
If
self
does not containnew_tip
returnNone
-
Clone self as
forked
-
While the tip of
forked
is not equal tonew_tip
- call
forked.pop_tip()
and discard the old tip
- call
-
Return
forked
fn pop_tip(&mut self)
Remove the highest height block of the non-finalized portion of a chain.
-
Remove the highest height
block
fromself.blocks
-
Update cumulative data members
- Remove the corresponding hash from
self.height_by_hash
- Subtract work from
self.partial_cumulative_work
- for each
transaction
inblock
- remove
transaction.hash
fromtx_by_hash
- Remove created utxos from
self.created_utxos
- Remove spent utxos from
self.spent_utxos
- Remove the nullifiers from the appropriate
self.<version>_nullifiers
- remove
- Remove the corresponding hash from
Ord
The Chain
type implements Ord
for reorganizing chains. First chains are
compared by their partial_cumulative_work
. Ties are then broken by
comparing block::Hash
es of the tips of each chain. (This tie-breaker means
that all Chain
s in the NonFinalizedState
must have at least one block.)
Note: Unlike zcashd
, Zebra does not use block arrival times as a
tie-breaker for the best tip. Since Zebra downloads blocks in parallel,
download times are not guaranteed to be unique. Using the block::Hash
provides a consistent tip order. (As a side-effect, the tip order is also
consistent after a node restart, and between nodes.)
Default
The Chain
type implements Default
for constructing new chains whose
parent block is the tip of the finalized state. This implementation should be
handled by #[derive(Default)]
.
- initialise cumulative data members
- Construct an empty
self.blocks
,height_by_hash
,tx_by_hash
,self.created_utxos
,self.spent_utxos
,self.<version>_anchors
,self.<version>_nullifiers
- Zero
self.partial_cumulative_work
- Construct an empty
Note: The ChainState
can be empty after a restart, because the
non-finalized state is empty.
NonFinalizedState
Type
The NonFinalizedState
type represents the set of all non-finalized state.
It consists of a set of non-finalized but verified chains and a set of
unverified blocks which are waiting for the full context needed to verify
them to become available.
NonFinalizedState
is defined by the following structure and API:
#![allow(unused)] fn main() { /// The state of the chains in memory, including queued blocks. #[derive(Debug, Default)] pub struct NonFinalizedState { /// Verified, non-finalized chains. chain_set: BTreeSet<Chain>, /// Blocks awaiting their parent blocks for contextual verification. contextual_queue: QueuedBlocks, } }
pub fn finalize(&mut self) -> Arc<Block>
Finalize the lowest height block in the non-finalized portion of the best chain and updates all side chains to match.
-
Extract the best chain from
self.chain_set
intobest_chain
-
Extract the rest of the chains into a
side_chains
temporary variable, so they can be mutated -
Remove the lowest height block from the best chain with
let finalized_block = best_chain.pop_root();
-
Add
best_chain
back toself.chain_set
ifbest_chain
is not empty -
For each remaining
chain
inside_chains
- remove the lowest height block from
chain
- If that block is equal to
finalized_block
andchain
is not empty addchain
back toself.chain_set
- Else, drop
chain
- remove the lowest height block from
-
Return
finalized_block
fn commit_block(&mut self, block: Arc<Block>)
Commit block
to the non-finalized state.
-
If the block is a pre-Sapling block, panic.
-
If any chains tip hash equal
block.header.previous_block_hash
remove that chain fromself.chain_set
-
Else Find the first chain that contains
block.parent
and fork it withblock.parent
as the new tiplet fork = self.chain_set.iter().find_map(|chain| chain.fork(block.parent));
-
Else panic, this should be unreachable because
commit_block
is only called whenblock
is ready to be committed. -
Push
block
intoparent_chain
-
Insert
parent_chain
intoself.chain_set
pub(super) fn commit_new_chain(&mut self, block: Arc<Block>)
Construct a new chain starting with block
.
-
Construct a new empty chain
-
push
block
into that new chain -
Insert the new chain into
self.chain_set
The QueuedBlocks
type
The queued blocks type represents the non-finalized blocks that were commited before their parent blocks were. It is responsible for tracking which blocks are queued by their parent so they can be commited immediately after the parent is commited. It also tracks blocks by their height so they can be discarded if they ever end up below the reorg limit.
NonFinalizedState
is defined by the following structure and API:
#![allow(unused)] fn main() { /// A queue of blocks, awaiting the arrival of parent blocks. #[derive(Debug, Default)] struct QueuedBlocks { /// Blocks awaiting their parent blocks for contextual verification. blocks: HashMap<block::Hash, QueuedBlock>, /// Hashes from `queued_blocks`, indexed by parent hash. by_parent: HashMap<block::Hash, Vec<block::Hash>>, /// Hashes from `queued_blocks`, indexed by block height. by_height: BTreeMap<block::Height, Vec<block::Hash>>, } }
pub fn queue(&mut self, new: QueuedBlock)
Add a block to the queue of blocks waiting for their requisite context to become available.
-
extract the
parent_hash
,new_hash
, andnew_height
fromnew.block
-
Add
new
toself.blocks
usingnew_hash
as the key -
Add
new_hash
to the set of hashes inself.by_parent.entry(parent_hash).or_default()
-
Add
new_hash
to the set of hashes inself.by_height.entry(new_height).or_default()
pub fn dequeue_children(&mut self, parent: block::Hash) -> Vec<QueuedBlock>
Dequeue the set of blocks waiting on parent
.
-
Remove the set of hashes waiting on
parent
fromself.by_parent
-
Remove and collect each block in that set of hashes from
self.blocks
asqueued_children
-
For each
block
inqueued_children
remove the associatedblock.hash
fromself.by_height
-
Return
queued_children
pub fn prune_by_height(&mut self, finalized_height: block::Height)
Prune all queued blocks whose height are less than or equal to
finalized_height
.
-
Split the
by_height
list at the finalized height, removing all heights that are belowfinalized_height
-
for each hash in the removed values of
by_height
- remove the corresponding block from
self.blocks
- remove the block's hash from the list of blocks waiting on
block.header.previous_block_hash
fromself.by_parent
- remove the corresponding block from
Summary
-
Chain
represents the non-finalized portion of a single chain -
NonFinalizedState
represents the non-finalized portion of all chains -
QueuedBlocks
represents all unverified blocks that are waiting for context to be available.
The state service uses the following entry points:
-
commit_block
when it receives new blocks. -
finalize
to prevent chains inNonFinalizedState
from growing beyond the reorg limit. -
FinalizedState.queue_and_commit_finalized_blocks on the blocks returned by
finalize
, to commit those finalized blocks to disk.
Committing non-finalized blocks
New non-finalized
blocks are commited as follows:
pub(super) fn queue_and_commit_non_finalized_blocks(&mut self, new: Arc<Block>) -> tokio::sync::oneshot::Receiver<block::Hash>
-
If a duplicate block hash exists in a non-finalized chain, or the finalized chain, it has already been successfully verified:
- create a new oneshot channel
- immediately send
Err(DuplicateBlockHash)
drop the sender - return the receiver
-
If a duplicate block hash exists in the queue:
- Find the
QueuedBlock
for that existing duplicate block - create a new channel for the new request
- replace the old sender in
queued_block
with the new sender - send
Err(DuplicateBlockHash)
through the old sender channel - continue to use the new receiver
- Find the
-
Else create a
QueuedBlock
forblock
:- Create a
tokio::sync::oneshot
channel - Use that channel to create a
QueuedBlock
forblock
- Add
block
toself.queued_blocks
- continue to use the new receiver
- Create a
-
If
block.header.previous_block_hash
is not present in the finalized or non-finalized state:- Return the receiver for the block's channel
-
Else iteratively attempt to process queued blocks by their parent hash starting with
block.header.previous_block_hash
-
While there are recently commited parent hashes to process
- Dequeue all blocks waiting on
parent
withlet queued_children = self.queued_blocks.dequeue_children(parent);
- for each queued
block
- Run contextual validation on
block
- contextual validation should check that the block height is equal to the previous block height plus 1. This check will reject blocks with invalid heights.
- If the block fails contextual validation send the result to the associated channel
- Else if the block's previous hash is the finalized tip add to the
non-finalized state with
self.mem.commit_new_chain(block)
- Else add the new block to an existing non-finalized chain or new fork
with
self.mem.commit_block(block);
- Send
Ok(hash)
over the associated channel to indicate the block was successfully commited - Add
block.hash
to the set of recently commited parent hashes to process
- Run contextual validation on
- Dequeue all blocks waiting on
-
While the length of the non-finalized portion of the best chain is greater than the reorg limit
- Remove the lowest height block from the non-finalized state with
self.mem.finalize();
- Commit that block to the finalized state with
self.disk.commit_finalized_direct(finalized);
- Remove the lowest height block from the non-finalized state with
-
Prune orphaned blocks from
self.queued_blocks
withself.queued_blocks.prune_by_height(finalized_height);
-
Return the receiver for the block's channel
rocksdb data structures
rocksdb provides a persistent, thread-safe BTreeMap<&[u8], &[u8]>
. Each map is
a distinct "tree". Keys are sorted using lex order on byte strings, so
integer values should be stored using big-endian encoding (so that the lex
order on byte strings is the numeric ordering).
We use the following rocksdb column families:
Tree | Keys | Values |
---|---|---|
hash_by_height | BE32(height) | block::Hash |
height_by_hash | block::Hash | BE32(height) |
block_by_height | BE32(height) | Block |
tx_by_hash | transaction::Hash | (BE32(height) \|\| BE32(tx_index)) |
utxo_by_outpoint | OutPoint | TransparentOutput |
sprout_nullifiers | sprout::Nullifier | () |
sapling_nullifiers | sapling::Nullifier | () |
sprout_anchors | sprout::tree::Root | () |
sapling_anchors | sapling::tree::Root | () |
Zcash structures are encoded using ZcashSerialize
/ZcashDeserialize
.
Note: We do not store the cumulative work for the finalized chain, because the finalized work is equal for all non-finalized chains. So the additional non-finalized work can be used to calculate the relative chain order, and choose the best chain.
Notes on rocksdb column families
-
The
hash_by_height
andheight_by_hash
column families provide a bijection between block heights and block hashes. (Since the rocksdb state only stores finalized state, they are actually a bijection). -
The
block_by_height
column family provides a bijection between block heights and block data. There is no correspondingheight_by_block
column family: instead, hash the block, and useheight_by_hash
. (Since the rocksdb state only stores finalized state, they are actually a bijection). -
Blocks are stored by height, not by hash. This has the downside that looking up a block by hash requires an extra level of indirection. The upside is that blocks with adjacent heights are adjacent in the database, and many common access patterns, such as helping a client sync the chain or doing analysis, access blocks in (potentially sparse) height order. In addition, the fact that we commit blocks in order means we're writing only to the end of the rocksdb column family, which may help save space.
-
Transaction references are stored as a
(height, index)
pair referencing the height of the transaction's parent block and the transaction's index in that block. This would more traditionally be a(hash, index)
pair, but because we store blocks by height, storing the height saves one level of indirection.
Committing finalized blocks
If the parent block is not committed, add the block to an internal queue for future processing. Otherwise, commit the block described below, then commit any queued children. (Although the checkpointer generates verified blocks in order when it completes a checkpoint, the blocks are committed in the response futures, so they may arrive out of order).
Committing a block to the rocksdb state should be implemented as a wrapper around
a function also called by Request::CommitBlock
,
which should:
pub(super) fn queue_and_commit_finalized_blocks(&mut self, queued_block: QueuedBlock)
- Obtain the highest entry of
hash_by_height
as(old_height, old_tip)
. Check thatblock
's parent hash isold_tip
and its height isold_height+1
, or panic. This check is performed as defense-in-depth to prevent database corruption, but it is the caller's responsibility (e.g. the zebra-state service's responsibility) to commit finalized blocks in order.
The genesis block does not have a parent block. For genesis blocks,
check that block
's parent hash is null
(all zeroes) and its height is 0
.
-
Insert:
(hash, height)
intoheight_by_hash
;(height, hash)
intohash_by_height
;(height, block)
intoblock_by_height
.
-
If the block is a genesis block, skip any transaction updates.
(Due to a bug in zcashd, genesis block anchors and transactions are ignored during validation.)
-
Update the
sprout_anchors
andsapling_anchors
trees with the Sprout and Sapling anchors. -
Iterate over the enumerated transactions in the block. For each transaction:
-
Insert
(transaction_hash, BE32(block_height) || BE32(tx_index))
totx_by_hash
; -
For each
TransparentInput::PrevOut { outpoint, .. }
in the transaction'sinputs()
, removeoutpoint
fromutxo_by_output
. -
For each
output
in the transaction'soutputs()
, construct theoutpoint
that identifies it, and insert(outpoint, output)
intoutxo_by_output
. -
For each
JoinSplit
description in the transaction, insert(nullifiers[0],())
and(nullifiers[1],())
intosprout_nullifiers
. -
For each
Spend
description in the transaction, insert(nullifier,())
intosapling_nullifiers
.
-
Note: The Sprout and Sapling anchors are the roots of the Sprout and
Sapling note commitment trees that have already been calculated for the last
transaction(s) in the block that have JoinSplit
s in the Sprout case and/or
Spend
/Output
descriptions in the Sapling case. These should be passed as
fields in the Commit*Block
requests.
Due to the coinbase maturity rules, the Sprout root is the empty root
for the first 100 blocks. (These rules are already implemented in contextual
validation and the anchor calculations.) Therefore, zcashd
's genesis bug is
irrelevant for the mainnet and testnet chains.
Hypothetically, if Sapling were activated from genesis, the specification requires
a Sapling anchor, but zcashd
would ignore that anchor.
These updates can be performed in a batch or without necessarily iterating over all transactions, if the data is available by other means; they're specified this way for clarity.
Accessing previous blocks for contextual validation
The state service performs contextual validation of blocks received via the
CommitBlock
request. Since CommitBlock
is synchronous, contextual validation
must also be performed synchronously.
The relevant chain for a block starts at its previous block, and follows the chain of previous blocks back to the genesis block.
Relevant chain iterator
The relevant chain can be retrieved from the state service as follows:
- if the previous block is the finalized tip:
- get recent blocks from the finalized state
- if the previous block is in the non-finalized state:
- get recent blocks from the relevant chain, then
- get recent blocks from the finalized state, if required
The relevant chain can start at any non-finalized block, or at the finalized tip.
Relevant chain implementation
The relevant chain is implemented as a StateService
iterator, which returns
Arc<Block>
s.
The chain iterator implements ExactSizeIterator
, so Zebra can efficiently
assert that the relevant chain contains enough blocks to perform each contextual
validation check.
#![allow(unused)] fn main() { impl StateService { /// Return an iterator over the relevant chain of the block identified by /// `hash`. /// /// The block identified by `hash` is included in the chain of blocks yielded /// by the iterator. pub fn chain(&self, hash: block::Hash) -> Iter<'_> { ... } } impl Iterator for Iter<'_> { type Item = Arc<Block>; ... } impl ExactSizeIterator for Iter<'_> { ... } impl FusedIterator for Iter<'_> {} }
For further details, see PR 1271.
Request / Response API
The state API is provided by a pair of Request
/Response
enums. Each
Request
variant corresponds to particular Response
variants, and it's
fine (and encouraged) for caller code to unwrap the expected variants with
unreachable!
on the unexpected variants. This is slightly inconvenient but
it means that we have a unified state interface with unified backpressure.
This API includes both write and read calls. Spotting Commit
requests in
code review should not be a problem, but in the future, if we need to
restrict access to write calls, we could implement a wrapper service that
rejects these, and export "read" and "write" frontends to the same inner service.
Request::CommitBlock
#![allow(unused)] fn main() { CommitBlock { block: Arc<Block>, sprout_anchor: sprout::tree::Root, sapling_anchor: sapling::tree::Root, } }
Performs contextual validation of the given block, committing it to the state
if successful. Returns Response::Added(block::Hash)
with the hash of
the newly committed block or an error.
Request::CommitFinalizedBlock
#![allow(unused)] fn main() { CommitFinalizedBlock { block: Arc<Block>, sprout_anchor: sprout::tree::Root, sapling_anchor: sapling::tree::Root, } }
Commits a finalized block to the rocksdb state, skipping contextual validation.
This is exposed for use in checkpointing, which produces in-order finalized
blocks. Returns Response::Added(block::Hash)
with the hash of the
committed block if successful.
Request::Depth(block::Hash)
Computes the depth in the best chain of the block identified by the given hash, returning
Response::Depth(Some(depth))
if the block is in the best chain;Response::Depth(None)
otherwise.
Implemented by querying:
- (non-finalized) the
height_by_hash
map in the best chain, and - (finalized) the
height_by_hash
tree
Request::Tip
Returns Response::Tip(block::Hash)
with the current best chain tip.
Implemented by querying:
- (non-finalized) the highest height block in the best chain
- (finalized) the highest height block in the
hash_by_height
tree, if thenon-finalized
state is empty
Request::BlockLocator
Returns Response::BlockLocator(Vec<block::Hash>)
with hashes starting from
the current chain tip and reaching backwards towards the genesis block. The
first hash is the best chain tip. The last hash is the tip of the finalized
portion of the state. If the finalized and non-finalized states are both
empty, the block locator is also empty.
This can be used by the sync component to request hashes of subsequent blocks.
Implemented by querying:
- (non-finalized) the
hash_by_height
map in the best chain - (finalized) the
hash_by_height
tree.
Request::Transaction(transaction::Hash)
Returns
-
Response::Transaction(Some(Transaction))
if the transaction identified by the given hash is contained in the state; -
Response::Transaction(None)
if the transaction identified by the given hash is not contained in the state.
Implemented by querying:
- (non-finalized) the
tx_by_hash
map (to get the block that contains the transaction) of each chain starting with the best chain, and then find block that chain'sblocks
(to get the block containing the transaction data) - (finalized) the
tx_by_hash
tree (to get the block that contains the transaction) and thenblock_by_height
tree (to get the block containing the transaction data), if the transaction is not in any non-finalized chain
Request::Block(block::Hash)
Returns
-
Response::Block(Some(Arc<Block>))
if the block identified by the given hash is contained in the state; -
Response::Block(None)
if the block identified by the given hash is not contained in the state;
Implemented by querying:
- (non-finalized) the
height_by_hash
of each chain starting with the best chain, then find block that chain'sblocks
(to get the block data) - (finalized) the
height_by_hash
tree (to get the block height) and then theblock_by_height
tree (to get the block data), if the block is not in any non-finalized chain
Request::AwaitUtxo(OutPoint)
Returns
Response::Utxo(transparent::Output)
Implemented by querying:
- (non-finalized) if any
Chains
containOutPoint
in theircreated_utxos
get thetransparent::Output
fromOutPoint
's transaction - (finalized) else if
OutPoint
is inutxos_by_outpoint
return the associatedtransparent::Output
. - else wait for
OutPoint
to be created as described in RFC0004
Drawbacks
-
Restarts can cause
zebrad
to redownload up to the last one hundred blocks it verified in the best chain, and potentially some recent side-chain blocks. -
The service interface puts some extra responsibility on callers to ensure it is used correctly and does not verify the usage is correct at compile time.
-
the service API is verbose and requires manually unwrapping enums
-
We do not handle reorgs the same way
zcashd
does, and could in theory need to delete our entire on disk state and resync the chain in some pathological reorg cases. -
testnet rollbacks are infrequent, but possible, due to bugs in testnet releases. Each testnet rollback will require additional state service code.
Diagrams
┌───────────┐ ┌───────────┐ ┌───────────┐ ┌───────────┐
│PeerServer │ │PeerServer │ │PeerServer │ │PeerServer │
│ ┌───────┐ │ │ ┌───────┐ │ │ ┌───────┐ │ │ ┌───────┐ │
│ │┌─────┐│ │ │ │┌─────┐│ │ │ │┌─────┐│ │ │ │┌─────┐│ │
│ ││ Tcp ││ │ │ ││ Tcp ││ │ │ ││ Tcp ││ │ │ ││ Tcp ││ │
│ │└─────┘│ │ │ │└─────┘│ │ │ │└─────┘│ │ │ │└─────┘│ │
│ │Framed │ │ │ │Framed │ │ │ │Framed │ │ │ │Framed │ │
│ │Stream │ │ │ │Stream │ │ │ │Stream │ │ │ │Stream │ │
│ └───────┘─┼─┐ │ └───────┘─┼─┐ │ └───────┘─┼─┐ │ └───────┘─┼─┐
┏▶│ ┃ │ │ ┏▶│ ┃ │ │ ┏▶│ ┃ │ │ ┏▶│ ┃ │ │
┃ │ ┃ │ │ ┃ │ ┃ │ │ ┃ │ ┃ │ │ ┃ │ ┃ │ │
┃ │ ▼ │ │ ┃ │ ▼ │ │ ┃ │ ▼ │ │ ┃ │ ▼ │ │
┃ │ ┌───────┐ │ │ ┃ │ ┌───────┐ │ │ ┃ │ ┌───────┐ │ │ ┃ │ ┌───────┐ │ │
┃ │ │ Tower │ │ │ ┃ │ │ Tower │ │ │ ┃ │ │ Tower │ │ │ ┃ │ │ Tower │ │ │
┃ │ │Buffer │ │ │ ┃ │ │Buffer │ │ │ ┃ │ │Buffer │ │ │ ┃ │ │Buffer │ │ │
┃ │ └───────┘ │ │ ┃ │ └───────┘ │ │ ┃ │ └───────┘ │ │ ┃ │ └───────┘ │ │
┃ │ ┃ │ │ ┃ │ ┃ │ │ ┃ │ ┃ │ │ ┃ │ ┃ │ │
┃ └─────╋─────┘ │ ┃ └─────╋─────┘ │ ┃ └─────╋─────┘ │ ┃ └─────╋─────┘ │
┃ ┃ └─╋───────╋───────┴─╋───────╋───────┴─╋───────╋───────┴───────┐
┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ │
┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ │
┃ ┗━━━━━━━━━╋━━━━━━━┻━━━━━━━━━╋━━━━━━━┻━━━━━━━━━╋━━━━━━━┻━━━━━━━━━┓ │
┗━━━━━━━┓ ┗━━━━━━━┓ ┗━━━━━━━┓ ┗━━━━━━━┓ ┃ │
┌──────╋─────────────────╋─────────────────╋─────────────────╋──────┐ ┃ │
│ ┃ ┃ ┃ ┃ │ ┃ │
│┌───────────┐ ┌───────────┐ ┌───────────┐ ┌───────────┐│ ┃ │
││PeerClient │ │PeerClient │ │PeerClient │ │PeerClient ││ ┃ │
│└───────────┘ └───────────┘ └───────────┘ └───────────┘│ ┃ │
│ │ ┃ │
│┌──────┐ ┌──────────────┐ │ ┃ │
││ load │ │peer discovery│ PeerSet│ ┃ │
││signal│ ┏━▶│ receiver │ req: Request, rsp: Response│ ┃ │
│└──────┘ ┃ └──────────────┘ routes all outgoing requests│ ┃ │
│ ┃ ┃ adds peers via discovery│ ┃ │
└────╋──────╋───────────────────────────────────────────────────────┘ ┃ │
┃ ┃ ▲ ┃ │
┃ ┣━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓ ┃ ┃ │
┃ ┃ ┏━━━━━━━━━━━╋━━━━━━━━━━━━━╋━━━━━━━━━━━━━┫ ┃ │
▼ ┃ ┃ ┃ ┃ ┃ ┃ │
┌────────────────╋───┐┌────────────┐┌─────────────┐ ┃ ┃ │
│Crawler ┃ ││ Listener ││Initial Peers│ ┃ ┃ │
│ ┌──────┐││ ││ │ ┃ ┃ │
│ │Tower │││ ││ │ ┃ ┃ │
│ │Buffer│││listens for ││ connects on │ ┃ ┃ │
│ └──────┘││ incoming ││ launch to │ ┃ ┃ │
│uses peerset to ││connections,││ seed peers │ ┃ ┃ │
│crawl network, ││ sends ││specified in │ ┃ ┃ │
│maintains candidate ││ handshakes ││ config file │ ┃ ┃ │
│peer set, connects ││ to peer ││ to build │ ┃ ┃ │
│to new peers on load││ discovery ││initial peer │ ┃ ┃ │
│signal or timer ││ receiver ││ set │ ┃ ┃ │
└────────────────────┘└────────────┘└─────────────┘ ┃ ┃ │
│ zebra-network internals ┃ ┃ │
─ ─ ─ ─ ─ ─ ─│─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─┃─ ─ ─ ─ ─ ─ ╋ ─ ─ ┼
│ exposed api ┃ ┃ │
│ ┌────────────────────────┐ ┃ ┃ │
│ │Arc<Mutex<AddressBook>> │ ┃ ┃ │
│ │last-seen timestamps for│ ┃ ┃ │
└─────────────│ each peer, obtained by │◀─────╋────────────╋─────┘
│ hooking into incoming │ ┃ ┃
│ message streams │ ┃ ┃
└────────────────────────┘ ┃ ▼
┌────────────────┐┌───────────────┐
│Outbound Service││Inbound Service│
│ req: Request, ││ req: Request, │
│ rsp: Response ││ rsp: Response │
│ ││ │
│ Tower Buffer ││ routes all │
└────────────────┘│ incoming │
│requests, uses │
│ load-shed │
│ middleware to │
│ remove peers │
│ when internal │
│ services are │
│ overloaded │
└───────────────┘
zebra-checkpoints
zebra-checkpoints
uses a local zcashd
instance to generate a list of checkpoints for Zebra's checkpoint verifier.
Developers should run this tool every few months to add new checkpoints for the checkpoint_sync = true
mode.
(By default, Zebra syncs up to Sapling using checkpoints. These checkpoints don't need to be updated.)
For more information on how to run this program visit Zebra checkpoints document
API Reference
Zebra's API documentation is generated using Rustdoc:
doc.zebra.zfnd.org
renders documentation for the public API;doc-internal.zebra.zfnd.org
renders documentation for the internal API.