Zebra logotype


CI Docker CI OSes Continuous Delivery codecov Build docs License

Contents

About

Zebra is the Zcash Foundation's independent, consensus-compatible implementation of a Zcash node.

Zebra's network stack is interoperable with zcashd, and Zebra implements all the features required to reach Zcash network consensus, including the validation of all the consensus rules for the NU5 network upgrade. Here are some benefits of Zebra.

Zebra validates blocks and transactions, but needs extra software to generate them:

  • To generate transactions, run Zebra with lightwalletd.
  • To generate blocks, use a mining pool or miner with Zebra's mining JSON-RPCs. Currently Zebra can only send mining rewards to a single fixed address. To distribute rewards, use mining software that creates its own distribution transactions, a light wallet or the zcashd wallet.

Please join us on Discord if you'd like to find out more or get involved!

Getting Started

You can run Zebra using our Docker image or you can build it manually. Please see the System Requirements section in the Zebra book for system requirements.

Docker

This command will run our latest release, and sync it to the tip:

docker run zfnd/zebra:latest

For more information, read our Docker documentation.

Building Zebra

Building Zebra requires Rust, libclang, and a C++ compiler.

Zebra is tested with the latest stable Rust version. Earlier versions are not supported or tested. Any Zebra release can start depending on new features in the latest stable Rust.

Every few weeks, we release a new Zebra version.

Below are quick summaries for installing the dependencies on your machine.

General instructions for installing dependencies

  1. Install cargo and rustc.

  2. Install Zebra's build dependencies:

    • libclang is a library that might have different names depending on your package manager. Typical names are libclang, libclang-dev, llvm, or llvm-dev.
    • clang or another C++ compiler: g++ (all platforms) or Xcode (macOS).
    • protoc

[!NOTE] Zebra uses the --experimental_allow_proto3_optional flag with protoc during compilation. This flag was introduced in Protocol Buffers v3.12.0 released in May 16, 2020, so make sure you're not using a version of protoc older than 3.12.

Dependencies on Arch

sudo pacman -S rust clang protobuf

Note that the package clang includes libclang as well as the C++ compiler.

Once the dependencies are in place, you can build and install Zebra:

cargo install --locked zebrad

You can start Zebra by

zebrad start

See the Installing Zebra and Running Zebra sections in the book for more details.

Optional Configs & Features

Initializing Configuration File
zebrad generate -o ~/.config/zebrad.toml

The above command places the generated zebrad.toml config file in the default preferences directory of Linux. For other OSes default locations see here.

Configuring Progress Bars

Configure tracing.progress_bar in your zebrad.toml to show key metrics in the terminal using progress bars. When progress bars are active, Zebra automatically sends logs to a file.

There is a known issue where progress bar estimates become extremely large.

In future releases, the progress_bar = "summary" config will show a few key metrics, and the "detailed" config will show all available metrics. Please let us know which metrics are important to you!

Configuring Mining

Zebra can be configured for mining by passing a MINER_ADDRESS and port mapping to Docker. See the mining support docs for more details.

Custom Build Features

You can also build Zebra with additional Cargo features:

You can combine multiple features by listing them as parameters of the --features flag:

cargo install --features="<feature1> <feature2> ..." ...

Our full list of experimental and developer features is in the API documentation.

Some debugging and monitoring features are disabled in release builds to increase performance.

Known Issues

There are a few bugs in Zebra that we're still working on fixing:

Documentation

The Zcash Foundation maintains the following resources documenting Zebra:

User support

For bug reports please open a bug report ticket in the Zebra repository.

Alternatively by chat, Join the Zcash Foundation Discord Server and find the #zebra-support channel.

Security

Zebra has a responsible disclosure policy, which we encourage security researchers to follow.

License

Zebra is distributed under the terms of both the MIT license and the Apache License (Version 2.0).

See LICENSE-APACHE and LICENSE-MIT.

Some Zebra crates are distributed under the MIT license only, because some of their code was originally from MIT-licensed projects. See each crate's directory for details.

User Documentation

This section contains details on how to install, run, and instrument Zebra.

System Requirements

We recommend the following requirements for compiling and running zebrad:

  • 4 CPU cores
  • 16 GB RAM
  • 300 GB available disk space for building binaries and storing cached chain state
  • 100 Mbps network connection, with 300 GB of uploads and downloads per month

Zebra's tests can take over an hour, depending on your machine. Note that you might be able to build and run Zebra on slower systems — we haven't tested its exact limits yet.

Disk Requirements

Zebra uses around 300 GB for cached Mainnet data, and 10 GB for cached Testnet data. We expect disk usage to grow over time.

Zebra cleans up its database periodically, and also when you shut it down or restart it. Changes are committed using RocksDB database transactions. If you forcibly terminate Zebra, or it panics, any incomplete changes will be rolled back the next time it starts. So Zebra's state should always be valid, unless your OS or disk hardware is corrupting data.

Network Requirements and Ports

Zebra uses the following inbound and outbound TCP ports:

  • 8233 on Mainnet
  • 18233 on Testnet

If you configure Zebra with a specific listen_addr, it will advertise this address to other nodes for inbound connections. Outbound connections are required to sync, inbound connections are optional. Zebra also needs access to the Zcash DNS seeders, via the OS DNS resolver (usually port 53).

Zebra makes outbound connections to peers on any port. But zcashd prefers peers on the default ports, so that it can't be used for DDoS attacks on other networks.

Typical Mainnet Network Usage

  • Initial sync: 300 GB download. As already noted, we expect the initial download to grow.
  • Ongoing updates: 10 MB - 10 GB upload and download per day, depending on user-created transaction size and peer requests.

Zebra performs an initial sync every time its internal database version changes, so some version upgrades might require a full download of the whole chain.

Zebra needs some peers which have a round-trip latency of 2 seconds or less. If this is a problem for you, please open a ticket.

Platform Support

Support for different platforms are organized into three tiers, each with a different set of guarantees. For more information on the policies for platforms at each tier, see the Platform Tier Policy.

Platforms are identified by their Rust "target triple" which is a string composed by <machine>-<vendor>-<operating system>.

Tier 1

Tier 1 platforms can be thought of as "guaranteed to work". The Zebra project builds official binary releases for each tier 1 platform, and automated testing ensures that each tier 1 platform builds and passes tests after each change.

For the full requirements, see Tier 1 platform policy in the Platform Tier Policy.

platformosnotesrustartifacts
x86_64-unknown-linux-gnuDebian 1164-bitlatest stable releaseDocker

Tier 2

Tier 2 platforms can be thought of as "guaranteed to build". The Zebra project builds in CI for each tier 2 platform, and automated builds ensure that each tier 2 platform builds after each change. Not all automated tests are run so it's not guaranteed to produce a working build, and official builds are not available, but tier 2 platforms often work to quite a good degree and patches are always welcome!

For the full requirements, see Tier 2 platform policy in the Platform Tier Policy.

platformosnotesrustartifacts
x86_64-unknown-linux-gnuGitHub ubuntu-latest64-bitlatest stable releaseN/A
x86_64-unknown-linux-gnuGitHub ubuntu-latest64-bitlatest beta releaseN/A
x86_64-apple-darwinGitHub macos-latest64-bitlatest stable releaseN/A

Tier 3

Tier 3 platforms are those which the Zebra codebase has support for, but which the Zebra project does not build or test automatically, so they may or may not work. Official builds are not available.

For the full requirements, see Tier 3 platform policy in the Platform Tier Policy.

platformosnotesrustartifacts
aarch64-unknown-linux-gnuDebian 1164-bitlatest stable releaseN/A
aarch64-apple-darwinlatest macOS64-bit, Apple M1 or M2latest stable releaseN/A

Platform Tier Policy

Table of Contents

General

The Zcash Foundation provides three tiers of platform support, modeled after the Rust Target Tier Policy:

  • The Zcash Foundation provides no guarantees about tier 3 platforms; they may or may not build with the actual codebase.
  • Zebra's continuous integration checks that tier 2 platforms will always build, but they may or may not pass tests.
  • Zebra's continuous integration checks that tier 1 platforms will always build and pass tests.

Adding a new tier 3 platform imposes minimal requirements; but we focus primarily on avoiding disruption to ongoing Zebra development.

Tier 2 and tier 1 platforms place work on Zcash Foundation developers as a whole, to avoid breaking the platform. These tiers require commensurate and ongoing efforts from the maintainers of the platform, to demonstrate value and to minimize any disruptions to ongoing Zebra development.

This policy defines the requirements for accepting a proposed platform at a given level of support.

Each tier is based on all the requirements from the previous tier, unless overridden by a stronger requirement.

While these criteria attempt to document the policy, that policy still involves human judgment. Targets must fulfill the spirit of the requirements as well, as determined by the judgment of the Zebra team.

For a list of all supported platforms and their corresponding tiers ("tier 3", "tier 2", or "tier 1"), see platform support.

Note that a platform must have already received approval for the next lower tier, and spent a reasonable amount of time at that tier, before making a proposal for promotion to the next higher tier; this is true even if a platform meets the requirements for several tiers at once. This policy leaves the precise interpretation of "reasonable amount of time" up to the Zebra team.

The availability or tier of a platform in stable Zebra is not a hard stability guarantee about the future availability or tier of that platform. Higher-level platform tiers are an increasing commitment to the support of a platform, and we will take that commitment and potential disruptions into account when evaluating the potential demotion or removal of a platform that has been part of a stable release. The promotion or demotion of a platform will not generally affect existing stable releases, only current development and future releases.

In this policy, the words "must" and "must not" specify absolute requirements that a platform must meet to qualify for a tier. The words "should" and "should not" specify requirements that apply in almost all cases, but for which the Zebra team may grant an exception for good reason. The word "may" indicates something entirely optional, and does not indicate guidance or recommendations. This language is based on IETF RFC 2119.

Tier 3 platform policy

At this tier, the Zebra project provides no official support for a platform, so we place minimal requirements on the introduction of platforms.

  • A tier 3 platform must have a designated developer or developers (the "platform maintainers") on record to be CCed when issues arise regarding the platform. (The mechanism to track and CC such developers may evolve over time.)
  • Target names should not introduce undue confusion or ambiguity unless absolutely necessary to maintain ecosystem compatibility. For example, if the name of the platform makes people extremely likely to form incorrect beliefs about what it targets, the name should be changed or augmented to disambiguate it.
  • Tier 3 platforms must not impose burden on the authors of pull requests, or other developers in the community, to maintain the platform. In particular, do not post comments (automated or manual) on a PR that derail or suggest a block on the PR based on a tier 3 platform. Do not send automated messages or notifications (via any medium, including via @) to a PR author or others involved with a PR regarding a tier 3 platform, unless they have opted into such messages.
  • Patches adding or updating tier 3 platforms must not break any existing tier 2 or tier 1 platform, and must not knowingly break another tier 3 platform without approval of either the Zebra team of the other tier 3 platform.

If a tier 3 platform stops meeting these requirements, or the platform maintainers no longer have interest or time, or the platform shows no signs of activity and has not built for some time, or removing the platform would improve the quality of the Zebra codebase, we may post a PR to remove support for that platform. Any such PR will be CCed to the platform maintainers (and potentially other people who have previously worked on the platform), to check potential interest in improving the situation.

Tier 2 platform policy

At this tier, the Zebra project guarantees that a platform builds, and will reject patches that fail to build on a platform. Thus, we place requirements that ensure the platform will not block forward progress of the Zebra project.

A proposed new tier 2 platform must be reviewed and approved by Zebra team based on these requirements.

In addition, the devops team must approve the integration of the platform into Continuous Integration (CI), and the tier 2 CI-related requirements. This review and approval may take place in a PR adding the platform to CI, or simply by a devops team member reporting the outcome of a team discussion.

  • Tier 2 platforms must implement all the Zcash consensus rules. Other Zebra features and binaries may be disabled, on a case-by-case basis.
  • A tier 2 platform must have a designated team of developers (the "platform maintainers") available to consult on platform-specific build-breaking issues. This team must have at least 1 developer.
  • The platform must not place undue burden on Zebra developers not specifically concerned with that platform. Zebra developers are expected to not gratuitously break a tier 2 platform, but are not expected to become experts in every tier 2 platform, and are not expected to provide platform-specific implementations for every tier 2 platform.
  • The platform must provide documentation for the Zcash community explaining how to build for their platform, and explaining how to run tests for the platform. If at all possible, this documentation should show how to run Zebra programs and tests for the platform using emulation, to allow anyone to do so. If the platform cannot be feasibly emulated, the documentation should document the required physical hardware or cloud systems.
  • The platform must document its baseline expectations for the features or versions of CPUs, operating systems, and any other dependencies.
  • The platform must build reliably in CI, for all components that Zebra's CI considers mandatory.
    • Since a working Rust compiler is required to build Zebra, the platform must be a Rust tier 1 platform.
  • The Zebra team may additionally require that a subset of tests pass in CI. In particular, this requirement may apply if the tests in question provide substantial value via early detection of critical problems.
  • Building the platform in CI must not take substantially longer than the current slowest platform in CI, and should not substantially raise the maintenance burden of the CI infrastructure. This requirement is subjective, to be evaluated by the devops team, and will take the community importance of the platform into account.
  • Test failures on tier 2 platforms will be handled on a case-by-case basis. Depending on the severity of the failure, the Zebra team may decide to:
    • disable the test on that platform,
    • require a fix before the next release, or
    • remove the platform from tier 2.
  • The platform maintainers should regularly run the testsuite for the platform, and should fix any test failures in a reasonably timely fashion.
  • All requirements for tier 3 apply.

A tier 2 platform may be demoted or removed if it no longer meets these requirements. Any proposal for demotion or removal will be CCed to the platform maintainers, and will be communicated widely to the Zcash community before being dropped from a stable release. (The amount of time between such communication and the next stable release may depend on the nature and severity of the failed requirement, the timing of its discovery, whether the platform has been part of a stable release yet, and whether the demotion or removal can be a planned and scheduled action.)

Tier 1 platform policy

At this tier, the Zebra project guarantees that a platform builds and passes all tests, and will reject patches that fail to build or pass the testsuite on a platform. We hold tier 1 platforms to our highest standard of requirements.

A proposed new tier 1 platform must be reviewed and approved by the Zebra team based on these requirements. In addition, the release team must approve the viability and value of supporting the platform.

In addition, the devops team must approve the integration of the platform into Continuous Integration (CI), and the tier 1 CI-related requirements. This review and approval may take place in a PR adding the platform to CI, by a devops team member reporting the outcome of a team discussion.

  • Tier 1 platforms must implement Zebra's standard production feature set, including the network protocol, mempool, cached state, and RPCs. Exceptions may be made on a case-by-case basis.
    • Zebra must have reasonable security, performance, and robustness on that platform. These requirements are subjective, and determined by consensus of the Zebra team.
    • Internal developer tools and manual testing tools may be disabled for that platform.
  • The platform must serve the ongoing needs of multiple production users of Zebra across multiple organizations or projects. These requirements are subjective, and determined by consensus of the Zebra team. A tier 1 platform may be demoted or removed if it becomes obsolete or no longer meets this requirement.
  • The platform must build and pass tests reliably in CI, for all components that Zebra's CI considers mandatory.
    • Test failures on tier 1 platforms will be handled on a case-by-case basis. Depending on the severity of the failure, the Zebra team may decide to:
      • disable the test on that platform,
      • require a fix before the next release,
      • require a fix before any other PRs merge, or
      • remove the platform from tier 1.
    • The platform must not disable an excessive number of tests or pieces of tests in the testsuite in order to do so. This is a subjective requirement.
  • Building the platform and running the testsuite for the platform must not take substantially longer than other platforms, and should not substantially raise the maintenance burden of the CI infrastructure.
    • In particular, if building the platform takes a reasonable amount of time, but the platform cannot run the testsuite in a timely fashion due to low performance, that alone may prevent the platform from qualifying as tier 1.
  • If running the testsuite requires additional infrastructure (such as physical systems running the platform), the platform maintainers must arrange to provide such resources to the Zebra project, to the satisfaction and approval of the Zebra devops team.
    • Such resources may be provided via cloud systems, via emulation, or via physical hardware.
    • If the platform requires the use of emulation to meet any of the tier requirements, the Zebra team must have high confidence in the accuracy of the emulation, such that discrepancies between emulation and native operation that affect test results will constitute a high-priority bug in either the emulation, the Rust implementation of the platform, or the Zebra implementation for the platform.
    • If it is not possible to run the platform via emulation, these resources must additionally be sufficient for the Zebra devops team to make them available for access by Zebra team members, for the purposes of development and testing. (Note that the responsibility for doing platform-specific development to keep the platform well maintained remains with the platform maintainers. This requirement ensures that it is possible for other Zebra developers to test the platform, but does not obligate other Zebra developers to make platform-specific fixes.)
    • Resources provided for CI and similar infrastructure must be available for continuous exclusive use by the Zebra project. Resources provided for access by Zebra team members for development and testing must be available on an exclusive basis when in use, but need not be available on a continuous basis when not in use.
  • All requirements for tier 2 apply.

A tier 1 platform may be demoted if it no longer meets these requirements but still meets the requirements for a lower tier. Any such proposal will be communicated widely to the Zcash community, both when initially proposed and before being dropped from a stable release. A tier 1 platform is highly unlikely to be directly removed without first being demoted to tier 2 or tier 3. (The amount of time between such communication and the next stable release may depend on the nature and severity of the failed requirement, the timing of its discovery, whether the platform has been part of a stable release yet, and whether the demotion or removal can be a planned and scheduled action.)

Raising the baseline expectations of a tier 1 platform (such as the minimum CPU features or OS version required) requires the approval of the Zebra team, and should be widely communicated as well.

Installing Zebra

Follow the Docker or compilation instructions.

Installing Dependencies

To compile Zebra from source, you will need to install some dependencies..

Alternative Compilation Methods

Compiling Manually from git

To compile Zebra directly from GitHub, or from a GitHub release source archive:

  1. Install the dependencies (see above)

  2. Get the source code using git or from a GitHub source package

git clone https://github.com/ZcashFoundation/zebra.git
cd zebra
git checkout v1.6.1
  1. Build and Run zebrad
cargo build --release --bin zebrad
target/release/zebrad start

Compiling from git using cargo install

cargo install --git https://github.com/ZcashFoundation/zebra --tag v1.6.1 zebrad

Compiling on ARM

If you're using an ARM machine, install the Rust compiler for ARM. If you build using the x86_64 tools, Zebra might run really slowly.

Build Troubleshooting

If you're having trouble with:

Compilers

  • clang: install both libclang and clang - they are usually different packages
  • libclang: check out the clang-sys documentation
  • g++ or MSVC++: try using clang or Xcode instead
  • rustc: use the latest stable rustc and cargo versions
    • Zebra does not have a minimum supported Rust version (MSRV) policy: any release can update the required Rust version.

Dependencies

  • use cargo install without --locked to build with the latest versions of each dependency

Experimental Shielded Scanning feature

  • install the rocksdb-tools or rocksdb packages to get the ldb binary, which allows expert users to query the scanner database. This binary is sometimes called rocksdb_ldb.

Optional Tor feature

  • sqlite linker errors: libsqlite3 is an optional dependency of the zebra-network/tor feature. If you don't have it installed, you might see errors like note: /usr/bin/ld: cannot find -lsqlite3. Follow the arti instructions to install libsqlite3, or use one of these commands instead:
cargo build
cargo build -p zebrad --all-features

Running Zebra

zebrad generate generates a default config. These defaults will be used if no config is present, so it's not necessary to generate a config. However, having a config file with the default fields is a useful starting point for changing the config.

The configuration format is the TOML encoding of the internal config structure, and documentation for all of the config options can be found here.

  • zebrad start starts a full node.

You can run Zebra as a:

Supported versions

Always run a supported version of Zebra, and upgrade it regularly, so it doesn't become unsupported and halt. More information.

Return Codes

  • 0: Application exited successfully
  • 1: Application exited unsuccessfully
  • 2: Application crashed
  • zebrad may also return platform-dependent codes.

Zebra with Docker

The easiest way to run Zebra is using Docker.

We've embraced Docker in Zebra for most of the solution lifecycle, from development environments to CI (in our pipelines), and deployment to end users.

[!TIP] We recommend using docker compose sub-command over the plain docker CLI, especially for more advanced use-cases like running CI locally, as it provides a more convenient and powerful way to manage multi-container Docker applications. See CI/CD Local Testing for more information, and other compose files available in the docker folder.

Quick usage

You can deploy Zebra for daily use with the images available in Docker Hub or build it locally for testing.

Ready to use image

Using docker compose:

docker compose -f docker/docker-compose.yml up

With plain docker CLI:

docker volume create zebrad-cache

docker run -d --platform linux/amd64 \
  --restart unless-stopped \
  --env-file .env \
  --mount type=volume,source=zebrad-cache,target=/var/cache/zebrad-cache \
  -p 8233:8233 \
  --memory 16G \
  --cpus 4 \
  zfnd/zebra

Build it locally

git clone --depth 1 --branch v1.6.1 https://github.com/ZcashFoundation/zebra.git
docker build --file docker/Dockerfile --target runtime --tag zebra:local .
docker run --detach zebra:local

Alternatives

See Building Zebra for more information.

Advanced usage

You're able to specify various parameters when building or launching the Docker image, which are meant to be used by developers and CI pipelines. For example, specifying the Network where Zebra will run (Mainnet, Testnet, etc), or enabling features like metrics with Prometheus.

For example, if we'd like to enable metrics on the image, we'd build it using the following build-arg:

[!IMPORTANT] To fully use and display the metrics, you'll need to run a Prometheus and Grafana server, and configure it to scrape and visualize the metrics endpoint. This is explained in more detailed in the Metrics section of the User Guide.

docker build -f ./docker/Dockerfile --target runtime --build-arg FEATURES='default-release-binaries prometheus' --tag local/zebra.mining:latest .

To increase the log output we can optionally add these build-args:

--build-arg RUST_BACKTRACE=full --build-arg RUST_LOG=debug --build-arg COLORBT_SHOW_HIDDEN=1

And after our image has been built, we can run it on Mainnet with the following command, which will expose the metrics endpoint on port 9999 and force the logs to be colored:

docker run --env LOG_COLOR="true" -p 9999:9999 local/zebra.mining

Based on our actual entrypoint.sh script, the following configuration file will be generated (on the fly, at startup) and used by Zebra:

[network]
network = "Mainnet"
listen_addr = "0.0.0.0"
[state]
cache_dir = "/var/cache/zebrad-cache"
[metrics]
endpoint_addr = "127.0.0.1:9999"

Running Zebra with Lightwalletd

To run Zebra with Lightwalletd, we recommend using the provided docker compose files for Zebra and Lightwalletd, which will start both services and connect them together, while exposing ports, mounting volumes, and setting environment variables.

docker compose -f docker/docker-compose.yml -f docker/docker-compose.lwd.yml up

CI/CD Local Testing

To run CI tests locally, which mimics the testing done in our CI pipelines on GitHub Actions, use the docker-compose.test.yml file. This setup allows for a consistent testing environment both locally and in CI.

Running Tests Locally

  1. Setting Environment Variables:

    • Modify the test.env file to set the desired test configurations.
    • For running all tests, set RUN_ALL_TESTS=1 in test.env.
  2. Starting the Test Environment:

    • Use Docker Compose to start the testing environment:

      docker-compose -f docker/docker-compose.test.yml up
      
    • This will start the Docker container and run the tests based on test.env settings.

  3. Viewing Test Output:

    • The test results and logs will be displayed in the terminal.
  4. Stopping the Environment:

    • Once testing is complete, stop the environment using:

      docker-compose -f docker/docker-compose.test.yml down
      

This approach ensures you can run the same tests locally that are run in CI, providing a robust way to validate changes before pushing to the repository.

Build and Run Time Configuration

Build Time Arguments

Configuration

  • FEATURES: Specifies the features to build zebrad with. Example: "default-release-binaries getblocktemplate-rpcs"
  • TEST_FEATURES: Specifies the features for tests. Example: "lightwalletd-grpc-tests zebra-checkpoints"

Logging

  • RUST_LOG: Sets the trace log level. Example: "debug"
  • RUST_BACKTRACE: Enables or disables backtraces. Example: "full"
  • RUST_LIB_BACKTRACE: Enables or disables library backtraces. Example: 1
  • COLORBT_SHOW_HIDDEN: Enables or disables showing hidden backtraces. Example: 1

Tests

  • TEST_FEATURES: Specifies the features for tests. Example: "lightwalletd-grpc-tests zebra-checkpoints"
  • ZEBRA_SKIP_IPV6_TESTS: Skips IPv6 tests. Example: 1
  • ENTRYPOINT_FEATURES: Overrides the specific features used to run tests in entrypoint.sh. Example: "default-release-binaries lightwalletd-grpc-tests"

CI/CD

  • SHORT_SHA: Represents the short SHA of the commit. Example: "a1b2c3d"

Run Time Variables

  • NETWORK: Specifies the network type. Example: "Mainnet"

Zebra Configuration

  • ZEBRA_CHECKPOINT_SYNC: Enables or disables checkpoint sync. Example: true
  • ZEBRA_LISTEN_ADDR: Address for Zebra to listen on. Example: "0.0.0.0"
  • ZEBRA_CACHED_STATE_DIR: Directory for cached state. Example: "/var/cache/zebrad-cache"

Mining Configuration

  • RPC_LISTEN_ADDR: Address for RPC to listen on. Example: "0.0.0.0"
  • RPC_PORT: Port for RPC. Example: 8232
  • MINER_ADDRESS: Address for the miner. Example: "t1XhG6pT9xRqRQn3BHP7heUou1RuYrbcrCc"

Other Configuration

  • METRICS_ENDPOINT_ADDR: Address for metrics endpoint. Example: "0.0.0.0"
  • METRICS_ENDPOINT_PORT: Port for metrics endpoint. Example: 9999
  • LOG_FILE: Path to the log file. Example: "/path/to/log/file.log"
  • LOG_COLOR: Enables or disables log color. Example: false
  • TRACING_ENDPOINT_ADDR: Address for tracing endpoint. Example: "0.0.0.0"
  • TRACING_ENDPOINT_PORT: Port for tracing endpoint. Example: 3000

Specific tests are defined in docker/test.env file and can be enabled by setting the corresponding environment variable to 1.

Registries

The images built by the Zebra team are all publicly hosted. Old image versions meant to be used by our CI pipeline (zebrad-test, lighwalletd) might be deleted on a scheduled basis.

We use Docker Hub for end-user images and Google Artifact Registry to build external tools and test images.

Tracing Zebra

Dynamic Tracing

Zebra supports dynamic tracing, configured using the config's TracingSection and an HTTP RPC endpoint.

Activate this feature using the filter-reload compile-time feature, and the filter and endpoint_addr runtime config options.

If the endpoint_addr is specified, zebrad will open an HTTP endpoint allowing dynamic runtime configuration of the tracing filter. For instance, if the config had endpoint_addr = '127.0.0.1:3000', then

  • curl -X GET localhost:3000/filter retrieves the current filter string;
  • curl -X POST localhost:3000/filter -d "zebrad=trace" sets the current filter string.

See the filter documentation for more details.

journald Logging

Zebra can send tracing spans and events to systemd-journald, on Linux distributions that use systemd.

Activate journald logging using the journald compile-time feature, and the use_journald runtime config option.

Flamegraphs

Zebra can generate flamegraphs of tracing spans.

Activate flamegraphs using the flamegraph compile-time feature, and the flamegraph runtime config option.

Sentry Production Monitoring

Compile Zebra with --features sentry to monitor it using Sentry in production.

Zebra Metrics

Zebra has support for Prometheus, configured using the prometheus compile-time feature, and the MetricsSection runtime configuration.

The following steps can be used to send real time Zebra metrics data into a grafana front end that you can visualize:

  1. Build zebra with prometheus feature:

    cargo install --features prometheus --locked --git https://github.com/ZcashFoundation/zebra zebrad
    
  2. Create a zebrad.toml file that we can edit:

    zebrad generate -o zebrad.toml
    
  3. Add endpoint_addr to the metrics section:

    [metrics]
    endpoint_addr = "127.0.0.1:9999"
    
  4. Run Zebra, and specify the path to the zebrad.toml file, for example:

    zebrad -c zebrad.toml start
    
  5. Install and run Prometheus and Grafana via Docker:

    # create a storage volume for grafana (once)
    sudo docker volume create grafana-storage
    # create a storage volume for prometheus (once)
    sudo docker volume create prometheus-storage
    
    # run prometheus with the included config
    sudo docker run --detach --network host --volume prometheus-storage:/prometheus --volume /path/to/zebra/prometheus.yaml:/etc/prometheus/prometheus.yml  prom/prometheus
    
    # run grafana
    sudo docker run --detach --network host --env GF_SERVER_HTTP_PORT=3030 --env GF_SERVER_HTTP_ADDR=localhost --volume grafana-storage:/var/lib/grafana grafana/grafana
    

    Now the grafana dashboard is available at http://localhost:3030 ; the default username and password is admin/admin. Prometheus scrapes Zebra on localhost:9999, and provides the results on localhost:9090.

  6. Configure Grafana with a Prometheus HTTP Data Source, using Zebra's metrics.endpoint_addr.

    In the grafana dashboard:

    1. Create a new Prometheus Data Source Prometheus-Zebra
    2. Enter the HTTP URL: 127.0.0.1:9090
    3. Save the configuration
  7. Now you can add the grafana dashboards from zebra/grafana (Create > Import > Upload JSON File), or create your own.

image info

Running lightwalletd with zebra

Zebra's RPC methods can support a lightwalletd service backed by zebrad. We recommend using zcash/lightwalletd because we use it in testing. Other lightwalletd forks have limited support, see the Sync lightwalletd section for more info.

[!NOTE] You can also use docker to run lightwalletd with zebra. Please see our docker documentation for more information.

Contents:

Configure zebra for lightwalletd

We need a zebra configuration file. First, we create a file with the default settings:

zebrad generate -o ~/.config/zebrad.toml

The above command places the generated zebrad.toml config file in the default preferences directory of Linux. For other OSes default locations see here.

Tweak the following option in order to prepare for lightwalletd setup.

JSON-RPC

We need to configure Zebra to behave as an RPC endpoint. The standard RPC port for Zebra is:

  • 8232 for Mainnet, and
  • 18323 for Testnet.

For example, to use Zebra as a lightwalletd backend on Mainnet, give it this ~/.config/zebrad.toml:

[rpc]
# listen for RPC queries on localhost
listen_addr = '127.0.0.1:8232'

# automatically use multiple CPU threads
parallel_cpu_threads = 0

WARNING: This config allows multiple Zebra instances to share the same RPC port. See the RPC config documentation for details.

Sync Zebra

With the configuration in place you can start synchronizing Zebra with the Zcash blockchain. This may take a while depending on your hardware.

zebrad start

Zebra will display information about sync process:

...
zebrad::commands::start: estimated progress to chain tip sync_percent=10.783 %
...

Until eventually it will get there:

...
zebrad::commands::start: finished initial sync to chain tip, using gossiped blocks sync_percent=100.000 %
...

You can interrupt the process at any time with ctrl-c and Zebra will resume the next time at around the block you were downloading when stopping the process.

When deploying for production infrastructure, the above command can be run as a service or daemon.

For implementing zebra as a service please see here.

Download and build lightwalletd

While you synchronize Zebra you can install lightwalletd.

Before installing, you need to have go in place. Please visit the go install page with download and installation instructions.

With go installed and in your path, download and install lightwalletd:

git clone https://github.com/zcash/lightwalletd
cd lightwalletd
make
make install

If everything went good you should have a lightwalletd binary in ~/go/bin/.

Sync lightwalletd

Please make sure you have zebrad running (with RPC endpoint and up to date blockchain) to synchronize lightwalletd.

  • lightwalletd requires a zcash.conf file, however this file can be empty if you are using the default Zebra rpc endpoint (127.0.0.1:8232) and the zcash/lightwalletd fork.

    • Some lightwalletd forks also require a rpcuser and rpcpassword, but Zebra ignores them if it receives them from lightwalletd
    • When using a non-default port, use rpcport=28232 and rpcbind=127.0.0.1
    • When using testnet, use testnet=1
  • For production setups lightwalletd requires a cert.pem. For more information on how to do this please see here.

  • lightwalletd can run without the certificate (with the --no-tls-very-insecure flag) however this is not recommended for production environments.

With the cert in ./ and an empty zcash.conf we can start the sync with:

lightwalletd --zcash-conf-path ~/.config/zcash.conf --data-dir ~/.cache/lightwalletd --log-file /dev/stdout

By default lightwalletd service will listen on 127.0.0.1:9067

Lightwalletd will do its own synchronization, while it is doing you will see messages as:

...
{"app":"lightwalletd","level":"info","msg":"Ingestor adding block to cache: 748000","time":"2022-05-28T19:25:49-03:00"}
{"app":"lightwalletd","level":"info","msg":"Ingestor adding block to cache: 749540","time":"2022-05-28T19:25:53-03:00"}
{"app":"lightwalletd","level":"info","msg":"Ingestor adding block to cache: 751074","time":"2022-05-28T19:25:57-03:00"}
...

Wait until lightwalletd is in sync before connecting any wallet into it. You will know when it is in sync as those messages will not be displayed anymore.

Run tests

The Zebra team created tests for the interaction of zebrad and lightwalletd.

To run all the Zebra lightwalletd tests:

  1. install lightwalletd
  2. install protoc
  3. build Zebra with --features=lightwalletd-grpc-tests

Please refer to acceptance tests documentation in the Lightwalletd tests section.

Connect a wallet to lightwalletd

The final goal is to connect wallets to the lightwalletd service backed by Zebra.

For demo purposes we used zecwallet-cli with the adityapk00/lightwalletd fork. We didn't test zecwallet-cli with zcash/lightwalletd yet.

Make sure both zebrad and lightwalletd are running and listening.

Download and build the cli-wallet

cargo install --locked --git https://github.com/adityapk00/zecwallet-light-cli

zecwallet-cli binary will be at ~/.cargo/bin/zecwallet-cli.

Run the wallet

$ zecwallet-cli --server 127.0.0.1:9067
Lightclient connecting to http://127.0.0.1:9067/
{
  "result": "success",
  "latest_block": 1683911,
  "total_blocks_synced": 49476
}
Ready!
(main) Block:1683911 (type 'help') >>

Zebra zk-SNARK Parameters

The privacy features provided by Zcash are backed by different zk-snarks proving systems which are basically cryptographic primitives that allow a prover to convince a verifier that a statement is true by revealing no more information than the proof itself.

One of these proving systems is Groth16 and it is the one used by the Zcash transactions version 4 and greater. More specifically, in the sapling spend/output descriptions circuits and in the sprout joinsplits descriptions circuit.

https://zips.z.cash/protocol/protocol.pdf#groth

The Groth16 proving system requires a trusted setup, this is a set of predefined parameters that every node should possess to verify the proofs that will show up in the blockchain.

These parameters are built into the zebrad binary. They are predefined keys that will allow verification of the circuits. They were initially obtained by this process.

3 parameters are needed, one for each circuit, this is part of the Zcash consensus protocol:

https://zips.z.cash/protocol/protocol.pdf#grothparameters

Zebra uses the bellman crate groth16 implementation for all groth16 types.

Each time a transaction has any sprout joinsplit, sapling spend or sapling output these loaded parameters will be used for the verification process. Zebra verifies in parallel and by batches, these parameters are used on each verification done.

The first time any parameters are used, Zebra automatically parses all of the parameters. This work is only done once.

Mining Zcash with zebra

Zebra's RPC methods support miners and mining pools.

Contents:

Download Zebra

The easiest way to run Zebra for mining is with our docker images.

If you have installed Zebra another way, follow the instructions below to start mining:

Configure zebra for mining

We need a configuration file. First, we create a file with the default settings:

mkdir -p ~/.config
zebrad generate -o ~/.config/zebrad.toml

The above command places the generated zebrad.toml config file in the default preferences directory of Linux. For other OSes default locations see here.

Tweak the following options in order to prepare for mining.

Miner address

Node miner address is required. At the moment zebra only allows p2pkh or p2sh transparent addresses.

[mining]
miner_address = 't3dvVE3SQEi7kqNzwrfNePxZ1d4hUyztBA1'

The above address is the ZF Mainnet funding stream address. It is used here purely as an example.

RPC section

This change is required for zebra to behave as an RPC endpoint. The standard port for RPC endpoint is 8232 on mainnet.

[rpc]
listen_addr = "127.0.0.1:8232"

Running zebra

If the configuration file is in the default directory, then zebra will just read from it. All we need to do is to start zebra as follows:

zebrad

You can specify the configuration file path with -c /path/to/config.file.

Wait until zebra is in sync, you will see the sync at 100% when this happens:

...
2023-02-21T18:41:09.088931Z  INFO {zebrad="4daedbc" net="Main"}: zebrad::components::sync::progress: finished initial sync to chain tip, using gossiped blocks sync_percent=100.000% current_height=Height(1992055) network_upgrade=Nu5 remaining_sync_blocks=1 time_since_last_state_block=0s
...

Testing the setup

The easiest way to check your setup is to call the getblocktemplate RPC method and check the result.

$ curl --silent --data-binary '{"jsonrpc": "1.0", "id":"curltest", "method": "getblocktemplate", "params": [] }' -H 'Content-type: application/json' http://127.0.0.1:8232/ | jq

If you can see something similar to the following then you are good to go.

Click to see demo command output
{
  "result": {
    "capabilities": [
      "proposal"
    ],
    "version": 4,
    "previousblockhash": "000000000173ae4123b7cb0fbed51aad913a736b846eaa9f23c3bb7f6c65b011",
    "blockcommitmentshash": "84ac267e51ce10e6e4685955e3a3b08d96a7f862d74b2d60f141c8e91f1af3a7",
    "lightclientroothash": "84ac267e51ce10e6e4685955e3a3b08d96a7f862d74b2d60f141c8e91f1af3a7",
    "finalsaplingroothash": "84ac267e51ce10e6e4685955e3a3b08d96a7f862d74b2d60f141c8e91f1af3a7",
    "defaultroots": {
      "merkleroot": "5e312942e7f024166f3cb9b52627c07872b6bfa95754ccc96c96ca59b2938d11",
      "chainhistoryroot": "97be47b0836d629f094409f5b979e011cbdb51d4a7e6f1450acc08373fe0901a",
      "authdataroot": "dc40ac2b3a4ae92e4aa0d42abeea6934ef91e6ab488772c0466d7051180a4e83",
      "blockcommitmentshash": "84ac267e51ce10e6e4685955e3a3b08d96a7f862d74b2d60f141c8e91f1af3a7"
    },
    "transactions": [
      {
        "data": "0400008085202f890120a8b2e646b5c5ee230a095a3a19ffea3c2aa389306b1ee3c31e9abd4ac92e08010000006b483045022100fb64eac188cb0b16534e0bd75eae7b74ed2bdde20102416f2e2c18638ec776dd02204772076abbc4f9baf19bd76e3cdf953a1218e98764f41ebc37b4994886881b160121022c3365fba47d7db8422d8b4a410cd860788152453f8ab75c9e90935a7a693535ffffffff015ca00602000000001976a914411d4bb3c17e67b5d48f1f6b7d55ee3883417f5288ac000000009d651e000000000000000000000000",
        "hash": "63c939ad16ef61a1d382a2149d826e3a9fe9a7dbb8274bfab109b8e70f469012",
        "authdigest": "ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff",
        "depends": [],
        "fee": 11300,
        "sigops": 1,
        "required": false
      },
      {
        "data": "0400008085202f890192e3403f2fb04614a7faaf66b5f59a78101fe3f721aee3291dea3afcc5a4080d000000006b483045022100b39702506ff89302dcde977e3b817c8bb674c4c408df5cd14b0cc3199c832be802205cbbfab3a14e80c9765af69d21cd2406cea4e8e55af1ff5b64ec00a6df1f5e6b01210207d2b6f6b3b500d567d5cf11bc307fbcb6d342869ec1736a8a3a0f6ed17f75f4ffffffff0147c717a8040000001976a9149f68dd83709ae1bc8bc91d7068f1d4d6418470b688ac00000000000000000000000000000000000000",
        "hash": "d5c6e9eb4c378c8304f045a43c8a07c1ac377ab6b4d7206e338eda38c0f196ba",
        "authdigest": "ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff",
        "depends": [],
        "fee": 185,
        "sigops": 1,
        "required": false
      },
      {
        "data": "0400008085202f8901dca2e357fe25c062e988a90f6e055bf72b631f286833bcdfcc140b47990e22cc040000006a47304402205423166bba80f5d46322a7ea250f2464edcd750aa8d904d715a77e5eaad4417c0220670c6112d7f6dc3873143bdf5c3652c49c3e306d4d478632ca66845b2bfae2a6012102bc7156d237dbfd2f779603e3953dbcbb3f89703d21c1f5a3df6f127aa9b10058feffffff23da1b0000000000001976a914227ea3051630d4a327bcbe3b8fcf02d17a2c8f9a88acc2010000000000001976a914d1264ed5acc40e020923b772b1b8fdafff2c465c88ac661c0000000000001976a914d8bae22d9e23bfa78d65d502fbbe32e56f349e5688ac02210000000000001976a91484a1d34e31feac43b3965beb6b6dedc55d134ac588ac92040000000000001976a91448d9083a5d92124e8c1b6a2d895874bb6a077d1d88ac78140000000000001976a91433bfa413cd714601a100e6ebc99c49a8aaec558888ac4c1d0000000000001976a91447aebb77822273df8c9bc377e18332ce2af707f488ac004c0000000000001976a914a095a81f6fb880c0372ad3ea74366adc1545490888ac16120000000000001976a914d1f2052f0018fb4a6814f5574e9bc1befbdfce9388acbfce0000000000001976a914aa052c0181e434e9bbd87566aeb414a23356116088ac5c2b0000000000001976a914741a131b859e83b802d0eb0f5d11c75132a643a488ac40240000000000001976a914c5e62f402fe5b13f31f5182299d3204c44fc2d5288ace10a0000000000001976a914b612ff1d9efdf5c45eb8e688764c5daaf482df0c88accc010000000000001976a9148692f64b0a1d7fc201d7c4b86f5a6703b80d7dfe88aca0190000000000001976a9144c998d1b661126fd82481131b2abdc7ca870edc088ac44020000000000001976a914bd60ea12bf960b3b27c9ea000a73e84bbe59591588ac00460000000000001976a914b0c711a99ff21f2090fa97d49a5403eaa3ad9e0988ac9a240000000000001976a9145a7c7d50a72355f07340678ca2cba5f2857d15e788ac2a210000000000001976a91424cb780ce81cc384b61c5cc5853585dc538eb9af88ac30430000000000001976a9148b9f78cb36e4126920675fe5420cbd17384db44288ac981c0000000000001976a9145d1c183b0bde829b5363e1007f4f6f1d29d3bb4a88aca0140000000000001976a9147f44beaacfb56ab561648a2ba818c33245b39dbb88acee020000000000001976a914c485f4edcefcf248e883ad1161959efc14900ddf88acc03a0000000000001976a91419bfbbd0b5f63590290e063e35285fd070a36b6a88ac98030000000000001976a9147a557b673a45a255ff21f3746846c28c1b1e53b988acdc230000000000001976a9146c1bf6a4e0a06d3498534cec7e3b976ab5c2dcbc88ac3187f364000000001976a914a1a906b35314449892f2e6d674912e536108e06188ace61e0000000000001976a914fcaafc8ae90ac9f5cbf139d626cfbd215064034888ace4020000000000001976a914bb1bfa7116a9fe806fb3ca30fa988ab8f98df94088ac88180000000000001976a9146a43a0a5ea2b421c9134930d037cdbcd86b9e84c88ac0a3c0000000000001976a91444874ae13b1fa73f900b451f4b69dbabb2b2f93788ac0a410000000000001976a914cd89fbd4f8683d97c201e34c8431918f6025c50d88ac76020000000000001976a91482035b454977ca675328c4c7de097807d5c842d688ac1c160000000000001976a9142c9a51e381b27268819543a075bbe71e80234a6b88ac70030000000000001976a914a8f48fd340da7fe1f8bb13ec5856c9d1f5f50c0388ac6c651e009f651e000000000000000000000000",
        "hash": "2e9296d48f036112541b39522b412c06057b2d55272933a5aff22e17aa1228cd",
        "authdigest": "ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff",
        "depends": [],
        "fee": 1367,
        "sigops": 35,
        "required": false
      }
    ],
    "coinbasetxn": {
      "data": "0400008085202f89010000000000000000000000000000000000000000000000000000000000000000ffffffff050378651e00ffffffff04b4e4e60e0000000017a9140579e6348f398c5e78611da902ca457885cda2398738c94d010000000017a9145d190948e5a6982893512c6d269ea14e96018f7e8740787d010000000017a914931fec54c1fea86e574462cc32013f5400b8912987286bee000000000017a914d45cb1adffb5215a42720532a076f02c7c778c90870000000078651e000000000000000000000000",
      "hash": "f77c29f032f4abe579faa891c8456602f848f423021db1f39578536742e8ff3e",
      "authdigest": "ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff",
      "depends": [],
      "fee": -12852,
      "sigops": 0,
      "required": true
    },
    "longpollid": "0001992055c6e3ad7916770099070000000004516b4994",
    "target": "0000000001a11f00000000000000000000000000000000000000000000000000",
    "mintime": 1677004508,
    "mutable": [
      "time",
      "transactions",
      "prevblock"
    ],
    "noncerange": "00000000ffffffff",
    "sigoplimit": 20000,
    "sizelimit": 2000000,
    "curtime": 1677004885,
    "bits": "1c01a11f",
    "height": 1992056,
    "maxtime": 1677009907
  },
  "id": "curltest"
}

Run a mining pool

Just point your mining pool software to the Zebra RPC endpoint (127.0.0.1:8232). Zebra supports the RPC methods needed to run most mining pool software.

If you want to run an experimental s-nomp mining pool with Zebra on testnet, please refer to this document for a very detailed guide. s-nomp is not compatible with NU5, so some mining functions are disabled.

If your mining pool software needs additional support, or if you as a miner need additional RPC methods, then please open a ticket in the Zebra repository.

How to mine with Zebra on testnet

Important

s-nomp has not been updated for NU5, so you'll need the fixes in the branches below.

These fixes disable mining pool operator payments and miner payments: they just pay to the address configured for the node.

Install, run, and sync Zebra

  1. Configure zebrad.toml:

    • change the network.network config to Testnet
    • add your testnet transparent address in mining.miner_address, or you can use the ZF testnet address t27eWDgjFYJGVXmzrXeVjnb5J3uXDM9xH9v
    • ensure that there is an rpc.listen_addr in the config to enable the RPC server

    Example config:

    [consensus]
    checkpoint_sync = true
    
    [mempool]
    eviction_memory_time = '1h'
    tx_cost_limit = 80000000
    
    [metrics]
    
    [network]
    crawl_new_peer_interval = '1m 1s'
    initial_mainnet_peers = [
        'dnsseed.z.cash:8233',
        'dnsseed.str4d.xyz:8233',
        'mainnet.seeder.zfnd.org:8233',
        'mainnet.is.yolo.money:8233',
    ]
    initial_testnet_peers = [
        'dnsseed.testnet.z.cash:18233',
        'testnet.seeder.zfnd.org:18233',
        'testnet.is.yolo.money:18233',
    ]
    listen_addr = '0.0.0.0:18233'
    network = 'Testnet'
    peerset_initial_target_size = 25
    
    [rpc]
    debug_force_finished_sync = false
    parallel_cpu_threads = 1
    listen_addr = '127.0.0.1:18232'
    
    [state]
    cache_dir = '/home/ar/.cache/zebra'
    delete_old_database = true
    ephemeral = false
    
    [sync]
    checkpoint_verify_concurrency_limit = 1000
    download_concurrency_limit = 50
    full_verify_concurrency_limit = 20
    parallel_cpu_threads = 0
    
    [tracing]
    buffer_limit = 128000
    force_use_color = false
    use_color = true
    use_journald = false
    
    [mining]
    miner_address = 't27eWDgjFYJGVXmzrXeVjnb5J3uXDM9xH9v'
    
  2. Run Zebra with the config you created:

    zebrad -c zebrad.toml
    
  3. Wait for Zebra to sync to the testnet tip. This takes 8-12 hours on testnet (or 2-3 days on mainnet) as of October 2023.

Install s-nomp

General instructions with Debian/Ubuntu examples

Install dependencies

  1. Install redis and run it on the default port: https://redis.io/docs/getting-started/

    sudo apt install lsb-release
    curl -fsSL https://packages.redis.io/gpg | sudo gpg --dearmor -o /usr/share/keyrings/redis-archive-keyring.gpg
    
    echo "deb [signed-by=/usr/share/keyrings/redis-archive-keyring.gpg] https://packages.redis.io/deb $(lsb_release -cs) main" | sudo tee /etc/apt/sources.list.d/redis.list
    
    sudo apt-get update
    sudo apt-get install redis
    redis-server
    
  2. Install and activate a node version manager (e.g. nodenv or nvm)

  3. Install boost and libsodium development libraries

    sudo apt install libboost-all-dev
    sudo apt install libsodium-dev
    

Install s-nomp

  1. git clone https://github.com/ZcashFoundation/s-nomp

  2. cd s-nomp

  3. Use the Zebra fixes: git checkout zebra-mining

  4. Use node 10:

    nodenv install 10
    nodenv local 10
    

    or

    nvm install 10
    nvm use 10
    
  5. Update dependencies and install:

    export CXXFLAGS="-std=gnu++17"
    npm update
    npm install
    
Arch-specific instructions

Install s-nomp

  1. Install Redis, and development libraries required by S-nomp

    sudo pacman -S redis boost libsodium
    
  2. Install nvm, Python 3.10 and virtualenv

    paru -S python310 nvm
    sudo pacman -S python-virtualenv
    
  3. Start Redis

    sudo systemctl start redis
    
  4. Clone the repository

    git clone https://github.com/ZcashFoundation/s-nomp && cd s-nomp
    
  5. Use Node 10:

    unset npm_config_prefix
    source /usr/share/nvm/init-nvm.sh
    nvm install 10
    nvm use 10
    
  6. Use Python 3.10

    virtualenv -p 3.10 s-nomp
    source s-nomp/bin/activate
    
  7. Update dependencies and install:

    npm update
    npm install
    

Run s-nomp

  1. Edit pool_configs/zcash.json so daemons[0].port is your Zebra port
  2. Run s-nomp using npm start

Note: the website will log an RPC error even when it is disabled in the config. This seems like a s-nomp bug.

Install a CPU or GPU miner

Install dependencies

General instructions
  1. Install a statically compiled boost and icu.
  2. Install cmake.
Arch-specific instructions
sudo pacman -S cmake boost icu

Install nheqminer

We're going to install nheqminer, which supports multiple CPU and GPU Equihash solvers, namely djezo, xenoncat, and tromp. We're using tromp on a CPU in the following instructions since it is the easiest to install and use.

  1. git clone https://github.com/ZcashFoundation/nheqminer
  2. cd nheqminer
  3. Use the Zebra fixes: git checkout zebra-mining
  4. Follow the build instructions at https://github.com/nicehash/nheqminer#general-instructions, or run:
mkdir build
cd build
# Turn off `djezo` and `xenoncat`, which are enabled by default, and turn on `tromp` instead.
cmake -DUSE_CUDA_DJEZO=OFF -DUSE_CPU_XENONCAT=OFF -DUSE_CPU_TROMP=ON ..
make -j $(nproc)

Run miner

  1. Follow the run instructions at: https://github.com/nicehash/nheqminer#run-instructions
# you can use your own testnet address here
# miner and pool payments are disabled, configure your address on your node to get paid
./nheqminer -l 127.0.0.1:1234 -u tmRGc4CD1UyUdbSJmTUzcB6oDqk4qUaHnnh.worker1 -t 1

Notes:

  • A typical solution rate is 2-4 Sols/s per core
  • nheqminer sometimes ignores Control-C, if that happens, you can quit it using:
    • killall nheqminer, or
    • Control-Z then kill %1
  • Running nheqminer with a single thread (-t 1) can help avoid this issue

Mining with Zebra in Docker

Zebra's Docker images can be used for your mining operations. If you don't have Docker, see the manual configuration instructions.

Using docker, you can start mining by running:

docker run -e MINER_ADDRESS="t3dvVE3SQEi7kqNzwrfNePxZ1d4hUyztBA1" -p 8232:8232 zfnd/zebra:latest

This command starts a container on Mainnet and binds port 8232 on your Docker host. If you want to start generating blocks, you need to let Zebra sync first.

Note that you must pass the address for your mining rewards via the MINER_ADDRESS environment variable when you are starting the container, as we did with the ZF funding stream address above. The address we used starts with the prefix t1, meaning it is a Mainnet P2PKH address. Please remember to set your own address for the rewards.

The port we mapped between the container and the host with the -p flag in the example above is Zebra's default Mainnet RPC port. If you want to use a different one, you can specify it in the RPC_PORT environment variable, similarly to MINER_ADDRESS, and then map it with the Docker's -p flag.

Instead of listing the environment variables on the command line, you can use Docker's --env-file flag to specify a file containing the variables. You can find more info here https://docs.docker.com/engine/reference/commandline/run/#env.

Mining on Testnet

If you want to mine on Testnet, you need to set the NETWORK environment variable to Testnet and use a Testnet address for the rewards. For example, running

docker run -e NETWORK="Testnet" -e MINER_ADDRESS="t27eWDgjFYJGVXmzrXeVjnb5J3uXDM9xH9v" -p 18232:18232 zfnd/zebra:latest

will start a container on Testnet and bind port 18232 on your Docker host, which is the standard Testnet RPC port. Notice that we also used a different rewards address. It starts with the prefix t2, indicating that it is a Testnet address. A Mainnet address would prevent Zebra from starting on Testnet, and conversely, a Testnet address would prevent Zebra from starting on Mainnet.

Zebra Shielded Scanning

This document describes Zebra's shielded scanning from users' perspective.

For now, we only support Sapling, and only store transaction IDs in the scanner results database. Ongoing development is tracked in issue #7728.

Important Security Warning

Zebra's shielded scanning feature has known security issues. It is for experimental use only.

Do not use regular or sensitive viewing keys with Zebra's experimental scanning feature. Do not use this feature on a shared machine. We suggest generating new keys for experimental use or publicly known keys.

Build & Install

Use Zebra 1.6.0 or greater, or the main branch to get the latest features, and enable the shielded-scan feature during the build. You can also use Rust's cargo to install the latest release:

cargo install --features shielded-scan --locked --git https://github.com/ZcashFoundation/zebra zebrad

Zebra binary will be at ~/.cargo/bin/zebrad, which should be in your PATH.

Configuration

Generate a configuration file with the default settings:

zebrad generate -o ~/.config/zebrad.toml

In the generated zebrad.toml file, use:

  • the [shielded_scan] table for database settings, and
  • the [shielded_scan.sapling_keys_to_scan] table for diversifiable full viewing keys.

Sapling diversifiable/extended full viewing keys strings start with zxviews as described in ZIP-32.

For example, to scan the block chain with the public ZECpages viewing key, use:

[shielded_scan.sapling_keys_to_scan]
"zxviews1q0duytgcqqqqpqre26wkl45gvwwwd706xw608hucmvfalr759ejwf7qshjf5r9aa7323zulvz6plhttp5mltqcgs9t039cx2d09mgq05ts63n8u35hyv6h9nc9ctqqtue2u7cer2mqegunuulq2luhq3ywjcz35yyljewa4mgkgjzyfwh6fr6jd0dzd44ghk0nxdv2hnv4j5nxfwv24rwdmgllhe0p8568sgqt9ckt02v2kxf5ahtql6s0ltjpkckw8gtymxtxuu9gcr0swvz" = 419200

Where the number 419200 is the birthday of the key:

  • birthday lower than the Sapling activation height defaults to Sapling activation height.
  • birthday greater or equal than Sapling activation height will start scanning at provided height, improving scanner speed.

Scanning the Block Chain

Simply run

zebrad

The scanning will start once Zebra syncs its state past the Sapling activation height. Scanning a synced state takes between 12 and 24 hours. The scanner looks for transactions containing Sapling notes with outputs decryptable by the provided viewing keys.

You should see log messages in the output every 10 000 blocks scanned, similar to:

2023-12-16T12:14:41.526740Z  INFO zebra_scan::storage::db: Last scanned height for key number 0 is 435000, resuming at 435001
2023-12-16T12:14:41.526745Z  INFO zebra_scan::storage::db: loaded Zebra scanner cache
...
2023-12-16T12:15:19.063796Z  INFO {zebrad="39830b0" net="Main"}: zebra_scan::scan: Scanning the blockchain for key 0, started at block 435001, now at block 440000, current tip 2330550
...

The Zebra scanner will resume the task if your Zebra instance went down for any reason. In a new start, Zebra will display:

Last scanned height for key number 0 is 1798000, resuming at 1798001

Displaying Scanning Results

An easy way to query the results is to use the Scanning Results Reader.

Querying Raw Scanning Results

A more advanced way to query results is to use ldb tool, requires a certain level of expertise.

Install ldb:

sudo apt install rocksdb-tools

Run ldb with the scanner database:

ldb --db="$HOME/.cache/zebra/private-scan/v1/mainnet" --secondary_path= --column_family=sapling_tx_ids --hex scan

Some of the output will be markers the scanner uses to keep track of progress, however, some of them will be transactions found.

To lean more about how to filter the database please refer to RocksDB Administration and Data Access Tool

Zebra Shielded Scanning gRPC Server

Get Started

Setup

After setting up Zebra Shielded Scanning, add a listen_addr field to the shielded-scan configuration:

[shielded_scan]
listen_addr = "127.0.0.1:8231"

Then, run zebrad to start the scan gRPC server.

Making requests to the server will also require a gRPC client, the examples here use grpcurl, though any gRPC client should work.

See installation instructions for grpcurl here.

The types can be accessed through the zebra-grpc crate's root scanner module for clients in a Rust environment, and the scanner.proto file here can be used to build types in other environments.

Usage

To check that the gRPC server is running, try calling scanner.Scanner/GetInfo, for example with grpcurl:

grpcurl -plaintext '127.0.0.1:8231' scanner.Scanner/GetInfo

The response should look like:

{
  "minSaplingBirthdayHeight": 419200
}

An example request to the Scan method with grpcurl would look like:

grpcurl -plaintext -d '{ "keys": { "key": ["sapling_extended_full_viewing_key"] } }' '127.0.0.1:8231' scanner.Scanner/Scan

This will start scanning for transactions in Zebra's state and in new blocks as they're validated.

Or, to use the scanner gRPC server without streaming, try calling RegisterKeys with your Sapling extended full viewing key, waiting for the scanner to cache some results, then calling GetResults:

grpcurl -plaintext -d '{ "keys": { "key": ["sapling_extended_full_viewing_key"] } }' '127.0.0.1:8231' scanner.Scanner/RegisterKeys
grpcurl -plaintext -d '{ "keys": ["sapling_extended_full_viewing_key"] }' '127.0.0.1:8231' scanner.Scanner/GetResults

gRPC Reflection

To see all of the provided methods with grpcurl, try:

grpcurl -plaintext '127.0.0.1:8231' list scanner.Scanner

This will list the paths to each method in the Scanner service:

scanner.Scanner.ClearResults
scanner.Scanner.DeleteKeys
scanner.Scanner.GetInfo
scanner.Scanner.GetResults
scanner.Scanner.RegisterKeys

To see the the request and response types for a method, for example the GetResults method, try:

grpcurl -plaintext '127.0.0.1:8231' describe scanner.Scanner.GetResults \
&& grpcurl -plaintext '127.0.0.1:8231' describe scanner.GetResultsRequest \
&& grpcurl -plaintext '127.0.0.1:8231' describe scanner.GetResultsResponse \
&& grpcurl -plaintext '127.0.0.1:8231' describe scanner.Results \
&& grpcurl -plaintext '127.0.0.1:8231' describe scanner.Transactions \
&& grpcurl -plaintext '127.0.0.1:8231' describe scanner.Transaction

The response should be the request and response types for the GetResults method:

scanner.Scanner.GetResults is a method:
// Get all data we have stored for the given keys.
rpc GetResults ( .scanner.GetResultsRequest ) returns ( .scanner.GetResultsResponse );
scanner.GetResultsRequest is a message:
// A request for getting results for a set of keys.
message GetResultsRequest {
  // Keys for which to get results.
  repeated string keys = 1;
}
scanner.GetResultsResponse is a message:
// A set of responses for each provided key of a GetResults call.
message GetResultsResponse {
  // Results for each key.
  map<string, .scanner.Results> results = 1;
}
scanner.Results is a message:
// A result for a single key.
message Results {
  // A height, transaction id map
  map<uint32, .scanner.Transactions> by_height = 1;
}
scanner.Transactions is a message:
// A vector of transaction hashes
message Transactions {
  // Transactions
  repeated Transaction transactions = 1;
}
scanner.Transaction is a message:
// Transaction data
message Transaction {
  // The transaction hash/id
  string hash = 1;
}

Methods


GetInfo

Returns basic information about the zebra-scan instance.

RegisterKeys

Starts scanning for a set of keys, with optional start heights, and caching the results. Cached results can later be retrieved by calling the GetResults or Scan methods.

DeleteKeys

Stops scanning transactions for a set of keys. Deletes the keys and their cached results for the keys from zebra-scan.

GetResults

Returns cached results for a set of keys.

ClearResults

Deletes any cached results for a set of keys.

Scan

Starts scanning for a set of keys and returns a stream of results.

Kibana blockchain explorer

The goal here is to export block data from Zebra into an elasticsearch database and visualize it with the kibana front end.

Attention: This is an experimental feature tested only in the Zcash Testnet.

Elasticsearch support was introduced to Zebra in pull request #6274.

Download, build and run Elasticsearch

Installing elasticsearch is easy in linux and macOS by following the .tar.gz installation guide.

Make sure you end up with an elasticsearch binary. Run it:

./bin/elasticsearch

The first time you run the database, elastic password and the enrollment token for Kibana will displayed in the screen (See here). Please save these as you will need them.

Elasticsearch will listen in https://localhost:9200 by default.

Download, build and run Kibana

Installing kibana is also easy in linux and macOS following the .tar.gz installation guide.

Make sure you end up with a kibana binary and run it:

./bin/kibana

The first time you run kibana, it will provide a link for configuration:

Kibana has not been configured.
    
Go to http://localhost:5601/?code=405316 to get started.

Visit the url to get started, you will need the kibana enrollment token from elasticsearch and the elastic password from the previous step.

Kibana will listen in https://localhost:5601 by default.

You are now ready to start bumping data into elasticsearch with Zebra.

Download and build zebra with elasticsearch feature

Elasticsearch is an optional and experimental feature, we need to build and install with the elasticsearch rust feature enabled using the following command:

cargo install --features elasticsearch --locked --git https://github.com/ZcashFoundation/zebra zebrad

Zebra binary will be at ~/.cargo/bin/zebrad.

Configure Zebra for elasticsearch

Generate a configuration file with the default settings:

zebrad generate -o ~/.config/zebrad.toml

The following changes are needed:

network section

Change the network field to Testnet. The Mainnet should work but it is untested. Also ,the preferred p2p port for the testnet is 18233, so optionally change the listen_addr field to 0.0.0.0:18233.

state section

Add your elastic password generated when running the database for the first time into the elasticsearch_password field.

Run Zebra

If the config is in the default path then just running the binary will start the sync.

zebrad

Sync will take time, but you can use kibana to make sure blocks are being inserted during the process.

Visualize your data

As soon as the first batch of data is inserted from Zebra into elasticsearch database, an index zcash_testnet will be created.

To observe data, go to Analytics → Discover and create a new data view with the zcash_testnet index. Make sure you select header.time field as Timestamp field.

To see the data, use the calendar to get records for the last 10 years, the first blocks that are inserted will be very old blocks following the chain genesis.

image info

After a while the chain will be in sync.

image info

You can now use all the kibana features. For example, creating dashboards for specific data visualization.

image info

Forking the Zcash Testnet with Zebra

The Zcash blockchain community consistently explores upgrades to the Zcash protocol, introducing new features to the consensus layer. This tutorial guides teams or individuals through forking the Zcash Testnet locally using Zebra, enabling testing of custom functionalities in a private testnet environment.

As of writing, the current network upgrade on the Zcash Testnet is Nu5. While a future upgrade (Nu6) activation height will be known later, for this tutorial, we aim to activate after Nu5, allowing us to observe our code crossing the network upgrade and continuing isolated.

To achieve this, we'll use Zebra as the node, s-nomp as the mining pool, and nheqminer as the Equihash miner.

Note: This tutorial aims to remain generally valid after Nu6, with adjustments to the network upgrade name and block heights.

Requirements

  • A modified Zebra version capable of syncing up to our chosen activation height, including the changes from the code changes step.
  • Mining tools:
    • s-nomp pool
    • nheqminer

You may have two Zebra versions: one for syncing up to the activation height and another (preferably built on top of the first one) with the network upgrade and additional functionality.

Note: For mining setup please see How to mine with Zebra on testnet

Sync the Testnet to a Block after Nu5 Activation

Select a height for the new network upgrade after Nu5. In the Zcash public testnet, Nu5 activation height is 1_842_420, and at the time of writing, the testnet was at around block 2_598_958. To avoid dealing with checkpoints, choose a block that is not only after Nu5 but also in the future. In this tutorial, we chose block 2_599_958, which is 1000 blocks ahead of the current testnet tip.

Clone Zebra, create a config file, and use state.debug_stop_at_height to halt the Zebra sync after reaching our chosen network upgrade block height (2_599_958):

The relevant parts of the config file are:

[network]
listen_addr = "0.0.0.0:18233"
network = "Testnet"

[state]
debug_stop_at_height = 2599958
cache_dir = "/home/user/.cache/zebra"

Generate a Zebra config file:

zebrad generate -o myconf.toml`

Start Zebra with the modified config:

zebrad -c myconf.toml start

Wait for the sync to complete (this may take up to 24 hours, depending on testnet conditions), resulting in a state up to the desired block in ~/cache/zebra.

Code changes

We need to add the network upgrade variant to the zcash_primitives crate and Zebra.

librustzcash / zcash_primitives

Add the new network upgrade variant and a branch id among some changes needed for the library to compile. Here are some examples:

After the changes, check that the library can be built with cargo build --release.

Zebra

Here we are making changes to create an isolated network version of Zebra. In addition to your own changes, this Zebra version needs to have the following:

  • Add a Nu6 variant to the NetworkUpgrade enum located in zebra-chain/src/parameters/network_upgrade.rs.

  • Add consensus branch id, a random non-repeated string. We used 00000006 in our tests when writing this tutorial.

  • Point to the modified zcash_primitives in zebra-chain/Cargo.toml. In my case, I had to replace the dependency line with something like:

    zcash_primitives = { git = "https://github.com/oxarbitrage/librustzcash", branch = "nu6-test", features = ["transparent-inputs"] }
    
  • Make fixes needed to compile.

  • Ignore how far we are from the tip in get block template: zebra-rpc/src/methods/get_block_template_rpcs/get_block_template.rs

Unclean test commit for Zebra: Zebra commit

Make sure you can build the zebrad binary after the changes with zebra build --release

Configuration for isolated network

Now that you have a synced state and a modified Zebra version, it's time to run your isolated network. Relevant parts of the configuration file:

Relevant parts of the configuration file:

[mempool]
debug_enable_at_height = 0
    
[mining]
debug_like_zcashd = true
miner_address = 't27eWDgjFYJGVXmzrXeVjnb5J3uXDM9xH9v'
    
[network]
cache_dir = false
initial_testnet_peers = [
  "dnsseed.testnet.z.cash:18233",
  "testnet.seeder.zfnd.org:18233",
  "testnet.is.yolo.money:18233",
]
listen_addr = "0.0.0.0:18233"
network = "Testnet"
    
[rpc]
listen_addr = "0.0.0.0:18232"
    
[state]
cache_dir = "/home/oxarbitrage/.cache/zebra"
  • debug_enable_at_height= 0 enables the mempool independently of the tip height.
  • The [mining] section is necessary for mining blocks, and the rpc endpoint rpc.listen_addr too.
  • initial_testnet_peers is needed as Zebra starts behind the fork block, approximately 100 blocks behind, so it needs to receive those blocks again. This is necessary until the new fork passes more than 100 blocks after the fork height. At that point, this network can be isolated, and initial_testnet_peers can be set to [].
  • Ensure your state.cache_dir is the same as when you saved state in step 1.

Start the chain with:

zebrad -c myconf.toml start

Start s-nomp:

npm start

Start the miner:

nheqminer -l 127.0.0.1:1234 -u tmRGc4CD1UyUdbSJmTUzcB6oDqk4qUaHnnh.worker1 -t 1

Confirm Forked Chain

After Zebra retrieves blocks up to your activation height from the network, the network upgrade will change, and no more valid blocks could be received from outside.

After a while, in s-nomp, you should see submitted blocks from time to time after the fork height.

...
2023-11-24 16:32:05 [Pool]        [zcash_testnet] (Thread 1) Block notification via RPC after block submission
2023-11-24 16:32:24 [Pool]        [zcash_testnet] (Thread 1) Submitted Block using submitblock successfully to daemon instance(s)
2023-11-24 16:32:24 [Pool]        [zcash_testnet] (Thread 1) Block found: 0049f2daaaf9e90cd8b17041de0a47350e6811c2d0c9b0aed9420e91351abe43 by tmRGc4CD1UyUdbSJmTUzcB6oDqk4qUaHnnh.worker1
2023-11-24 16:32:24 [Pool]        [zcash_testnet] (Thread 1) Block notification 
...

You'll also see this in Zebra:

...
2023-11-24T19:32:05.574715Z  INFO zebra_rpc::methods::get_block_template_rpcs: submit block accepted block_hash=block::Hash("0084e1df2369a1fd5f75ab2b8b24472c49812669c812c7d528b0f8f88a798578") block_height="2599968"
2023-11-24T19:32:24.661758Z  INFO zebra_rpc::methods::get_block_template_rpcs: submit block accepted block_hash=block::Hash("0049f2daaaf9e90cd8b17041de0a47350e6811c2d0c9b0aed9420e91351abe43") block_height="2599969"
...

Ignore messages in Zebra related to how far you are from the tip or network/system clock issues, etc.

Check that you are in the right branch with the curl command:

curl --silent --data-binary '{"jsonrpc": "1.0", "id":"curltest", "method": "getblockchaininfo", "params": [] }' -H 'Content-type: application/json' http://127.0.0.1:18232/ | jq

In the result, verify the tip of the chain is after your activation height for Nu6 and that you are in branch 00000006 as expected.

Final words

Next steps depend on your use case. You might want to submit transactions with new fields, accept those transactions as part of new blocks in the forked chain, or observe changes at activation without sending transactions. Further actions are not covered in this tutorial.

Zebra OpenAPI specification

The Zebra RPC methods are a collection of endpoints used for interacting with the Zcash blockchain. These methods are utilized by wallets, block explorers, web and mobile applications, and more, for retrieving and sending information to the blockchain.

While the Zebra source code and RPC methods are well-documented, accessing this information typically involves searching for each function within the Zebra crate documentation, which may be inconvenient for users who are not familiar with Rust development.

To address this issue, the Zebra team has created an OpenAPI specification in the YAML format.

The Zebra OpenAPI specification is stored in a file named openapi.yaml, located at the root of the project. The latest version of this specification will always be available here.

Usage

There are several ways to utilize the specification. For users unfamiliar with OpenAPI and Swagger, simply navigate to the Swagger Editor and paste the specification there.

image info

To send and receive data directly from/to the blockchain within the Swagger web app, you'll need a Zebra node with the RPC endpoint enabled.

To enable this functionality, start zebrad with a custom configuration. Generate a default configuration by running the following command:

mkdir -p ~/.config
zebrad generate -o ~/.config/zebrad.toml

Then, add the IP address and port to the rpc section of the configuration:

[rpc]
listen_addr = "127.0.0.1:8232"

If you modify the address and port in the Zebra configuration, ensure to update it in the openapi.yaml specification as well.

Start Zebra with the following command:

zebrad

You should now be able to send requests and receive responses within Swagger.

image info

image info

Troubleshooting

We continuously test that our builds and tests pass on the latest GitHub Runners for:

  • macOS,
  • Ubuntu,
  • Docker:
    • Debian Bookworm.

Memory Issues

  • If Zebra's build runs out of RAM, try setting export CARGO_BUILD_JOBS=2.
  • If Zebra's tests timeout or run out of RAM, try running cargo test -- --test-threads=2. Note that cargo uses all processor cores on your machine by default.

Network Issues

Some of Zebra's tests download Zcash blocks, so they might be unreliable depending on your network connection. You can set ZEBRA_SKIP_NETWORK_TESTS=1 to skip the network tests.

Issues with Tests on macOS

Some of Zebra's tests deliberately cause errors that make Zebra panic. macOS records these panics as crash reports. If you are seeing "Crash Reporter" dialogs during Zebra tests, you can disable them using this Terminal.app command:

defaults write com.apple.CrashReporter DialogType none

Improving Performance

Zebra usually syncs in around three days on Mainnet and half a day on Testnet. The sync speed depends on your network connection and the overall Zcash network load. The major constraint we've found on zebrad performance is the network weather, especially the ability to make good connections to other Zcash network peers. If you're having trouble syncing, try the following config changes.

Release Build

Make sure you're using a release build on your native architecture.

Syncer Lookahead Limit

If your connection is slow, try downloading fewer blocks at a time:

[sync]
lookahead_limit = 1000
max_concurrent_block_requests = 25

Peer Set Size

If your connection is slow, try connecting to fewer peers:

[network]
peerset_initial_target_size = 25

Turn off debug logging

Zebra logs at info level by default.

If Zebra is slow, make sure it is logging at info level:

[tracing]
filter = 'info'

Or restrict debug logging to a specific Zebra component:

[tracing]
filter = 'info,zebra_network=debug'

If you keep on seeing multiple info logs per second, please open a bug.

Developer Documentation

This section contains the contribution guide and design documentation. It does not contain:

Contributing

Running and Debugging

See the user documentation for details on how to build, run, and instrument Zebra.

Bug Reports

Please create an issue on the Zebra issue tracker.

Pull Requests

PRs are welcome for small and large changes, but please don't make large PRs without coordinating with us via the issue tracker or Discord. This helps increase development coordination and makes PRs easier to merge.

Check out the help wanted or good first issue labels if you're looking for a place to get started!

Zebra follows the conventional commits standard for the commits merged to main. Since PRs are squashed before merging to main, the PR titles should follow the conventional commits standard so that the merged commits are conformant.

Coverage Reports

Zebra's CI currently generates coverage reports for every PR with rust's new source based coverage feature. The coverage reports are generated by the coverage.yml file.

These reports are then saved as html and zipped up into a github action's artifact. These artifacts can be accessed on the checks tab of any PR, next to the "re-run jobs" button on the Coverage (+nightly) CI job's tab example.

To access a report download and extract the zip artifact then open the top level index.html.

Design Overview

This document sketches the design for Zebra.

Desiderata

The following are general desiderata for Zebra:

  • [George's list..]

  • As much as reasonably possible, it and its dependencies should be implemented in Rust. While it may not make sense to require this in every case (for instance, it probably doesn't make sense to rewrite libsecp256k1 in Rust, instead of using the same upstream library as Bitcoin), we should generally aim for it.

  • As much as reasonably possible, Zebra should minimize trust in required dependencies. Note that "minimize number of dependencies" is usually a proxy for this desideratum, but is not exactly the same: for instance, a collection of crates like the tokio crates are all developed together and have one trust boundary.

  • Zebra should be well-factored internally into a collection of component libraries which can be used by other applications to perform Zcash-related tasks. Implementation details of each component should not leak into all other components.

  • Zebra should checkpoint on Canopy activation and drop all Sprout-related functionality not required post-Canopy.

Non-Goals

  • Zebra keeps a copy of the chain state, so it isn't intended for lightweight applications like light wallets. Those applications should use a light client protocol.

Notable Blog Posts

Service Dependencies

Note: dotted lines are for "getblocktemplate-rpcs" feature

services transaction_verifier transaction_verifier state state transaction_verifier->state mempool mempool mempool->transaction_verifier mempool->state peer_set peer_set mempool->peer_set inbound inbound inbound->state inbound->mempool block_verifier_router block_verifier_router inbound->block_verifier_router rpc_server rpc_server rpc_server->state rpc_server->mempool rpc_server->block_verifier_router checkpoint_verifier checkpoint_verifier block_verifier_router->checkpoint_verifier block_verifier block_verifier block_verifier_router->block_verifier checkpoint_verifier->state syncer syncer syncer->block_verifier_router syncer->peer_set block_verifier->transaction_verifier block_verifier->state

Architecture

Unlike zcashd, which originated as a Bitcoin Core fork and inherited its monolithic architecture, Zebra has a modular, library-first design, with the intent that each component can be independently reused outside of the zebrad full node. For instance, the zebra-network crate containing the network stack can also be used to implement anonymous transaction relay, network crawlers, or other functionality, without requiring a full node.

At a high level, the fullnode functionality required by zebrad is factored into several components:

  • zebra-chain, providing definitions of core data structures for Zcash, such as blocks, transactions, addresses, etc., and related functionality. It also contains the implementation of the consensus-critical serialization formats used in Zcash. The data structures in zebra-chain are defined to enforce structural validity by making invalid states unrepresentable. For instance, the Transaction enum has variants for each transaction version, and it's impossible to construct a transaction with, e.g., spend or output descriptions but no binding signature, or, e.g., a version 2 (Sprout) transaction with Sapling proofs. Currently, zebra-chain is oriented towards verifying transactions, but will be extended to support creating them in the future.

  • zebra-network, providing an asynchronous, multithreaded implementation of the Zcash network protocol inherited from Bitcoin. In contrast to zcashd, each peer connection has a separate state machine, and the crate translates the external network protocol into a stateless, request/response-oriented protocol for internal use. The crate provides two interfaces:

    • an auto-managed connection pool that load-balances local node requests over available peers, and sends peer requests to a local inbound service, and
    • a connect_isolated method that produces a peer connection completely isolated from all other node state. This can be used, for instance, to safely relay data over Tor, without revealing distinguishing information.
  • zebra-script provides script validation. Currently, this is implemented by linking to the C++ script verification code from zcashd, but in the future we may implement a pure-Rust script implementation.

  • zebra-consensus performs semantic validation of blocks and transactions: all consensus rules that can be checked independently of the chain state, such as verification of signatures, proofs, and scripts. Internally, the library uses tower-batch-control to perform automatic, transparent batch processing of contemporaneous verification requests.

  • zebra-state is responsible for storing, updating, and querying the chain state. The state service is responsible for contextual verification: all consensus rules that check whether a new block is a valid extension of an existing chain, such as updating the nullifier set or checking that transaction inputs remain unspent.

  • zebrad contains the full node, which connects these components together and implements logic to handle inbound requests from peers and the chain sync process.

All of these components can be reused as independent libraries, and all communication between stateful components is handled internally by internal asynchronous RPC abstraction ("microservices in one process").

zebra-chain

Internal Dependencies

None: these are the core data structure definitions.

Responsible for

  • definitions of commonly used data structures, e.g.,

    • Block,
    • Transaction,
    • Address,
    • KeyPair...
  • parsing bytes into these data structures

  • definitions of core traits, e.g.,

    • ZcashSerialize and ZcashDeserialize, which perform consensus-critical serialization logic.

Exported types

  • [...]

zebra-network

Internal Dependencies

  • zebra-chain

Responsible for

  • definition of a well structured, internal request/response protocol
  • provides an abstraction for "this node" and "the network" using the internal protocol
  • dynamic, backpressure-driven peer set management
  • per-peer state machine that translates the internal protocol to the Bitcoin/Zcash protocol
  • tokio codec for Bitcoin/Zcash message encoding.

Exported types

  • Request, an enum representing all possible requests in the internal protocol;
  • Response, an enum representing all possible responses in the internal protocol;
  • AddressBook, a data structure for storing peer addresses;
  • Config, a configuration object for all networking-related parameters;
  • init<S: Service>(Config, S) -> (impl Service, Arc<Mutex<AddressBook>>), the main entry-point.

The init entrypoint constructs a dynamically-sized pool of peers sending inbound requests to the provided S: tower::Service representing "this node", and returns a Service that can be used to send requests to "the network", together with an AddressBook updated with liveness information from the peer pool. The AddressBook can be used to respond to inbound requests for peers.

All peerset management (finding new peers, creating new outbound connections, etc) is completely encapsulated, as is responsibility for routing outbound requests to appropriate peers.

zebra-state

Internal Dependencies

  • zebra-chain for data structure definitions.

Responsible for

  • block storage API
    • operates on parsed block structs
      • these structs can be converted from and into raw bytes
    • primarily aimed at network replication, not at processing
    • can be used to rebuild the database below
  • maintaining a database of tx, address, etc data
    • this database can be blown away and rebuilt from the blocks, which are otherwise unused.
    • threadsafe, typed lookup API that completely encapsulates the database logic
    • handles stuff like "transactions are reference counted by outputs" etc.
  • providing tower::Service interfaces for all of the above to support backpressure.

Exported types

  • Request, an enum representing all possible requests in the internal protocol;
    • blocks can be accessed via their chain height or hash
    • confirmed transactions can be accessed via their block, or directly via their hash
  • Response, an enum representing all possible responses in the internal protocol;
  • init() -> impl Service, the main entry-point.

The init entrypoint returns a Service that can be used to send requests for the chain state.

All state management (adding blocks, getting blocks by index or hash) is completely encapsulated.

zebra-script

Internal Dependencies

  • ??? depends on how it's implemented internally

Responsible for

  • the minimal Bitcoin script implementation required for Zcash
  • script parsing
  • context-free script validation

Notes

This can wrap an existing script implementation at the beginning.

If this existed in a "good" way, we could use it to implement tooling for Zcash script inspection, debugging, etc.

Questions

  • How does this interact with NU4 script changes?

Exported types

  • [...]

zebra-consensus

Internal Dependencies

  • zebra-chain for data structures and parsing.
  • zebra-state to read and update the state database.
  • zebra-script for script parsing and validation.

Responsible for

  • consensus-specific parameters (network magics, genesis block, pow parameters, etc) that determine the network consensus
  • consensus logic to decide which block is the current block
  • block and transaction verification
    • context-free validation, e.g., signature, proof verification, etc.
    • context-dependent validation, e.g., determining whether a transaction is accepted in a particular chain state context.
    • verifying mempool (unconfirmed) transactions
  • block checkpoints
    • mandatory checkpoints (genesis block, canopy activation)
    • optional regular checkpoints (every Nth block)
  • modifying the chain state
    • adding new blocks to ZebraState, including chain reorganisation
    • adding new transactions to ZebraMempoolState
  • storing the transaction mempool state
    • mempool transactions can be accessed via their hash
  • providing tower::Service interfaces for all of the above to support backpressure and batch validation.

Exported types

  • block::init() -> impl Service, the main entry-point for block verification.
  • ZebraMempoolState
    • all state management (adding transactions, getting transactions by hash) is completely encapsulated.
  • mempool::init() -> impl Service, the main entry-point for mempool transaction verification.

The init entrypoints return Services that can be used to verify blocks or transactions, and add them to the relevant state.

zebra-rpc

Internal Dependencies

  • zebra-chain for data structure definitions
  • zebra-node-services for shared request type definitions
  • zebra-utils for developer and power user tools

Responsible for

  • rpc interface

Exported types

  • [...]

zebra-client

Internal Dependencies

  • zebra-chain for structure definitions
  • zebra-state for transaction queries and client/wallet state storage
  • zebra-script possibly? for constructing transactions

Responsible for

  • implementation of some event a user might trigger
  • would be used to implement a full wallet
  • create transactions, monitors shielded wallet state, etc.

Notes

Communication between the client code and the rest of the node should be done by a tower service interface. Since the Service trait can abstract from a function call to RPC, this means that it will be possible for us to isolate all client code to a subprocess.

Exported types

  • [...]

zebrad

Abscissa-based application which loads configs, all application components, and connects them to each other.

Responsible for

  • actually running the server
  • connecting functionality in dependencies

Internal Dependencies

  • zebra-chain
  • zebra-network
  • zebra-state
  • zebra-consensus
  • zebra-client
  • zebra-rpc

Unassigned functionality

Responsibility for this functionality needs to be assigned to one of the modules above (subject to discussion):

  • [ ... add to this list ... ]

Diagrams

  ┌───────────┐     ┌───────────┐     ┌───────────┐     ┌───────────┐
  │PeerServer │     │PeerServer │     │PeerServer │     │PeerServer │
  │ ┌───────┐ │     │ ┌───────┐ │     │ ┌───────┐ │     │ ┌───────┐ │
  │ │┌─────┐│ │     │ │┌─────┐│ │     │ │┌─────┐│ │     │ │┌─────┐│ │
  │ ││ Tcp ││ │     │ ││ Tcp ││ │     │ ││ Tcp ││ │     │ ││ Tcp ││ │
  │ │└─────┘│ │     │ │└─────┘│ │     │ │└─────┘│ │     │ │└─────┘│ │
  │ │Framed │ │     │ │Framed │ │     │ │Framed │ │     │ │Framed │ │
  │ │Stream │ │     │ │Stream │ │     │ │Stream │ │     │ │Stream │ │
  │ └───────┘─┼─┐   │ └───────┘─┼─┐   │ └───────┘─┼─┐   │ └───────┘─┼─┐
┏▶│     ┃     │ │ ┏▶│     ┃     │ │ ┏▶│     ┃     │ │ ┏▶│     ┃     │ │
┃ │     ┃     │ │ ┃ │     ┃     │ │ ┃ │     ┃     │ │ ┃ │     ┃     │ │
┃ │     ▼     │ │ ┃ │     ▼     │ │ ┃ │     ▼     │ │ ┃ │     ▼     │ │
┃ │ ┌───────┐ │ │ ┃ │ ┌───────┐ │ │ ┃ │ ┌───────┐ │ │ ┃ │ ┌───────┐ │ │
┃ │ │ Tower │ │ │ ┃ │ │ Tower │ │ │ ┃ │ │ Tower │ │ │ ┃ │ │ Tower │ │ │
┃ │ │Buffer │ │ │ ┃ │ │Buffer │ │ │ ┃ │ │Buffer │ │ │ ┃ │ │Buffer │ │ │
┃ │ └───────┘ │ │ ┃ │ └───────┘ │ │ ┃ │ └───────┘ │ │ ┃ │ └───────┘ │ │
┃ │     ┃     │ │ ┃ │     ┃     │ │ ┃ │     ┃     │ │ ┃ │     ┃     │ │
┃ └─────╋─────┘ │ ┃ └─────╋─────┘ │ ┃ └─────╋─────┘ │ ┃ └─────╋─────┘ │
┃       ┃       └─╋───────╋───────┴─╋───────╋───────┴─╋───────╋───────┴───────┐
┃       ┃         ┃       ┃         ┃       ┃         ┃       ┃               │
┃       ┃         ┃       ┃         ┃       ┃         ┃       ┃               │
┃       ┗━━━━━━━━━╋━━━━━━━┻━━━━━━━━━╋━━━━━━━┻━━━━━━━━━╋━━━━━━━┻━━━━━━━━━┓     │
┗━━━━━━━┓         ┗━━━━━━━┓         ┗━━━━━━━┓         ┗━━━━━━━┓         ┃     │
 ┌──────╋─────────────────╋─────────────────╋─────────────────╋──────┐  ┃     │
 │      ┃                 ┃                 ┃                 ┃      │  ┃     │
 │┌───────────┐     ┌───────────┐     ┌───────────┐     ┌───────────┐│  ┃     │
 ││PeerClient │     │PeerClient │     │PeerClient │     │PeerClient ││  ┃     │
 │└───────────┘     └───────────┘     └───────────┘     └───────────┘│  ┃     │
 │                                                                   │  ┃     │
 │┌──────┐      ┌──────────────┐                                     │  ┃     │
 ││ load │      │peer discovery│                              PeerSet│  ┃     │
 ││signal│   ┏━▶│   receiver   │          req: Request, rsp: Response│  ┃     │
 │└──────┘   ┃  └──────────────┘         routes all outgoing requests│  ┃     │
 │    ┃      ┃                               adds peers via discovery│  ┃     │
 └────╋──────╋───────────────────────────────────────────────────────┘  ┃     │
      ┃      ┃                                             ▲            ┃     │
      ┃      ┣━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓             ┃            ┃     │
      ┃      ┃     ┏━━━━━━━━━━━╋━━━━━━━━━━━━━╋━━━━━━━━━━━━━┫            ┃     │
      ▼      ┃     ┃           ┃             ┃             ┃            ┃     │
  ┌────────────────╋───┐┌────────────┐┌─────────────┐      ┃            ┃     │
  │Crawler         ┃   ││  Listener  ││Initial Peers│      ┃            ┃     │
  │            ┌──────┐││            ││             │      ┃            ┃     │
  │            │Tower │││            ││             │      ┃            ┃     │
  │            │Buffer│││listens for ││ connects on │      ┃            ┃     │
  │            └──────┘││  incoming  ││  launch to  │      ┃            ┃     │
  │uses peerset to     ││connections,││ seed peers  │      ┃            ┃     │
  │crawl network,      ││   sends    ││specified in │      ┃            ┃     │
  │maintains candidate ││ handshakes ││ config file │      ┃            ┃     │
  │peer set, connects  ││  to peer   ││  to build   │      ┃            ┃     │
  │to new peers on load││ discovery  ││initial peer │      ┃            ┃     │
  │signal or timer     ││  receiver  ││     set     │      ┃            ┃     │
  └────────────────────┘└────────────┘└─────────────┘      ┃            ┃     │
             │        zebra-network internals              ┃            ┃     │
─ ─ ─ ─ ─ ─ ─│─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─┃─ ─ ─ ─ ─ ─ ╋ ─ ─ ┼
             │              exposed api                    ┃            ┃     │
             │             ┌────────────────────────┐      ┃            ┃     │
             │             │Arc<Mutex<AddressBook>> │      ┃            ┃     │
             │             │last-seen timestamps for│      ┃            ┃     │
             └─────────────│ each peer, obtained by │◀─────╋────────────╋─────┘
                           │ hooking into incoming  │      ┃            ┃
                           │    message streams     │      ┃            ┃
                           └────────────────────────┘      ┃            ▼
                                             ┌────────────────┐┌───────────────┐
                                             │Outbound Service││Inbound Service│
                                             │ req: Request,  ││ req: Request, │
                                             │ rsp: Response  ││ rsp: Response │
                                             │                ││               │
                                             │  Tower Buffer  ││  routes all   │
                                             └────────────────┘│   incoming    │
                                                               │requests, uses │
                                                               │   load-shed   │
                                                               │ middleware to │
                                                               │ remove peers  │
                                                               │ when internal │
                                                               │ services are  │
                                                               │  overloaded   │
                                                               └───────────────┘

Zebra Cached State Database Implementation

Adding a Column Family

Most Zebra column families are implemented using low-level methods that allow accesses using any type. But this is error-prone, because we can accidentally use different types to read and write them. (Or read using different types in different methods.)

If we type the column family name out every time, a typo can lead to a panic, because the column family doesn't exist.

Instead:

  • define the name and type of each column family at the top of the implementation module,
  • add a method on the database that returns that type, and
  • add the column family name to the list of column families in the database:

For example:

#![allow(unused)]
fn main() {
/// The name of the sapling transaction IDs result column family.
pub const SAPLING_TX_IDS: &str = "sapling_tx_ids";

/// The column families supported by the running `zebra-scan` database code.
pub const SCANNER_COLUMN_FAMILIES_IN_CODE: &[&str] = &[
    sapling::SAPLING_TX_IDS,
];

/// The type for reading sapling transaction IDs results from the database.
pub type SaplingTxIdsCf<'cf> =
    TypedColumnFamily<'cf, SaplingScannedDatabaseIndex, Option<SaplingScannedResult>>;

impl Storage {
    /// Returns a typed handle to the `sapling_tx_ids` column family.
    pub(crate) fn sapling_tx_ids_cf(&self) -> SaplingTxIdsCf {
        SaplingTxIdsCf::new(&self.db, SAPLING_TX_IDS)
            .expect("column family was created when database was created")
    }
}
}

Then, every read of the column family uses that method, which enforces the correct types: (These methods have the same name as the low-level methods, but are easier to call.)

#![allow(unused)]
fn main() {
impl Storage {
    /// Returns the result for a specific database index (key, block height, transaction index).
    pub fn sapling_result_for_index(
        &self,
        index: &SaplingScannedDatabaseIndex,
    ) -> Option<SaplingScannedResult> {
        self.sapling_tx_ids_cf().zs_get(index).flatten()
    }

    /// Returns the Sapling indexes and results in the supplied range.
    fn sapling_results_in_range(
        &self,
        range: impl RangeBounds<SaplingScannedDatabaseIndex>,
    ) -> BTreeMap<SaplingScannedDatabaseIndex, Option<SaplingScannedResult>> {
        self.sapling_tx_ids_cf().zs_items_in_range_ordered(range)
    }
}
}

This simplifies the implementation compared with the raw ReadDisk methods.

To write to the database, use the new_batch_for_writing() method on the column family type. This returns a batch that enforces the correct types. Use write_batch() to write it to the database:

#![allow(unused)]
fn main() {
impl Storage {
    /// Insert a sapling scanning `key`, and mark all heights before `birthday_height` so they
    /// won't be scanned.
    pub(crate) fn insert_sapling_key(
        &mut self,
        storage: &Storage,
        sapling_key: &SaplingScanningKey,
        birthday_height: Option<Height>,
    ) {
        ...
        self.sapling_tx_ids_cf()
            .new_batch_for_writing()
            .zs_insert(&index, &None)
            .write_batch()
            .expect("unexpected database write failure");
    }
}
}

To write to an existing batch in legacy code, use with_batch_for_writing() instead. This relies on the caller to write the batch to the database:

#![allow(unused)]
fn main() {
impl DiskWriteBatch {
    /// Updates the history tree for the tip, if it is not empty.
    ///
    /// The batch must be written to the database by the caller.
    pub fn update_history_tree(&mut self, db: &ZebraDb, tree: &HistoryTree) {
        let history_tree_cf = db.history_tree_cf().with_batch_for_writing(self);

        if let Some(tree) = tree.as_ref().as_ref() {
            // The batch is modified by this method and written by the caller.
            let _ = history_tree_cf.zs_insert(&(), tree);
        }
    }
}
}

To write to a legacy batch, then write it to the database, you can use take_batch_for_writing(batch).write_batch().

During database upgrades, you might need to access the same column family using different types. Define a type and convenience method for each legacy type, and use them during the upgrade.

Some full examples of legacy code conversions, and the typed column family implementation itself are in PR #8112 and PR #8115.

Current Implementation

Verification Modes

Zebra's state has two verification modes:

  • block hash checkpoints, and
  • full verification.

This means that verification uses two different codepaths, and they must produce the same results.

By default, Zebra uses as many checkpoints as it can, because they are more secure against rollbacks (and some other kinds of chain attacks). Then it uses full verification for the last few thousand blocks.

When Zebra gets more checkpoints in each new release, it checks the previously verified cached state against those checkpoints. This checks that the two codepaths produce the same results.

Upgrading the State Database

For most state upgrades, we want to modify the database format of the existing database. If we change the major database version, every user needs to re-download and re-verify all the blocks, which can take days.

Writing Blocks to the State

Blocks can be written to the database via two different code paths, and both must produce the same results:

  • Upgrading a pre-existing database to the latest format
  • Writing newly-synced blocks in the latest format

This code is high risk, because discovering bugs is tricky, and fixing bugs can require a full reset and re-write of an entire column family.

Most Zebra instances will do an upgrade, because they already have a cached state, and upgrades are faster. But we run a full sync in CI every week, because new users use that codepath. (And it is their first experience of Zebra.)

When Zebra starts up and shuts down (and periodically in CI tests), we run checks on the state format. This makes sure that the two codepaths produce the same state on disk.

To reduce code and testing complexity:

  • when a previous Zebra version opens a newer state, the entire state is considered to have that lower version, and
  • when a newer Zebra version opens an older state, each required upgrade is run on the entire state.

In-Place Upgrade Goals

Here are the goals of in-place upgrades:

  • avoid a full download and rebuild of the state
  • Zebra must be able to upgrade the format from previous minor or patch versions of its disk format (Major disk format versions are breaking changes. They create a new empty state and re-sync the whole chain.)
    • this is checked the first time CI runs on a PR with a new state version. After the first CI run, the cached state is marked as upgraded, so the upgrade doesn't run again. If CI fails on the first run, any cached states with that version should be deleted.
  • the upgrade and full sync formats must be identical
    • this is partially checked by the state validity checks for each upgrade (see above)
  • previous zebra versions should be able to load the new format
    • this is checked by other PRs running using the upgraded cached state, but only if a Rust PR runs after the new PR's CI finishes, but before it merges
  • best-effort loading of older supported states by newer Zebra versions
  • best-effort compatibility between newer states and older supported Zebra versions

Design Constraints

Upgrades run concurrently with state verification and RPC requests.

This means that:

  • the state must be able to read the old and new formats
    • it can't panic if the data is missing
    • it can't give incorrect results, because that can affect verification or wallets
    • it can return an error
    • it can only return an Option if the caller handles it correctly
  • full syncs and upgrades must write the same format
    • the same write method should be called from both the full sync and upgrade code, this helps prevent data inconsistencies
  • repeated upgrades must produce a valid state format
    • if Zebra is restarted, the format upgrade will run multiple times
    • if an older Zebra version opens the state, data can be written in an older format
  • the format must be valid before and after each database transaction or API call, because an upgrade can be cancelled at any time
    • multi-column family changes should made in database transactions
    • if you are building new column family:
      • disable state queries, then enable them once it's done, or
      • do the upgrade in an order that produces correct results (for example, some data is valid from genesis forward, and some from the tip backward)
    • if each database API call produces a valid format, transactions aren't needed

If there is an upgrade failure, panic and tell the user to delete their cached state and re-launch Zebra.

Performance Constraints

Some column family access patterns can lead to very poor performance.

Known performance issues include:

  • using an iterator on a column family which also deletes keys
  • creating large numbers of iterators
  • holding an iterator for a long time

See the performance notes and bug reports in:

  • https://github.com/facebook/rocksdb/wiki/Iterator#iterating-upper-bound-and-lower-bound
  • https://tracker.ceph.com/issues/55324
  • https://jira.mariadb.org/browse/MDEV-19670

But we need to use iterators for some operations, so our alternatives are (in preferred order):

  1. Minimise the number of keys we delete, and how often we delete them
  2. Avoid using iterators on column families where we delete keys
  3. If we must use iterators on those column families, set read bounds to minimise the amount of deleted data that is read

Currently only UTXOs require key deletion, and only utxo_loc_by_transparent_addr_loc requires deletion and iterators.

Required Tests

State upgrades are a high-risk change. They permanently modify the state format on production Zebra instances. Format issues are tricky to diagnose, and require extensive testing and a new release to fix. Deleting and rebuilding an entire column family can also be costly, taking minutes or hours the first time a cached state is upgraded to a new Zebra release.

Some format bugs can't be fixed, and require an entire rebuild of the state. For example, deleting or corrupting transactions or block headers.

So testing format upgrades is extremely important. Every state format upgrade should test:

  • new format serializations
  • new calculations or data processing
  • the upgrade produces a valid format
  • a full sync produces a valid format

Together, the tests should cover every code path. For example, the subtrees needed mid-block, end-of-block, sapling, and orchard tests. They mainly used the validity checks for coverage.

Each test should be followed by a restart, a sync of 200+ blocks, and another restart. This simulates typical user behaviour.

And ideally:

  • An upgrade from the earliest supported Zebra version (the CI sync-past-checkpoint tests do this on every PR)

Manually Triggering a Format Upgrade

Zebra stores the current state minor and patch versions in a version file in the database directory. This path varies based on the OS, major state version, network, and config.

For example, the default mainnet state version on Linux is at: ~/.cache/zebra/state/v25/mainnet/version

To upgrade a cached Zebra state from v25.0.0 to the latest disk format, delete the version file. To upgrade from a specific version v25.x.y, edit the file so it contains x.y.

Editing the file and running Zebra will trigger a re-upgrade over an existing state. Re-upgrades can hide format bugs. For example, if the old code was correct, and the new code skips blocks, the validity checks won't find that bug.

So it is better to test with a full sync, and an older cached state.

Current State Database Format

rocksdb provides a persistent, thread-safe BTreeMap<&[u8], &[u8]>. Each map is a distinct "tree". Keys are sorted using lexographic order ([u8].sorted()) on byte strings, so integer values should be stored using big-endian encoding (so that the lex order on byte strings is the numeric ordering).

Note that the lex order storage allows creating 1-to-many maps using keys only. For example, the tx_loc_by_transparent_addr_loc allows mapping each address to all transactions related to it, by simply storing each transaction prefixed with the address as the key, leaving the value empty. Since rocksdb allows listing all keys with a given prefix, it will allow listing all transactions related to a given address.

We use the following rocksdb column families:

Column FamilyKeysValuesChanges
Blocks
hash_by_heightblock::Heightblock::HashCreate
height_by_hashblock::Hashblock::HeightCreate
block_header_by_heightblock::Heightblock::HeaderCreate
Transactions
tx_by_locTransactionLocationTransactionCreate
hash_by_tx_locTransactionLocationtransaction::HashCreate
tx_loc_by_hashtransaction::HashTransactionLocationCreate
Transparent
balance_by_transparent_addrtransparent::AddressAmount || AddressLocationUpdate
tx_loc_by_transparent_addr_locAddressTransaction()Create
utxo_by_out_locOutputLocationtransparent::OutputDelete
utxo_loc_by_transparent_addr_locAddressUnspentOutput()Delete
Sprout
sprout_nullifierssprout::Nullifier()Create
sprout_anchorssprout::tree::Rootsprout::NoteCommitmentTreeCreate
sprout_note_commitment_tree()sprout::NoteCommitmentTreeUpdate
Sapling
sapling_nullifierssapling::Nullifier()Create
sapling_anchorssapling::tree::Root()Create
sapling_note_commitment_treeblock::Heightsapling::NoteCommitmentTreeCreate
sapling_note_commitment_subtreeblock::HeightNoteCommitmentSubtreeDataCreate
Orchard
orchard_nullifiersorchard::Nullifier()Create
orchard_anchorsorchard::tree::Root()Create
orchard_note_commitment_treeblock::Heightorchard::NoteCommitmentTreeCreate
orchard_note_commitment_subtreeblock::HeightNoteCommitmentSubtreeDataCreate
Chain
history_tree()NonEmptyHistoryTreeUpdate
tip_chain_value_pool()ValueBalanceUpdate

Data Formats

We use big-endian encoding for keys, to allow database index prefix searches.

Most Zcash protocol structures are encoded using ZcashSerialize/ZcashDeserialize. Other structures are encoded using custom IntoDisk/FromDisk implementations.

Block and Transaction Data:

  • Height: 24 bits, big-endian, unsigned (allows for ~30 years worth of blocks)
  • TransactionIndex: 16 bits, big-endian, unsigned (max ~23,000 transactions in the 2 MB block limit)
  • TransactionCount: same as TransactionIndex
  • TransactionLocation: Height \|\| TransactionIndex
  • OutputIndex: 24 bits, big-endian, unsigned (max ~223,000 transfers in the 2 MB block limit)
  • transparent and shielded input indexes, and shielded output indexes: 16 bits, big-endian, unsigned (max ~49,000 transfers in the 2 MB block limit)
  • OutputLocation: TransactionLocation \|\| OutputIndex
  • AddressLocation: the first OutputLocation used by a transparent::Address. Always has the same value for each address, even if the first output is spent.
  • Utxo: Output, derives extra fields from the OutputLocation key
  • AddressUnspentOutput: AddressLocation \|\| OutputLocation, used instead of a BTreeSet<OutputLocation> value, to improve database performance
  • AddressTransaction: AddressLocation \|\| TransactionLocation used instead of a BTreeSet<TransactionLocation> value, to improve database performance
  • NoteCommitmentSubtreeIndex: 16 bits, big-endian, unsigned
  • NoteCommitmentSubtreeData<{sapling, orchard}::tree::Node>: Height \|\| {sapling, orchard}::tree::Node

Amounts:

  • Amount: 64 bits, little-endian, signed
  • ValueBalance: [Amount; 4]

Derived Formats (legacy):

  • *::NoteCommitmentTree: bincode using serde
    • stored note commitment trees always have cached roots
  • NonEmptyHistoryTree: bincode using serde, using our copy of an old zcash_history serde implementation

bincode is a risky format to use, because it depends on the exact order and type of struct fields. Do not use it for new column families.

Address Format

The following figure helps visualizing the address index, which is the most complicated part. Numbers in brackets are array sizes; bold arrows are compositions (i.e. TransactionLocation is the concatenation of Height and TransactionIndex); dashed arrows are compositions that are also 1-to-many maps (i.e. AddressTransaction is the concatenation of AddressLocation and TransactionLocation, but also is used to map each AddressLocation to multiple TransactionLocations).

graph TD;
    Address -->|"balance_by_transparent_addr<br/>"| AddressBalance;
    AddressBalance ==> Amount;
    AddressBalance ==> AddressLocation;
    AddressLocation ==> FirstOutputLocation;
    AddressLocation -.->|"tx_loc_by_transparent_addr_loc<br/>(AddressTransaction[13])"| TransactionLocation;
    TransactionLocation ==> Height;
    TransactionLocation ==> TransactionIndex;
    OutputLocation -->|utxo_by_out_loc| Output;
    OutputLocation ==> TransactionLocation;
    OutputLocation ==> OutputIndex;
    AddressLocation -.->|"utxo_loc_by_transparent_addr_loc<br/>(AddressUnspentOutput[16])"| OutputLocation;

    AddressBalance["AddressBalance[16]"];
    Amount["Amount[8]"];
    Height["Height[3]"];
    Address["Address[21]"];
    TransactionIndex["TransactionIndex[2]"];
    TransactionLocation["TransactionLocation[5]"];
    OutputIndex["OutputIndex[3]"];
    OutputLocation["OutputLocation[8]"];
    FirstOutputLocation["First OutputLocation[8]"];
    AddressLocation["AddressLocation[8]"];

Implementing consensus rules using rocksdb

Each column family handles updates differently, based on its specific consensus rules:

  • Create:
    • Each key-value entry is created once.
    • Keys are never deleted, values are never updated.
  • Delete:
    • Each key-value entry is created once.
    • Keys can be deleted, but values are never updated.
    • Code called by ReadStateService must ignore deleted keys, or use a read lock.
    • We avoid deleting keys, and avoid using iterators on Delete column families, for performance.
    • TODO: should we prevent re-inserts of keys that have been deleted?
  • Update:
    • Each key-value entry is created once.
    • Keys are never deleted, but values can be updated.
    • Code called by ReadStateService must handle old or new values, or use a read lock.

We can't do some kinds of value updates, because they cause RocksDB performance issues:

  • Append:
    • Keys are never deleted.
    • Existing values are never updated.
    • Sets of values have additional items appended to the end of the set.
    • Code called by ReadStateService must handle shorter or longer sets, or use a read lock.
  • Up/Del:
    • Keys can be deleted.
    • Sets of values have items added or deleted (in any position).
    • Code called by ReadStateService must ignore deleted keys and values, accept shorter or longer sets, and accept old or new values. Or it should use a read lock.

Avoid using large sets of values as RocksDB keys or values.

RocksDB read locks

The read-only ReadStateService needs to handle concurrent writes and deletes of the finalized column families it reads. It must also handle overlaps between the cached non-finalized Chain, and the current finalized state database.

The StateService uses RocksDB transactions for each block write. So ReadStateService queries that only access a single key or value will always see a consistent view of the database.

If a ReadStateService query only uses column families that have keys and values appended (Never in the Updates table above), it should ignore extra appended values. Most queries do this by default.

For more complex queries, there are several options:

Reading across multiple column families:

  1. Ignore deleted values using custom Rust code
  2. Take a database snapshot - https://docs.rs/rocksdb/latest/rocksdb/struct.DBWithThreadMode.html#method.snapshot

Reading a single column family: 3. multi_get - https://docs.rs/rocksdb/latest/rocksdb/struct.DBWithThreadMode.html#method.multi_get_cf 4. iterator - https://docs.rs/rocksdb/latest/rocksdb/struct.DBWithThreadMode.html#method.iterator_cf

RocksDB also has read transactions, but they don't seem to be exposed in the Rust crate.

Low-Level Implementation Details

RocksDB ignores duplicate puts and deletes, preserving the latest values. If rejecting duplicate puts or deletes is consensus-critical, check db.get_cf(cf, key)? before putting or deleting any values in a batch.

Currently, these restrictions should be enforced by code review:

  • multiple zs_inserts are only allowed on Update column families, and
  • delete_cf is only allowed on Delete column families.

In future, we could enforce these restrictions by:

  • creating traits for Never, Delete, and Update
  • doing different checks in zs_insert depending on the trait
  • wrapping delete_cf in a trait, and only implementing that trait for types that use Delete column families.

As of June 2021, the Rust rocksdb crate ignores the delete callback, and merge operators are unreliable (or have undocumented behaviour). So they should not be used for consensus-critical checks.

Notes on rocksdb column families

  • The hash_by_height and height_by_hash column families provide a bijection between block heights and block hashes. (Since the rocksdb state only stores finalized state, they are actually a bijection).

  • Similarly, the tx_loc_by_hash and hash_by_tx_loc column families provide a bijection between transaction locations and transaction hashes.

  • The block_header_by_height column family provides a bijection between block heights and block header data. There is no corresponding height_by_block column family: instead, hash the block header, and use the hash from height_by_hash. (Since the rocksdb state only stores finalized state, they are actually a bijection). Similarly, there are no column families that go from transaction data to transaction locations: hash the transaction and use tx_loc_by_hash.

  • Block headers and transactions are stored separately in the database, so that individual transactions can be accessed efficiently. Blocks can be re-created on request using the following process:

    • Look up height in height_by_hash
    • Get the block header for height from block_header_by_height
    • Iterate from TransactionIndex 0, to get each transaction with height from tx_by_loc, stopping when there are no more transactions in the block
  • Block headers are stored by height, not by hash. This has the downside that looking up a block by hash requires an extra level of indirection. The upside is that blocks with adjacent heights are adjacent in the database, and many common access patterns, such as helping a client sync the chain or doing analysis, access blocks in (potentially sparse) height order. In addition, the fact that we commit blocks in order means we're writing only to the end of the rocksdb column family, which may help save space.

  • Similarly, transaction data is stored in chain order in tx_by_loc and utxo_by_out_loc, and chain order within each vector in utxo_loc_by_transparent_addr_loc and tx_loc_by_transparent_addr_loc.

  • TransactionLocations are stored as a (height, index) pair referencing the height of the transaction's parent block and the transaction's index in that block. This would more traditionally be a (hash, index) pair, but because we store blocks by height, storing the height saves one level of indirection. Transaction hashes can be looked up using hash_by_tx_loc.

  • Similarly, UTXOs are stored in utxo_by_out_loc by OutputLocation, rather than OutPoint. OutPoints can be looked up using tx_loc_by_hash, and reconstructed using hash_by_tx_loc.

  • The Utxo type can be constructed from the OutputLocation and Output data, height: OutputLocation.height, and is_coinbase: OutputLocation.transaction_index == 0 (coinbase transactions are always the first transaction in a block).

  • balance_by_transparent_addr is the sum of all utxo_loc_by_transparent_addr_locs that are still in utxo_by_out_loc. It is cached to improve performance for addresses with large UTXO sets. It also stores the AddressLocation for each address, which allows for efficient lookups.

  • utxo_loc_by_transparent_addr_loc stores unspent transparent output locations by address. The address location and UTXO location are stored as a RocksDB key, so they are in chain order, and get good database performance. This column family includes also includes the original address location UTXO, if it has not been spent.

  • When a block write deletes a UTXO from utxo_by_out_loc, that UTXO location should be deleted from utxo_loc_by_transparent_addr_loc. The deleted UTXO can be removed efficiently, because the UTXO location is part of the key. This is an index optimisation, which does not affect query results.

  • tx_loc_by_transparent_addr_loc stores transaction locations by address. This list includes transactions containing spent UTXOs. The address location and transaction location are stored as a RocksDB key, so they are in chain order, and get good database performance. This column family also includes the TransactionLocation of the transaction for the AddressLocation.

  • The sprout_note_commitment_tree stores the note commitment tree state at the tip of the finalized state, for the specific pool. There is always a single entry. Each tree is stored as a "Merkle tree frontier" which is basically a (logarithmic) subset of the Merkle tree nodes as required to insert new items. For each block committed, the old tree is deleted and a new one is inserted by its new height.

  • The {sapling, orchard}_note_commitment_tree stores the note commitment tree state for every height, for the specific pool. Each tree is stored as a "Merkle tree frontier" which is basically a (logarithmic) subset of the Merkle tree nodes as required to insert new items.

  • The {sapling, orchard}_note_commitment_subtree stores the completion height and root for every completed level 16 note commitment subtree, for the specific pool.

  • history_tree stores the ZIP-221 history tree state at the tip of the finalized state. There is always a single entry for it. The tree is stored as the set of "peaks" of the "Merkle mountain range" tree structure, which is what is required to insert new items.

  • Each *_anchors stores the anchor (the root of a Merkle tree) of the note commitment tree of a certain block. We only use the keys since we just need the set of anchors, regardless of where they come from. The exception is sprout_anchors which also maps the anchor to the matching note commitment tree. This is required to support interstitial treestates, which are unique to Sprout. TODO: store the Root hash in sprout_note_commitment_tree, and use it to look up the note commitment tree. This de-duplicates tree state data. But we currently only store one sprout tree by height.

  • The value pools are only stored for the finalized tip.

  • We do not store the cumulative work for the finalized chain, because the finalized work is equal for all non-finalized chains. So the additional non-finalized work can be used to calculate the relative chain order, and choose the best chain.

Zebra versioning and releases

This document contains the practices that we follow to provide you with a leading-edge application, balanced with stability. We strive to ensure that future changes are always introduced in a predictable way. We want everyone who depends on Zebra to know when and how new features are added, and to be well-prepared when obsolete ones are removed.

Before reading, you should understand Semantic Versioning and how a Trunk-based development works

Zebra versioning

Zebra version numbers show the impact of the changes in a release. They are composed of three parts: major.minor.patch. For example, version 3.1.11 indicates major version 3, minor version 1, and patch level 11.

The version number is incremented based on the level of change included in the release.

NOTE:
As Zebra is in a pre-release state (is unstable and might not satisfy the intended compatibility requirements as denoted by its associated normal version). The pre-release version is denoted by appending a hyphen and a series of dot separated identifiers immediately following the patch version.

Level of changeDetails
Major releaseContains significant new features, and commonly correspond to network upgrades; some technical assistance may be needed during the update. When updating to a major release, you may need to follow the specific upgrade instructions provided in the release notes.
Minor releaseContains new smaller features. Minor releases should be fully backward-compatible. No technical assistance is expected during update. If you want to use the new features in a minor release, you might need to follow the instructions in the release notes.
Patch releaseLow risk, bug fix release. No technical assistance is expected during update.

Supported Releases

Every Zebra version released by the Zcash Foundation is supported up to a specific height. Currently we support each version for about 16 weeks but this can change from release to release.

When the Zcash chain reaches this end of support height, zebrad will shut down and the binary will refuse to start.

Our process is similar to zcashd: https://zcash.github.io/zcash/user/release-support.html

Older Zebra versions that only support previous network upgrades will never be supported, because they are operating on an unsupported Zcash chain fork.

Supported update paths

You can update to any version of Zebra, provided that the following criteria are met:

  • The version you want to update to is supported.
  • The version you want to update from is within one major version of the version you want to upgrade to.

See Keeping Up-to-Date for more information about updating your Zebra projects to the most recent version.

Preview releases

We let you preview what's coming by providing Release Candidate (rc) pre-releases for some major releases:

Pre-release typeDetails
BetaThe release that is under active development and testing. The beta release is indicated by a release tag appended with the -beta identifier, such as 8.1.0-beta.0.
Release candidateA release for final testing of new features. A release candidate is indicated by a release tag appended with the -rc identifier, such as version 8.1.0-rc.0.

Distribution tags

Zebras's tagging relates directly to versions published on Docker. We will reference these Docker Hub distribution tags throughout:

TagDescription
latestThe most recent stable version.
betaThe most recent pre-release version of Zebra for testing. May not always exist.
rcThe most recent release candidate of Zebra, meant to become a stable version. May not always exist.

Feature Flags

To keep the main branch in a releasable state, experimental features must be gated behind a Rust feature flag. Breaking changes should also be gated behind a feature flag, unless the team decides they are urgent. (For example, security fixes which also break backwards compatibility.)

Release frequency

We work toward a regular schedule of releases, so that you can plan and coordinate your updates with the continuing evolution of Zebra.

Dates are offered as general guidance and are subject to change.

In general, expect the following release cycle:

  • A major release for each network upgrade, whenever there are breaking changes to Zebra (by API, severe bugs or other kind of upgrades)
  • Minor releases for significant new Zebra features or severe bug fixes
  • A patch release every few weeks

This cadence of releases gives eager developers access to new features as soon as they are fully developed and pass through our code review and integration testing processes, while maintaining the stability and reliability of the platform for production users that prefer to receive features after they have been validated by Zcash and other developers that use the pre-release builds.

Deprecation practices

Sometimes "breaking changes", such as the removal of support for RPCs, APIs, and features, are necessary to:

  • add new Zebra features,
  • improve Zebra performance or reliability,
  • stay current with changing dependencies, or
  • implement changes in the (blockchain) itself.

To make these transitions as straightforward as possible, we make these commitments to you:

  • We work hard to minimize the number of breaking changes and to provide migration tools, when possible
  • We follow the deprecation policy described here, so you have time to update your applications to the latest Zebra binaries, RPCs and APIs
  • If a feature has critical security or reliability issues, and we need to remove it as soon as possible, we will explain why at the top of the release notes

To help ensure that you have sufficient time and a clear path to update, this is our deprecation policy:

Deprecation stagesDetails
AnnouncementWe announce deprecated RPCs and features in the change log. When we announce a deprecation, we also announce a recommended update path.
Deprecation periodWhen a RPC or a feature is deprecated, it is still present until the next major release. A deprecation can be announced in any release, but the removal of a deprecated RPC or feature happens only in major release. Until a deprecated RPC or feature is removed, it is maintained according to the Tier 1 support policy, meaning that only critical and security issues are fixed.
Rust APIsThe Rust APIs of the Zebra crates are currently unstable and unsupported. Use the zebrad commands or JSON-RPCs to interact with Zebra.

Release candidate & release process

Our release checklist is available as a template, which defines each step our team needs to follow to create a new pre-release or release, and to also build and push the binaries to the official channels Release Checklist Template.

Zebra Continuous Integration

Overview

Zebra has extensive continuous integration tests for node syncing and lightwalletd integration.

On every PR change, Zebra runs these Docker tests:

  • Zebra update syncs from a cached state Google Cloud tip image
  • lightwalletd full syncs from a cached state Google Cloud tip image
  • lightwalletd update syncs from a cached state Google Cloud tip image
  • lightwalletd integration with Zebra JSON-RPC and Light Wallet gRPC calls

When a PR is merged to the main branch, we also run a Zebra full sync test from genesis. Some of our builds and tests are repeated on the main branch, due to:

  • GitHub's cache sharing rules,
  • our cached state sharing rules, or
  • generating base coverage for PR coverage reports.

Currently, each Zebra and lightwalletd full and update sync will updates cached state images, which are shared by all tests. Tests prefer the latest image generated from the same commit. But if a state from the same commit is not available, tests will use the latest image from any branch and commit, as long as the state version is the same.

Zebra also does a smaller set of tests on tier 2 platforms using GitHub actions runners.

Automated Merges

We use Mergify to automatically merge most pull requests. To merge, a PR has to pass all required main branch protection checks, and be approved by a Zebra developer.

We try to use Mergify as much as we can, so all PRs get consistent checks.

Some PRs don't use Mergify:

  • Mergify config updates
  • Admin merges, which happen when there are multiple failures on the main branch
  • Manual merges (these are allowed by our branch protection rules, but we almost always use Mergify)

Merging with failing CI is usually disabled by our branch protection rules. See the Admin: Manually Merging PRs section below for manual merge instructions.

We use workflow conditions to skip some checks on PRs, Mergify, or the main branch. For example, some workflow changes skip Rust code checks. When a workflow can skip a check, we need to create a patch workflow with an empty job with the same name. This is a known Actions issue. This lets the branch protection rules pass when the job is skipped. In Zebra, we name these workflows with the extension .patch.yml.

Branch Protection Rules

Branch protection rules should be added for every failure that should stop a PR merging, break a release, or cause problems for Zebra users. We also add branch protection rules for developer or devops features that we need to keep working, like coverage.

But the following jobs don't need branch protection rules:

  • Testnet jobs: testnet is unreliable.
  • Optional linting jobs: some lint jobs are required, but some jobs like spelling and actions are optional.
  • Jobs that rarely run: for example, cached state rebuild jobs.
  • Setup jobs that will fail another later job which always runs, for example: Google Cloud setup jobs. We have branch protection rules for build jobs, but we could remove them if we want.

When a new job is added in a PR, use the #devops Slack channel to ask a GitHub admin to add a branch protection rule after it merges. Adding a new Zebra crate automatically adds a new job to build that crate by itself in ci-build-crates.yml, so new crate PRs also need to add a branch protection rule.

Admin: Changing Branch Protection Rules

Zebra repository admins and Zcash Foundation organisation owners can add or delete branch protection rules in the Zebra repository.

To change branch protection rules:

Any developer:

  1. Run a PR containing the new rule, so its name is available to autocomplete.
  2. If the job doesn't run on all PRs, add a patch job with the name of the job. If the job calls a reusable workflow, the name is Caller job / Reusable step. (The name of the job inside the reusable workflow is ignored.)

Admin:

  1. Go to the branch protection rule settings
  2. Click on Edit for the main branch
  3. Scroll down to the Require status checks to pass before merging section. (This section must always be enabled. If it is disabled, all the rules get deleted.)

To add jobs:

  1. Start typing the name of the job or step in the search box
  2. Select the name of the job or step to add it

To remove jobs:

  1. Go to Status checks that are required.
  2. Find the job name, and click the cross on the right to remove it

And finally:

  1. Click Save changes, using your security key if needed

If you accidentally delete a lot of rules, and you can't remember what they were, ask a ZF organisation owner to send you a copy of the rules from the audit log.

Organisation owners can also monitor rule changes and other security settings using this log.

Admin: Manually Merging PRs

Admins can allow merges with failing CI, to fix CI when multiple issues are causing failures.

Admin:

  1. Follow steps 2 and 3 above to open the main branch protection rule settings
  2. Scroll down to Do not allow bypassing the above settings
  3. Uncheck it
  4. Click Save changes
  5. Do the manual merge, and put an explanation on the PR
  6. Re-open the branch protection rule settings, and re-enable Do not allow bypassing the above settings

Pull Requests from Forked Repositories

GitHub doesn't allow PRs from forked repositories to have access to our repository secret keys, even after we approve their CI. This means that Google Cloud CI fails on these PRs.

Until we fix this CI bug, we can merge external PRs by:

  1. Reviewing the code to make sure it won't give our secret keys to anyone
  2. Pushing a copy of the branch to the Zebra repository
  3. Opening a PR using that branch
  4. Closing the original PR with a note that it will be merged (closing duplicate PRs is required by Mergify)
  5. Asking another Zebra developer to approve the new PR

Manual Testing Using Google Cloud

Some Zebra developers have access to the Zcash Foundation's Google Cloud instance, which also runs our automatic CI.

Please shut down large instances when they are not being used.

Automated Deletion

The Delete GCP Resources workflow automatically deletes test instances, instance templates, disks, and images older than a few days.

If you want to keep instances, instance templates, disks, or images in Google Cloud, name them so they don't match the automated names:

  • deleted instances, instance templates and disks end in a commit hash, so use a name that doesn't end in -[0-9a-f]{7,}
  • deleted disks and images start with zebrad- or lwd-, so use a name starting with anything else

Our production Google Cloud project doesn't have automated deletion.

Troubleshooting

To improve CI performance, some Docker tests are stateful.

Tests can depend on:

  • built Zebra and lightwalletd docker images
  • cached state images in Google cloud
  • jobs that launch Google Cloud instances for each test
  • multiple jobs that follow the logs from Google Cloud (to work around the 6 hour GitHub actions limit)
  • a final "Run" job that checks the exit status of the Rust acceptance test
  • the current height and user-submitted transactions on the blockchain, which changes every minute

To support this test state, some Docker tests depend on other tests finishing first. This means that the entire workflow must be re-run when a single test fails.

Finding Errors

  1. Check if the same failure is happening on the main branch or multiple PRs. If it is, open a ticket and tell the Zebra team lead.

  2. Look for the earliest job that failed, and find the earliest failure.

For example, this failure doesn't tell us what actually went wrong:

Error: The template is not valid. ZcashFoundation/zebra/.github/workflows/sub-build-docker-image.yml@8bbc5b21c97fafc83b70fbe7f3b5e9d0ffa19593 (Line: 52, Col: 19): Error reading JToken from JsonReader. Path '', line 0, position 0.

https://github.com/ZcashFoundation/zebra/runs/8181760421?check_suite_focus=true#step:41:4

But the specific failure is a few steps earlier:

#24 2117.3 error[E0308]: mismatched types ...

https://github.com/ZcashFoundation/zebra/runs/8181760421?check_suite_focus=true#step:8:2112

  1. The earliest failure can also be in another job or pull request: a. check the whole workflow run (use the "Summary" button on the top left of the job details, and zoom in) b. if Mergify failed with "The pull request embarked with main cannot be merged", look at the PR "Conversation" tab, and find the latest Mergify PR that tried to merge this PR. Then start again from step 1.

  2. If that doesn't help, try looking for the latest failure. In Rust tests, the "failure:" notice contains the failed test names.

Fixing CI Sync Timeouts

CI sync jobs near the tip will take different amounts of time as:

  • the blockchain grows, and
  • Zebra's checkpoints are updated.

To fix a CI sync timeout, follow these steps until the timeouts are fixed:

  1. Check for recent PRs that could have caused a performance decrease

  2. Update Zebra's checkpoints

  3. If a Rust test fails with "command did not log any matches for the given regex, within the ... timeout":

    a. If it's the full sync test, increase the full sync timeout

    b. If it's an update sync test, increase the update sync timeouts

Fixing Duplicate Dependencies in Check deny.toml bans

Zebra's CI checks for duplicate crate dependencies: multiple dependencies on different versions of the same crate. If a developer or dependabot adds a duplicate dependency, the Check deny.toml bans CI job will fail.

You can view Zebra's entire dependency tree using cargo tree. It can also show the active features on each dependency.

To fix duplicate dependencies, follow these steps until the duplicate dependencies are fixed:

  1. Check for updates to the crates mentioned in the Check deny.toml bans logs, and try doing them in the same PR. For an example, see PR #5009.

    a. Check for open dependabot PRs, and

    b. Manually check for updates to those crates on https://crates.io .

  2. If there are still duplicate dependencies, try removing those dependencies by disabling crate features:

    a. Check for features that Zebra activates in its Cargo.toml files, and try turning them off, then

    b. Try adding default-features = false to Zebra's dependencies (see PR #4082).

  3. If there are still duplicate dependencies, add or update skip-tree in deny.toml:

    a. Prefer exceptions for dependencies that are closer to Zebra in the dependency tree (sometimes this resolves other duplicates as well),

    b. Add or update exceptions for the earlier version of duplicate dependencies, not the later version, and

    c. Add a comment about why the dependency exception is needed: what was the direct Zebra dependency that caused it?

    d. For an example, see PR #4890.

  4. Repeat step 3 until the dependency warnings are fixed. Adding a single skip-tree exception can resolve multiple warnings.

Fixing "unmatched skip root" warnings in Check deny.toml bans

  1. Run cargo deny --all-features check bans, or look at the output of the latest "Check deny.toml bans --all-features" job on the main branch

  2. If there are any "skip tree root was not found in the dependency graph" warnings, delete those versions from deny.toml

Fixing Disk Full Errors

If the Docker cached state disks are full, increase the disk sizes in:

If the GitHub Actions disks are full, follow these steps until the errors are fixed:

  1. Check if error is also happening on the main branch. If it is, skip the next step.
  2. Update your branch to the latest main branch, this builds with all the latest dependencies in the main branch cache.
  3. Clear the GitHub Actions code cache for the failing branch. Code caches are named after the compiler version.
  4. Clear the GitHub Actions code caches for all the branches and the main branch.

These errors often happen after a new compiler version is released, because the caches can end up with files from both compiler versions.

You can find a list of caches using:

gh api -H "Accept: application/vnd.github+json" repos/ZcashFoundation/Zebra/actions/caches

And delete a cache by id using:

gh api --method DELETE -H "Accept: application/vnd.github+json" /repos/ZcashFoundation/Zebra/actions/caches/<id>

These commands are from the GitHub Actions Cache API reference.

Retrying After Temporary Errors

Some errors happen due to network connection issues, high load, or other rare situations.

If it looks like a failure might be temporary, try re-running all the jobs on the PR using one of these methods:

  1. @mergifyio update
  2. @dependabot recreate (for dependabot PRs only)
  3. click on the failed job, and select "re-run all jobs". If the workflow hasn't finished, you might need to cancel it, and wait for it to finish.

Here are some of the rare and temporary errors that should be retried:

  • Docker: "buildx failed with ... cannot reuse body, request must be retried"
  • Failure in local_listener_fixed_port_localhost_addr_v4 Rust test, mention ticket #4999 on the PR
  • any network connection or download failures

We track some rare errors using tickets, so we know if they are becoming more common and we need to fix them.

Zebra Continuous Delivery

Zebra has an extension of it's continuous integration since it automatically deploys all code changes to a testing and/or pre-production environment after each PR gets merged into the main branch, and on each Zebra release.

Triggers

The Continuous delivery pipeline is triggered when:

  • A PR is merged to main (technically, a push event)
  • A new release is published in GitHub

Deployments

On each trigger Zebra is deployed using the branch or version references as part of the deployment naming convention. Deployments are made using Managed Instance Groups (MIGs) from Google Cloud Platform with, 2 nodes in the us-central1 region.

Note: These MIGs are always replaced when PRs are merged to the main branch and when a release is published. If a new major version is released, a new MIG is also created, keeping the previous major version running until it's no longer needed.

A single instance can also be deployed, on an on-demand basis, if required, when a long-lived instance, with specific changes, is needed to be tested in the Mainnet with the same infrastructure used for CI & CD.

Further validations of the actual process can be done on our continuous delivery workflow file.

zebra-checkpoints

zebra-checkpoints uses a local zebrad or zcashd instance to generate a list of checkpoints for Zebra's checkpoint verifier.

Developers should run this tool every few months to add new checkpoints to Zebra. (By default, Zebra uses these checkpoints to sync to the chain tip.)

For more information on how to run this program visit Zebra checkpoints README

Doing Mass Renames in Zebra Code

Sometimes we want to rename a Rust type or function, or change a log message.

But our types and functions are also used in our documentation, so the compiler can sometimes miss when their names are changed.

Our log messages are also used in our integration tests, so changing them can lead to unexpected test failures or hangs.

Universal Renames with sed

You can use sed to rename all the instances of a name in Zebra's code, documentation, and tests:

git ls-tree --full-tree -r --name-only HEAD | \
xargs sed -i -e 's/OldName/NewName/g' -e 's/OtherOldName/OtherNewName/g'

Or excluding specific paths:

git ls-tree --full-tree -r --name-only HEAD | \
grep -v -e 'path-to-skip' -e 'other-path-to-skip' | \
xargs sed -i -e 's/OldName/NewName/g' -e 's/OtherOldName/OtherNewName/g'

sed also supports regular expressions to replace a pattern with another pattern.

Here's how to make a PR with these replacements:

  1. Run the sed commands
  2. Run cargo fmt --all after doing all the replacements
  3. Put the commands in the commit message and pull request, so the reviewer can check them

Here's how to review that PR:

  1. Check out two copies of the repository, one with the PR, and one without:
cd zebra
git fetch --all
# clear the checkout so we can use main elsewhere
git checkout main^
# Use the base branch or commit for the PR, which is usually main
git worktree add ../zebra-sed main
git worktree add ../zebra-pr origin/pr-branch-name
  1. Run the scripts on the repository without the PR:
cd ../zebra-sed
# run the scripts in the PR or commit message
git ls-tree --full-tree -r --name-only HEAD | \
grep -v -e 'path-to-skip' -e 'other-path-to-skip' | \
xargs sed -i -e 's/OldName/NewName/g' -e 's/OtherOldName/OtherNewName/g'
cargo fmt --all
  1. Automatically check that they match
cd ..
git diff zebra-sed zebra-pr

If there are no differences, then the PR can be approved.

If there are differences, then post them as a review in the PR, and ask the author to re-run the script on the latest main.

Interactive Renames with fastmod

You can use fastmod to rename some instances, but skip others:

fastmod --hidden --fixed-strings "OldName" "NewName" [paths to change]

Using the --hidden flag does renames in .github workflows, issue templates, and other configs.

fastmod also supports regular expressions to replace a pattern with another pattern.

Here's how to make a PR with these replacements:

  1. Run the fastmod commands, choosing which instances to replace
  2. Run cargo fmt --all after doing all the replacements
  3. Put the commands in the commit message and pull request, so the reviewer can check them
  4. If there are a lot of renames:
    • use sed on any directories or files that are always renamed, and put them in the first PR,
    • do a cleanup using fastmod in the next PR.

Here's how to review that PR:

  1. Manually review each replacement (there's no shortcut)

When you're referencing a type or function in a doc comment, use a rustdoc link to refer to it.

This makes the documentation easier to navigate, and our rustdoc lint will detect any typos or name changes.

#![allow(unused)]
fn main() {
//! This is what `rustdoc` links look like:
//! - [`u32`] type or trait
//! - [`drop()`] function
//! - [`Clone::clone()`] method
//! - [`Option::None`] enum variant
//! - [`Option::Some(_)`](Option::Some) enum variant with data
//! - [`HashMap`](std::collections::HashMap) fully-qualified path
//! - [`BTreeSet<String>`](std::collections::BTreeSet) fully-qualified path with generics
}

If a type isn't imported in the module or Rust prelude, then it needs a fully-qualified path in the docs, or an unused import:

#![allow(unused)]
fn main() {
// For rustdoc
#[allow(unused_imports)]
use std::collections::LinkedList;

//! Link to [`LinkedList`].
struct Type;
}

Updating the ECC dependencies

Zebra relies on numerous Electric Coin Company (ECC) dependencies, and updating them can be a complex task. This guide will help you navigate the process.

The main dependency that influences that is zcash itself. This is because zebra_script links to specific files from it (zcash_script.cpp and all on which it depends). Due to the architecture of zcash, this requires linking to a lot of seemingly unrelated dependencies like orchard, halo2, etc (which are all Rust crates).

Steps for upgrading

Let's dive into the details of each step required to perform an upgrade:

Before starting

  • Zebra developers often dismiss ECC dependency upgrade suggestions from dependabot. For instance, see this closed PR in favor of the 5.7.0 zcashd upgrade PR, which followed this guide.

  • Determine the version of zcashd to use. This version will determine which versions of other crates to use. Typically, this should be a tag, but in some cases, it might be a reference to a branch (e.g., nu5-consensus) for testing unreleased developments.

  • Upgrading the zcash_script crate can be challenging, depending on changes in the latest zcashd release. Follow the instructions in the project's README for guidance.

  • Upgrade and release zcash_script before upgrading other ECC dependencies in Zebra.

Upgrade versions

  • Use the cargo upgrade command to upgrade all the ECC dependency versions in Zebra. For example, in this PR, the following command was used:
cargo upgrade --incompatible -p bridgetree -p incrementalmerkletree -p orchard -p zcash_primitives -p zcash_proofs -p zcash_address -p zcash_encoding -p zcash_note_encryption -p zcash_script

Notes:

  • Insert all the crate names to be updated to the command.

  • Use crate-name@version to upgrade to a specific version of that crate, instead of just the highest version.

  • You need to have cargo upgrade and cargo edit installed for this command to work.

Version consistency check

  • Ensure that the crate versions in the Cargo.toml of the zcashd release, Cargo.toml of zcash_script, and the Cargo.toml files of Zebra crates are all the same. Version consistency is crucial.

Build/Test zebra & fix issues

  • Build zebra and make sure it compiles.
cargo build
  • Test Zebra and make sure all test code compiles and all tests pass:
cargo test
  • When upgrading, it's common for things to break, such as deprecated or removed functionality. Address these issues by referring to the broken dependency's changelog, which often provides explanations and workarounds.

  • If you encounter issues that you can't resolve, consider reaching out to ECC team members who worked on the upgrade, as they may have more context.

Check deny.toml

  • Review Zebra's deny.toml file for potential duplicates that can be removed due to the upgrade. You may also need to add new entries to deny.toml.
  • You can identify issues with the dependencies using cargo deny check bans command, need to have cargo deny installed.
  • Push your changes and let the CI identify any additional problems.

Push the Pull Request (PR)

  • Push the pull request with all the changes and ensure that the full CI process passes.
  • Seek approval for the PR.
  • Merge to main branch.

Zebra RFCs

We are experimenting with using a process similar to the Rust RFC process to document design decisions for Zebra.

Summary

The Bitcoin network protocol used by Zcash allows nodes to download blocks from other peers. This RFC describes how we find and download this data asynchronously.

Motivation

To sync the chain, we need to find out which blocks to download and then download them. Downloaded blocks can then be fed into the verification system and (assuming they verify correctly) into the state system. In zcashd, blocks are processed one at a time. In Zebra, however, we want to be able to pipeline block download and verification operations, using futures to explicitly specify logical dependencies between sub-tasks, which we execute concurrently and potentially out-of-order on a threadpool. This means that the procedure we use to determine which blocks to download must look somewhat different than zcashd.

Block fetching in Bitcoin

Zcash inherits its network protocol from Bitcoin. Bitcoin block fetching works roughly as follows. A node can request block information from peers using either a getblocks or getheaders message. Both of these messages contain a block locator object consisting of a sequence of block hashes. The block hashes are ordered from highest to lowest, and represent checkpoints along the path from the node's current tip back to genesis. The remote peer computes the intersection between its chain and the node's chain by scanning through the block locator for the first hash in its chain. Then, it sends (up to) 500 subsequent block hashes in an inv message (in the case of getblocks) or (up to) 2000 block headers in a headers message (in the case of getheaders). Note: zcashd reduces the getheaders count to 160, because Zcash headers are much larger than Bitcoin headers, as noted below.

The headers message sent after getheaders contains the actual block headers, while the inv message sent after getblocks contains only hashes, which have to be fetched with a getdata message. In Bitcoin, the block headers are small relative to the size of the full block, but this is not always the case for Zcash, where the block headers are much larger due to the use of Equihash and many blocks have only a few transactions. Also, getblocks allows parallelizing block downloads, while getheaders doesn't. For these reasons and because we know we need full blocks anyways, we should probably use getblocks.

The getblocks Bitcoin message corresponds to our zebra_network::Request::FindBlocksByHash, and the getdata message is generated by zebra_network::Request::Blocks.

Pipelining block verification

As mentioned above, our goal is to be able to pipeline block download and verification. This means that the process for block lookup should ideally attempt to fetch and begin verification of future blocks without blocking on complete verification of all earlier blocks. To do this, we split the chain state into the verified block chain (held by the state component) and the prospective block chain (held only by the syncer), and use the following algorithm to pursue prospective chain tips.

ObtainTips

  1. Query the current state to construct the sequence of hashes
[tip, tip-1, tip-2, ..., tip-9, tip-20, tip-40, tip-80, tip-160 ]

The precise construction is unimportant, but this should have a Bitcoin-style dense-first, then-sparse hash structure.

The initial state should contain the genesis block for the relevant network. So the sequence of hashes will only contain the genesis block

[genesis ]

The network will respond with a list of hashes, starting at the child of the genesis block.

  1. Make a FindBlocksByHash request to the network F times, where F is a fanout parameter, to get resp1, ..., respF.

  2. For each response, starting from the beginning of the list, prune any block hashes already included in the state, stopping at the first unknown hash to get resp1', ..., respF'. (These lists may be empty).

  3. Combine the last elements of each list into a set; this is the set of prospective tips.

  4. Combine all elements of each list into a set, and queue download and verification of those blocks.

  5. If there are any prospective tips, call ExtendTips, which returns a new set of prospective tips. Continue calling ExtendTips with this new set, until there are no more prospective tips.

  6. Restart after some delay, say 15 seconds.

ExtendTips

  1. Remove all prospective tips from the set of prospective tips, then iterate through them. For each removed tip:

  2. Create a FindBlocksByHash request consisting of just the prospective tip. Send this request to the network F times.

  3. For each response, check whether the first hash in the response is a genesis block (for either the main or test network). If so, discard the response. It indicates that the remote peer does not have any blocks following the prospective tip. (Or that the remote peer is on the wrong network.)

  4. Combine the last elements of the remaining responses into a set, and add this set to the set of prospective tips.

  5. Combine all elements of the remaining responses into a set, and queue download and verification of those blocks.

DoS resistance

Because this strategy aggressively downloads any available blocks, it could be vulnerable to a DoS attack, where a malicious peer feeds us bogus chain tips, causing us to waste network and CPU on blocks that will never be valid. However, because we separate block finding from block downloading, and because of the design of our network stack, this attack is probably not feasible. The primary reason is that zebra_network randomly loadbalances outbound requests over all available peers.

Consider a malicious peer who responds to block discovery with a bogus list of hashes. We will eagerly attempt to download all of those bogus blocks, but our requests to do so will be randomly load-balanced to other peers, who are unlikely to know about the bogus blocks. When we try to extend a bogus tip, the extension request will also be randomly load-balanced, so it will likely be routed to a peer that doesn't know about it and can't extend it. And because we perform multiple block discovery queries, which will also be randomly load balanced, we're unlikely to get stuck on a false chain tip.

Fork-finding

When starting from a verified chain tip, the choice of block locator can find forks at least up to the reorg limit (99 blocks). When extending a prospective tip, forks are ignored, but this is fine, since unless we are prefetching the longest chain, we won't be able to keep extending the tip prospectively.

Retries and Fanout

We should consider the fanout parameter F and the retry policy for the different requests. I'm not sure whether we need to retry requests to discover new block hashes, since the fanout may already provide redundancy. For the block requests themselves, we should have a retry policy with a limited number of attempts, enough to insulate against network failures but not so many that we would retry a bogus block indefinitely. Maybe fanout 4 and 3 retries?

Parallel Verification

Summary

Zebra verifies blocks in several stages, most of which can be executed in parallel.

We use several different design patterns to enable this parallelism:

  • We download blocks and start verifying them in parallel,
  • We batch signature and proof verification using verification services, and
  • We defer data dependencies until just before the block is committed to the state (see the detailed design RFCs).

Motivation

Zcash (and Bitcoin) are designed to verify each block in sequence, starting from the genesis block. But during the initial sync, and when restarting with an older state, this process can be quite slow.

By deferring data dependencies, we can partially verify multiple blocks in parallel.

By parallelising block and transaction verification, we can use multithreading and batch verification for signatures, proofs, scripts, and hashes.

Definitions

Blockchain:

  • chain fork: Zcash is implemented using a tree of blocks. Each block has a single previous block, and zero to many next blocks. A chain fork consists of a tip and all its previous blocks, back to the genesis block.
  • genesis: The root of the tree of blocks is called the genesis block. It has no previous block.
  • tip: A block which has no next block is called a tip. Each chain fork can be identified using its tip.

Data:

  • consensus rule: A protocol rule which all nodes must apply consistently, so they can converge on the same chain fork.
  • context-free: Consensus rules which do not have a data dependency on previous blocks.
  • data dependency: Information contained in the previous block and its chain fork, which is required to verify the current block.
  • state: The set of verified blocks. The state might also cache some dependent data, so that we can efficiently verify subsequent blocks.

Verification Stages:

  • structural verification: Parsing raw bytes into the data structures defined by the protocol.
  • semantic verification: Verifying the consensus rules on the data structures defined by the protocol.
  • contextual verification: Verifying the current block, once its data dependencies have been satisfied by a verified previous block. This verification might also use the cached state corresponding to the previous block.

Guide-level explanation

In Zebra, we want to verify blocks in parallel. Some fields can be verified straight away, because they don't depend on the output of previous blocks. But other fields have data dependencies, which means that we need previous blocks before we can fully validate them.

If we delay checking some of these data dependencies, then we can do more of the verification in parallel.

Example: BlockHeight

Here's how Zebra can verify the different Block Height consensus rules in parallel:

Structural Verification:

  1. Parse the Block into a BlockHeader and a list of transactions.

Semantic Verification: No Data Dependencies:

  1. Check that the first input of the first transaction in the block is a coinbase input with a valid block height in its data field.

Semantic Verification: Deferring a Data Dependency:

  1. Verify other consensus rules that depend on Block Height, assuming that the Block Height is correct. For example, many consensus rules depend on the current Network Upgrade, which is determined by the Block Height. We verify these consensus rules, assuming the Block Height and Network Upgrade are correct.

Contextual Verification:

  1. Submit the block to the state for contextual verification. When it is ready to be committed (it may arrive before the previous block), check all deferred constraints, including the constraint that the block height of this block is one more than the block height of its parent block. If all constraints are satisfied, commit the block to the state. Otherwise, reject the block as invalid.

Zebra Design

Design Patterns

When designing changes to Zebra verification, use these design patterns:

  • perform context-free verification as soon as possible, (that is, verification which has no data dependencies on previous blocks),
  • defer data dependencies as long as possible, then
  • check the data dependencies.

Minimise Deferred Data

Keep the data dependencies and checks as simple as possible.

For example, Zebra could defer checking both the Block Height and Network Upgrade.

But since the Network Upgrade depends on the Block Height, we only need to defer the Block Height check. Then we can use all the fields that depend on the Block Height, as if it is correct. If the final Block Height check fails, we will reject the entire block, including all the verification we performed using the assumed Network Upgrade.

Implementation Strategy

When implementing these designs, perform as much verification as possible, await any dependencies, then perform the necessary checks.

Reference-level explanation

Verification Stages

In Zebra, verification occurs in the following stages:

  • Structural Verification: Raw block data is parsed into a block header and transactions. Invalid data is not representable in these structures: deserialization (parsing) can fail, but serialization always succeeds.
  • Semantic Verification: Parsed block fields are verified, based on their data dependencies:
    • Context-free fields have no data dependencies, so they can be verified as needed.
    • Fields with simple data dependencies defer that dependency as long as possible, so they can perform more verification in parallel. Then they await the required data, which is typically the previous block. (And potentially older blocks in its chain fork.)
    • Fields with complex data dependencies require their own parallel verification designs. These designs are out of scope for this RFC.
  • Contextual Verification: After a block is verified, it is added to the state. The details of state updates, and their interaction with semantic verification, are out of scope for this RFC.

This RFC focuses on Semantic Verification, and the design patterns that enable blocks to be verified in parallel.

Verification Interfaces

Verification is implemented by the following traits and services:

  • Structural Verification:
    • zebra_chain::ZcashDeserialize: A trait for parsing consensus-critical data structures from a byte buffer.
  • Semantic Verification:
    • ChainVerifier: Provides a verifier service that accepts a Block request, performs verification on the block, and responds with a block::Hash on success.
    • Internally, the ChainVerifier selects between a CheckpointVerifier for blocks that are within the checkpoint range, and a BlockVerifier for recent blocks.
  • Contextual Verification:
    • zebra_state::init: Provides the state update service, which accepts requests to add blocks to the state.

Checkpoint Verification

The CheckpointVerifier performs rapid verification of blocks, based on a set of hard-coded checkpoints. Each checkpoint hash can be used to verify all the

previous blocks, back to the genesis block. So Zebra can skip almost all verification for blocks in the checkpoint range.

The CheckpointVerifier uses an internal queue to store pending blocks. Checkpoint verification is cheap, so it is implemented using non-async functions within the CheckpointVerifier service.

Here is how the CheckpointVerifier implements each verification stage:

  • Structural Verification:
    • As Above: the CheckpointVerifier accepts parsed Block structs.
  • Semantic Verification:
    • check_height: makes sure the block height is within the unverified checkpoint range, and adds the block to its internal queue.
    • target_checkpoint_height: Checks for a continuous range of blocks from the previous checkpoint to a subsequent checkpoint. If the chain is incomplete, returns a future, and waits for more blocks. If the chain is complete, assumes that the previous_block_hash fields of these blocks form an unbroken chain from checkpoint to checkpoint, and starts processing the checkpoint range. (This constraint is an implicit part of the CheckpointVerifier design.)
    • process_checkpoint_range: makes sure that the blocks in the checkpoint range have an unbroken chain of previous block hashes.
  • Contextual Verification:
    • As Above: the CheckpointVerifier returns success to the ChainVerifier, which sends verified Blocks to the state service.

Block Verification

The BlockVerifier performs detailed verification of recent blocks, in parallel.

Here is how the BlockVerifier implements each verification stage:

  • Structural Verification:
    • As Above: the BlockVerifier accepts parsed Block structs.
  • Semantic Verification:
    • As Above: verifies each field in the block. Defers any data dependencies as long as possible, awaits those data dependencies, then performs data dependent checks.
    • Note: Since futures are executed concurrently, we can use the same function to:
      • perform context-free verification,
      • perform verification with deferred data dependencies,
      • await data dependencies, and
      • check data dependencies. To maximise concurrency, we should write verification functions in this specific order, so the awaits are as late as possible.
  • Contextual Verification:
    • As Above: the BlockVerifier returns success to the ChainVerifier, which sends verified Blocks to the state service.

Zcash Protocol Design

When designing a change to the Zcash protocol, minimise the data dependencies between blocks.

Try to create designs that:

  • Eliminate data dependencies,
  • Make the changes depend on a version field in the block header or transaction,
  • Make the changes depend on the current Network Upgrade, or
  • Make the changes depend on a field in the current block, with an additional consensus rule to check that field against previous blocks.

When making decisions about these design tradeoffs, consider:

  • how the data dependency could be deferred, and
  • the CPU cost of the verification - if it is trivial, then it does not matter if the verification is parallelised.

Drawbacks

This design is a bit complicated, but we think it's necessary to achieve our goals.

Rationale and alternatives

  • What makes this design a good design?
    • It enables a significant amount of parallelism
    • It is simpler than some other alternatives
    • It uses existing Rust language facilities, mainly Futures and await/async
  • Is this design a good basis for later designs or implementations?
    • We have built a UTXO design on this design
    • We believe we can build "recent blocks" and "chain summary" designs on this design
    • Each specific detailed design will need to consider how the relevant data dependencies are persisted
  • What other designs have been considered and what is the rationale for not choosing them?
    • Serial verification
      • Effectively single-threaded
    • Awaiting data dependencies as soon as they are needed
      • Less parallelism
    • Providing direct access to the state
      • Might cause data races, might be prevented by Rust's ownership rules
      • Higher risk of bugs
  • What is the impact of not doing this?
    • Verification is slow, we can't batch or parallelise some parts of the verification

Prior art

TODO: expand this section

  • zcashd
    • serial block verification
    • Zebra implements the same consensus rules, but a different design
  • tower

Unresolved questions

  • Is this design good enough to use as a framework for future RFCs?
  • Does this design require any changes to the current implementation?
    • Implement block height consensus rule (check previous block hash and height)
    • Check that the BlockVerifier performs checks in the following order:
      • verification, deferring dependencies as needed,
      • await dependencies,
      • check deferred data dependencies

Out of Scope:

  • What is the most efficient design for parallel verification?

    • (Optimisations are out of scope.)
  • How is each specific field verified?

  • How do we verify fields with complex data dependencies?

  • How does verification change with different network upgrades?

  • How do multiple chains work, in detail?

  • How do state updates work, in detail?

  • Moving the verifiers into the state service

Future possibilities

  • Separate RFCs for other data dependencies
    • Recent blocks
    • Overall chain summaries (for example, total work)
    • Reorganisation limit: multiple chains to single chain transition
  • Optimisations for parallel verification

Summary

The Bitcoin network protocol used by Zcash allows nodes to advertise data (inventory items) for download by other peers. This RFC describes how we track and use this information.

Motivation

In order to participate in the network, we need to be able to fetch new data that our peers notify us about. Because our network stack abstracts away individual peer connections, and load-balances over available peers, we need a way to direct requests for new inventory only to peers that advertised to us that they have it.

Definitions

  • Inventory item: either a block or transaction.
  • Inventory hash: the hash of an inventory item, represented by the InventoryHash type.
  • Inventory advertisement: a notification from another peer that they have some inventory item.
  • Inventory request: a request to another peer for an inventory item.

Guide-level explanation

The Bitcoin network protocol used by Zcash provides a mechanism for nodes to gossip blockchain data to each other. This mechanism is used to distribute (mined) blocks and (unmined) transactions through the network. Nodes can advertise data available in their inventory by sending an inv message containing the hashes and types of those data items. After receiving an inv message advertising data, a node can determine whether to download it.

This poses a challenge for our network stack, which goes to some effort to abstract away details of individual peers and encapsulate all peer connections behind a single request/response interface representing "the network". Currently, the peer set tracks readiness of all live peers, reports readiness if at least one peer is ready, and routes requests across ready peers randomly using the "power of two choices" algorithm.

However, while this works well for data that is already distributed across the network (e.g., existing blocks) it will not work well for fetching data during distribution across the network. If a peer informs us of some new data, and we attempt to download it from a random, unrelated peer, we will likely fail. Instead, we track recent inventory advertisements, and make a best-effort attempt to route requests to peers who advertised that inventory.

Reference-level explanation

The inventory tracking system has several components:

  1. A registration hook that monitors incoming messages for inventory advertisements;
  2. An inventory registry that tracks inventory presence by peer;
  3. Routing logic that uses the inventory registry to appropriately route requests.

The first two components have fairly straightforward design decisions, but the third has considerably less obvious choices and tradeoffs.

Inventory Monitoring

Zebra uses Tokio's codec mechanism to transform a byte-oriented I/O interface into a Stream and Sink for incoming and outgoing messages. These are passed to the peer connection state machine, which is written generically over any Stream and Sink. This construction makes it easy to "tap" the sequence of incoming messages using .then and .with stream and sink combinators.

We already do this to record Prometheus metrics on message rates as well as to report message timestamps used for liveness checks and last-seen address book metadata. The message timestamp mechanism is a good example to copy. The handshake logic instruments the incoming message stream with a closure that captures a sender handle for a mpsc channel with a large buffer (currently 100 timestamp entries). The receiver handle is owned by a separate task that shares an Arc<Mutex<AddressBook>> with other parts of the application. This task waits for new timestamp entries, acquires a lock on the address book, and updates the address book. This ensures that timestamp updates are queued asynchronously, without lock contention.

Unlike the address book, we don't need to share the inventory data with other parts of the application, so it can be owned exclusively by the peer set. This means that no lock is necessary, and the peer set can process advertisements in its poll_ready implementation. This method may be called infrequently, which could cause the channel to fill. However, because inventory advertisements are time-limited, in the sense that they're only useful before some item is fully distributed across the network, it's safe to handle excess entries by dropping them. This behavior is provided by a broadcast/mpmc channel, which can be used in place of an mpsc channel.

An inventory advertisement is an (InventoryHash, SocketAddr) pair. The stream hook should check whether an incoming message is an inv message with only a small number (e.g., 1) inventory entries. If so, it should extract the hash for each item and send it through the channel. Otherwise, it should ignore the message contents. Why? Because inv messages are also sent in response to queries, such as when we request subsequent block hashes, and in that case we want to assume that the inventory is generally available rather than restricting downloads to a single peer. However, items are usually gossiped individually (or potentially in small chunks; zcashd has an internal inv buffer subject to race conditions), so choosing a small bound such as 1 is likely to work as a heuristic for when we should assume that advertised inventory is not yet generally available.

Inventory Registry

The peer set's poll_ready implementation should extract all available (InventoryHash, SocketAddr) pairs from the channel, and log a warning event if the receiver is lagging. The channel should be configured with a generous buffer size (such as 100) so that this is unlikely to happen in normal circumstances. These pairs should be fed into an InventoryRegistry structure along these lines:

#![allow(unused)]
fn main() {
struct InventoryRegistry{
    current: HashMap<InventoryHash, HashSet<SocketAddr>>,
    prev: HashMap<InventoryHash, HashSet<SocketAddr>>,
}

impl InventoryRegistry {
    pub fn register(&mut self, item: InventoryHash, addr: SocketAddr) {
        self.0.entry(item).or_insert(HashSet::new).insert(addr);
    }

    pub fn rotate(&mut self) {
        self.prev = std::mem::take(self.current)
    }

    pub fn peers(&self, item: InventoryHash) -> impl Iterator<Item=&SocketAddr> {
        self.prev.get(item).chain(self.current.get(item)).flatten()
    }
}
}

This API allows pruning the inventory registry using rotate, which implements generational pruning of registry entries. The peer set should maintain a tokio::time::Interval with some interval parameter, and check in poll_ready whether the interval stream has any items, calling rotate for each one:

#![allow(unused)]
fn main() {
while let Poll::Ready(Some(_)) = timer.poll_next(cx) {
    registry.rotate();
}
}

By rotating for each available item in the interval stream, rather than just once, we ensure that if the peer set's poll_ready is not called for a long time, rotate will be called enough times to correctly flush old entries.

Inventory advertisements live in the registry for twice the length of the timer, so it should be chosen to be half of the desired lifetime for inventory advertisements. Setting the timer to 75 seconds, the block interval, seems like a reasonable choice.

Routing Logic

At this point, the peer set has information on recent inventory advertisements. However, the Service trait only allows poll_ready to report readiness based on the service's data and the type of the request, not the content of the request. This means that we must report readiness without knowing whether the request should be routed to a specific peer, and we must handle the case where call gets a request for an item only available at an unready peer.

This RFC suggests the following routing logic. First, check whether the request fetches data by hash. If so, and peers() returns Some(ref addrs), iterate over addrs and route the request to the first ready peer if there is one. In all other cases, fall back to p2c routing. Alternatives are suggested and discussed below.

Rationale and alternatives

The rationale is described above. The alternative choices are primarily around the routing logic.

Because the Service trait does not allow applying backpressure based on the content of a request, only based on the service's internal data (via the &mut self parameter of Service::poll_ready) and on the type of the request (which determines which impl Service is used). This means that it is impossible for us to apply backpressure until a service that can process a specific inventory request is ready, because until we get the request, we can't determine which peers might be required to process it.

We could attempt to ensure that the peer set would be ready to process a specific inventory request would be to pre-emptively "reserve" a peer as soon as it advertises an inventory item. But this doesn't actually work to ensure readiness, because a peer could advertise two inventory items, and only be able to service one request at a time. It also potentially locks the peer set, since if there are only a few peers and they all advertise inventory, the service can't process any other requests. So this approach does not work.

Another alternative would be to do some kind of buffering of inventory requests that cannot immediately be processed by a peer that advertised that inventory. There are two basic sub-approaches here.

In the first case, we could maintain an unbounded queue of yet-to-be processed inventory requests in the peer set, and every time poll_ready is called, we check whether a service that could serve those inventory requests became ready, and start processing the request if we can. This would provide the lowest latency, because we can dispatch the request to the first available peer. For instance, if peer A advertises inventory I, the peer set gets an inventory request for I, peer A is busy so the request is queued, and peer B advertises inventory I, we could dispatch the queued request to B rather than waiting for A.

However, it's not clear exactly how we'd implement this, because this mechanism is driven by calls to poll_ready, and those might not happen. So we'd need some separate task that would drive processing the buffered task to completion, but this may not be able to do so by poll_ready, since that method requires owning the service, and the peer set will be owned by a Buffer worker.

In the second case, we could select an unready peer that advertised the requested inventory, clone it, and move the cloned peer into a task that would wait for that peer to become ready and then make the request. This is conceptually much cleaner than the above mechanism, but it has the downside that we don't dispatch the request to the first ready peer. In the example above, if we cloned peer A and dispatched the request to it, we'd have to wait for A to become ready, even if the second peer B advertised the same inventory just after we dispatched the request to A. However, this is not presently possible anyways, because the peer::Clients that handle requests are not clonable. They could be made clonable (they send messages to the connection state machine over a mpsc channel), but we cannot make this change without altering our liveness mechanism, which uses bounds on the time-since-last-message to determine whether a peer connection is live and to prevent immediate reconnections to recently disconnected peers.

A final alternative would be to fail inventory requests that we cannot route to a peer which advertised that inventory. This moves the failure forward in time, but preemptively fails some cases where the request might succeed -- for instance, if the peer has inventory but just didn't tell us, or received the inventory between when we dispatch the request and when it receives our message. It seems preferable to try and fail than to not try at all.

In practice, we're likely to care about the gossip protocol and inventory fetching once we've already synced close to the chain tip. In this setting, we're likely to already have peer connections, and we're unlikely to be saturating our peer set with requests (as we do during initial block sync). This suggests that the common case is one where we have many idle peers, and that therefore we are unlikely to have dispatched any recent requests to the peer that advertised inventory. So our common case should be one where all of this analysis is irrelevant.

Summary

This RFC describes an architecture for asynchronous script verification and its interaction with the state layer. This architecture imposes constraints on the ordering of operations in the state layer.

Motivation

As in the rest of Zebra, we want to express our work as a collection of work-items with explicit dependencies, then execute these items concurrently and in parallel on a thread pool.

Definitions

  • UTXO: unspent transparent transaction output. Transparent transaction outputs are modeled in zebra-chain by the transparent::Output structure.
  • outpoint: a reference to an unspent transparent transaction output, including a transaction hash and output index. Outpoints are modeled in zebra-chain by the transparent::OutPoint structure.
  • transparent input: a previous transparent output consumed by a later transaction (the one it is an input to). Modeled in zebra-chain by the transparent::Input::PrevOut enum variant.
  • coinbase transaction: the first transaction in each block, which creates new coins.
  • lock script: the script that defines the conditions under which some UTXO can be spent. Stored in the transparent::Output::lock_script field.
  • unlock script: a script satisfying the conditions of the lock script, allowing a UTXO to be spent. Stored in the transparent::Input::PrevOut::lock_script field.

Guide-level explanation

Zcash's transparent address system is inherited from Bitcoin. Transactions spend unspent transparent transaction outputs (UTXOs) from previous transactions. These UTXOs are encumbered by locking scripts that define the conditions under which they can be spent, e.g., requiring a signature from a certain key. Transactions wishing to spend UTXOs supply an unlocking script that should satisfy the conditions of the locking script for each input they wish to spend.

This means that script verification requires access to data about previous UTXOs, in order to determine the conditions under which those UTXOs can be spent. In Zebra, we aim to run operations asynchronously and out-of-order to the greatest extent possible. For instance, we may begin verification of a block before all of its ancestors have been verified or even downloaded. So we need to design a mechanism that allows script verification to declare its data dependencies and execute as soon as all required data is available.

It's not necessary for this mechanism to ensure that the transaction outputs remain unspent, only to give enough information to perform script verification. Checking that all transaction inputs are actually unspent is done later, at the point that its containing block is committed to the chain.

At a high level, this adds a new request/response pair to the state service:

  • Request::AwaitSpendableUtxo { output: OutPoint, ..conditions } requests a spendable transparent::Output, looked up using OutPoint.
  • Response::SpendableUtxo(Utxo) supplies the requested transparent::Output as part of a new Utxo type, if the output is spendable based on conditions;

Note that this request is named differently from the other requests, AwaitSpendableUtxo rather than GetUtxo or similar. This is because the request has rather different behavior:

  • the request does not complete until the state service learns about a UTXO matching the request, which could be never. For instance, if the transaction output was already spent, the service is not required to return a response.
  • the request does not complete until the output is spendable, based on the conditions in the request.

The state service does not cancel long-running UTXO requests. Instead, the caller is responsible for deciding when a request is unlikely to complete. (For example, using a timeout layer.)

This allows a script verifier to asynchronously obtain information about previous transaction outputs and start verifying scripts as soon as the data is available. For instance, if we begin parallel download and verification of 500 blocks, we should be able to begin script verification of all scripts referencing outputs from existing blocks in parallel, and begin verification of scripts referencing outputs from new blocks as soon as they are committed to the chain.

Because spending outputs from older blocks is more common than spending outputs from recent blocks, this should allow a significant amount of parallelism.

Reference-level explanation

Data structures

We add the following request and response to the state protocol:

#![allow(unused)]
fn main() {
enum Request::AwaitSpendableUtxo {
    outpoint: OutPoint,
    spend_height: Height,
    spend_restriction: SpendRestriction,
}

/// Consensus rule:
/// "A transaction with one or more transparent inputs from coinbase transactions
/// MUST have no transparent outputs (i.e.tx_out_count MUST be 0)."
enum SpendRestriction {
    /// The UTXO is spent in a transaction with transparent outputs
    SomeTransparentOutputs,
    /// The UTXO is spent in a transaction with all shielded outputs
    AllShieldedOutputs,
}
}

As described above, the request name is intended to indicate the request's behavior. The request does not resolve until:

  • the state layer learns of a UTXO described by the request, and
  • the output is spendable at height with spend_restriction.

The new Utxo type adds a coinbase flag and height to transparent::Outputs that we look up in the state, or get from newly committed blocks:

#![allow(unused)]
fn main() {
enum Response::SpendableUtxo(Utxo)

pub struct Utxo {
    /// The output itself.
    pub output: transparent::Output,

    /// The height at which the output was created.
    pub height: block::Height,

    /// Whether the output originated in a coinbase transaction.
    pub from_coinbase: bool,
}
}

Transparent coinbase consensus rules

Specifically, if the UTXO is a transparent coinbase output, the service is not required to return a response if:

  • spend_height is less than MIN_TRANSPARENT_COINBASE_MATURITY (100) blocks after the Utxo.height, or
  • spend_restriction is SomeTransparentOutputs.

This implements the following consensus rules:

A transaction MUST NOT spend a transparent output of a coinbase transaction from a block less than 100 blocks prior to the spend.

Note that transparent outputs of coinbase transactions include Founders’ Reward outputs and transparent funding stream outputs.

A transaction with one or more transparent inputs from coinbase transactions MUST have no transparent outputs (i.e.tx_out_count MUST be 0).

Inputs from coinbase transactions include Founders’ Reward outputs and funding stream outputs.

https://zips.z.cash/protocol/protocol.pdf#txnencodingandconsensus

Parallel coinbase checks

We can perform these coinbase checks asynchronously, in the presence of multiple chain forks, as long as the following conditions both hold:

  1. We don't mistakenly accept or reject spends to the transparent pool.

  2. We don't mistakenly accept or reject mature spends.

Parallel coinbase justification

There are two parts to a spend restriction:

  • the from_coinbase flag, and
  • if the from_coinbase flag is true, the coinbase height.

If a particular transaction hash h always has the same from_coinbase value, and h exists in multiple chains, then regardless of which Utxo arrives first, the outputs of h always get the same from_coinbase value during validation. So spends can not be mistakenly accepted or rejected due to a different coinbase flag.

Similarly, if a particular coinbase transaction hash h always has the same height value, and h exists in multiple chains, then regardless of which Utxo arrives first, the outputs of h always get the same height value during validation. So coinbase spends can not be mistakenly accepted or rejected due to a different height value. (The heights of non-coinbase outputs are irrelevant, because they are never checked.)

These conditions hold as long as the following multi-chain properties are satisfied:

  • from_coinbase: across all chains, the set of coinbase transaction hashes is disjoint from the set of non-coinbase transaction hashes, and
  • coinbase height: across all chains, duplicate coinbase transaction hashes can only occur at exactly the same height.

Parallel coinbase consensus rules

These multi-chain properties can be derived from the following consensus rules:

Transaction versions 1-4:

[Pre-Sapling ] If effectiveVersion = 1 or nJoinSplit = 0, then both tx_in_count and tx_out_count MUST be nonzero. ... [Sapling onward] If effectiveVersion < 5, then at least one of tx_in_count, nSpendsSapling, and nJoinSplit MUST be nonzero.

A coinbase transaction for a block at block height greater than 0 MUST have a script that, as its first item, encodes the block height height as follows.

For height in the range {1 .. 16}, the encoding is a single byte of value 0x50 + height.

Otherwise, let heightBytes be the signed little-endian representation of height, using the minimum nonzero number of bytes such that the most significant byte is < 0x80. The length of heightBytes MUST be in the range {1 .. 8}. Then the encoding is the length of heightBytes encoded as one byte, followed by heightBytes itself.

https://zips.z.cash/protocol/protocol.pdf#txnencodingandconsensus

The transaction ID of a version 4 or earlier transaction is the SHA-256d hash of the transaction encoding in the pre-v5 format described above.

https://zips.z.cash/protocol/protocol.pdf#txnidentifiers

Transaction version 5:

[NU5 onward] If effectiveVersion ≥ 5, then this condition must hold: tx_in_count > 0 or nSpendsSapling > 0 or (nActionsOrchard > 0 and enableSpendsOrchard = 1). ... [NU5 onward] The nExpiryHeight field of a coinbase transaction MUST be equal to its block height.

https://zips.z.cash/protocol/protocol.pdf#txnencodingandconsensus

non-malleable transaction identifiers ... commit to all transaction data except for attestations to transaction validity ... A new transaction digest algorithm is defined that constructs the identifier for a transaction from a tree of hashes ... A BLAKE2b-256 hash of the following values: ... T.1e: expiry_height (4-byte little-endian block height)

https://zips.z.cash/zip-0244#t-1-header-digest

Since:

  • coinbase transaction hashes commit to the block Height,
  • non-coinbase transaction hashes commit to their inputs, and
  • double-spends are not allowed;

Therefore:

  • coinbase transaction hashes are unique for distinct heights in any chain,
  • coinbase transaction hashes are unique in a single chain, and
  • non-coinbase transaction hashes are unique in a single chain, because they recursively commit to unique inputs.

So the required parallel verification conditions are satisfied.

Script verification

To verify scripts, a script verifier requests the relevant UTXOs from the state service and waits for all of them to resolve, or fails verification with a timeout error. Currently, we outsource script verification to zcash_consensus, which does FFI into the same C++ code as zcashd uses. We need to ensure this code is thread-safe.

Database implementation

Implementing the state request correctly requires considering two sets of behaviors:

  1. behaviors related to the state's external API (a Buffered tower::Service);
  2. behaviors related to the state's internal implementation (using rocksdb).

Making this distinction helps us to ensure we don't accidentally leak "internal" behaviors into "external" behaviors, which would violate encapsulation and make it more difficult to replace rocksdb.

In the first category, our state is presented to the rest of the application as a Buffered tower::Service. The Buffer wrapper allows shared access to a service using an actor model, moving the service to be shared into a worker task and passing messages to it over an multi-producer single-consumer (mpsc) channel. The worker task receives messages and makes Service::calls. The Service::call method returns a Future, and the service is allowed to decide how much work it wants to do synchronously (in call) and how much work it wants to do asynchronously (in the Future it returns).

This means that our external API ensures that the state service sees a linearized sequence of state requests, although the exact ordering is unpredictable when there are multiple senders making requests.

Because the state service has exclusive access to the rocksdb database, and the state service sees a linearized sequence of state requests, we have an easy way to opt in to asynchronous database access. We can perform rocksdb operations synchronously in the Service::call, waiting for them to complete, and be sure that all future requests will see the resulting rocksdb state. Or, we can perform rocksdb operations asynchronously in the future returned by Service::call.

If we perform all writes synchronously and allow reads to be either synchronous or asynchronous, we ensure that writes cannot race each other. Asynchronous reads are guaranteed to read at least the state present at the time the request was processed, or a later state.

Lookup states

Now, returning to the UTXO lookup problem, we can map out the possible states with this restriction in mind. This description assumes that UTXO storage is split into disjoint sets, one in-memory (e.g., blocks after the reorg limit) and the other in rocksdb (e.g., blocks after the reorg limit). The details of this storage are not important for this design, only that the two sets are disjoint.

When the state service processes a Request::AwaitSpendableUtxo referencing some UTXO u, there are three disjoint possibilities:

  1. u is already contained in an in-memory block storage;
  2. u is already contained in the rocksdb UTXO set;
  3. u is not yet known to the state service.

In case 3, we need to queue u and scan all future blocks to see whether they contain u. However, if we have a mechanism to queue u, we can perform check 2 asynchronously, because restricting to synchronous writes means that any async read will return the current or later state. If u was in the rocksdb UTXO set when the request was processed, the only way that an async read would not return u is if the UTXO were spent, in which case the service is not required to return a response.

Lookup implementation

This behavior can be encapsulated into a PendingUtxos structure described below.

#![allow(unused)]
fn main() {
// sketch
#[derive(Default, Debug)]
struct PendingUtxos(HashMap<OutPoint, oneshot::Sender<Utxo>>);

impl PendingUtxos {
    // adds the outpoint and returns (wrapped) rx end of oneshot
    // checks the spend height and restriction before sending the utxo response
    // return can be converted to `Service::Future`
    pub fn queue(
        &mut self,
        outpoint: OutPoint,
        spend_height: Height,
        spend_restriction: SpendRestriction,
    ) -> impl Future<Output=Result<Response, ...>>;

    // if outpoint is a hashmap key, remove the entry and send output on the channel
    pub fn respond(&mut self, outpoint: OutPoint, output: transparent::Output);

    /// check the list of pending UTXO requests against the supplied `utxos`
    pub fn check_against(&mut self, utxos: &HashMap<transparent::OutPoint, Utxo>);

    // scans the hashmap and removes any entries with closed senders
    pub fn prune(&mut self);
}
}

The state service should maintain an Arc<Mutex<PendingUtxos>>, used as follows:

  1. In Service::call(Request::AwaitSpendableUtxo { outpoint: u, .. }, the service should:
  • call PendingUtxos::queue(u) to get a future f to return to the caller;
  • spawn a task that does a rocksdb lookup for u, calling PendingUtxos::respond(u, output) if present;
  • check the in-memory storage for u, calling PendingUtxos::respond(u, output) if present;
  • return f to the caller (it may already be ready). The common case is that u references an old spendable UTXO, so spawning the lookup task first means that we don't wait to check in-memory storage for u before starting the rocksdb lookup.
  1. In f, the future returned by PendingUtxos::queue(u), the service should check that the Utxo is spendable before returning it:
  • if Utxo.from_coinbase is false, return the utxo;
  • if Utxo.from_coinbase is true, check that:
    • spend_restriction is AllShieldedOutputs, and
    • spend_height is greater than or equal to MIN_TRANSPARENT_COINBASE_MATURITY plus the Utxo.height,
    • if both checks pass, return the utxo.
    • if any check fails, drop the utxo, and let the request timeout.
  1. In Service::call(Request::CommitBlock(block, ..)), the service should:
  • check for double-spends of each UTXO in the block, and
  • do any other transactional checks before committing a block as normal. Because the AwaitSpendableUtxo request is informational, there's no need to do the transactional checks before matching against pending UTXO requests, and doing so upfront can run expensive verification earlier than needed.
  1. In Service::poll_ready(), the service should call PendingUtxos::prune() at least some of the time. This is required because when a consumer uses a timeout layer, the cancelled requests should be flushed from the queue to avoid a resource leak. However, doing this on every call will result in us spending a bunch of time iterating over the hashmap.

Drawbacks

One drawback of this design is that we may have to wait on a lock. However, the critical section basically amounts to a hash lookup and a channel send, so I don't think that we're likely to run into problems with long contended periods, and it's unlikely that we would get a deadlock.

Rationale and alternatives

High-level design rationale is inline with the design sketch. One low-level option would be to avoid encapsulating behavior in the PendingUtxos and just have an Arc<Hashmap<..>>, so that the lock only protects the hashmap lookup and not sending through the channel. But I think the current design is cleaner and the cost is probably not too large.

Unresolved questions

  • We need to pick a timeout for UTXO lookup. This should be long enough to account for the fact that we may start verifying blocks before all of their ancestors are downloaded.

State Updates

  • Feature Name: state_updates
  • Start Date: 2020-08-14
  • Design PR: https://github.com/ZcashFoundation/zebra/pull/902
  • Zebra Issue: https://github.com/ZcashFoundation/zebra/issues/1049

Summary

Zebra manages chain state in the zebra-state crate, which allows state queries via asynchronous RPC (in the form of a Tower service). The state system is responsible for contextual verification in the sense of RFC2, checking that new blocks are consistent with the existing chain state before committing them. This RFC describes how the state is represented internally, and how state updates are performed.

Motivation

We need to be able to access and modify the chain state, and we want to have a description of how this happens and what guarantees are provided by the state service.

Definitions

  • state data: Any data the state service uses to represent chain state.

  • structural/semantic/contextual verification: as defined in RFC2.

  • block chain: A sequence of valid blocks linked by inclusion of the previous block hash in the subsequent block. Chains are rooted at the genesis block and extend to a tip.

  • chain state: The state of the ledger after application of a particular sequence of blocks (state transitions).

  • block work: The approximate amount of work required for a miner to generate a block hash that passes the difficulty filter. The number of block header attempts and the mining time are proportional to the work value. Numerically higher work values represent longer processing times.

  • cumulative work: The sum of the block work of all blocks in a chain, from genesis to the chain tip.

  • best chain: The chain with the greatest cumulative work. This chain represents the consensus state of the Zcash network and transactions.

  • side chain: A chain which is not contained in the best chain. Side chains are pruned at the reorg limit, when they are no longer connected to the finalized state.

  • chain reorganization: Occurs when a new best chain is found and the previous best chain becomes a side chain.

  • reorg limit: The longest reorganization accepted by zcashd, 100 blocks.

  • orphaned block: A block which is no longer included in the best chain.

  • non-finalized state: State data corresponding to blocks above the reorg limit. This data can change in the event of a chain reorg.

  • finalized state: State data corresponding to blocks below the reorg limit. This data cannot change in the event of a chain reorg.

  • non-finalized tips: The highest blocks in each non-finalized chain. These tips might be at different heights.

  • finalized tip: The highest block in the finalized state. The tip of the best chain is usually 100 blocks (the reorg limit) above the finalized tip. But it can be lower during the initial sync, and after a chain reorganization, if the new best chain is at a lower height.

  • relevant chain: The relevant chain for a block starts at the previous block, and extends back to genesis.

  • relevant tip: The tip of the relevant chain.

Guide-level explanation

The zebra-state crate provides an implementation of the chain state storage logic in a Zcash consensus node. Its main responsibility is to store chain state, validating new blocks against the existing chain state in the process, and to allow later querying of said chain state. zebra-state provides this interface via a tower::Service based on the actor model with a request/response interface for passing messages back and forth between the state service and the rest of the application.

The main entry point for the zebra-state crate is the init function. This function takes a zebra_state::Config and constructs a new state service, which it returns wrapped by a tower::Buffer. This service is then interacted with via the tower::Service trait.

#![allow(unused)]
fn main() {
use tower::{Service, ServiceExt};

let state = zebra_state::on_disk::init(state_config, network);
let request = zebra_state::Request::BlockLocator;
let response = state.ready_and().await?.call(request).await?;

assert!(matches!(response, zebra_state::Response::BlockLocator(_)));
}

Note: The tower::Service API requires that ready is always called exactly once before each call. It is up to users of the zebra state service to uphold this contract.

The tower::Buffer wrapper is Cloneable, allowing shared access to a common state service. This allows different tasks to share access to the chain state.

The set of operations supported by zebra-state are encoded in its Request enum. This enum has one variant for each supported operation.

#![allow(unused)]
fn main() {
pub enum Request {
    CommitBlock {
        block: Arc<Block>,
    },
    CommitFinalizedBlock {
        block: Arc<Block>,
    },
    Depth(Hash),
    Tip,
    BlockLocator,
    Transaction(Hash),
    Block(HashOrHeight),

    // .. some variants omitted
}
}

zebra-state breaks down its requests into two categories and provides different guarantees for each category: requests that modify the state, and requests that do not. Requests that update the state are guaranteed to run sequentially and will never race against each other. Requests that read state are done asynchronously and are guaranteed to read at least the state present at the time the request was processed by the service, or a later state present at the time the request future is executed. The state service avoids race conditions between the read state and the written state by doing all contextual verification internally.

Reference-level explanation

State Components

Zcash (as implemented by zcashd) differs from Bitcoin in its treatment of transaction finality. If a new best chain is detected that does not extend the previous best chain, blocks at the end of the previous best chain become orphaned (no longer included in the best chain). Their state updates are therefore no longer included in the best chain's chain state. The process of rolling back orphaned blocks and applying new blocks is called a chain reorganization. Bitcoin allows chain reorganizations of arbitrary depth, while zcashd limits chain reorganizations to 100 blocks. (In zcashd, the new best chain must be a side-chain that forked within 100 blocks of the tip of the current best chain.)

This difference means that in Bitcoin, chain state only has probabilistic finality, while in Zcash, chain state is final once it is beyond the reorg limit. To simplify our implementation, we split the representation of the state data at the finality boundary provided by the reorg limit.

State data from blocks above the reorg limit (non-finalized state) is stored in-memory and handles multiple chains. State data from blocks below the reorg limit (finalized state) is stored persistently using rocksdb and only tracks a single chain. This allows a simplification of our state handling, because only finalized data is persistent and the logic for finalized data handles less invariants.

One downside of this design is that restarting the node loses the last 100 blocks, but node restarts are relatively infrequent and a short re-sync is cheap relative to the cost of additional implementation complexity.

Another downside of this design is that we do not achieve exactly the same behavior as zcashd in the event of a 51% attack: zcashd limits each chain reorganization to 100 blocks, but permits multiple reorgs, while Zebra limits all chain reorgs to 100 blocks. In the event of a successful 51% attack on Zcash, this could be resolved by wiping the rocksdb state and re-syncing the new chain, but in this scenario there are worse problems.

Service Interface

The state is accessed asynchronously through a Tower service interface. Determining what guarantees the state service can and should provide to the rest of the application requires considering two sets of behaviors:

  1. behaviors related to the state's external API (a Buffered tower::Service);
  2. behaviors related to the state's internal implementation (using rocksdb).

Making this distinction helps us to ensure we don't accidentally leak "internal" behaviors into "external" behaviors, which would violate encapsulation and make it more difficult to replace rocksdb.

In the first category, our state is presented to the rest of the application as a Buffered tower::Service. The Buffer wrapper allows shared access to a service using an actor model, moving the service to be shared into a worker task and passing messages to it over an multi-producer single-consumer (mpsc) channel. The worker task receives messages and makes Service::calls. The Service::call method returns a Future, and the service is allowed to decide how much work it wants to do synchronously (in call) and how much work it wants to do asynchronously (in the Future it returns).

This means that our external API ensures that the state service sees a linearized sequence of state requests, although the exact ordering is unpredictable when there are multiple senders making requests.

Because the state service has exclusive access to the rocksdb database, and the state service sees a linearized sequence of state requests, we have an easy way to opt in to asynchronous database access. We can perform rocksdb operations synchronously in the Service::call, waiting for them to complete, and be sure that all future requests will see the resulting rocksdb state. Or, we can perform rocksdb operations asynchronously in the future returned by Service::call.

If we perform all writes synchronously and allow reads to be either synchronous or asynchronous, we ensure that writes cannot race each other. Asynchronous reads are guaranteed to read at least the state present at the time the request was processed, or a later state.

Summary

  • rocksdb reads may be done synchronously (in call) or asynchronously (in the Future), depending on the context;

  • rocksdb writes must be done synchronously (in call)

In-memory data structures

At a high level, the in-memory data structures store a collection of chains, each rooted at the highest finalized block. Each chain consists of a map from heights to blocks. Chains are stored using an ordered map from cumulative work to chains, so that the map ordering is the ordering of worst to best chains.

The Chain type

The Chain type represents a chain of blocks. Each block represents an incremental state update, and the Chain type caches the cumulative state update from its root to its tip.

The Chain type is used to represent the non-finalized portion of a complete chain of blocks rooted at the genesis block. The parent block of the root of a Chain is the tip of the finalized portion of the chain. As an exception, the finalized portion of the chain is initially empty, until the genesis block has been finalized.

The Chain type supports several operations to manipulate chains, push, pop_root, and fork. push is the most fundamental operation and handles contextual validation of chains as they are extended. pop_root is provided for finalization, and is how we move blocks from the non-finalized portion of the state to the finalized portion. fork on the other hand handles creating new chains for push when new blocks arrive whose parent isn't a tip of an existing chain.

Note: The Chain type's API is only designed to handle non-finalized data. The genesis block and all pre canopy blocks are always considered to be finalized blocks and should not be handled via the Chain type through CommitBlock. They should instead be committed directly to the finalized state with CommitFinalizedBlock. This is particularly important with the genesis block since the Chain will panic if used while the finalized state is completely empty.

The Chain type is defined by the following struct and API:

#![allow(unused)]
fn main() {
#[derive(Debug, Clone)]
pub struct Chain {
    // The function `eq_internal_state` must be updated every time a field is added to [`Chain`].
    /// The configured network for this chain.
    network: Network,

    /// The contextually valid blocks which form this non-finalized partial chain, in height order.
    pub(crate) blocks: BTreeMap<block::Height, ContextuallyValidBlock>,

    /// An index of block heights for each block hash in `blocks`.
    pub height_by_hash: HashMap<block::Hash, block::Height>,

    /// An index of [`TransactionLocation`]s for each transaction hash in `blocks`.
    pub tx_loc_by_hash: HashMap<transaction::Hash, TransactionLocation>,

    /// The [`transparent::Utxo`]s created by `blocks`.
    ///
    /// Note that these UTXOs may not be unspent.
    /// Outputs can be spent by later transactions or blocks in the chain.
    //
    // TODO: replace OutPoint with OutputLocation?
    pub(crate) created_utxos: HashMap<transparent::OutPoint, transparent::OrderedUtxo>,
    /// The [`transparent::OutPoint`]s spent by `blocks`,
    /// including those created by earlier transactions or blocks in the chain.
    pub(crate) spent_utxos: HashSet<transparent::OutPoint>,

    /// The Sprout note commitment tree of the tip of this [`Chain`],
    /// including all finalized notes, and the non-finalized notes in this chain.
    pub(super) sprout_note_commitment_tree: sprout::tree::NoteCommitmentTree,
    /// The Sprout note commitment tree for each anchor.
    /// This is required for interstitial states.
    pub(crate) sprout_trees_by_anchor:
        HashMap<sprout::tree::Root, sprout::tree::NoteCommitmentTree>,
    /// The Sapling note commitment tree of the tip of this [`Chain`],
    /// including all finalized notes, and the non-finalized notes in this chain.
    pub(super) sapling_note_commitment_tree: sapling::tree::NoteCommitmentTree,
    /// The Sapling note commitment tree for each height.
    pub(crate) sapling_trees_by_height: BTreeMap<block::Height, sapling::tree::NoteCommitmentTree>,
    /// The Orchard note commitment tree of the tip of this [`Chain`],
    /// including all finalized notes, and the non-finalized notes in this chain.
    pub(super) orchard_note_commitment_tree: orchard::tree::NoteCommitmentTree,
    /// The Orchard note commitment tree for each height.
    pub(crate) orchard_trees_by_height: BTreeMap<block::Height, orchard::tree::NoteCommitmentTree>,
    /// The ZIP-221 history tree of the tip of this [`Chain`],
    /// including all finalized blocks, and the non-finalized `blocks` in this chain.
    pub(crate) history_tree: HistoryTree,

    /// The Sprout anchors created by `blocks`.
    pub(crate) sprout_anchors: MultiSet<sprout::tree::Root>,
    /// The Sprout anchors created by each block in `blocks`.
    pub(crate) sprout_anchors_by_height: BTreeMap<block::Height, sprout::tree::Root>,
    /// The Sapling anchors created by `blocks`.
    pub(crate) sapling_anchors: MultiSet<sapling::tree::Root>,
    /// The Sapling anchors created by each block in `blocks`.
    pub(crate) sapling_anchors_by_height: BTreeMap<block::Height, sapling::tree::Root>,
    /// The Orchard anchors created by `blocks`.
    pub(crate) orchard_anchors: MultiSet<orchard::tree::Root>,
    /// The Orchard anchors created by each block in `blocks`.
    pub(crate) orchard_anchors_by_height: BTreeMap<block::Height, orchard::tree::Root>,

    /// The Sprout nullifiers revealed by `blocks`.
    pub(super) sprout_nullifiers: HashSet<sprout::Nullifier>,
    /// The Sapling nullifiers revealed by `blocks`.
    pub(super) sapling_nullifiers: HashSet<sapling::Nullifier>,
    /// The Orchard nullifiers revealed by `blocks`.
    pub(super) orchard_nullifiers: HashSet<orchard::Nullifier>,

    /// Partial transparent address index data from `blocks`.
    pub(super) partial_transparent_transfers: HashMap<transparent::Address, TransparentTransfers>,

    /// The cumulative work represented by `blocks`.
    ///
    /// Since the best chain is determined by the largest cumulative work,
    /// the work represented by finalized blocks can be ignored,
    /// because they are common to all non-finalized chains.
    pub(super) partial_cumulative_work: PartialCumulativeWork,

    /// The chain value pool balances of the tip of this [`Chain`],
    /// including the block value pool changes from all finalized blocks,
    /// and the non-finalized blocks in this chain.
    ///
    /// When a new chain is created from the finalized tip,
    /// it is initialized with the finalized tip chain value pool balances.
    pub(crate) chain_value_pools: ValueBalance<NonNegative>,
}
}

pub fn push(&mut self, block: Arc<Block>)

Push a block into a chain as the new tip

  1. Update cumulative data members

    • Add the block's hash to height_by_hash
    • Add work to self.partial_cumulative_work
    • For each transaction in block
      • Add key: transaction.hash and value: (height, tx_index) to tx_loc_by_hash
      • Add created utxos to self.created_utxos
      • Add spent utxos to self.spent_utxos
      • Add nullifiers to the appropriate self.<version>_nullifiers
  2. Add block to self.blocks

pub fn pop_root(&mut self) -> Arc<Block>

Remove the lowest height block of the non-finalized portion of a chain.

  1. Remove the lowest height block from self.blocks

  2. Update cumulative data members

    • Remove the block's hash from self.height_by_hash
    • Subtract work from self.partial_cumulative_work
    • For each transaction in block
      • Remove transaction.hash from tx_loc_by_hash
      • Remove created utxos from self.created_utxos
      • Remove spent utxos from self.spent_utxos
      • Remove the nullifiers from the appropriate self.<version>_nullifiers
  3. Return the block

pub fn fork(&self, new_tip: block::Hash) -> Option<Self>

Fork a chain at the block with the given hash, if it is part of this chain.

  1. If self does not contain new_tip return None

  2. Clone self as forked

  3. While the tip of forked is not equal to new_tip

    • call forked.pop_tip() and discard the old tip
  4. Return forked

fn pop_tip(&mut self)

Remove the highest height block of the non-finalized portion of a chain.

  1. Remove the highest height block from self.blocks

  2. Update cumulative data members

    • Remove the corresponding hash from self.height_by_hash
    • Subtract work from self.partial_cumulative_work
    • for each transaction in block
      • remove transaction.hash from tx_loc_by_hash
      • Remove created utxos from self.created_utxos
      • Remove spent utxos from self.spent_utxos
      • Remove the nullifiers from the appropriate self.<version>_nullifiers

Ord

The Chain type implements Ord for reorganizing chains. First chains are compared by their partial_cumulative_work. Ties are then broken by comparing block::Hashes of the tips of each chain. (This tie-breaker means that all Chains in the NonFinalizedState must have at least one block.)

Note: Unlike zcashd, Zebra does not use block arrival times as a tie-breaker for the best tip. Since Zebra downloads blocks in parallel, download times are not guaranteed to be unique. Using the block::Hash provides a consistent tip order. (As a side-effect, the tip order is also consistent after a node restart, and between nodes.)

Default

The Chain type implements Default for constructing new chains whose parent block is the tip of the finalized state. This implementation should be handled by #[derive(Default)].

  1. initialise cumulative data members
    • Construct an empty self.blocks, height_by_hash, tx_loc_by_hash, self.created_utxos, self.spent_utxos, self.<version>_anchors, self.<version>_nullifiers
    • Zero self.partial_cumulative_work

Note: The ChainState can be empty after a restart, because the non-finalized state is empty.

NonFinalizedState Type

The NonFinalizedState type represents the set of all non-finalized state. It consists of a set of non-finalized but verified chains and a set of unverified blocks which are waiting for the full context needed to verify them to become available.

NonFinalizedState is defined by the following structure and API:

#![allow(unused)]
fn main() {
/// The state of the chains in memory, including queued blocks.
#[derive(Debug, Default)]
pub struct NonFinalizedState {
    /// Verified, non-finalized chains.
    chain_set: BTreeSet<Chain>,
    /// Blocks awaiting their parent blocks for contextual verification.
    contextual_queue: QueuedBlocks,
}
}

pub fn finalize(&mut self) -> Arc<Block>

Finalize the lowest height block in the non-finalized portion of the best chain and updates all side chains to match.

  1. Extract the best chain from self.chain_set into best_chain

  2. Extract the rest of the chains into a side_chains temporary variable, so they can be mutated

  3. Remove the lowest height block from the best chain with let finalized_block = best_chain.pop_root();

  4. Add best_chain back to self.chain_set if best_chain is not empty

  5. For each remaining chain in side_chains

    • remove the lowest height block from chain
    • If that block is equal to finalized_block and chain is not empty add chain back to self.chain_set
    • Else, drop chain
  6. Return finalized_block

fn commit_block(&mut self, block: Arc<Block>)

Commit block to the non-finalized state.

  1. If the block is a pre-Canopy block, or the canopy activation block, panic.

  2. If any chains tip hash equal block.header.previous_block_hash remove that chain from self.chain_set

  3. Else Find the first chain that contains block.parent and fork it with block.parent as the new tip

    • let fork = self.chain_set.iter().find_map(|chain| chain.fork(block.parent));
  4. Else panic, this should be unreachable because commit_block is only called when block is ready to be committed.

  5. Push block into parent_chain

  6. Insert parent_chain into self.chain_set

pub(super) fn commit_new_chain(&mut self, block: Arc<Block>)

Construct a new chain starting with block.

  1. Construct a new empty chain

  2. push block into that new chain

  3. Insert the new chain into self.chain_set

The QueuedBlocks type

The queued blocks type represents the non-finalized blocks that were committed before their parent blocks were. It is responsible for tracking which blocks are queued by their parent so they can be committed immediately after the parent is committed. It also tracks blocks by their height so they can be discarded if they ever end up below the reorg limit.

NonFinalizedState is defined by the following structure and API:

#![allow(unused)]
fn main() {
/// A queue of blocks, awaiting the arrival of parent blocks.
#[derive(Debug, Default)]
struct QueuedBlocks {
    /// Blocks awaiting their parent blocks for contextual verification.
    blocks: HashMap<block::Hash, QueuedBlock>,
    /// Hashes from `queued_blocks`, indexed by parent hash.
    by_parent: HashMap<block::Hash, Vec<block::Hash>>,
    /// Hashes from `queued_blocks`, indexed by block height.
    by_height: BTreeMap<block::Height, Vec<block::Hash>>,
}
}

pub fn queue(&mut self, new: QueuedBlock)

Add a block to the queue of blocks waiting for their requisite context to become available.

  1. extract the parent_hash, new_hash, and new_height from new.block

  2. Add new to self.blocks using new_hash as the key

  3. Add new_hash to the set of hashes in self.by_parent.entry(parent_hash).or_default()

  4. Add new_hash to the set of hashes in self.by_height.entry(new_height).or_default()

pub fn dequeue_children(&mut self, parent: block::Hash) -> Vec<QueuedBlock>

Dequeue the set of blocks waiting on parent.

  1. Remove the set of hashes waiting on parent from self.by_parent

  2. Remove and collect each block in that set of hashes from self.blocks as queued_children

  3. For each block in queued_children remove the associated block.hash from self.by_height

  4. Return queued_children

pub fn prune_by_height(&mut self, finalized_height: block::Height)

Prune all queued blocks whose height are less than or equal to finalized_height.

  1. Split the by_height list at the finalized height, removing all heights that are below finalized_height

  2. for each hash in the removed values of by_height

    • remove the corresponding block from self.blocks
    • remove the block's hash from the list of blocks waiting on block.header.previous_block_hash from self.by_parent

Summary

  • Chain represents the non-finalized portion of a single chain

  • NonFinalizedState represents the non-finalized portion of all chains

  • QueuedBlocks represents all unverified blocks that are waiting for context to be available.

The state service uses the following entry points:

  • commit_block when it receives new blocks.

  • finalize to prevent chains in NonFinalizedState from growing beyond the reorg limit.

  • FinalizedState.queue_and_commit_finalized_blocks on the blocks returned by finalize, to commit those finalized blocks to disk.

Committing non-finalized blocks

New non-finalized blocks are committed as follows:

pub(super) fn queue_and_commit_non_finalized_blocks(&mut self, new: Arc<Block>) -> tokio::sync::oneshot::Receiver<block::Hash>

  1. If a duplicate block hash exists in a non-finalized chain, or the finalized chain, it has already been successfully verified:

    • create a new oneshot channel
    • immediately send Err(DuplicateBlockHash) drop the sender
    • return the receiver
  2. If a duplicate block hash exists in the queue:

    • Find the QueuedBlock for that existing duplicate block
    • create a new channel for the new request
    • replace the old sender in queued_block with the new sender
    • send Err(DuplicateBlockHash) through the old sender channel
    • continue to use the new receiver
  3. Else create a QueuedBlock for block:

    • Create a tokio::sync::oneshot channel
    • Use that channel to create a QueuedBlock for block
    • Add block to self.queued_blocks
    • continue to use the new receiver
  4. If block.header.previous_block_hash is not present in the finalized or non-finalized state:

    • Return the receiver for the block's channel
  5. Else iteratively attempt to process queued blocks by their parent hash starting with block.header.previous_block_hash

  6. While there are recently committed parent hashes to process

    • Dequeue all blocks waiting on parent with let queued_children = self.queued_blocks.dequeue_children(parent);
    • for each queued block
      • Run contextual validation on block
        • contextual validation should check that the block height is equal to the previous block height plus 1. This check will reject blocks with invalid heights.
      • If the block fails contextual validation send the result to the associated channel
      • Else if the block's previous hash is the finalized tip add to the non-finalized state with self.mem.commit_new_chain(block)
      • Else add the new block to an existing non-finalized chain or new fork with self.mem.commit_block(block);
      • Send Ok(hash) over the associated channel to indicate the block was successfully committed
      • Add block.hash to the set of recently committed parent hashes to process
  7. While the length of the non-finalized portion of the best chain is greater than the reorg limit

    • Remove the lowest height block from the non-finalized state with self.mem.finalize();
    • Commit that block to the finalized state with self.disk.commit_finalized_direct(finalized);
  8. Prune orphaned blocks from self.queued_blocks with self.queued_blocks.prune_by_height(finalized_height);

  9. Return the receiver for the block's channel

rocksdb data structures

The current database format is documented in Upgrading the State Database.

Committing finalized blocks

If the parent block is not committed, add the block to an internal queue for future processing. Otherwise, commit the block described below, then commit any queued children. (Although the checkpointer generates verified blocks in order when it completes a checkpoint, the blocks are committed in the response futures, so they may arrive out of order).

Committing a block to the rocksdb state should be implemented as a wrapper around a function also called by Request::CommitBlock, which should:

pub(super) fn queue_and_commit_finalized_blocks(&mut self, queued_block: QueuedBlock)

  1. Obtain the highest entry of hash_by_height as (old_height, old_tip). Check that block's parent hash is old_tip and its height is old_height+1, or panic. This check is performed as defense-in-depth to prevent database corruption, but it is the caller's responsibility (e.g. the zebra-state service's responsibility) to commit finalized blocks in order.

The genesis block does not have a parent block. For genesis blocks, check that block's parent hash is null (all zeroes) and its height is 0.

  1. Insert the block and transaction data into the relevant column families.

  2. If the block is a genesis block, skip any transaction updates.

    (Due to a bug in zcashd, genesis block anchors and transactions are ignored during validation.)

  3. Update the block anchors, history tree, and chain value pools.

  4. Iterate over the enumerated transactions in the block. For each transaction, update the relevant column families.

Note: The Sprout and Sapling anchors are the roots of the Sprout and Sapling note commitment trees that have already been calculated for the last transaction(s) in the block that have JoinSplits in the Sprout case and/or Spend/Output descriptions in the Sapling case. These should be passed as fields in the Commit*Block requests.

Due to the coinbase maturity rules, the Sprout root is the empty root for the first 100 blocks. (These rules are already implemented in contextual validation and the anchor calculations.)

Hypothetically, if Sapling were activated from genesis, the specification requires a Sapling anchor, but zcashd would ignore that anchor.

These updates can be performed in a batch or without necessarily iterating over all transactions, if the data is available by other means; they're specified this way for clarity.

Accessing previous blocks for contextual validation

The state service performs contextual validation of blocks received via the CommitBlock request. Since CommitBlock is synchronous, contextual validation must also be performed synchronously.

The relevant chain for a block starts at its previous block, and follows the chain of previous blocks back to the genesis block.

Relevant chain iterator

The relevant chain can be retrieved from the state service as follows:

  • if the previous block is the finalized tip:
    • get recent blocks from the finalized state
  • if the previous block is in the non-finalized state:
    • get recent blocks from the relevant chain, then
    • get recent blocks from the finalized state, if required

The relevant chain can start at any non-finalized block, or at the finalized tip.

Relevant chain implementation

The relevant chain is implemented as a StateService iterator, which returns Arc<Block>s.

The chain iterator implements ExactSizeIterator, so Zebra can efficiently assert that the relevant chain contains enough blocks to perform each contextual validation check.

#![allow(unused)]
fn main() {
impl StateService {
    /// Return an iterator over the relevant chain of the block identified by
    /// `hash`.
    ///
    /// The block identified by `hash` is included in the chain of blocks yielded
    /// by the iterator.
    pub fn chain(&self, hash: block::Hash) -> Iter<'_> { ... }
}

impl Iterator for Iter<'_>  {
    type Item = Arc<Block>;
    ...
}
impl ExactSizeIterator for Iter<'_> { ... }
impl FusedIterator for Iter<'_> {}
}

For further details, see PR 1271.

Request / Response API

The state API is provided by a pair of Request/Response enums. Each Request variant corresponds to particular Response variants, and it's fine (and encouraged) for caller code to unwrap the expected variants with unreachable! on the unexpected variants. This is slightly inconvenient but it means that we have a unified state interface with unified backpressure.

This API includes both write and read calls. Spotting Commit requests in code review should not be a problem, but in the future, if we need to restrict access to write calls, we could implement a wrapper service that rejects these, and export "read" and "write" frontends to the same inner service.

Request::CommitBlock

#![allow(unused)]
fn main() {
CommitBlock {
    block: Arc<Block>,
    sprout_anchor: sprout::tree::Root,
    sapling_anchor: sapling::tree::Root,
}
}

Performs contextual validation of the given block, committing it to the state if successful. Returns Response::Added(block::Hash) with the hash of the newly committed block or an error.

Request::CommitFinalizedBlock

#![allow(unused)]
fn main() {
CommitFinalizedBlock {
    block: Arc<Block>,
    sprout_anchor: sprout::tree::Root,
    sapling_anchor: sapling::tree::Root,
}
}

Commits a finalized block to the rocksdb state, skipping contextual validation. This is exposed for use in checkpointing, which produces in-order finalized blocks. Returns Response::Added(block::Hash) with the hash of the committed block if successful.

Request::Depth(block::Hash)

Computes the depth in the best chain of the block identified by the given hash, returning

  • Response::Depth(Some(depth)) if the block is in the best chain;
  • Response::Depth(None) otherwise.

Implemented by querying:

  • (non-finalized) the height_by_hash map in the best chain, and
  • (finalized) the height_by_hash tree

Request::Tip

Returns Response::Tip(block::Hash) with the current best chain tip.

Implemented by querying:

  • (non-finalized) the highest height block in the best chain
  • (finalized) the highest height block in the hash_by_height tree, if the non-finalized state is empty

Request::BlockLocator

Returns Response::BlockLocator(Vec<block::Hash>) with hashes starting from the current chain tip and reaching backwards towards the genesis block. The first hash is the best chain tip. The last hash is the tip of the finalized portion of the state. If the finalized and non-finalized states are both empty, the block locator is also empty.

This can be used by the sync component to request hashes of subsequent blocks.

Implemented by querying:

  • (non-finalized) the hash_by_height map in the best chain
  • (finalized) the hash_by_height tree.

Request::Transaction(transaction::Hash)

Returns

  • Response::Transaction(Some(Transaction)) if the transaction identified by the given hash is contained in the state;

  • Response::Transaction(None) if the transaction identified by the given hash is not contained in the state.

Implemented by querying:

  • (non-finalized) the tx_loc_by_hash map (to get the block that contains the transaction) of each chain starting with the best chain, and then find block that chain's blocks (to get the block containing the transaction data)
  • (finalized) the tx_loc_by_hash tree (to get the block that contains the transaction) and then block_header_by_height tree (to get the block containing the transaction data), if the transaction is not in any non-finalized chain

Request::Block(block::Hash)

Returns

  • Response::Block(Some(Arc<Block>)) if the block identified by the given hash is contained in the state;

  • Response::Block(None) if the block identified by the given hash is not contained in the state;

Implemented by querying:

  • (non-finalized) the height_by_hash of each chain starting with the best chain, then find block that chain's blocks (to get the block data)
  • (finalized) the height_by_hash tree (to get the block height) and then the block_header_by_height tree (to get the block data), if the block is not in any non-finalized chain

Request::AwaitSpendableUtxo { outpoint: OutPoint, spend_height: Height, spend_restriction: SpendRestriction }

Returns

  • Response::SpendableUtxo(transparent::Output)

Implemented by querying:

  • (non-finalized) if any Chains contain OutPoint in their created_utxos, return the Utxo for OutPoint;
  • (finalized) else if OutPoint is in utxos_by_outpoint, return the Utxo for OutPoint;
  • else wait for OutPoint to be created as described in RFC0004;

Then validating:

  • check the transparent coinbase spend restrictions specified in RFC0004;
  • if the restrictions are satisfied, return the response;
  • if the spend is invalid, drop the request (and the caller will time out).

Drawbacks

  • Restarts can cause zebrad to redownload up to the last one hundred blocks it verified in the best chain, and potentially some recent side-chain blocks.

  • The service interface puts some extra responsibility on callers to ensure it is used correctly and does not verify the usage is correct at compile time.

  • the service API is verbose and requires manually unwrapping enums

  • We do not handle reorgs the same way zcashd does, and could in theory need to delete our entire on disk state and resync the chain in some pathological reorg cases.

  • testnet rollbacks are infrequent, but possible, due to bugs in testnet releases. Each testnet rollback will require additional state service code.

Summary

Zcash nodes use a Proof of Work algorithm to reach consensus on the best chain. Valid blocks must reach a difficulty threshold, which is adjusted after every block. The difficulty adjustment calculations depend on the difficulties and times of recent blocks. So Zebra performs contextual validation RFC2 of difficulty adjustments as part of committing blocks to the state.

Motivation

The Zcash block difficulty adjustment is one of the core Zcash consensus rules. Zebra must implement this consensus rule to make sure that its cached chain state is consistent with the consensus of Zcash nodes.

Difficulty adjustment is also a significant part of Zcash's security guarantees. It ensures that the network continues to resist takeover attacks, even as the number of Zcash miners grows.

Difficulty adjustment also ensures that blocks are regularly spaced, which allows users to create and finalise transactions with short, consistent delays. These predictable delays contribute to Zcash's usability.

Definitions

Difficulty:

  • hash difficulty: An arbitrary ranking of blocks, based on their hashes. Defined as the hash of the block, interpreted as a big-endian 256-bit number. Numerically smaller difficulties are harder to generate.

  • difficulty threshold: The easiest valid hash difficulty for a block. Numerically lower thresholds are harder to satisfy.

  • difficulty filter: A block passes the difficulty filter if the hash difficulty is less than or equal to the difficulty threshold (based on the block's difficulty field).

  • block work: The approximate amount of work required for a miner to generate a block hash that passes the difficulty filter. The number of block header attempts and the mining time are proportional to the work value. Numerically higher work values represent longer processing times.

  • averaging window: The 17 most recent blocks in the relevant chain.

  • median block span: The 11 most recent blocks from a chosen tip, typically the relevant tip.

  • target spacing: 150 seconds per block before Blossom activation, 75 seconds per block from Blossom activation onwards.

  • adjusted difficulty: After each block is mined, the difficulty threshold of the next block is adjusted, to keep the block gap close to the target spacing.

  • mean target difficulty: The arithmetic mean of the difficulty thresholds of the blocks in the averaging window.

  • median timespan: The average number of seconds taken to generate the blocks in the averaging window. Calculated using the difference of median block spans in and after the averaging window, then damped and bounded.

  • target timespan: The target spacing for an averaging window's worth of blocks.

Consensus:

  • consensus rule: A protocol rule which all nodes must apply consistently, so they can converge on the same chain fork.

  • structural/semantic/contextual verification: as defined in RFC2.

State:

  • block chain: A sequence of valid blocks linked by inclusion of the previous block hash in the subsequent block. Chains are rooted at the genesis block and extend to a tip.

  • relevant chain: The relevant chain for a block starts at the previous block, and extends back to genesis.

  • relevant tip: The tip of the relevant chain.

  • non-finalized state: State data corresponding to blocks above the reorg limit. This data can change in the event of a chain reorg.

  • finalized state: State data corresponding to blocks below the reorg limit. This data cannot change in the event of a chain reorg.

  • non-finalized tips: The highest blocks in each non-finalized chain. These tips might be at different heights.

  • finalized tip: The highest block in the finalized state. The tip of the best chain is usually 100 blocks (the reorg limit) above the finalized tip. But it can be lower during the initial sync, and after a chain reorganization, if the new best chain is at a lower height.

Guide-level explanation

Zcash's difficulty consensus rules are similar to Bitcoin.

Each block contains a difficulty threshold in its header. The hash of the block header must be less than this difficulty threshold. (When interpreted as a 256-bit integer in big-endian byte order.) This context-free semantic verification check is performed by the BlockVerifier.

After each block, the difficulty threshold is adjusted so that the block gap is close to the target spacing. On average, harder blocks take longer to mine, and easier blocks take less time.

The adjusted difficulty for the next block is calculated using the difficulty thresholds and times of recent blocks. Zcash uses the most recent 28 blocks in the relevant chain in its difficulty adjustment calculations.

The difficulty adjustment calculations adjust the mean target difficulty, based on the difference between the median timespan and the target timespan. If the median timespan is less than the target timespan, the next block is harder to mine.

The StateService calculates the adjusted difficulty using the context from the relevant chain. The difficulty contextual verification check ensures that the difficulty threshold of the next block is equal to the adjusted difficulty for its relevant chain.

State service interface changes

Contextual validation accesses recent blocks. So we modify the internal state service interface to provide an abstraction for accessing recent blocks.

The relevant chain

The relevant chain consists of the ancestors of a block, starting with its parent block, and extending back to the genesis block.

In Zebra, recent blocks are part of the non-finalized state, which can contain multiple chains. Past the reorganization limit, Zebra commits a single chain to the finalized state.

The relevant chain can start at any block in the non-finalized state, or at the finalized tip. See RFC5 for details.

Contextual validation design

Contextual validation is performed synchronously by the state service, as soon as the state has:

  • received the semantically valid next block (via CommitBlock), and
  • committed the previous block.

The difficulty adjustment check calculates the correct adjusted difficulty threshold value for a candidate block, and ensures that the block's difficulty_threshold field is equal to that value.

This check is implemented as follows:

Difficulty adjustment

The block difficulty threshold is adjusted by scaling the mean target difficulty by the median timespan.

On Testnet, if a long time has elapsed since the previous block, the difficulty adjustment is modified to allow minimum-difficulty blocks.

Mean target difficulty

The mean target difficulty is the arithmetic mean of the difficulty thresholds of the PoWAveragingWindow (17) most recent blocks in the relevant chain.

Zcash uses block difficulty thresholds in its difficulty adjustment calculations. (Block hashes are not used for difficulty adjustment.)

Median timespan

The average number of seconds taken to generate the 17 blocks in the averaging window.

The median timespan is calculated by taking the difference of the median times for:

  • the relevant tip: the PoWMedianBlockSpan (11) most recent blocks, and
  • the 11 blocks after the 17-block PoWAveragingWindow: that is, blocks 18-28 behind the relevant tip.

The median timespan is damped by the PoWDampingFactor, and bounded by PoWMaxAdjustDown and PoWMaxAdjustUp.

Test network minimum difficulty blocks

If there is a large gap after a Testnet block, the next block becomes a minimum difficulty block. Testnet minimum difficulty blocks have their difficulty_threshold set to the minimum difficulty for Testnet.

Block difficulty threshold

The block difficulty threshold for the next block is calculated by scaling the mean target difficulty by the ratio between the median timespan and the averaging window timespan.

The result of this calculation is limited by ToCompact(PoWLimit(network)), a per-network minimum block difficulty. This minimum difficulty is also used when a Testnet block's time gap exceeds the minimum difficulty gap.

Reference-level explanation

Contextual validation

Contextual validation is implemented in StateService::check_contextual_validity, which calls a separate function for each contextual validity check.

In Zebra, contextual validation starts after Canopy activation, so we can assume that the relevant chain contains at least 28 blocks on Mainnet and Testnet. (And panic if this assumption does not hold at runtime.)

Fundamental data types

Zebra is free to implement its difficulty calculations in any way that produces equivalent results to zcashd and the Zcash specification.

Difficulty

In Zcash block headers, difficulty thresholds are stored as a "compact" nBits value, which uses a custom 32-bit floating-point encoding. Zebra calls this type CompactDifficulty.

In Zcash, difficulty threshold calculations are performed using unsigned 256-bit integers. Rust has no standard u256 type, but there are a number of crates available which implement the required operations on 256-bit integers. Zebra abstracts over the chosen u256 implementation using its ExpandedDifficulty type.

Time

In Zcash, time values are unsigned 32-bit integers. But the difficulty adjustment calculations include time subtractions which could overflow an unsigned type, so they are performed using signed 64-bit integers in zcashd.

Zebra parses the header.time field into a DateTime<Utc>. Conveniently, the chrono::DateTime<_>::timestamp() function returns i64 values. So Zebra can do its signed time calculations using i64 values internally.

Note: i32 is an unsuitable type for signed time calculations. It is theoretically possible for the time gap between blocks to be larger than i32::MAX, because those times are provided by miners. Even if the median time gap is that large, the bounds and minimum difficulty in Zcash's difficulty adjustment algorithm will preserve a reasonable difficulty threshold. So Zebra must support this edge case.

Consensus-Critical Operations

The order of operations and overflow semantics for 256-bit integers can be consensus-critical.

For example:

  • dividing before multiplying discards lower-order bits, but
  • multiplying before dividing can cause overflow.

Zebra's implementation should try to match zcashd's order of operations and overflow handling as closely as possible.

Difficulty adjustment check

The difficulty adjustment check calculates the correct difficulty threshold value for a candidate block, and ensures that the block's difficulty_threshold field is equal to that value.

Context data type

The difficulty adjustment functions use a context consisting of the difficulties and times from the previous 28 blocks in the relevant chain.

These functions also use the candidate block's height and network.

To make these functions more ergonomic, we create a AdjustedDifficulty type, and implement the difficulty adjustment calculations as methods on that type.

#![allow(unused)]
fn main() {
/// The averaging window for difficulty threshold arithmetic mean calculations.                               
///                                                                                                           
/// `PoWAveragingWindow` in the Zcash specification.                                                          
pub const POW_AVERAGING_WINDOW: usize = 17;

/// The median block span for time median calculations.                                                       
///                                                                                                           
/// `PoWMedianBlockSpan` in the Zcash specification.                                                          
pub const POW_MEDIAN_BLOCK_SPAN: usize = 11;

/// Contains the context needed to calculate the adjusted difficulty for a block. 
struct AdjustedDifficulty {
    candidate_time: DateTime<Utc>,
    candidate_height: block::Height,
    network: Network,
    relevant_difficulty_thresholds: [CompactDifficulty; POW_AVERAGING_WINDOW + POW_MEDIAN_BLOCK_SPAN],
    relevant_times: [DateTime<Utc>; POW_AVERAGING_WINDOW + POW_MEDIAN_BLOCK_SPAN],
}
}

We implement some initialiser methods on AdjustedDifficulty for convenience. We might want to validate downloaded headers in future, so we include a new_from_header initialiser.

#![allow(unused)]
fn main() {
/// Initialise and return a new `AdjustedDifficulty` using a `candidate_block`,
/// `network`, and a `context`.
///
/// The `context` contains the previous
/// `PoWAveragingWindow + PoWMedianBlockSpan` (28) `difficulty_threshold`s and
/// `time`s from the relevant chain for `candidate_block`, in reverse height
/// order, starting with the previous block.
///
/// Note that the `time`s might not be in reverse chronological order, because
/// block times are supplied by miners.
///
/// Panics:
/// If the `context` contains fewer than 28 items.
pub fn new_from_block<C>(candidate_block: &Block,
                         network: Network,
                         context: C)
                         -> AdjustedDifficulty
    where
        C: IntoIterator<Item = (CompactDifficulty, DateTime<Utc>)>,
    { ... }

/// Initialise and return a new `AdjustedDifficulty` using a
/// `candidate_header`, `previous_block_height`, `network`, and a `context`.
///
/// Designed for use when validating block headers, where the full block has not
/// been downloaded yet.
///
/// See `new_from_block` for detailed information about the `context`.
///
/// Panics:
/// If the context contains fewer than 28 items.
pub fn new_from_header<C>(candidate_header: &block::Header,
                          previous_block_height: block::Height,
                          network: Network,
                          context: C)
                          -> AdjustedDifficulty
    where
        C: IntoIterator<Item = (CompactDifficulty, DateTime<Utc>)>,
    { ... }
}

Memory usage note

Copying CompactDifficulty values into the AdjustedDifficulty struct uses less memory than borrowing those values. CompactDifficulty values are 32 bits, but pointers are 64-bit on most modern machines. (And since they all come from different blocks, we need a pointer to each individual value.)

Borrowing DateTime<Utc> values might use slightly less memory than copying them - but that depends on the exact way that Rust stores associated types derived from a generic argument.

In any case, the overall size of each AdjustedDifficulty is only a few hundred bytes. If it turns up in profiles, we can look at borrowing the block header data.

Difficulty adjustment check implementation

The difficulty adjustment check ensures that the candidate_difficulty_threshold is equal to the difficulty_threshold value calculated using AdjustedDifficulty::adjusted_difficulty_threshold.

We implement this function:

#![allow(unused)]
fn main() {
/// Validate the `difficulty_threshold` from a candidate block's header, based
/// on an `expected_difficulty` for that block.
///
/// Uses `expected_difficulty` to calculate the expected `ToCompact(Threshold())`
/// value, then compares that value to the `difficulty_threshold`. Returns
/// `Ok(())` if the values are equal.
pub fn difficulty_threshold_is_valid(difficulty_threshold: CompactDifficulty,
                                     expected_difficulty: AdjustedDifficulty)
                                     -> Result<(), ValidateContextError> { ... }
}

Mean target difficulty calculation

The mean target difficulty is the arithmetic mean of the difficulty thresholds of the PoWAveragingWindow (17) most recent blocks in the relevant chain.

We implement this method on AdjustedDifficulty:

#![allow(unused)]
fn main() {
/// Calculate the arithmetic mean of the averaging window thresholds: the
/// expanded `difficulty_threshold`s from the previous `PoWAveragingWindow` (17)
/// blocks in the relevant chain.
///
/// Implements `MeanTarget` from the Zcash specification.
fn mean_target_difficulty(&self) -> ExpandedDifficulty { ... }
}

Implementation notes

Since the PoWLimits are 2^251 − 1 for Testnet, and 2^243 − 1 for Mainnet, the sum of these difficulty thresholds will be less than or equal to (2^251 − 1)*17 = 2^255 + 2^251 - 17. Therefore, this calculation can not overflow a u256 value. So the function is infalliable.

In Zebra, contextual validation starts after Canopy activation, so we can assume that the relevant chain contains at least 17 blocks. Therefore, the PoWLimit case of MeanTarget() in the Zcash specification is unreachable.

Median timespan calculation

The median timespan is the difference of the median times for:

  • the relevant tip: the PoWMedianBlockSpan (11) most recent blocks, and
  • the 11 blocks after the 17-block PoWAveragingWindow: that is, blocks 18-28 behind the relevant tip.

(The median timespan is known as the ActualTimespan in the Zcash specification, but this terminology is confusing, because it is a difference of medians, rather than any "actual" elapsed time.)

Zebra implements the median timespan using the following methods on AdjustedDifficulty:

#![allow(unused)]
fn main() {
/// Calculate the bounded median timespan. The median timespan is the
/// difference of medians of the timespan times, which are the `time`s from
/// the previous `PoWAveragingWindow + PoWMedianBlockSpan` (28) blocks in the
/// relevant chain.
///
/// Uses the candidate block's `height' and `network` to calculate the
/// `AveragingWindowTimespan` for that block.
///
/// The median timespan is damped by the `PoWDampingFactor`, and bounded by
/// `PoWMaxAdjustDown` and `PoWMaxAdjustUp`.
///
/// Implements `ActualTimespanBounded` from the Zcash specification.
///
/// Note: This calculation only uses `PoWMedianBlockSpan` (11) times at the
/// start and end of the timespan times. timespan times `[11..=16]` are ignored.
fn median_timespan_bounded(&self) -> Duration { ... }

/// Calculate the median timespan. The median timespan is the difference of
/// medians of the timespan times, which are the `time`s from the previous
/// `PoWAveragingWindow + PoWMedianBlockSpan` (28) blocks in the relevant chain.
///
/// Implements `ActualTimespan` from the Zcash specification.
///
/// See `median_timespan_bounded` for details.
fn median_timespan(&self) -> Duration { ... }

/// Calculate the median of the `median_block_span_times`: the `time`s from a
/// slice of `PoWMedianBlockSpan` (11) blocks in the relevant chain.
///
/// Implements `MedianTime` from the Zcash specification.
fn median_time(mut median_block_span_times: [DateTime<Utc>; POW_MEDIAN_BLOCK_SPAN])
               -> DateTime<Utc> { ... }
}

Zebra implements the AveragingWindowTimespan using the following methods on NetworkUpgrade:

#![allow(unused)]
fn main() {
impl NetworkUpgrade {
    /// Returns the `AveragingWindowTimespan` for the network upgrade.
    pub fn averaging_window_timespan(&self) -> Duration { ... }

    /// Returns the `AveragingWindowTimespan` for `network` and `height`.
    pub fn averaging_window_timespan_for_height(network: Network,
                                                height: block::Height)
                                                -> Duration { ... }
}
}

Implementation notes

In Zebra, contextual validation starts after Canopy activation, so we can assume that the relevant chain contains at least 28 blocks. Therefore:

  • max(0, height − PoWMedianBlockSpan) in the MedianTime() calculation simplifies to height − PoWMedianBlockSpan, and
  • there is always an odd number of blocks in MedianTime(), so the median is always the exact middle of the sequence.

Therefore, the function is infalliable.

Test network minimum difficulty calculation

A block is a Testnet minimum difficulty block if:

  • the block is a Testnet block,
  • the block's height is 299188 or greater, and
  • the time gap from the previous block is greater than the Testnet minimum difficulty gap, which is 6 times the target spacing for the block's height. (The target spacing was halved from the Blossom network upgrade onwards.)

The difficulty adjustment is modified for Testnet minimum difficulty blocks as follows:

  • the difficulty threshold in the block header is set to the Testnet minimum difficulty threshold, ToCompact(PoWLimit(network)).

Since the new difficulty changes the block header, Testnet blocks can only satisfy one of the alternate difficulty adjustment rules:

  • if the time gap is less than or equal to the Testnet minimum difficulty gap: the difficulty threshold is calculated using the default difficulty adjustment rule,
  • if the time gap is greater than the Testnet minimum difficulty gap: the difficulty threshold is the Testnet minimum difficulty threshold.

See ZIP-208 for details.

Note: some older versions of ZIPs 205 and 208 incorrectly said that:

  • the time gap threshold uses an "at least" check (it is strictly greater than),
  • the minimum difficulty threshold value was PoWLimit (it is ToCompact(PoWLimit)),
  • the difficulty_threshold (nBits) field is not modified in Testnet minimum difficulty blocks (the field is modified), and
  • the Testnet minimum difficulty value is not used to calculate future difficulty adjustments (the modified value is used in future adjustments).

ZIP 205 and 208 were fixed on 14 November 2020, see ZIP PR 417 and ZIP commit 806076c for details.

Test network minimum difficulty implementation

The Testnet minimum difficulty calculation uses the existing NetworkUpgrade::minimum_difficulty_spacing_for_height function to calculate the minimum difficulty gap.

We implement this method on NetworkUpgrade:

#![allow(unused)]
fn main() {
/// Returns true if the gap between `block_time` and `previous_block_time` is                             
/// greater than the Testnet minimum difficulty time gap. This time gap                                   
/// depends on the `network` and `block_height`.                                                          
///                                                                                                       
/// Returns false on Mainnet, when `block_height` is less than the minimum                                
/// difficulty start height, and when the time gap is too small.                                          
///                                                                                                       
/// `block_time` can be less than, equal to, or greater than                                              
/// `previous_block_time`, because block times are provided by miners.                                    
///                                                                                                       
/// Implements the Testnet minimum difficulty adjustment from ZIPs 205 and 208.                           
///                                                                                                       
/// Spec Note: Some parts of ZIPs 205 and 208 previously specified an incorrect                           
/// check for the time gap. This function implements the correct "greater than"                           
/// check.                                                                                                
pub fn is_testnet_min_difficulty_block(
    network: Network,
    block_height: block::Height,
    block_time: DateTime<Utc>,
    previous_block_time: DateTime<Utc>,
) -> bool { ... }
}

Implementation notes

In Zcash, the Testnet minimum difficulty rule starts at block 299188, and in Zebra, contextual validation starts after Canopy activation. So we can assume that there is always a previous block.

Therefore, this function is infalliable.

Block difficulty threshold calculation

The block difficulty threshold for the next block is calculated by scaling the mean target difficulty by the ratio between the median timespan and the averaging window timespan.

The result of the scaled threshold calculation is limited by ToCompact(PoWLimit(network)), a per-network minimum block difficulty. This minimum difficulty is also used when a Testnet block's time gap exceeds the minimum difficulty gap. We use the existing ExpandedDifficulty::target_difficulty_limit function to calculate the value of ToCompact(PoWLimit(network)).

In Zebra, contextual validation starts after Canopy activation, so the genesis case of Threshold() in the Zcash specification is unreachable.

Block difficulty threshold implementation

We implement these methods on AdjustedDifficulty:

#![allow(unused)]
fn main() {
/// Calculate the expected `difficulty_threshold` for a candidate block, based
/// on the `candidate_time`, `candidate_height`, `network`, and the
/// `difficulty_threshold`s and `time`s from the previous
/// `PoWAveragingWindow + PoWMedianBlockSpan` (28) blocks in the relevant chain.
///
/// Implements `ThresholdBits` from the Zcash specification, and the Testnet
/// minimum difficulty adjustment from ZIPs 205 and 208.
pub fn expected_difficulty_threshold(&self) -> CompactDifficulty { ... }

/// Calculate the `difficulty_threshold` for a candidate block, based on the
/// `candidate_height`, `network`, and the relevant `difficulty_threshold`s and
/// `time`s.
///
/// See `expected_difficulty_threshold` for details.
///
/// Implements `ThresholdBits` from the Zcash specification. (Which excludes the
/// Testnet minimum difficulty adjustment.)
fn threshold_bits(&self) -> CompactDifficulty { ... }
}

Implementation notes

Since:

  • the PoWLimits are 2^251 − 1 for Testnet, and 2^243 − 1 for Mainnet,
  • the ActualTimespanBounded can be at most MaxActualTimespan, which is floor(PoWAveragingWindow * PoWTargetSpacing * (1 + PoWMaxAdjustDown)) or floor(17 * 150 * (1 + 32/100)) = 3366,
  • AveragingWindowTimespan is at most 17 * 150 = 2250, and
  • MeanTarget is at most PoWLimit, ...

The maximum scaled value inside the Threshold() calculation is:

  • floor(PoWLimit / 2250) * 3366, which equals
  • floor((2^251 − 1) / 2250) * 3366, which equals
  • (2^251 − 1) * 132/100,
  • which is less than 2^252.

Therefore, this calculation can not overflow a u256 value. (And even if it did overflow, it would be constrained to a valid value by the PoWLimit minimum.)

Note that the multiplication by ActualTimespanBounded must happen after the division by AveragingWindowTimespan. Performing the multiplication first could overflow.

If implemented in this way, the function is infalliable.

zcashd truncates the MeanTarget after the mean calculation, and after dividing by AveragingWindowTimespan. But as long as there is no overflow, this is equivalent to the single truncation of the final result in the Zcash specification. However, Zebra should follow the order of operations in zcashd, and use repeated divisions, because that can't overflow. See the relevant comment in the zcashd source code.

Module Structure

The structs and functions in this RFC are implemented in a new zebra_state::service::check::difficulty module.

This module has two entry points:

  • DifficultyAdjustment::new_from_block
  • difficulty_threshold_is_valid

These entry points are both called from StateService::check_contextual_validity.

Test Plan

Explain how the feature will be tested, including:

  • tests for consensus-critical functionality
  • existing test vectors, if available
  • Zcash blockchain block test vectors (specify the network upgrade, feature, or block height and network)
  • property testing or fuzzing

The tests should cover:

  • positive cases: make sure the feature accepts valid inputs
    • using block test vectors for each network upgrade provides some coverage of valid inputs
  • negative cases: make sure the feature rejects invalid inputs
    • make sure there is a test case for each error condition in the code
    • if there are lots of potential errors, prioritise:
      • consensus-critical errors
      • security-critical errors, and
      • likely errors
  • edge cases: make sure that boundary conditions are correctly handled

Drawbacks

Why should we not do this?

Alternate consensus parameters

Any alternate consensus parameters or regtest mode would have to respect the constraints set by this design.

In particular:

  • the PoWLimit must be less than or equal to (2^256 - 1) / PoWAveragingWindow (approximately 2^251) to avoid overflow,
  • the PoWAveragingWindow and PoWMedianBlockSpan are fixed by function argument types (at least until Rust gets stable const generics), and
  • the design eliminates a significant number of edge cases by assuming that difficulty adjustments aren't validated for the first PoWAveragingWindow + PoWMedianBlockSpan (28) blocks in the chain.

Rationale and alternatives

Is this design a good basis for later designs or implementations?

The design includes specific methods for a future header-only validation design.

What other designs have been considered and what is the rationale for not choosing them?

A previous version of the RFC did not have the AdjustedDifficulty struct and methods. That design was easy to misuse, because each function had a complicated argument list.

What is the impact of not doing this?

Zebra could accept invalid, low-difficulty blocks from arbitrary miners. That would be a security issue.

Prior art

  • zcashd
  • the Zcash specification
  • Bitcoin

Unresolved questions

  • What parts of the design do you expect to resolve through the implementation of this feature before stabilization?

    • Guide-level examples
    • Reference-level examples
    • Corner case examples
    • Testing
  • What related issues do you consider out of scope for this RFC that could be addressed in the future independently of the solution that comes out of this RFC?

    • Monitoring and maintenance

Future possibilities

Re-using the relevant chain API in other contextual checks

The relevant chain iterator can be re-used to implement other contextual validation checks.

For example, responding to peer requests for block locators, which means implementing relevant chain hash queries as a StateService request

Header-only difficulty adjustment validation

Implementing header-only difficulty adjustment validation as a StateService request.

Caching difficulty calculations

Difficulty calculations use u256 could be a bit expensive, particularly if we get a flood of low-difficulty blocks. To reduce the impact of this kind of DoS, we could cache the value returned by threshold_bits for each block in the non-finalized state, and for the finalized tip. This value could be used to quickly calculate the difficulties for any child blocks of these blocks.

There's no need to persist this cache, or pre-fill it. (Minimum-difficulty Testnet blocks don't call threshold_bits, and some side-chain blocks will never have a next block.)

This caching is only worth implementing if these calculations show up in zebrad profiles.

Summary

The zebra-client crate handles client functionality. Client functionality is defined as all functionality related to a particular user's private data, in contrast to the other full node functionality which handles public chain state. This includes:

  • note and key management;
  • transaction generation;
  • a client component for zebrad that handles block chain scanning, with appropriate side-channel protections;
  • an RPC endpoint for zebrad that allows access to the client component;
  • Rust library code that implements basic wallet functionality;
  • a zebra-cli binary that wraps the wallet library and RPC queries in a command-line interface.

Client functionality is restricted to transparent and Sapling shielded transactions; Sprout shielded transactions are not supported. (Users should migrate to Sapling).

Motivation

We want to allow users to efficiently and securely send and receive funds via Zebra. One challenge unique to Zcash is block chain scanning: because shielded transactions reveal no metadata about the sender or receiver, users must scan the block chain for relevant transactions using viewing keys. This means that unlike a transparent blockchain with public transactions, a full node must have online access to viewing keys to scan the chain. This creates the risk of a privacy leak, because the node should not reveal which viewing keys it has access to.

Block chain scanning requires a mechanism that allows users to manage and store key material. This mechanism should also provide basic wallet functionality, so that users can send and receive funds without requiring third-party software.

To protect user privacy, this and all secret-dependent functionality should be strongly isolated from the rest of the node implementation. Care should be taken to protect against side channels that could reveal information about viewing keys. To make this isolation easier, all secret-dependent functionality is provided only by the zebra-client crate.

Definitions

  • client functionality: all functionality related to a particular user's private data, in contrast to other full node functionality which handles public chain state.

  • block chain scanning: the process of scanning the block chain for relevant transactions using a viewing key, as described in §4.19 of the protocol specification.

  • viewing key: Sapling shielded addresses support viewing keys, which represent the capability to decrypt transactions, as described in §3.1 and §4.2.2 of the protocol specification.

  • task: In this document, task refers specifically to a Tokio task. In brief, a task is a light weight, non-blocking unit of execution (green thread), similar to a Goroutine or Erlang process. Tasks execute independently and are scheduled co-operatively using explicit yield points. Tasks are executed on the Tokio runtime, which can either be single- or multi-threaded.

Guide-level explanation

There are two main parts of this functionality. The first is a Client component running as part of zebrad, and the second is a zebra-cli command-line tool.

The Client component is responsible for blockchain scanning. It maintains its own distinct sled database, which stores the viewing keys it uses to scan as well as the results of scanning. When a new block is added to the chain state, the Client component is notified asynchronously using a channel. For each Sapling shielded transaction in the block, the component attempts to perform trial decryption of that transaction's notes using each registered viewing key, as described in §4.19. If successful, decrypted notes are saved to the database.

The PING/REJECT attack demonstrates the importance of decoupling execution of normal node operations from secret-dependent operations. Zebra's network stack already makes it immune to those particular attacks, because each peer connection is executed in a different task. However, to eliminate this entire class of vulnerability, we execute the Client component in its own task, decoupled from the rest of the node functionality. In fact, each viewing key's scanning is performed independently, as described in more detail below, with an analysis of potential side-channels.

The second part is the zebra-cli command-line tool, which provides basic wallet functionality. This tool manages spending keys and addresses, and communicates with the Client component in zebrad to provide basic wallet functionality. Specifically, zebra-cli uses a distinct RPC endpoint to load viewing keys into zebrad and to query the results of block chain scanning. zebra-cli can then use the results of those queries to generate transactions and submit them to the network using zebrad.

This design upholds the principle of least authority by separating key material required for spending funds from the key material required for block chain scanning. This allows compartmentalization. For instance, a user could in principle run zebrad on a cloud VPS with only their viewing keys and store their spending keys on a laptop, or a user could run zebrad on a local machine and store their spending keys in a hardware wallet. Both of these use cases would require some additional tooling support, but are possible with this design.

Reference-level explanation

State notifications

We want a way to subscribe to updates from the state system via a channel. For the purposes of this RFC, these changes are in-flight, but in the future, these could be used for a push-based RPC mechanism.

Subscribers can subscribe to all state change notifications as they come in.

Currently the zebra_state::init() method returns a BoxService that allows you to make requests to the chain state. Instead, we would return a (BoxService, StateNotifications) tuple, where StateNotifications is a new structure initially defined as:

#[non_exhaustive]
pub struct StateNotifications {
  pub new_blocks: tokio::sync::watch::Receiver<Arc<Block>>,
}

Instead of making repeated polling requests to a state service to look for any new blocks, this channel will push new blocks to a consumer as they come in, for the consumer to use or discard at their discretion. This will be used by the client component described below. This will also be needed for gossiping blocks to other peers, as they are validated.

Online client component

This component maintains its own Sled tree. See RFC#0005 for more details on Sled.

We use the following Sled trees:

TreeKeysValues
viewing_keysIncomingViewingKeyString
height_by_keyIncomingViewingKeyBE32(height)
received_set_by_keyIncomingViewingKey?
spend_set_by_keyIncomingViewingKey?
nullifier_map_by_keyIncomingViewingKey?

See https://zips.z.cash/protocol/protocol.pdf#saplingscan

Zcash structures are encoded using ZcashSerialize/ZcashDeserialize.

This component runs inside zebrad. After incoming viewing keys are registered, it holds onto them in order to do blockchain scanning. The component keeps track of where it’s scanned to (TODO: per key?). Runs in its own separate task, in case it crashes, it’s not noticeable, and executes independently (but in the same process) of the normal node operation.

In the case of the client component that needs to do blockchain scanning and trial decryption, every valid block with non-coinbase transactions will need to be checked and its transactions trial-decrypted with registered incoming viewing keys to see if any notes have been received by the key's owner and if any notes have already been spent elsewhere.

RPC's

A specific set of privileged RPC endpoints:

  • Allows registering of incoming viewing keys with zebrad in order to do blockchain scanning
  • Allows querying of the results of that scanning, to get wallet balance, etc
  • Not authenticated to start (see 'Future possibilities')
  • Users can control access by controlling access to the privileged endpoint (ie via a firewall)

Support for sending tx's via non-privileged RPC endpoints, or via Stolon:

  • sendTransaction: once you author a transcation you can gossip it via any Zcash node, not just a specific instance of zebrad

Wallet functionality

  • Holds on to your spending keys so you can author transactions
  • Uses RPC methods to query the online client component inside zebrad about wallet balances

CLI binary

  • zebra-cli talks to the subcomponent running in zebrad
    • (can use servo/bincode to communicate with zebrad)
    • via the privileged (and possibly the unprivileged) RPC endpoints
    • can use cap-std to restrict filesystem and network access for zebra-client. See https://github.com/ZcashFoundation/zebra/issues/2340
    • can use the tui crate to render a terminal UI

Task isolation in Tokio

  • TODO: fill in
  • cooperative multitasking is fine, IF you cooperate
  • lots of tasks

Module Structure

zebra-client ( currently and empty stub) zebra-cli (does not exist yet) zebra-rfc? (exists as an empty stub, we way have zebra-cli communicate with zebra-client inside zebrad via an RPC method any/or a private IPC layer)

Test Plan

Drawbacks

Supporting a wallet assumes risk. Effort required to implement wallet functionality.

  • need to responsibly handle secret key material;
  • currently we only handle public data.

Rationale and alternatives

  • why have a separate RPC endpoint?

    • extra endpoints are cheap
    • allows segmentation by capability
    • alternative is error-prone after-the-fact ACLs like Tor control port filters
  • What is the impact of not doing this?

    • We can't send money with zebra alone.
    • rely on third party wallet software to send funds with zebra
      • we need to provide basic functionality within zebra's trust boundary, rather than forcing users to additionally trust 3p software
      • there are great 3p wallets, we want to integrate with them, just don't want to rely on them
  • What about the light client protocol?

    • does not address this use case, has different trust model (private lookup, no scanning)
    • we want our first client that interacts with zebrad to not have a long startup time, which a lightclient implementation would require
    • zebra-cli should be within the same trust and privacy boundary as the zebrad node it is interacting with
    • light client protocol as currently implemented requires stack assumptions such as protobufs and a hardcoded lightserver to talk to
  • What about having one database per key?

    • easy to reliably delete or backup all data related to a single key
    • might use slightly more space/CPU
    • slightly harder to delete all the keys

Unresolved questions

  • wait to fill this in until doing the detailed writeup.

Future possibilities

  • BlazeSync algorithm for fast syncing, like Zecwallet

  • mandatory sweeps for legacy keys

    • blazingly fast wallet startup, to match zebrad's blazingly fast sync
    • generate unified address from a new seed phrase (or one provided by the user)
    • user can just backup seed phrase rather than a set of private keys
    • handles arbitrary keys from zcashd and other wallets, even if they weren't generated from a seed phrase
    • handles Sprout funds without zebra-client having to support Sprout balances
    • startup is incredibly fast
      • sweep takes a few minutes to be confirmed
      • scanning the entire chain could take hours
      • if we know when the seed phrase was created, we can skip millions of blocks during scanning
    • sweeps can also be initiated by the user for non-linkability / performance / refresh
    • sweeps should handle the "block reward recipient" case where there are a lot of small outputs
    • initial release could support mandatory sweeps, and future releases could support legacy keys
  • split Client component into subprocess

    • this helps somewhat but the benefit is reduced by our preexisting memory safety, thanks to Rust
    • not meaningful without other isolation (need to restrict zebrad from accessing viewing keys on disk, etc)
    • could use cap-std to restrict filesystem and network access for zebra-client. See https://github.com/ZcashFoundation/zebra/issues/2340
    • instead of process isolation, maybe you actually want the Light Client Protocol, or something similar?
  • hardware wallet integration for zebra-cli

    • having zebra-cli allows us to do this
    • much higher security ROI than subprocess
    • very cool future feature
  • authenticate queries for a particular viewing key by proving knowledge of the viewing key (requires crypto). this could allow public access to the client endpoint

  • Use Unified Addresses only, no legacy addrs.

Summary

Network Upgrade number 5 (NU5) introduces a new transaction type (transaction version 5). This document is a proposed design for implementing such a transaction version.

Motivation

The Zebra software wants to be a protocol compatible Zcash implementation. One of the tasks to do this includes the support of the new version 5 transactions that will be implemented in Network Upgrade 5 (NU5).

Definitions

  • NU5 - the 5th Zcash network upgrade, counting from the Overwinter upgrade as upgrade zero.
  • Orchard - a new shielded pool introduced in NU5.
  • Sapling - a new shielded pool introduced in the 1st network upgrade. (Sapling is also the name of that network upgrade, but this RFC is focused on the Sapling shielded pool.)
  • orchard data - Data types needed to support orchard transactions.
  • sapling data - Data types needed to support sapling transactions.
  • orchard transaction version - Transactions that support orchard data. Currently only V5.
  • sapling transaction version - Transactions that support sapling data. Currently V4 and V5 but the data is implemented differently in them.

Guide-level explanation

V5 transactions are described by the protocol in the second table of Transaction Encoding and Consensus.

All of the changes proposed in this document are only to the zebra-chain crate.

To highlight changes most of the document comments from the code snippets in the reference section were removed.

Sapling Changes Overview

V4 and V5 transactions both support sapling, but the underlying data structures are different. So we need to make the sapling data types generic over the V4 and V5 structures.

In V4, anchors are per-spend, but in V5, they are per-transaction. In V5, the shared anchor is only present if there is at least one spend.

For consistency, we also move some fields into the ShieldedData type, and rename some fields and types.

Orchard Additions Overview

V5 transactions are the only ones that will support orchard transactions with Orchard data types.

Orchard uses Halo2Proofs with corresponding signature type changes. Each Orchard Action contains a spend and an output. Placeholder values are substituted for unused spends and outputs.

Other Transaction V5 Changes

V5 transactions split Spends, Outputs, and AuthorizedActions into multiple arrays, with a single CompactSize count before the first array. We add new zcash_deserialize_external_count and zcash_serialize_external_count utility functions, which make it easier to serialize and deserialize these arrays correctly.

The order of some of the fields changed from V4 to V5. For example the lock_time and expiry_height were moved above the transparent inputs and outputs.

The serialized field order and field splits are in the V5 transaction section in the NU5 spec. (Currently, the V5 spec is on a separate page after the V1-V4 specs.)

Zebra's structs sometimes use a different order from the spec. We combine fields that occur together, to make it impossible to represent structurally invalid Zcash data.

In general:

  • Zebra enums and structs put fields in serialized order.
  • Composite structs and emnum variants are ordered based on last data deserialized for the composite.

Reference-level explanation

Sapling Changes

We know by protocol (2nd table of Transaction Encoding and Consensus) that V5 transactions will support sapling data however we also know by protocol that spends (Spend Description Encoding and Consensus, See †) and outputs (Output Description Encoding and Consensus, See †) fields change from V4 to V5.

ShieldedData is currently defined and implemented in zebra-chain/src/transaction/shielded_data.rs. As this is Sapling specific we propose to move this file to zebra-chain/src/sapling/shielded_data.rs.

Changes to V4 Transactions

Here we have the proposed changes for V4 transactions:

  • make sapling_shielded_data use the PerSpendAnchor anchor variant
  • rename shielded_data to sapling_shielded_data
  • move value_balance into the sapling::ShieldedData type
  • order fields based on the last data deserialized for each field
#![allow(unused)]
fn main() {
enum Transaction::V4 {
    inputs: Vec<transparent::Input>,
    outputs: Vec<transparent::Output>,
    lock_time: LockTime,
    expiry_height: block::Height,
    joinsplit_data: Option<JoinSplitData<Groth16Proof>>,
    sapling_shielded_data: Option<sapling::ShieldedData<PerSpendAnchor>>,
}
}

The following types have ZcashSerialize and ZcashDeserialize implementations, because they can be serialized into a single byte vector:

  • transparent::Input
  • transparent::Output
  • LockTime
  • block::Height
  • Option<JoinSplitData<Groth16Proof>>

Note: Option<sapling::ShieldedData<PerSpendAnchor>> does not have serialize or deserialize implementations, because the binding signature is after the joinsplits. Its serialization and deserialization is handled as part of Transaction::V4.

Anchor Variants

We add an AnchorVariant generic type trait, because V4 transactions have a per-Spend anchor, but V5 transactions have a shared anchor. This trait can be added to sapling/shielded_data.rs:

#![allow(unused)]
fn main() {
struct PerSpendAnchor {}
struct SharedAnchor {}

/// This field is not present in this transaction version.
struct FieldNotPresent;

impl AnchorVariant for PerSpendAnchor {
    type Shared = FieldNotPresent;
    type PerSpend = sapling::tree::Root;
}

impl AnchorVariant for SharedAnchor {
    type Shared = sapling::tree::Root;
    type PerSpend = FieldNotPresent;
}

trait AnchorVariant {
    type Shared;
    type PerSpend;
}
}

Changes to Sapling ShieldedData

We use AnchorVariant in ShieldedData to model the anchor differences between V4 and V5:

  • in V4, there is a per-spend anchor
  • in V5, there is a shared anchor, which is only present when there are spends

If there are no spends and no outputs:

  • in v4, the value_balance is fixed to zero
  • in v5, the value balance field is not present
  • in both versions, the binding_sig field is not present
#![allow(unused)]
fn main() {
/// ShieldedData ensures that value_balance and binding_sig are only present when
/// there is at least one spend or output.
struct sapling::ShieldedData<AnchorV: AnchorVariant> {
    value_balance: Amount,
    transfers: sapling::TransferData<AnchorV>,
    binding_sig: redjubjub::Signature<Binding>,
}

/// TransferData ensures that:
/// * there is at least one spend or output, and
/// * the shared anchor is only present when there are spends
enum sapling::TransferData<AnchorV: AnchorVariant> {
    /// In Transaction::V5, if there are any spends,
    /// there must also be a shared spend anchor.
    SpendsAndMaybeOutputs {
        shared_anchor: AnchorV::Shared,
        spends: AtLeastOne<Spend<AnchorV>>,
        maybe_outputs: Vec<Output>,
    }

    /// If there are no spends, there must not be a shared
    /// anchor.
    JustOutputs {
        outputs: AtLeastOne<Output>,
    }
}
}

The AtLeastOne type is a vector wrapper which always contains at least one element. For more details, see its documentation.

Some of these fields are in a different order to the serialized data, see the V4 and V5 transaction specs for details.

The following types have ZcashSerialize and ZcashDeserialize implementations, because they can be serialized into a single byte vector:

  • Amount
  • sapling::tree::Root
  • redjubjub::Signature<Binding>

Adding V5 Sapling Spend

Sapling spend code is located at zebra-chain/src/sapling/spend.rs. We use AnchorVariant to model the anchor differences between V4 and V5. And we create a struct for serializing V5 transaction spends:

#![allow(unused)]
fn main() {
struct Spend<AnchorV: AnchorVariant> {
    cv: commitment::ValueCommitment,
    per_spend_anchor: AnchorV::PerSpend,
    nullifier: note::Nullifier,
    rk: redjubjub::VerificationKeyBytes<SpendAuth>,
    // This field is stored in a separate array in v5 transactions, see:
    // https://zips.z.cash/protocol/nu5.pdf#txnencodingandconsensus
    // parse using `zcash_deserialize_external_count` and `zcash_serialize_external_count`
    zkproof: Groth16Proof,
    // This fields is stored in another separate array in v5 transactions
    spend_auth_sig: redjubjub::Signature<SpendAuth>,
}

/// The serialization prefix fields of a `Spend` in Transaction V5.
///
/// In `V5` transactions, spends are split into multiple arrays, so the prefix,
/// proof, and signature must be serialised and deserialized separately.
///
/// Serialized as `SpendDescriptionV5` in [protocol specification §7.3].
struct SpendPrefixInTransactionV5 {
    cv: commitment::ValueCommitment,
    nullifier: note::Nullifier,
    rk: redjubjub::VerificationKeyBytes<SpendAuth>,
}
}

The following types have ZcashSerialize and ZcashDeserialize implementations, because they can be serialized into a single byte vector:

  • Spend<PerSpendAnchor> (moved from the pre-RFC Spend)
  • SpendPrefixInTransactionV5 (new)
  • Groth16Proof
  • redjubjub::Signature<redjubjub::SpendAuth> (new - for v5 spend auth sig arrays)

Note: Spend<SharedAnchor> does not have serialize and deserialize implementations. It must be split using into_v5_parts before serialization, and recombined using from_v5_parts after deserialization.

These convenience methods convert between Spend<SharedAnchor> and its v5 parts: SpendPrefixInTransactionV5, the spend proof, and the spend auth signature.

Changes to Sapling Output

In Zcash the Sapling output fields are the same for V4 and V5 transactions, so the Output struct is unchanged. However, V4 and V5 transactions serialize outputs differently, so we create additional structs for serializing outputs in each transaction version.

The output code is located at zebra-chain/src/sapling/output.rs:

#![allow(unused)]
fn main() {
struct Output {
    cv: commitment::ValueCommitment,
    cm_u: jubjub::Fq,
    ephemeral_key: keys::EphemeralPublicKey,
    enc_ciphertext: note::EncryptedNote,
    out_ciphertext: note::WrappedNoteKey,
    // This field is stored in a separate array in v5 transactions, see:
    // https://zips.z.cash/protocol/nu5.pdf#txnencodingandconsensus
    // parse using `zcash_deserialize_external_count` and `zcash_serialize_external_count`
    zkproof: Groth16Proof,
}

/// Wrapper for `Output` serialization in a `V4` transaction.
struct OutputInTransactionV4(pub Output);

/// The serialization prefix fields of an `Output` in Transaction V5.
///
/// In `V5` transactions, spends are split into multiple arrays, so the prefix
/// and proof must be serialised and deserialized separately.
///
/// Serialized as `OutputDescriptionV5` in [protocol specification §7.3].
struct OutputPrefixInTransactionV5 {
    cv: commitment::ValueCommitment,
    cm_u: jubjub::Fq,
    ephemeral_key: keys::EphemeralPublicKey,
    enc_ciphertext: note::EncryptedNote,
    out_ciphertext: note::WrappedNoteKey,
}
}

The following fields have ZcashSerialize and ZcashDeserialize implementations, because they can be serialized into a single byte vector:

  • OutputInTransactionV4 (moved from Output)
  • OutputPrefixInTransactionV5 (new)
  • Groth16Proof

Note: The serialize and deserialize implementations on Output are moved to OutputInTransactionV4. In v4 transactions, outputs must be wrapped using into_v4 before serialization, and unwrapped using from_v4 after deserialization. In transaction v5, outputs must be split using into_v5_parts before serialization, and recombined using from_v5_parts after deserialization.

These convenience methods convert Output to:

  • its v4 serialization wrapper OutputInTransactionV4, and
  • its v5 parts: OutputPrefixInTransactionV5 and the output proof.

Adding V5 Transactions

Now lets see how the V5 transaction is specified in the protocol, this is the second table of Transaction Encoding and Consensus and how are we going to represent it based in the above changes for Sapling fields and the new Orchard fields.

We propose the following representation for transaction V5 in Zebra:

#![allow(unused)]
fn main() {
enum Transaction::V5 {
    lock_time: LockTime,
    expiry_height: block::Height,
    inputs: Vec<transparent::Input>,
    outputs: Vec<transparent::Output>,
    sapling_shielded_data: Option<sapling::ShieldedData<SharedAnchor>>,
    orchard_shielded_data: Option<orchard::ShieldedData>,
}
}

To model the V5 anchor type, sapling_shielded_data uses the SharedAnchor variant located at zebra-chain/src/transaction/sapling/shielded_data.rs.

The following fields have ZcashSerialize and ZcashDeserialize implementations, because they can be serialized into a single byte vector:

  • LockTime
  • block::Height
  • transparent::Input
  • transparent::Output
  • Option<sapling::ShieldedData<SharedAnchor>> (new)
  • Option<orchard::ShieldedData> (new)

Orchard Additions

Adding Orchard ShieldedData

The new V5 structure will create a new orchard::ShieldedData type. This new type will be defined in a new zebra-chain/src/orchard/shielded_data.rs file:

#![allow(unused)]
fn main() {
struct orchard::ShieldedData {
    flags: Flags,
    value_balance: Amount,
    shared_anchor: orchard::tree::Root,
    proof: Halo2Proof,
    actions: AtLeastOne<AuthorizedAction>,
    binding_sig: redpallas::Signature<Binding>,
}
}

The fields are ordered based on the last data deserialized for each field.

The following types have ZcashSerialize and ZcashDeserialize implementations, because they can be serialized into a single byte vector:

  • orchard::Flags (new)
  • Amount
  • Halo2Proof (new)
  • redpallas::Signature<Binding> (new)

Adding Orchard AuthorizedAction

In V5 transactions, there is one SpendAuth signature for every Action. To ensure that this structural rule is followed, we create an AuthorizedAction type in orchard/shielded_data.rs:

#![allow(unused)]
fn main() {
/// An authorized action description.
///
/// Every authorized Orchard `Action` must have a corresponding `SpendAuth` signature.
struct orchard::AuthorizedAction {
    action: Action,
    // This field is stored in a separate array in v5 transactions, see:
    // https://zips.z.cash/protocol/nu5.pdf#txnencodingandconsensus
    // parse using `zcash_deserialize_external_count` and `zcash_serialize_external_count`
    spend_auth_sig: redpallas::Signature<SpendAuth>,
}
}

Where Action is defined as Action definition.

The following types have ZcashSerialize and ZcashDeserialize implementations, because they can be serialized into a single byte vector:

  • Action (new)
  • redpallas::Signature<SpendAuth> (new)

Note: AuthorizedAction does not have serialize and deserialize implementations. It must be split using into_parts before serialization, and recombined using from_parts after deserialization.

These convenience methods convert between AuthorizedAction and its parts: Action and the spend auth signature.

Adding Orchard Flags

Finally, in the V5 transaction we have a new orchard::Flags type. This is a bitfield type defined as:

#![allow(unused)]
fn main() {
bitflags! {
    /// Per-Transaction flags for Orchard.
    ///
    /// The spend and output flags are passed to the `Halo2Proof` verifier, which verifies
    /// the relevant note spending and creation consensus rules.
    struct orchard::Flags: u8 {
        /// Enable spending non-zero valued Orchard notes.
        ///
        /// "the `enableSpendsOrchard` flag, if present, MUST be 0 for coinbase transactions"
        const ENABLE_SPENDS = 0b00000001;
        /// Enable creating new non-zero valued Orchard notes.
        const ENABLE_OUTPUTS = 0b00000010;
        // Reserved, zeros (bits 2 .. 7)
    }
}
}

This type is also defined in orchard/shielded_data.rs.

Note: A consensus rule was added to the protocol specification stating that:

In a version 5 transaction, the reserved bits 2..7 of the flagsOrchard field MUST be zero.

Test Plan

  • All renamed, modified and new types should serialize and deserialize.
  • The full V4 and V5 transactions should serialize and deserialize.
  • Prop test strategies for V4 and V5 will be updated and created.
  • Before NU5 activation on testnet, test on the following test vectors:
    • Hand-crafted Orchard-only, Orchard/Sapling, Orchard/Transparent, and Orchard/Sapling/Transparent transactions based on the spec
    • "Fake" Sapling-only and Sapling/Transparent transactions based on the existing test vectors, converted from V4 to V5 format
      • We can write a test utility function to automatically do these conversions
    • An empty transaction, with no Orchard, Sapling, or Transparent data
      • A v5 transaction with no spends, but some outputs, to test the shared anchor serialization rule
    • Any available zcashd test vectors
  • After NU5 activation on testnet:
    • Add test vectors using the testnet activation block and 2 more post-activation blocks
  • After NU5 activation on mainnet:
    • Add test vectors using the mainnet activation block and 2 more post-activation blocks

Security

To avoid parsing memory exhaustion attacks, we will make the following changes across all Transaction, ShieldedData, Spend and Output variants, V1 through to V5:

  • Check cardinality consensus rules at parse time, before deserializing any Vecs
    • In general, Zcash requires that each transaction has at least one Transparent/Sprout/Sapling/Orchard transfer, this rule is not currently encoded in our data structures (it is only checked during semantic verification)
  • Stop parsing as soon as the first error is detected

These changes should be made in a later pull request, see #1917 for details.

Summary

Zebra programmers need to carefully write async code so it doesn't deadlock or hang. This is particularly important for poll, select, Buffer, Batch, and Mutex.

Zebra executes concurrent tasks using async Rust, with the tokio executor.

At a higher level, Zebra also uses tower::Services, tower::Buffers, and our own tower-batch-control implementation.

Motivation

Like all concurrent codebases, Zebra needs to obey certain constraints to avoid hangs. Unfortunately, Rust's tooling in these areas is still developing. So Zebra developers need to manually check these constraints during design, development, reviews, and testing.

Definitions

  • hang: a Zebra component stops making progress.
  • constraint: a rule that Zebra must follow to prevent hangs.
  • CORRECTNESS comment: the documentation for a constraint in Zebra's code.
  • task: an async task can execute code independently of other tasks, using cooperative multitasking.
  • contention: slower execution because multiple tasks are waiting to acquire a lock, buffer/batch slot, or readiness.
  • missed wakeup: a task hangs because it is never scheduled for wakeup.
  • lock: exclusive access to a shared resource. Locks stop other code from running until they are released. For example, a mutex, buffer slot, or service readiness.
  • critical section: code that is executed while holding a lock.
  • deadlock: a hang that stops an async task executing code, because it is waiting for a lock, slot, or task readiness. For example: a task is waiting for a service to be ready, but the service readiness depends on that task making progress.
  • starvation or livelock: a hang that executes code, but doesn't do anything useful. For example: a loop never terminates.

Guide-level explanation

If you are designing, developing, or testing concurrent Zebra code, follow the patterns in these examples to avoid hangs.

If you are reviewing concurrent Zebra designs or code, make sure that:

  • it is clear how the design or code avoids hangs
  • the design or code follows the patterns in these examples (as much as possible)
  • the concurrency constraints and risks are documented

The Reference section contains in-depth background information about Rust async concurrency in Zebra.

Here are some examples of concurrent designs and documentation in Zebra:

Registering Wakeups Before Returning Poll::Pending

To avoid missed wakeups, futures must schedule a wakeup before they return Poll::Pending. For more details, see the Poll::Pending and Wakeups section.

Zebra's unready_service.rs uses the ready! macro to correctly handle Poll::Pending from the inner service.

You can see some similar constraints in pull request #1954.

#![allow(unused)]
fn main() {
// CORRECTNESS
//
// The current task must be scheduled for wakeup every time we return
// `Poll::Pending`.
//
//`ready!` returns `Poll::Pending` when the service is unready, and
// the inner `poll_ready` schedules this task for wakeup.
//
// `cancel.poll` also schedules this task for wakeup if it is canceled.
let res = ready!(this
    .service
    .as_mut()
    .expect("poll after ready")
    .poll_ready(cx));
}

Futures-Aware Mutexes

To avoid hangs or slowdowns, prefer futures-aware types, particularly for complex waiting or locking code. But in some simple cases, std::sync::Mutex is more efficient. For more details, see the Futures-Aware Types section.

Zebra's Handshake won't block other tasks on its thread, because it uses futures::lock::Mutex:

#![allow(unused)]
fn main() {
pub async fn negotiate_version(
    peer_conn: &mut Framed<TcpStream, Codec>,
    addr: &SocketAddr,
    config: Config,
    nonces: Arc<futures::lock::Mutex<HashSet<Nonce>>>,
    user_agent: String,
    our_services: PeerServices,
    relay: bool,
) -> Result<(Version, PeerServices), HandshakeError> {
    // Create a random nonce for this connection
    let local_nonce = Nonce::default();
    // # Correctness
    //
    // It is ok to wait for the lock here, because handshakes have a short
    // timeout, and the async mutex will be released when the task times
    // out.
    nonces.lock().await.insert(local_nonce);

    ...
}
}

Zebra's Inbound service can't use an async-aware mutex for its AddressBook, because the mutex is shared with non-async code. It only holds the mutex to clone the address book, reducing the amount of time that other tasks on its thread are blocked:

#![allow(unused)]
fn main() {
// # Correctness
//
// Briefly hold the address book threaded mutex while
// cloning the address book. Then sanitize after releasing
// the lock.
let peers = address_book.lock().unwrap().clone();
let mut peers = peers.sanitized();
}

Avoiding Deadlocks when Acquiring Buffer or Service Readiness

To avoid deadlocks, readiness and locks must be acquired in a consistent order. For more details, see the Acquiring Buffer Slots, Mutexes, or Readiness section.

Zebra's ChainVerifier avoids deadlocks, contention, and errors by:

  • calling poll_ready before each call
  • acquiring buffer slots for the earlier verifier first (based on blockchain order)
  • ensuring that buffers are large enough for concurrent tasks
#![allow(unused)]
fn main() {
// We acquire checkpoint readiness before block readiness, to avoid an unlikely
// hang during the checkpoint to block verifier transition. If the checkpoint and
// block verifiers are contending for the same buffer/batch, we want the checkpoint
// verifier to win, so that checkpoint verification completes, and block verification
// can start. (Buffers and batches have multiple slots, so this contention is unlikely.)
//
// The chain verifier holds one slot in each verifier, for each concurrent task.
// Therefore, any shared buffers or batches polled by these verifiers should double
// their bounds. (For example, the state service buffer.)
ready!(self
    .checkpoint
    .poll_ready(cx)
    .map_err(VerifyChainError::Checkpoint))?;
ready!(self.block.poll_ready(cx).map_err(VerifyChainError::Block))?;
Poll::Ready(Ok(()))
}

Critical Section Compiler Errors

To avoid deadlocks or slowdowns, critical sections should be as short as possible, and they should not depend on any other tasks. For more details, see the Acquiring Buffer Slots, Mutexes, or Readiness section.

Zebra's CandidateSet must release a std::sync::Mutex lock before awaiting a tokio::time::Sleep future. This ensures that the threaded mutex lock isn't held over the await point.

If the lock isn't dropped, compilation fails, because the mutex lock can't be sent between threads.

#![allow(unused)]
fn main() {
// # Correctness
//
// In this critical section, we hold the address mutex, blocking the
// current thread, and all async tasks scheduled on that thread.
//
// To avoid deadlocks, the critical section:
// - must not acquire any other locks
// - must not await any futures
//
// To avoid hangs, any computation in the critical section should
// be kept to a minimum.
let reconnect = {
    let mut guard = self.address_book.lock().unwrap();
    ...
    let reconnect = guard.reconnection_peers().next()?;

    let reconnect = MetaAddr::new_reconnect(&reconnect.addr, &reconnect.services);
    guard.update(reconnect);
    reconnect
};

// SECURITY: rate-limit new candidate connections
sleep.await;
}

Sharing Progress between Multiple Futures

To avoid starvation and deadlocks, tasks that depend on multiple futures should make progress on all of those futures. This is particularly important for tasks that depend on their own outputs. For more details, see the Unbiased Selection section.

Zebra's peer crawler task avoids starvation and deadlocks by:

You can see a range of hang fixes in pull request #1950.

#![allow(unused)]
fn main() {
// CORRECTNESS
//
// To avoid hangs and starvation, the crawler must:
// - spawn a separate task for each handshake, so they can make progress
//   independently (and avoid deadlocking each other)
// - use the `select!` macro for all actions, because the `select` function
//   is biased towards the first ready future

loop {
    let crawler_action = tokio::select! {
        a = handshakes.next() => a,
        a = crawl_timer.next() => a,
        _ = demand_rx.next() => {
            if let Some(candidate) = candidates.next().await {
                // candidates.next has a short delay, and briefly holds the address
                // book lock, so it shouldn't hang
                DemandHandshake { candidate }
            } else {
                DemandCrawl
            }
        }
    };

    match crawler_action {
        DemandHandshake { candidate } => {
            // spawn each handshake into an independent task, so it can make
            // progress independently of the crawls
            let hs_join =
                tokio::spawn(dial(candidate, connector.clone()));
            handshakes.push(Box::pin(hs_join));
        }
        DemandCrawl => {
            // update has timeouts, and briefly holds the address book
            // lock, so it shouldn't hang
            candidates.update().await?;
        }
        // handle handshake responses and the crawl timer
    }
}
}

Prioritising Cancellation Futures

To avoid starvation, cancellation futures must take priority over other futures, if multiple futures are ready. For more details, see the Biased Selection section.

Zebra's connection.rs avoids hangs by prioritising the cancel and timer futures over the peer receiver future. Under heavy load, the peer receiver future could always be ready with a new message, starving the cancel or timer futures.

You can see a range of hang fixes in pull request #1950.

#![allow(unused)]
fn main() {
// CORRECTNESS
//
// Currently, select prefers the first future if multiple
// futures are ready.
//
// If multiple futures are ready, we want the cancellation
// to take priority, then the timeout, then peer responses.
let cancel = future::select(tx.cancellation(), timer_ref);
match future::select(cancel, peer_rx.next()) {
    ...
}
}

Atomic Shutdown Flag

As of April 2021, Zebra implements some shutdown checks using an atomic bool.

Zebra's shutdown.rs avoids data races and missed updates by using the strongest memory ordering (SeqCst).

We plan to replace this raw atomic code with a channel, see #1678.

#![allow(unused)]
fn main() {
/// A flag to indicate if Zebra is shutting down.
///
/// Initialized to `false` at startup.
pub static IS_SHUTTING_DOWN: AtomicBool = AtomicBool::new(false);

/// Returns true if the application is shutting down.
pub fn is_shutting_down() -> bool {
    // ## Correctness:
    //
    // Since we're shutting down, and this is a one-time operation,
    // performance is not important. So we use the strongest memory
    // ordering.
    // https://doc.rust-lang.org/nomicon/atomics.html#sequentially-consistent
    IS_SHUTTING_DOWN.load(Ordering::SeqCst)
}

/// Sets the Zebra shutdown flag to `true`.
pub fn set_shutting_down() {
    IS_SHUTTING_DOWN.store(true, Ordering::SeqCst);
}
}

Integration Testing Async Code

Sometimes, it is difficult to unit test async code, because it has complex dependencies. For more details, see the Testing Async Code section.

zebrad's acceptance tests run short Zebra syncs on the Zcash mainnet or testnet. These acceptance tests make sure that zebrad can:

  • sync blocks using its async block download and verification pipeline
  • cancel a sync
  • reload disk state after a restart

These tests were introduced in pull request #1193.

#![allow(unused)]
fn main() {
/// Test if `zebrad` can sync some larger checkpoints on mainnet.
#[test]
fn sync_large_checkpoints_mainnet() -> Result<()> {
    let reuse_tempdir = sync_until(
        LARGE_CHECKPOINT_TEST_HEIGHT,
        Mainnet,
        STOP_AT_HEIGHT_REGEX,
        LARGE_CHECKPOINT_TIMEOUT,
        None,
    )?;

    // if stopping corrupts the rocksdb database, zebrad might hang or crash here
    // if stopping does not write the rocksdb database to disk, Zebra will
    // sync, rather than stopping immediately at the configured height
    sync_until(
        (LARGE_CHECKPOINT_TEST_HEIGHT - 1).unwrap(),
        Mainnet,
        "previous state height is greater than the stop height",
        STOP_ON_LOAD_TIMEOUT,
        Some(reuse_tempdir),
    )?;

    Ok(())
}
}

Instrumenting Async Functions

Sometimes, it is difficult to debug async code, because there are many tasks running concurrently. For more details, see the Monitoring Async Code section.

Zebra runs instrumentation on some of its async function using tracing. Here's an instrumentation example from Zebra's sync block downloader:

#![allow(unused)]
fn main() {
/// Queue a block for download and verification.
///
/// This method waits for the network to become ready, and returns an error
/// only if the network service fails. It returns immediately after queuing
/// the request.
#[instrument(level = "debug", skip(self), fields(%hash))]
pub async fn download_and_verify(&mut self, hash: block::Hash) -> Result<(), Report> {
    ...
}
}

Tracing and Metrics in Async Functions

Sometimes, it is difficult to monitor async code, because there are many tasks running concurrently. For more details, see the Monitoring Async Code section.

Zebra's client requests are monitored via:

  • trace and debug logs using tracing crate
  • related work spans using the tracing crate
  • counters using the metrics crate
#![allow(unused)]
fn main() {
/// Handle an incoming client request, possibly generating outgoing messages to the
/// remote peer.
///
/// NOTE: the caller should use .instrument(msg.span) to instrument the function.
async fn handle_client_request(&mut self, req: InProgressClientRequest) {
    trace!(?req.request);

    let InProgressClientRequest { request, tx, span } = req;

    if tx.is_canceled() {
        metrics::counter!("peer.canceled", 1);
        tracing::debug!("ignoring canceled request");
        return;
    }
    ...
}
}

Reference-level explanation

The reference section contains in-depth information about concurrency in Zebra:

Most Zebra designs or code changes will only touch on one or two of these areas.

After an await, the rest of the Future might not be run

Futures can be "canceled" at any await point. Authors of futures must be aware that after an await, the code might not run. Futures might be polled to completion causing the code to work. But then many years later, the code is changed and the future might conditionally not be polled to completion which breaks things. The burden falls on the user of the future to poll to completion, and there is no way for the lib author to enforce this - they can only document this invariant.

https://github.com/rust-lang/wg-async-foundations/blob/master/src/vision/submitted_stories/status_quo/alan_builds_a_cache.md#-frequently-asked-questions

In particular, FutureExt::now_or_never:

  • drops the future, and
  • doesn't schedule the task for wakeups.

So even if the future or service passed to now_or_never is cloned, the task won't be awoken when it is ready again.

Task starvation

Tokio tasks are scheduled cooperatively:

a task is allowed to run until it yields, indicating to the Tokio runtime’s scheduler that it cannot currently continue executing. When a task yields, the Tokio runtime switches to executing the next task.

If a task doesn't yield during a CPU-intensive operation, or a tight loop, it can starve other tasks on the same thread. This can cause hangs or timeouts.

There are a few different ways to avoid task starvation:

Poll::Pending and Wakeups

When returning Poll::Pending, poll functions must ensure that the task will be woken up when it is ready to make progress.

In most cases, the poll function calls another poll function that schedules the task for wakeup.

Any code that generates a new Poll::Pending should either have:

  • a CORRECTNESS comment explaining how the task is scheduled for wakeup, or
  • a wakeup implementation, with tests to ensure that the wakeup functions as expected.

Note: poll functions often have a qualifier, like poll_ready or poll_next.

Futures-Aware Types

Prefer futures-aware types in complex locking or waiting code, rather than types which will block the current thread.

For example:

  • Use futures::lock::Mutex rather than std::sync::Mutex
  • Use tokio::time::{sleep, timeout} rather than std::thread::sleep

Always qualify ambiguous names like Mutex and sleep, so that it is obvious when a call will block.

If you are unable to use futures-aware types:

  • block the thread for as short a time as possible
  • document the correctness of each blocking call
  • consider re-designing the code to use tower::Services, or other futures-aware types

In some simple cases, std::sync::Mutex is correct and more efficient, when:

  • the value behind the mutex is just data, and
  • the locking behaviour is simple.

In these cases:

wrap the Arc<Mutex<...>> in a struct that provides non-async methods for performing operations on the data within, and only lock the mutex inside these methods

For more details, see the tokio documentation.

Acquiring Buffer Slots, Mutexes, or Readiness

Ideally, buffer slots, mutexes, or readiness should be:

  • acquired with one lock per critical section, and
  • held for as short a time as possible.

If multiple locks are required for a critical section, acquire them in the same order any time those locks are used. If tasks acquire multiple locks in different orders, they can deadlock, each holding a lock that the other needs.

If a buffer, mutex, future or service has complex readiness dependencies, schedule those dependencies separate tasks using tokio::spawn. Otherwise, it might deadlock due to a dependency loop within a single executor task.

Carefully read the documentation of the channel methods you call, to check if they lock. For example, tokio::sync::watch::Receiver::borrow holds a read lock, so the borrowed data should always be cloned. Use Arc for efficient clones if needed.

Never have two active watch borrow guards in the same scope, because that can cause a deadlock. The watch::Sender may start acquiring a write lock while the first borrow guard is active but the second one isn't. That means that the first read lock was acquired, but the second never will be because starting to acquire the write lock blocks any other read locks from being acquired. At the same time, the write lock will also never finish acquiring, because it waits for all read locks to be released, and the first read lock won't be released before the second read lock is acquired.

In all of these cases:

  • make critical sections as short as possible, and
  • do not depend on other tasks or locks inside the critical section.

Acquiring Service Readiness

Note: do not call poll_ready on multiple tasks, then match against the results. Use the ready! macro instead, to acquire service readiness in a consistent order.

Buffer and Batch

The constraints imposed by the tower::Buffer and tower::Batch implementations are:

  1. poll_ready must be called at least once for each call
  2. Once we've reserved a buffer slot, we always get Poll::Ready from a buffer, regardless of the current readiness of the buffer or its underlying service
  3. The Buffer/Batch capacity limits the number of concurrently waiting tasks. Once this limit is reached, further tasks will block, awaiting a free reservation.
  4. Some tasks can depend on other tasks before they resolve. (For example: block validation.) If there are task dependencies, the Buffer/Batch capacity must be larger than the maximum number of concurrently waiting tasks, or Zebra could deadlock (hang).

We also avoid hangs because:

  • the timeouts on network messages, block downloads, and block verification will restart verification if it hangs
  • Buffer and Batch release their reservations when response future is returned by the buffered/batched service, even if the returned future hangs
    • in general, we should move as much work into futures as possible, unless the design requires sequential calls
  • larger Buffer/Batch bounds

Buffered Services

A service should be provided wrapped in a Buffer if:

  • it is a complex service
  • it has multiple callers, or
  • it has a single caller that calls it multiple times concurrently.

Services might also have other reasons for using a Buffer. These reasons should be documented.

Choosing Buffer Bounds

Zebra's Buffer bounds should be set to the maximum number of concurrent requests, plus 1:

it's advisable to set bound to be at least the maximum number of concurrent requests the Buffer will see https://docs.rs/tower/0.4.3/tower/buffer/struct.Buffer.html#method.new

The extra slot protects us from future changes that add an extra caller, or extra concurrency.

As a general rule, Zebra Buffers should all have at least 5 slots, because most Zebra services can be called concurrently by:

  • the sync service,
  • the inbound service, and
  • multiple concurrent zebra-client blockchain scanning tasks.

Services might also have other reasons for a larger bound. These reasons should be documented.

We should limit Buffer lengths for services whose requests or responses contain Blocks (or other large data items, such as Transaction vectors). A long Buffer full of Blocks can significantly increase memory usage.

For example, parsing a malicious 2 MB block can take up to 12 MB of RAM. So a 5 slot buffer can use 60 MB of RAM.

Long Buffers can also increase request latency. Latency isn't a concern for Zebra's core use case as a node software, but it might be an issue if wallets, exchanges, or block explorers want to use Zebra.

Awaiting Multiple Futures

When awaiting multiple futures, Zebra can use biased or unbiased selection.

Typically, we prefer unbiased selection, so that if multiple futures are ready, they each have a chance of completing. But if one of the futures needs to take priority (for example, cancellation), you might want to use biased selection.

Unbiased Selection

The futures::select! and tokio::select! macros select ready arguments at random by default.

To poll a select! in order, pass biased; as the first argument to the macro.

Also consider the FuturesUnordered stream for unbiased selection of a large number of futures. However, this macro and stream require mapping all arguments to the same type.

Consider mapping the returned type to a custom enum with module-specific names.

Biased Selection

The futures::select is biased towards its first argument. If the first argument is always ready, the second argument will never be returned. (This behavior is not documented or guaranteed.) This bias can cause starvation or hangs. Consider edge cases where queues are full, or there are a lot of messages. If in doubt:

  • put shutdown or cancel oneshots first, then timers, then other futures
  • use the select! macro to ensure fairness

Select's bias can be useful to ensure that cancel oneshots and timers are always executed first. Consider the select_biased! macro and FuturesOrdered stream for guaranteed ordered selection of futures. (However, this macro and stream require mapping all arguments to the same type.)

The futures::select Either return type is complex, particularly when nested. This makes code hard to read and maintain. Map the Either to a custom enum.

Replacing Atomics with Channels

If you're considering using atomics, prefer a safe, tested, portable abstraction, like tokio's watch or oneshot channels.

In Zebra, we try to use safe abstractions, and write obviously correct code. It takes a lot of effort to write, test, and maintain low-level code. Almost all of our performance-critical code is in cryptographic libraries. And our biggest performance gains from those libraries come from async batch cryptography.

We are gradually replacing atomics with channels in Zebra.

Atomic Risks

Some atomic sizes and atomic operations are not available on some platforms. Others come with a performance penalty on some platforms.

It's also easy to use a memory ordering that's too weak. Future code changes might require a stronger memory ordering. But it's hard to test for these kinds of memory ordering bugs.

Some memory ordering bugs can only be discovered on non-x86 platforms. And when they do occur, they can be rare. x86 processors guarantee strong orderings, even for Relaxed accesses. Since Zebra's CI all runs on x86 (as of June 2021), our tests get AcqRel orderings, even when we specify Relaxed. But ARM processors like the Apple M1 implement weaker memory orderings, including genuinely Relaxed access. For more details, see the hardware reordering section of the Rust nomicon.

But if a Zebra feature requires atomics:

  1. use an AtomicUsize with the strongest memory ordering (SeqCst)
  2. use a weaker memory ordering, with:
  • a correctness comment,
  • multithreaded tests with a concurrency permutation harness like loom, on x86 and ARM, and
  • benchmarks to prove that the low-level code is faster.

Tokio's watch channel uses SeqCst for reads and writes to its internal "version" atomic. So Zebra should do the same.

Testing Async Code

Zebra's existing acceptance and integration tests will catch most hangs and deadlocks.

Some tests are only run after merging to main. If a recently merged PR fails on main, we revert the PR, and fix the failure.

Some concurrency bugs only happen intermittently. Zebra developers should run regular full syncs to ensure that their code doesn't cause intermittent hangs. This is particularly important for code that modifies Zebra's highly concurrent crates:

  • zebrad
  • zebra-network
  • zebra-state
  • zebra-consensus
  • tower-batch-control
  • tower-fallback

Monitoring Async Code

Zebra uses the following crates for monitoring and diagnostics:

These introspection tools are also useful during testing:

  • tracing logs individual events
    • spans track related work through the download and verification pipeline
  • metrics monitors overall progress and error rates
    • labels split counters or gauges into different categories (for example, by peer address)

Drawbacks

Implementing and reviewing these constraints creates extra work for developers. But concurrency bugs slow down every developer, and impact users. And diagnosing those bugs can take a lot of developer effort.

Unresolved questions

Can we catch these bugs using automated tests?

How can we diagnose these kinds of issues faster and more reliably?

Summary

This document describes how to verify the Zcash chain and transaction value pools in Zebra.

Motivation

In the Zcash protocol there are consensus rules that:

  • prohibit negative chain value pools ZIP-209, and
  • restrict the creation of new money to a specific number of coins in each coinbase transaction. Spec Section 3.4

These rules make sure that a fixed amount of Zcash is created by each block, even if there are vulnerabilities in some shielded pools.

(Checking the coins created by coinbase transactions and funding streams is out of scope for this design.)

Definitions

Transaction Value Balances

  • transaction value pool - The unspent input value in a transaction. Inputs add value, outputs remove value, and value balances modify value. The pool represents the sum of transparent and shielded inputs, minus the sum of transparent and shielded outputs.
  • value balance - The change in a transaction's value pool. There is a separate value balance for each transparent and shielded pool.
  • transparent value balance - The change in the transaction value pool, due to transparent inputs and outputs. The sum of the UTXOs spent by transparent inputs in tx_in fields, minus the sum of newly created outputs in tx_out fields.
  • sprout value balance - The change in the transaction value pool, due to sprout JoinSplits. The sum of all v_sprout_new fields, minus the sum of all v_sprout_old fields.
  • sapling value balance - The change in the transaction value pool, due to sapling Spends and Outputs. Equal to the valueBalanceSapling field.
  • orchard value balance - The change in the transaction value pool, due to orchard Actions. Equal to the valueBalanceOrchard field.
  • remaining transaction value - The unspent value in the transaction value pool. The sum of the transparent and shielded value balances in each transaction. This value is equal to the transaction value pool after we know the values of all the input UTXOs.
  • coinbase transaction - A transaction which spends newly created value (coinbase), and the remaining value of other transactions in its block (miner fees). Coinbase transactions do not have any other inputs, so they can't spend the outputs of other transactions.

Chain Value Pools

Note: chain value pools and transaction value balances have opposite signs.

  • chain value pool balance - The total value of unspent outputs in the chain, for each transparent and shielded pool. The sum of all block chain value pool changes in the chain. Each of the transparent, sprout, sapling, and orchard chain value pool balances must be non-negative.
  • block chain value pool change - The change in the chain value pools caused by a block. The negative sum of all the value balances in each block.

Guide-level explanation

Transaction Value Balances

Each transaction has an individual value pool, containing its unspent input value.

Spent transparent inputs add value to this pool, and newly created transparent outputs remove value. Similarly, Sprout JoinSplits have a field that adds value to the transaction pool, and a field that removes value. These transparent and sprout values are unsigned.

Sapling and Orchard have a single signed value balance per transaction, which modifies the transaction value pool.

We need to check that each transaction's total output value is less than or equal to the total input value. The remaining value in the transaction must not be negative. This makes sure that transactions can only spend up to their total input value. (Only coinbase transactions can create new value.)

In the spec, this is called the remaining value in the transparent transaction value pool. But in Zebra, we don't assign this value to a specific pool. We just call it the transaction value pool.

Chain Value Pools

There is one chain value pool for transparent funds, and one for each kind of shielded transfer, containing their unspent outputs.

These value pools are updated using chain value pool changes, which are the negation of transaction value balances. (Transaction value balances use unspent input value, but chain value balances use unspent outputs.)

Each of the chain value pools can change its value with every block added to the chain. This is a state feature and Zebra handle this in the zebra-state crate. We propose to store the pool values for the finalized tip height on disk.

We need to check each chain value pool as blocks are added to the chain, to make sure that chain balances never go negative.

Summary of the implementation:

  • Create a new type ValueBalance that will contain Amounts for each pool(transparent, sprout, sapling, orchard).
  • Create value_pool() methods on each relevant submodule (transparent, joinsplit, sapling and orchard).
  • Create a value_pool() method in transaction with all the above and in block with all the transaction value balances.
  • Pass the value balance of the incoming block into the state.
  • Get a previously stored value balance.
  • With both values check the consensus rules (constraint violations).
  • Update the saved values for the new tip.

Reference-level explanation

Consensus rules

Shielded Chain Value Pools

Consensus rules:

If any of the "Sprout chain value pool balance", "Sapling chain value pool balance", or "Orchard chain value pool balance" would become negative in the block chain created as a result of accepting a block, then all nodes MUST reject the block as invalid.

Nodes MAY relay transactions even if one or more of them cannot be mined due to the aforementioned restriction.

https://zips.z.cash/zip-0209#specification

Transparent Transaction Value Pool & Remaining Value

The unspent input value in a transaction: the sum of the transaction value balances.

Consensus rules:

Transparent inputs to a transaction insert value into a transparent transaction value pool associated with the transaction, and transparent outputs remove value from this pool.

As in Bitcoin, the remaining value in the transparent transaction value pool of a non-coinbase transaction is available to miners as a fee. The remaining value in the transparent transaction value pool of a coinbase transaction is destroyed.

The remaining value in the transparent transaction value pool MUST be nonnegative.

https://zips.z.cash/protocol/protocol.pdf#transactions

In Zebra, the remaining value in non-coinbase transactions is not assigned to any particular pool, until a miner spends it as part of a coinbase output.

Sprout Chain Value Pool

Consensus rules:

Each JoinSplit transfer can be seen, from the perspective of the transparent transaction value pool, as an input and an output simultaneously.

v_sprout_old takes value from the transparent transaction value pool and v_sprout_new adds value to the transparent transaction value pool . As a result, v_sprout_old is treated like an output value, whereas v_sprout_new is treated like an input value.

As defined in ZIP-209, the Sprout chain value pool balance for a given block chain is the sum of all v_sprout_old field values for transactions in the block chain, minus the sum of all v_sprout_new fields values for transactions in the block chain.

If the Sprout chain value pool balance would become negative in the block chain created as a result of accepting a block, then all nodes MUST reject the block as invalid.

https://zips.z.cash/protocol/protocol.pdf#joinsplitbalance

Sapling Chain Value Pool

Consensus rules:

A positive Sapling balancing value takes value from the Sapling transaction value pool and adds it to the transparent transaction value pool. A negative Sapling balancing value does the reverse. As a result, positive vbalanceSapling is treated like an input to the transparent transaction value pool, whereas negative vbalanceSapling is treated like an output from that pool.

As defined in ZIP-209, the Sapling chain value pool balance for a given block chain is the negation of the sum of all valueBalanceSapling field values for transactions in the block chain.

If the Sapling chain value pool balance would become negative in the block chain created as a result of accepting a block, then all nodes MUST reject the block as invalid.

https://zips.z.cash/protocol/protocol.pdf#saplingbalance

Orchard Chain Value Pool

Consensus rules:

Orchard introduces Action transfers, each of which can optionally perform a spend, and optionally perform an output. Similarly to Sapling, the net value of Orchard spends minus outputs in a transaction is called the Orchard balancing value, measured in zatoshi as a signed integer vbalanceOrchard.

vbalanceOrchard is encoded in a transaction as the field valueBalanceOrchard. If a transaction has no Action descriptions, vbalanceOrchard is implicitly zero. Transaction fields are described in § 7.1 ‘Transaction Encoding and Consensus’ on p. 116.

A positive Orchard balancing value takes value from the Orchard transaction value pool and adds it to the transparent transaction value pool. A negative Orchard balancing value does the reverse. As a result, positive vbalanceOrchard is treated like an input to the transparent transaction value pool, whereas negative vbalanceOrchard is treated like an output from that pool.

Similarly to the Sapling chain value pool balance defined in ZIP-209, the Orchard chain value pool balance for a given block chain is the negation of the sum of all valueBalanceOrchard field values for transactions in the block chain.

If the Orchard chain value pool balance would become negative in the block chain created as a result of accepting a block , then all nodes MUST reject the block as invalid.

https://zips.z.cash/protocol/protocol.pdf#orchardbalance

Transparent Chain Value Pool

Consensus rule:

Transfers of transparent value work essentially as in Bitcoin

https://zips.z.cash/protocol/protocol.pdf#overview

There is no explicit Zcash consensus rule that the transparent chain value pool balance must be non-negative. But an equivalent rule must be enforced by Zcash implementations, so that each block only creates a fixed amount of coins.

Specifically, this rule can be derived from other consensus rules:

  • a transparent output must have a non-negative value,
  • a transparent input can only spend an unspent transparent output,
  • so, there must be a non-negative remaining value in the transparent transaction value pool.

Some of these consensus rules are derived from Bitcoin, so they may not be documented in the Zcash Specification.

Coinbase Transactions

In this design, we assume that all coinbase outputs are valid, to avoid checking the newly created coinbase value, and the miner fees.

The coinbase value and miner fee rules will be checked as part of a future design.

Exceptions and Edge Cases

Value pools and value balances include the value of all unspent outputs, regardless of whether they can actually be spent.

For example:

  • transparent outputs which have unsatisfiable lock scripts
  • shielded outputs which have invalid private keys

However, some value is not part of any output:

  • if created value or miner fees are not spent in a coinbase transaction, they are destroyed
  • since coinbase transaction output values are rounded to the nearest zatoshi, any fractional part of miner-controlled or funding stream outputs is destroyed by rounding

Therefore:

  • the total of all chain value pools will always be strictly less than MAX_MONEY, and
  • the current total of all chain value pools will always be less than or equal to the number of coins created in coinbase transactions.

These properties are implied by other consensus rules, and do not need to be checked separately.

Proposed Implementation

Create a new ValueBalance type

  • Code will be located in a new file: zebra-chain/src/value_balance.rs.
  • Supported operators apply to all the Amounts inside the type: +, -, +=, -=, sum().
  • Implementation of the above operators are similar to the ones implemented for Amount<C> in zebra-chain/src/amount.rs. In particular, we want to return a Result on them so we can error when a constraint is violated.
  • We will use Default to represent a totally empty ValueBalance, this is the state of all pools at the genesis block.
#![allow(unused)]
fn main() {
#[serde(bound = "C: Constraint")]
struct ValueBalance<C = NegativeAllowed> {
    transparent: Amount<C>,
    sprout: Amount<C>,
    sapling: Amount<C>,
    orchard: Amount<C>,
}

impl ValueBalance {
    /// [Consensus rule]: The remaining value in the transparent transaction value pool MUST be nonnegative.
    ///
    /// This rule applies to Block and Mempool transactions.
    ///
    /// [Consensus rule]: https://zips.z.cash/protocol/protocol.pdf#transactions
    fn remaining_transaction_value(&self) -> Result<Amount<NonNegative>, Err> {
        // This rule checks the transparent value balance minus the sum of the sprout, sapling, and orchard
        // value balances in a transaction is nonnegative
        self.transparent - [self.sprout + self.sapling + self.orchard].sum()
    }
}

impl Add for Result<ValueBalance<C>>
where
    C: Constraint,
{

}

impl Sub for Result<ValueBalance<C>>
where
    C: Constraint,
{

}

impl AddAssign for Result<ValueBalance<C>>
where
    C: Constraint,
{

}

impl SubAssign for Result<ValueBalance<C>>
where
    C: Constraint,
{

}

impl Sum for Result<ValueBalance<C>>
where
    C: Constraint,
{

}

impl Default for ValueBalance<C>
where
    C: Constraint,
{

}
}

Create a method in Transaction that returns ValueBalance<NegativeAllowed> for the transaction

We first add value_balance() methods in all the modules we need and use them to get the value balance for the whole transaction.

Create a method in Input that returns ValueBalance<NegativeAllowed>

  • Method location is at zebra-chain/src/transparent.rs.
  • Method need utxos, this information is available in verify_transparent_inputs_and_outputs.
  • If the utxos are not available in the block or state, verification will timeout and return an error
#![allow(unused)]
fn main() {
impl Input {
    fn value_balance(&self, utxos: &HashMap<OutPoint, Utxo>) -> ValueBalance<NegativeAllowed> {

    }
}
}

Create a method in Output that returns ValueBalance<NegativeAllowed>

  • Method location is at zebra-chain/src/transparent.rs.
#![allow(unused)]
fn main() {
impl Output {
    fn value_balance(&self) -> ValueBalance<NegativeAllowed> {

    }
}
}

Create a method in JoinSplitData that returns ValueBalance<NegativeAllowed>

  • Method location is at zebra-chain/src/transaction/joinsplit.rs
#![allow(unused)]
fn main() {
pub fn value_balance(&self) -> ValueBalance<NegativeAllowed> {

}
}

Create a method in sapling::ShieldedData that returns ValueBalance<NegativeAllowed>

  • Method location is at zebra-chain/src/transaction/sapling/shielded_data.rs
#![allow(unused)]
fn main() {
pub fn value_balance(&self) -> ValueBalance<NegativeAllowed> {

}
}

Create a method in orchard::ShieldedData that returns ValueBalance<NegativeAllowed>

  • Method location is at zebra-chain/src/transaction/orchard/shielded_data.rs
#![allow(unused)]
fn main() {
pub fn value_balance(&self) -> ValueBalance<NegativeAllowed> {

}
}

Create the Transaction method

  • Method location: zebra-chain/src/transaction.rs
  • Method will use all the value_balances() we created until now.
#![allow(unused)]
fn main() {
/// utxos must contain the utxos of every input in the transaction,
/// including UTXOs created by earlier transactions in this block.
pub fn value_balance(&self, utxos: &HashMap<transparent::OutPoint, Utxo>) -> ValueBalance<NegativeAllowed> {

}
}

Create a method in Block that returns ValueBalance<NegativeAllowed> for the block

  • Method location is at zebra-chain/src/block.rs.
  • Method will make use of Transaction::value_balance method created before.
#![allow(unused)]
fn main() {
/// utxos must contain the utxos of every input in the transaction,
/// including UTXOs created by a transaction in this block,
/// then spent by a later transaction that's also in this block.
pub fn value_balance(&self, utxos: &HashMap<transparent::OutPoint, Utxo>) -> ValueBalance<NegativeAllowed> {
    self.transactions()
        .map(Transaction::value_balance)
        .sum()
        .expect("Each block should have at least one coinbase transaction")
}
}

Check the remaining transaction value consensus rule

  • Do the check in zebra-consensus/src/transaction.rs
  • Make the check part of the basic checks
#![allow(unused)]
fn main() {
..
// Check the remaining transaction value consensus rule:
tx.value_balance().remaining_transaction_value()?;
..
}

Pass the value balance for this block from the consensus into the state

  • Add a new field into PreparedBlock located at zebra-state/src/request.rs, this is the NonFinalized section of the state.
#![allow(unused)]
fn main() {
pub struct PreparedBlock {
    ..
    /// The value balances for each pool for this block.
    pub block_value_balance: ValuePool<NegativeAllowed>,
}
}
  • In zebra-consensus/src/block.rs pass the value balance to the zebra-state:
#![allow(unused)]
fn main() {
let block_value_balance = block.value_balance();
let prepared_block = zs::PreparedBlock {
    ..
    block_value_balance,
};
}

Add a value pool into the state Chain struct

  • This is the value pool for the non finalized part of the blockchain.
  • Location of the Chain structure where the pool field will be added: zebra-state/src/service/non_finalized_state/chain.rs
#![allow(unused)]
fn main() {
pub struct Chain {
    ..
    /// The chain value pool balance at the tip of this chain.
    value_pool: ValueBalance<NonNegative>,
}
}
  • Add a new argument finalized_tip_value_balance to the commit_new_chain() method located in the same file.
  • Pass the new argument to the Chain in:
#![allow(unused)]
fn main() {
let mut chain = Chain::new(finalized_tip_history_tree, finalized_tip_value_balance);
}

Note: We don't need to pass the finalized tip value balance into the commit_block() method.

Check the consensus rules when the chain is updated or reversed

  • Location: zebra-state/src/service/non_finalized_state/chain.rs
#![allow(unused)]
fn main() {
impl UpdateWith<ValueBalance<NegativeAllowed>> for Chain {
    fn update_chain_state_with(&mut self, value_balance: &ValueBalance<NegativeAllowed>) -> Result<(), Err> {
        self.value_pool = (self.value_pool + value_balance)?;
        Ok(())
    }
    fn revert_chain_state_with(&mut self, value_balance: &ValueBalance<NegativeAllowed>) -> Result<(), Err> {
        self.value_pool = (self.value_pool + value_balance)?;
        Ok(())
    }
}
}

Changes to finalized state

The state service will call commit_new_chain(). We need to pass the value pool from the disk into this function.

#![allow(unused)]
fn main() {
self.mem
    .commit_new_chain(prepared, self.disk.history_tree(), self.disk.get_pool())?;
}

We now detail what is needed in order to have the get_pool() method available.

Serialization of ValueBalance<C>

In order to save ValueBalance into the disk database we must implement IntoDisk and FromDisk for ValueBalance and for Amount:

#![allow(unused)]
fn main() {
impl IntoDisk for ValueBalance<C> {
    type Bytes = [u8; 32];

    fn as_bytes(&self) -> Self::Bytes {
        [self.transparent.to_bytes(), self.sprout.to_bytes(),
        self.sapling.to_bytes(), self.orchard.to_bytes()].concat()
    }
}

impl FromDisk for ValueBalance<C> {
    fn from_bytes(bytes: impl AsRef<[u8]>) -> Self {
        let array = bytes.as_ref().try_into().unwrap();
        ValueBalance {
            transparent: Amount::from_bytes(array[0..8]).try_into().unwrap()
            sprout: Amount::from_bytes(array[8..16]).try_into().unwrap()
            sapling: Amount::from_bytes(array[16..24]).try_into().unwrap()
            orchard: Amount::from_bytes(array[24..32]).try_into().unwrap()
        }
    }
}

impl IntoDisk for Amount {
    type Bytes = [u8; 8];

    fn as_bytes(&self) -> Self::Bytes {
        self.to_bytes()
    }
}

impl FromDisk for Amount {
    fn from_bytes(bytes: impl AsRef<[u8]>) -> Self {
        let array = bytes.as_ref().try_into().unwrap();
        Amount::from_bytes(array)
    }
}
}

The above code is going to need a Amount::from_bytes new method.

Add a from_bytes method in Amount

  • Method location is at zebra-chain/src/amount.rs
  • A to_bytes() method already exist, place from_bytes() right after it.
#![allow(unused)]
fn main() {
/// From little endian byte array
pub fn from_bytes(&self, bytes: [u8; 8]) -> Self {
    let amount = i64::from_le_bytes(bytes).try_into().unwrap();
    Self(amount, PhantomData)
}
}

Changes to zebra-state/src/request.rs

Add a new field to FinalizedState:

#![allow(unused)]
fn main() {
pub struct FinalizedBlock {
    ..
    /// The value balance for transparent, sprout, sapling and orchard
    /// inside all the transactions of this block.
    pub(crate) block_value_balance: ValueBalance<NegativeAllowed>,
}
}

Populate it when PreparedBlock is converted into FinalizedBlock:

#![allow(unused)]
fn main() {
impl From<PreparedBlock> for FinalizedBlock {
    fn from(prepared: PreparedBlock) -> Self {
        let PreparedBlock {
            ..
            block_value_balance,
        } = prepared;
        Self {
            ..
            block_value_balance,
        }
    }
}
}

Changes to zebra-state/src/service/finalized_state.rs

First we add a column of type ValueBalance that will contain Amounts for all the pools: transparent, sprout, sapling, orchard:

#![allow(unused)]
fn main() {
rocksdb::ColumnFamilyDescriptor::new("tip_chain_value_pool", db_options.clone()),
}

At block commit(commit_finalized_direct()) we create the handle for the new column:

#![allow(unused)]
fn main() {
let tip_chain_value_pool = self.db.cf_handle("tip_chain_value_pool").unwrap();
}

Next we save each tip value pool into the field for each upcoming block except for the genesis block:

#![allow(unused)]
fn main() {
// Consensus rule: The block height of the genesis block is 0
// https://zips.z.cash/protocol/protocol.pdf#blockchain
if height == block::Height(0) {
    batch.zs_insert(tip_chain_value_pool, height, ValueBalance::default());
} else {
    let current_pool = self.current_value_pool();
    batch.zs_insert(tip_chain_value_pool, height, (current_pool + finalized.block_value_balance)?);
}
}

The current_value_pool() function will get the stored value of the pool at the tip as follows:

#![allow(unused)]
fn main() {
pub fn current_value_pool(&self) -> ValuePool<NonNegative> {
    self.db.cf_handle("tip_chain_value_pool")
}
}

Test Plan

Unit tests

  • Create a transaction that has a negative remaining value.
    • Test that the transaction fails the verification in Transaction::value_balance()
    • To avoid passing the utxo we can have 0 as the amount of the transparent pool and some negative shielded pool.

Prop tests

  • Create a chain strategy that ends up with a valid value balance for all the pools (transparent, sprout, sapling, orchard)
    • Test that the amounts are all added to disk.
  • Add new blocks that will make each pool became negative.
    • Test for constraint violations in the value balances for each case.
    • Failures should be at update_chain_state_with().
  • Test consensus rules success and failures in revert_chain_state_with()
    • TODO: how?
  • serialize and deserialize ValueBalance using IntoDisk and FromDisk

Manual tests

  • Zebra must sync up to tip computing all value balances and never breaking the value pool rules.

Future Work

Add an extra state request to verify the speculative chain balance after applying a Mempool transaction. (This is out of scope for our current NU5 and mempool work.)

Note: The chain value pool balance rules apply to Block transactions, but they are optional for Mempool transactions:

Nodes MAY relay transactions even if one or more of them cannot be mined due to the aforementioned restriction.

https://zips.z.cash/zip-0209#specification

Since Zebra does chain value pool balance validation in the state, we want to skip verifying the speculative chain balance of Mempool transactions.

API Reference

The Zcash Foundation maintains the following API documentation for Zebra: