ZK System Benchmarking

Jan 11th, 2023
Delendum Research

Delendum Research, Miden, Risc Zero

By Bobbin Threadbare, Dominik Schmid, Tim Carstens, Brian Retford, Daniel Lubarov, Andrei Nagornyi and Ventali Tan

Github repository: delendum-xyz/zk-benchmarking

Table of Content

Introduction

The purpose of this benchmarking project is to produce a collection of standard benchmarks for comparing the performance of different zero-knowledge proof libraries, in an easy-to-read format, including graphs and tables. The first question we ask ourselves is - who is the target audience?

Is it a) developers, who are not necessarily familiar with the internals of ZK systems, but are looking to make an informed decision based on the tradeoffs involved with using a given proving system for a concrete, real-world use case, or b) individuals who understand the broad underlying primitives and are interested in the performance of key cryptographic building blocks?

Our goal is to cover both1.

Principles

Relevant

Our benchmarks aim to measure the performance of realistic, relevant tasks and use cases. This allows third-party engineers to estimate how each proof system would perform in their application.

Neutral

We do not favor or disfavor any particular proof system. We seek to allow each proof system to show its best features in a variety of realistic test cases.

Idiomatic

We allow each proof system to interpret each benchmark in whatever way makes the most sense for that system. This allows each proof system to showcase their performance when being used idiomatically.

Reproducible

Our measurements are automated and reproducible.

What are we testing?

Performance

We measure proving time2 in various standard hardware environments, thereby allowing each proof system to showcase its real-world behavior when running on commonly available systems.

For each benchmark, we measure the performance of these tasks:

We start with smaller computations such as iterated hashing, merkle inclusion proof, and recursion, and will eventually move on to larger end-to-end scenarios.

The following graph explains our approach:

graph

Definitions

Here are the definitions of the three tests mentioned above:

Iterated hashing

Iterated hashing is an essential building block for Merkle tree structures and whenever one needs to succinctly commit larger amounts of data.

To benchmark iterative hashing we compute a hash chain as H(H(H(...H(x)))), where H() is a cryptographic hash function, for some input x. As input x we chose a 32-bytes input [0_u8; 32] and the number of invocations of H() defines the length of the hash chain.

Merkle inclusion proof

A merkle inclusion proof proves that a certain leaf is part of a merkle tree. The VM is given the leaf and the merkle path and needs to prove inclusion by iterated hashing. We define the benchmark to pick a Merkle path of depth 32 and then the job_size is the number of Merkle paths we verify.

[Benchmarks will be presented soon]

Recursion

[Benchmarks will be presented soon]

Other factors

The following factors are also very important to consider and we will try to describe them in future:

Security

Ease of building new apps

Upgradability

Sample results

Currently, the following ZK systems are benchmarked.

System ZKP System Default Security Level
Polygon Miden STARK 96 bits
RISC Zero STARK 100 bits

Prover performance

The table below shows the time it takes to generate a proof for a hash chain of a given length using a given hash function. This time includes the time needed to generate the witness for the computation. Time shown is in seconds.

Prover time (sec) SHA256 BLAKE3 RP64_256
10 100 10 100 100 1000
Apple M2 (4P + 4E cores), 8GB RAM
Miden VM 1.91 40.39 0.96 9.87 0.05 0.28
RISC Zero 1.29 5.48
AWS Graviton 3 (64 cores), 128 GB RAM
Miden VM 0.49 3.99 0.33 2.06 0.05 0.13
RISC Zero 0.40 1.59

A few notes:

Verifier performance

The table below shows the time it takes to verify a proof of correctly computing a hash chain of a given length and a given hash function. Time shown is in milliseconds.

Verifier time (ms) SHA256 BLAKE3 RP64_256
10 100 10 100 100 1000
Apple M2 (4P + 4E cores), 8GB RAM
Miden VM 2.42 3.73 2.56 2.52 2.28 2.42
RISC Zero 1.92 2.44
AWS Graviton 3 (64 cores), 128 GB RAM
Miden VM 3.26 3.54 3.24 3.47 2.81 3.04
RISC Zero 3.03 4.05

Proof size

The table below shows the size of a generated proof in kilobytes. Proof sizes do not depend on the platform used to generate proofs.

Proof size (KB) SHA256 BLAKE3 RP64_256
10 100 10 100 100 1000
Miden VM 87.7 105.0 81.3 98.4 56.2 71.0
RISC Zero 183.4 205.1

Roadmap

  1. Identify key performance metrics for evaluating the efficiency of zero-knowledge proof libraries (completed)
  2. Develop a set of test cases that exercise a range of different functionality in the libraries, including creating and verifying proofs for various types of statements (completed)
  3. Implement automation scripts for running the test cases and collecting performance data (completed)
  4. Set up continuous integration to regularly run the benchmarking suite and track performance over time (ongoing)
  5. Collaborate with other organizations to add more libraries, incorporate their test cases and use cases into the benchmarking suite (TODO)
  6. Publish the results of the benchmarking efforts in a public repository, along with documentation and scripts for reproducing the benchmarks (completed)
  7. Continuously update and improve the benchmarking suite as new features and functionality are added to the libraries (ongoing)

Call for collaboration

We are seeking collaborators to help us expand and improve our benchmarking efforts. If you are interested in joining us in this effort, there are several ways you can contribute:

To get involved, please reach out to us via the repository’s issue tracker or email research@delendum.xyz. We welcome contributions from individuals and organizations of all experience levels and backgrounds. Together, we can create a comprehensive and reliable benchmarking resource for the ZK community.

Notes

  1. You may find arm benchmarking whitepaper (here) and programming language benchmarks game (here) interesting 

  2. We don’t measure memory usage right now, but it would be interesting/valuable to add it in future.