
dag_exec: a std-only DAG executor for CPU-heavy pipelines (pruning + bounded parallelism)
TL;DR —
dag_execis a small, std-only executor for CPU-heavy pipelines that are naturally graph-shaped.
You request the outputs you care about, it prunes the rest, and it can run independent nodes in parallel with explicit bounds (no async runtime, no Rayon).
Links: crate + docs + repo
- Crate: https://crates.io/crates/dag_exec
- Docs: https://docs.rs/dag_exec/latest/dag_exec/
- Repo: https://github.com/reymom/rust-dag-executor
Why this crate exists
A lot of “real” pipelines aren’t linear — they’re dependency graphs:
- hashing / commitments / decoding
- multi-stage indexing
- proof plumbing (witness/trace → commitments → aggregation)
- “compute multiple artifacts from the same input” workflows
Two things show up over and over:
- You often don’t need all outputs. Computing everything and throwing most away is wasteful.
- When nodes are CPU-heavy, you want parallelism — but with predictable bounds (no unbounded dispatch / memory blowups).
dag_exec is a small executor for that niche: bounded parallel execution + partial evaluation in a tiny, dependency-free package.
When to use it (and when not to)
Use dag_exec if:
- your pipeline is naturally a dependency graph (DAG)
- nodes are CPU-ish (hashing, decoding, proof steps, indexing stages)
- you want a std-only solution
- you care about explicit bounds:
max_in_flight, per-worker queue caps
Don’t use it if:
- your workload is I/O bound and naturally async (Tokio/async is a better fit)
- you just want “parallel map” over independent items (Rayon is simpler)
- you need distributed execution / caching / retries / persistence (that’s a bigger system)
What you get
1) Partial evaluation (pruning)
You run the DAG requesting only the outputs you want; the executor evaluates only the ancestors needed to produce them.
Expected shape (merkle example demo):
- request root →
leaf_hash=16,internal_hash=15 - request subtree →
leaf_hash=4,internal_hash=3
This is the most important property: fewer requested outputs ⇒ fewer executed nodes.
2) Bounded parallelism (std threads)
The parallel executor uses a worker pool and explicit caps:
max_in_flight: global bound on queued + running tasksworker_queue_cap: per-worker bounded queue
This keeps dispatch predictable instead of “spawn forever”.
Results at a glance (from examples/rollup.rs):
- Pruning: request one chunk proof ⇒ 16 → 4 leaf hashes
- Parallel (heavy mode): ~3× wall-time improvement @4 workers
- Backpressure: parallel scheduler bounded by
max_in_flight+ per-worker queue caps
Minimal API (60 seconds)
use dag_exec::{DagBuilder, Executor, ExecutorConfig};
use std::sync::Arc;
let mut b = DagBuilder::<String, i32, ()>::new();
b.add_source("a".into(), 1)?;
b.add_source("b".into(), 2)?;
b.add_task("c".into(), vec!["a".into(), "b".into()], |xs: &[Arc<i32>]| Ok(*xs[0] + *xs[1]))?;
let dag = b.build()?;
let exec = Executor::new(ExecutorConfig::default());
let out = exec.run_sequential(&dag, ["c".to_string()])?;
assert_eq!(*out["c"], 3);
# Ok::<(), dag_exec::BuildError<String>>(())Examples (copy/paste runnable)
I kept the examples intentionally “infra-shaped”:
1) examples/pipeline.rs — branch pruning by output selection
Demonstrates output-driven pruning across branches (commitment vs receipts, etc).
cargo run --example pipeline
DAG_EXEC_HASH_ITERS=200000 cargo run --release --example pipeline2) examples/merkle.rs — pruning-only demo (pure and fast)
Showcases pruning in a basic Merkle tree example:
cargo run --example merkle3) examples/rollup.rs — rollup-shaped pipeline (prune + parallel + real numbers)
This is the “flagship”: leaf commitments → chunk roots → batch root, plus an inclusion-proof artifact node. Also includes honest perf numbers showing when parallelism loses (tiny work) and when it wins (heavy work).
cargo run --release --example rollup
DAG_EXEC_MAX_WORKERS=4 DAG_EXEC_HASH_ITERS=200000 cargo run --release --example rollup
DAG_EXEC_MAX_WORKERS=4 DAG_EXEC_HASH_ITERS=2000000 cargo run --release --example rollupWhat to look for (in heavy mode):
- requesting one chunk proof drops work from 16 leaves → 4 leaves
- parallel batch root can drop wall-clock by ~3× at 4 workers (~24ms → ~8ms in heavy mode)
Benchmarks (criterion)
cargo benchNotes:
chain_*has no inherent parallelism → measures scheduler overhead.fanout_*_heavyshows the parallel win when nodes are CPU-heavy.
If you want the HTML report locally:
# linux
xdg-open target/criterion/report/index.htmlDesign notes & honest tradeoffs
- For tiny per-node work, parallel can be slower (thread + channel overhead dominates).
- This is not for I/O-bound async workloads — Tokio/async is a better fit.
- This is not “Rayon replacement” — it’s a DAG scheduler with pruning + explicit backpressure.
Build log / posts from the implementation
This crate came out of a few focused engineering problems:
-
Backpressure in the parallel scheduler (bounded queues + global in-flight cap) https://www.reymom.xyz/blog/rust/2026-02-21-backpressure-in-parallel-executor
-
Testing concurrency invariants deterministically (no
sleep()) https://www.reymom.xyz/blog/rust/2026-02-24-testing-invariants-atomics -
Rollup-shaped pipeline as a DAG (pruning + proof artifact + real perf numbers) https://www.reymom.xyz/blog/rust/2026-03-01-merkle-rollup-exec-dag
If you try dag_exec in a real pipeline (proof systems, indexing, commitments), I’d love to hear what worked and what didn’t.
Stay Updated
Get notified when I publish new articles about Web3 development, hackathon experiences, and cryptography insights.
You might also like

Designing Backpressure in a Parallel DAG Executor
How I introduced bounded backpressure into a parallel DAG scheduler using sync channels and an in-flight cap.

Rollup Commitments as a DAG: Pruning, Parallelism, and Proof Plumbing (Rust std-only)
A rollup-shaped batch pipeline modeled as a DAG: compute only the chunk/proof you need, and scale CPU-heavy hashing with bounded parallelism — all in std.

Testing Concurrency Invariants in a Parallel Executor
How to verify max_in_flight bounds using AtomicUsize, CAS loops, and deterministic gating without sleep().