Architecture
Goals
- Researchers — reproducible, statistically defensible comparisons (CIs, effect sizes, non-parametric tests).
- Serializer authors — A/B compare versions of the same serializer (
--compare-a/--compare-b). - System integrators — swap payloads and environments; keep the same metrics pipeline.
- Extensibility — add languages without rewriting analysis.
Layout
config/benchmark_config.yaml # master parameters
schemas/ # shared test data config + protos
logs/<language>/ # per-language CSV outputs
analysis/ # scientific stats + reports
python/ | c-sharp/ | rust/ | c/ | javascript/ # language harnesses
docs/ # theory + per-language serializer docs
scripts/run-all-benchmarks.sh # orchestrator
Measurement model
prepare(type) # untimed: codecs, schemas, buffers
prepare_data(fixture) # untimed: convert to serializer-native model
for i in 0..N-1:
t0 = now()
bytes = serialize(obj)
t1 = now()
obj2 = deserialize(bytes)
t2 = now()
assert fidelity(fixture, obj2)
log(ser=t1-t0, deser=t2-t1, size=len(bytes), i)
Repetition i=0 is warmup (excluded from aggregates).
Statistics pipeline
- Parse CSV (normalize ticks → ns for C#)
- Drop warmup
- IQR outlier filter (configurable)
- Mean/median/std/MAD/CV/percentiles
- Bootstrap CI on mean (default 95%, 2000 resamples)
- Within-group effect sizes vs fastest (Cliff's δ, Hedges' g)
- Optional Mann–Whitney + Holm for version A/B
See Analysis Methodology.
Configuration
Most constants live in config/benchmark_config.yaml. Test-data shape lives in schemas/test_data_config.json (shared with existing generators).
Adding languages
See Adding a Language.