Adding a Language Benchmark
This project is designed to grow language harnesses without changing the analysis core.
1. Register the language
Edit config/benchmark_config.yaml:
languages:
go: # example
display_name: Go
enabled: true
runner_dir: go
runner_script: scripts/run-benchmarks.sh
log_dir: logs/go
time_unit: nanoseconds
docs_dir: docs/go
serializers: [...]
Add paths.language_log_dirs.go: logs/go.
2. Implement the harness contract
| Requirement | Detail |
|---|---|
| Output CSV | logs/<lang>/benchmark-log.csv with schema in csv_schema |
Language column |
Must match the language id (e.g. go) |
| Time unit | Nanoseconds for all new runners |
| Modes | bytes and stream (or string/stream if that matches existing C# convention) |
| Warmup | Repetition index 0 is excluded by analysis |
| Prepare outside loop | Schema compile, type registration, buffer pools — not timed |
| Timed section | Serialize + deserialize only |
| Fidelity | Round-trip semantic check; record error in benchmark-errors.csv |
| ObjectGraph | Skip serializers without cycle support |
| Seed | Read RandomSeed from schemas/test_data_config.json (or config reproducibility.random_seed) |
3. Test data types
Implement equivalents of: Person, Integer, Telemetry, SimpleObject, StringArray, EDI_835, ObjectGraph.
Use the same collection sizes from schemas/test_data_config.json.
4. Runner script
runner_dir/scripts/run-benchmarks.sh must accept:
smoke | all-single | full | research
Map repetitions from modes in benchmark_config.yaml.
5. Documentation
docs/<lang>/index.md— ecosystem overviewdocs/<lang>/<lang>_tested_serializers.md— each serializer, optimal API, limitations
6. Wire orchestration
Update scripts/run-all-benchmarks.sh to invoke the new runner.
Analysis auto-discovers logs/<lang>/benchmark-log.csv; pass explicitly via:
analyze-benchmarks --extra-logs go=logs/go/benchmark-log.csv
7. Tests
Add at least: smoke run produces non-empty CSV; times are positive; required columns present.