Benchmarks
How Stowage performs under load. The full benchmark harness lives at
benchmarks/.
Numbers below come from the published results files at the time of
v1.0.
#Setup
Captured 2026-04-26 to 2026-04-29.
- 16 concurrent workers per case.
- 15 s per case.
- Stowage and the upstream MinIO each constrained to 1 CPU and
200 MiB via cgroup limits in
benchmarks/docker-compose.bench.yml. GOMEMLIMIT=180MiBon both processes so Go's GC stays inside the cgroup.- Bench client and both servers shared the same host; loopback, no TLS.
#Dashboard endpoints
From benchmarks/results.md. Single-CPU, 16-concurrency:
| Endpoint | rps | p50 (ms) | p99 (ms) |
|---|---|---|---|
GET /healthz | 5858 | 1.97 | 24.68 |
GET /readyz | 5898 | 1.98 | 23.57 |
GET /api/auth/config | 5615 | 2.06 | 23.57 |
GET /api/me | 4445 | 2.65 | 26.22 |
GET /api/backends | 5360 | 2.16 | 24.55 |
GET /api/backends/{id}/buckets | 1078 | 7.15 | 69.54 |
GET …/objects | 791 | 9.03 | 80.05 |
HEAD …/object | 965 | 7.50 | 73.24 |
GET …/object (1 KiB) | 807 | 9.33 | 77.27 |
POST /auth/login/local | 5.9 | 110.5 | 331.4 |
Login is intentionally slow — argon2id m=65536 per hash, capped by
a 10-attempts/15-min/IP limiter. Login concurrency cannot safely
exceed 1 inside the 200 MiB container without OOM-killing the
server.
#S3 proxy
From benchmarks/results-s3proxy.md:
| Case | rps | p50 (ms) | p99 (ms) |
|---|---|---|---|
Proxy ListBuckets (synthesised) | 8932 | 1.34 | 6.15 |
Proxy HeadBucket | 1637 | 4.33 | 66.02 |
Proxy ListObjectsV2 | 722 | 9.58 | 82.93 |
Proxy HeadObject | 1070 | 6.16 | 74.45 |
Proxy GetObject 1 KiB | 875 | 7.91 | 78.35 |
Proxy GetObject 1 MiB | 214 | 79.89 | 217.38 |
Proxy GetObject (presigned) | 908 | 6.88 | 79.50 |
Proxy GetObject (anonymous) | 984 | 6.17 | 78.03 |
Proxy PutObject 1 KiB | 566 | 13.40 | 88.36 |
Proxy PutObject 1 MiB | 140 | 104.47 | 272.97 |
Proxy DeleteObject | 1205 | 6.08 | 68.98 |
Proxy Auth Failure (bad sig) | 10645 | 0.75 | 29.86 |
Proxy Scope Violation | 7672 | 1.41 | 15.96 |
#Proxy vs raw MinIO (head-to-head)
Under matched 1 CPU / 200 MiB constraints, the proxy adds +1–3 ms
p50 / 0–11 % throughput for upstream-bound calls vs talking
direct to MinIO. PutObject (both 1 KiB and 1 MiB) is faster than
direct (+8 % and +18 % rps respectively). Synthesised paths
(ListBuckets, scope reject, bad-sig reject) are much faster than
MinIO's equivalent reject paths because the proxy answers without
ever calling the upstream.
The detailed per-case comparison lives at
benchmarks/results-comparison-proxy.md.
#Where the perf work landed
Three commits' worth, each driven by pprof under bench load:
| Stage | Fix |
|---|---|
| 1 | Bespoke http.Transport (256 idle/host, HTTP/2). Batched audit recorder. Dominant fix — pprof showed the dial storm at ~52 % of CPU before this. |
| 2 | SigV4 derived signing-key cache with secret-fingerprint binding. 4-step HMAC chain → 1-HMAC on cache hits. |
| 3 | audit.sampling.proxy_success_read_rate defaults to 0.0. Successful proxy reads no longer audit by default. |
#What you can do to make it faster
For the proxy:
- Pool the response-stream copy buffer — alloc profiling shows
io.copyBufferat ~51 % of total bytes allocated on the read path. Async.Poolof 32 KiB buffers would halve it. - Replace the outbound
aws-sdk-go-v2/v4.Signerwith a hand-rolled signer sharing the verifier's signing-key cache. ~3-4 % alloc win. - Audit DB on its own SQLite file. Today's audit and main writes share one mutex.
These are tracked in Roadmap.
#How to run the benchmarks yourself
cd benchmarks
docker compose -f docker-compose.bench.yml up -d --build
./run.shThe harness is in benchmarks/bench.go (dashboard) and
benchmarks/s3proxybench/ (S3 proxy). Output is JSON under
benchmarks/results-*.json; the markdown summary files are
generated by ./check.
#Calibration: what to read into these numbers
- Single 15 s sample per case. ±10–15 % run-to-run variance is normal.
- cgroup v1 doesn't enforce CPU as strictly as v2 — the "1 CPU" cap is approximate.
- Real deployments add TLS termination on both sides (~4 ms p50). The ratio between Stowage and direct shrinks as upstream RTT grows; the absolute Stowage overhead stays roughly constant.
- The four S3-shaped endpoints with a 1:1 MinIO mapping are the
ones with a fair head-to-head comparison. Stowage's
/api/me,/api/auth/config, etc. have no MinIO equivalent.