Documentation

Benchmarks

How Stowage performs under load. The full benchmark harness lives at benchmarks/. Numbers below come from the published results files at the time of v1.0.

#Setup

Captured 2026-04-26 to 2026-04-29.

  • 16 concurrent workers per case.
  • 15 s per case.
  • Stowage and the upstream MinIO each constrained to 1 CPU and 200 MiB via cgroup limits in benchmarks/docker-compose.bench.yml.
  • GOMEMLIMIT=180MiB on both processes so Go's GC stays inside the cgroup.
  • Bench client and both servers shared the same host; loopback, no TLS.

#Dashboard endpoints

From benchmarks/results.md. Single-CPU, 16-concurrency:

Endpointrpsp50 (ms)p99 (ms)
GET /healthz58581.9724.68
GET /readyz58981.9823.57
GET /api/auth/config56152.0623.57
GET /api/me44452.6526.22
GET /api/backends53602.1624.55
GET /api/backends/{id}/buckets10787.1569.54
GET …/objects7919.0380.05
HEAD …/object9657.5073.24
GET …/object (1 KiB)8079.3377.27
POST /auth/login/local5.9110.5331.4

Login is intentionally slow — argon2id m=65536 per hash, capped by a 10-attempts/15-min/IP limiter. Login concurrency cannot safely exceed 1 inside the 200 MiB container without OOM-killing the server.

#S3 proxy

From benchmarks/results-s3proxy.md:

Caserpsp50 (ms)p99 (ms)
Proxy ListBuckets (synthesised)89321.346.15
Proxy HeadBucket16374.3366.02
Proxy ListObjectsV27229.5882.93
Proxy HeadObject10706.1674.45
Proxy GetObject 1 KiB8757.9178.35
Proxy GetObject 1 MiB21479.89217.38
Proxy GetObject (presigned)9086.8879.50
Proxy GetObject (anonymous)9846.1778.03
Proxy PutObject 1 KiB56613.4088.36
Proxy PutObject 1 MiB140104.47272.97
Proxy DeleteObject12056.0868.98
Proxy Auth Failure (bad sig)106450.7529.86
Proxy Scope Violation76721.4115.96

#Proxy vs raw MinIO (head-to-head)

Under matched 1 CPU / 200 MiB constraints, the proxy adds +1–3 ms p50 / 0–11 % throughput for upstream-bound calls vs talking direct to MinIO. PutObject (both 1 KiB and 1 MiB) is faster than direct (+8 % and +18 % rps respectively). Synthesised paths (ListBuckets, scope reject, bad-sig reject) are much faster than MinIO's equivalent reject paths because the proxy answers without ever calling the upstream.

The detailed per-case comparison lives at benchmarks/results-comparison-proxy.md.

#Where the perf work landed

Three commits' worth, each driven by pprof under bench load:

StageFix
1Bespoke http.Transport (256 idle/host, HTTP/2). Batched audit recorder. Dominant fix — pprof showed the dial storm at ~52 % of CPU before this.
2SigV4 derived signing-key cache with secret-fingerprint binding. 4-step HMAC chain → 1-HMAC on cache hits.
3audit.sampling.proxy_success_read_rate defaults to 0.0. Successful proxy reads no longer audit by default.

#What you can do to make it faster

For the proxy:

  1. Pool the response-stream copy buffer — alloc profiling shows io.copyBuffer at ~51 % of total bytes allocated on the read path. A sync.Pool of 32 KiB buffers would halve it.
  2. Replace the outbound aws-sdk-go-v2/v4.Signer with a hand-rolled signer sharing the verifier's signing-key cache. ~3-4 % alloc win.
  3. Audit DB on its own SQLite file. Today's audit and main writes share one mutex.

These are tracked in Roadmap.

#How to run the benchmarks yourself

cd benchmarks
docker compose -f docker-compose.bench.yml up -d --build
./run.sh

The harness is in benchmarks/bench.go (dashboard) and benchmarks/s3proxybench/ (S3 proxy). Output is JSON under benchmarks/results-*.json; the markdown summary files are generated by ./check.

#Calibration: what to read into these numbers

  • Single 15 s sample per case. ±10–15 % run-to-run variance is normal.
  • cgroup v1 doesn't enforce CPU as strictly as v2 — the "1 CPU" cap is approximate.
  • Real deployments add TLS termination on both sides (~4 ms p50). The ratio between Stowage and direct shrinks as upstream RTT grows; the absolute Stowage overhead stays roughly constant.
  • The four S3-shaped endpoints with a 1:1 MinIO mapping are the ones with a fair head-to-head comparison. Stowage's /api/me, /api/auth/config, etc. have no MinIO equivalent.