~3× faster runs · 677 hrs/month freed · fleet-wide
neeto-ci-web · epic #3799 · May 2026
Press → to advance · F for fullscreen · ESC for overview
Cal-web: full pre/post measurement on add-minitest-distributed (4,024 tests). 8-app measurement: avg of block-pipeline runs (test_pods > 0) from production ci_jobs since 2026-05-19 15:30; baseline = 7–13 May p50.
12,068 default.yml CI runs in 30 days across 100 products; the largest suites dominate:
| Project | Success p50 | p95 | Runs / 30d | Hrs / mo |
|---|---|---|---|---|
| neeto-cal-web | 17.5 min | 37.1 min | 1,751 | ~580 |
| neeto-form-web | 16.5 min | 31.8 min | 544 | ~162 |
| neeto-desk-web | 12.6 min | 25.9 min | 954 | ~224 |
| neeto-invoice-web | 14.3 min | 28.6 min | 125 | ~33 |
| neeto-crm-web | 13.8 min | 23.0 min | 110 | ~26 |
| all 100 products | — | — | 12,068 | ~1,505 |
A successful neeto-cal-web run takes ~20 min today. That's ~580 hours of CI/month on one product alone — and every dev waits on it.
neetoci-version ruby/node, install servicesnode_modules / vendor/bundle / .nvm from S3 → unpack to EBS rootbundle install, yarn install, start postgres + redis (per-pod apt-install pgvector)rake db:create db:schema:loadbundle exec rails test serially against the suiteEvery command runs in a single Kubernetes pod. No fan-out, no parallelism, no shared cache between blocks. EBS root volume is the only filesystem.
Per-step p50 · p95 across 18 pre-optimization neeto-cal-web default.yml runs (7–13 May 2026):
checkoutneetoci-version rubyneetoci-version nodepostgres startredis startcache restore (EBS)bundle installyarn installdb:create + schema:loadrubocop + erblintbundle exec rails testbundle exec rake setup
rails test ≈ 65% of in-command time (p50). rake setup, cache restore and postgres startup are the next-largest — and every one of them repeats on each parallel pod.
A CI job is now a DAG of blocks. Each block has named jobs; each job is its own K8s job; pods within a job can fan out.
Block 1 (Setup & Checks) runs 2 K8s jobs in parallel. Once it succeeds, Block 2 and Block 3 fan out together (both depend only on Block 1). Block 3's Run tests job is itself 4 K8s pods coordinated via a shared Redis queue.
global_job_config: # runs on EVERY pod, before everything
env_vars: [{ name: TZ, value: UTC }]
prologue:
commands: [neetoci-version ruby 4.0.1, checkout, cache restore, bundle install]
blocks:
- name: Setup & Checks
dependencies: [] # entry block
task:
jobs: # each entry = its own K8s job
- { name: Install + ESLint, commands: [...] }
- { name: Auditors + Linters, commands: [...] }
- name: Tests
dependencies: [Setup & Checks] # fan-out from Block 1
task:
prologue: # block-level prologue, runs once per pod
commands: [neetoci-service start postgres 18, ...]
jobs:
- name: Run tests
commands: [bundle exec rails test]
parallelism: 4
shared_redis: true # minitest-distributed coordinator
epilogue:
always: { commands: [bundle exec rake simplecov_coverage:publish] }
Fully backward compatible. The parser falls back to legacy behavior when task:, dependencies:, or global_job_config: are absent — old flat commands: configs still work unchanged.
Problem.
Every CI pod that started Postgres ran apt-get update && apt-get install postgresql-N-pgvector against the public pgdg repo. Two index updates + a 12 MB package install, ~6–10 s per pod, multiplied by every test pod in the run.
# docker-ci/utils/neetoci-service (pre-fix)
sudo podman exec postgres bash -c \
"apt-get update -qq && apt-get install -y -qq postgresql-${pg_major}-pgvector"
Fix. Rebuilt all postgres:* images in the internal registry with pgvector baked in (issue #3880). Deleted the runtime apt-install line from neetoci-service.
Problem.
Every pod paid neetoci-version ruby 4.0.1 → tarball download from the in-cluster binaries-cache service, unpack into ~/.rbenv/versions/4.0.1. Same for Node via nvm. Postgres + Redis pulled as podman images on first use. ~30–60 s of overhead per fresh pod.
Fix. New declarative docker-ci/dependencies file (#3879 → PR #3907) and Dockerfile bake steps that pre-install everything into the CI image:
RUBY_VERSIONS=(4.0.1) # pre-installed under ~/.rbenv
NODE_VERSIONS=(22.13) # pre-installed under ~/.nvm, default alias
APT_POSTGRES_VERSIONS=(18 18.3) # pgdg apt + pgvector
REDIS_VERSIONS=(7.0.5) # compiled from source at /opt/redis/7.0.5/
neetoci-version ruby 4.0.1 becomes a pure rbenv switch: 3 s → 0.08 s. neetoci-service start redis 7.0.5 becomes a redis-server --daemonize: 5 s → 30 ms.
Problem.
Same commit, same image, same node family (r8g) — two consecutive runs of cache restore unpacking the same 1.4 GB of node_modules + vendor/bundle + .nvm:
| Run | Download | Unpack node_modules | Unpack vendor/bundle | Total |
|---|---|---|---|---|
| A | 4.5 s | 16 s | 21 s | 26 s |
| B | 7.0 s | 58 s | 64 s | 68 s |
Download barely moved — variance was entirely in the unpack, i.e. writing 1.4 GB into the pod filesystem. Cause: r8g nodes have an EBS-only root. Every pod's scratch I/O — cache unpack, the Postgres data dir, db:schema:load, log files — lands on one network-attached gp3 volume shared by every pod on the node. Run enough pods at once and its IOPS saturate.
Fix. Moved the CI Karpenter NodePool to the r8gd family — same Graviton4 silicon, but with a physically-attached local NVMe SSD. Setting EC2NodeClass.spec.instanceStorePolicy: RAID0 tells Karpenter to RAID the instance-store NVMe and repoint kubelet + containerd ephemeral storage onto it, so pod scratch I/O hits local NVMe instead of contending for shared EBS.
cache restore was slow, not just noisy
Problem.
Even on a fast node, cache restore took ~69 s. The Go cache binary — NeetoCI's fork of SemaphoreCI's cache-cli — restored every key (nvm, gems, yarn-cache, node_modules) one after another. And yarn.lock mapped to a redundant ~/.cache/yarn archive — 1.4 GB, ~30 s — that bought nothing: node_modules is already cached, so yarn install is instant on a hit.
Fix. Rebuilt the binary from toolbox PR #2 — shipped as static arm64/amd64 binaries in PR #3892:
sync.WaitGroup; independent keys download concurrently → ~70 s → ~30 s (bound by the slowest key, gems).yarn-cache — yarn.lock no longer caches ~/.cache/yarn → −30 s/job and −1.4 GB of S3 per run.parallelism: 4 but one pod ran the entire suite
Problem.
Setting parallelism: 4 spawned 4 test pods, but with no work distributor each one re-ran the full suite. A 4,024-test run on the buggy build:
| Pod | Tests run | Duration |
|---|---|---|
| pod 0 | 4,024 (all) | ~14 min |
| pod 1 | 0 | 0.06 s |
| pod 2 | 0 | 0.06 s |
| pod 3 | 0 | 0.06 s |
Fix. Added the minitest-distributed gem (loaded conditionally via MINITEST_COORDINATOR env var) and a new shared_redis: true per-job flag. NeetoCI provisions a per-job Redis; pods enqueue/work-steal tests until the queue drains.
Three findings from instrumenting the pod logs (JSON-event stream into the UI accordion):
rails test finished the whole run before others joined (slide 12).apt-get update + install postgresql-N-pgvector ran in every postgres start, every pod, every run. ~10 s of pure waste (slide 8).bundle exec rake simplecov_coverage:publish ran but never appeared in the pipeline view — the post-deployment script bypassed the JSON-event logger.Each one was a multi-second tax; combined they were the difference between a 7-minute and 20-minute run.
CiJobBlock/CiJobBlockJob models, YAML blocks:/task:/dependencies, ExecuteService/SpawnBlockService/SyncPodService:v62) — Ruby 4.0.1, Node 22.13, Postgres 18 + pgvector, Redis 7.0.5 all pre-installed via the new declarative docker-ci/dependencies filepostgres:{13,14,15,15.1,18,18.3} all carry pgvector baked in; runtime apt-install removed from neetoci-serviceinstanceStorePolicy: RAID0; pod scratch I/O off EBSTracking issue: neeto-ci-web#3799 · 14 sub-issues, 13 PRs merged
| Step | Before (p50 · p95) | After (:v62, r8gd + bake) | Δ p50 |
|---|---|---|---|
neetoci-version ruby 4.0.1 | 3.1 s · 5.0 s | 0.08 s | −97% |
neetoci-version node 22.13 | 4.6 s · 6.9 s | ~1 s | −78% |
cache restore (1.4 GB) | 63 s · 85 s | ~24 s | −62% |
bundle install --jobs 2 | 1.4 s · 2.9 s | 0.8 s | −43% |
neetoci-service start postgres 18 | 42 s · 58 s | ~3 s | −93% |
neetoci-service start redis 7.0.5 | 9.5 s · 14 s | 0.03 s | −99% |
bundle exec rake db:create db:schema:load | 12 s · 21 s | ~8 s | −35% |
bundle exec rails test | 9.1 min · 15.2 min | ~5 min (4 pods) | −45% |
Before = p50 · p95 of 18 production neeto-cal-web default.yml runs, 7–13 May 2026 (per-command durations parsed from job logs). After = median of 3+ runs on add-minitest-distributed, commit dd261621.
Baseline = success p50 from production ci_jobs, 7–13 May 2026. Post-merge = avg of block-pipeline runs only (test_pods > 0), 19 May 15:30 onward. n = 2–6/app, still early.
Top 15 projects by default.yml avg run-time. ✓ = block pipeline live; avg from production logs since merge.
| # | Project | Runs/mo | Baseline p50 | Post-merge | Δ | hrs saved/mo |
|---|---|---|---|---|---|---|
| 1 | neeto-cal-web ✓ | 763 | 19.8 min | ~7.0 min | −65% | ~165 |
| 2 | neeto-form-web ✓ | 292 | 16.5 min | 6.5 min | −61% | ~49 |
| 3 | neeto-invoice-web ✓ | 70 | 15.0 min | 7.9 min | −48% | ~8 |
| 4 | neeto-crm-web ✓ | 69 | 13.8 min | 7.7 min | −44% | ~7 |
| 5 | neeto-desk-web ✓ | 507 | 12.6 min | 8.2 min | −35% | ~37 |
| 6 | neeto-chat-web ✓ | 178 | 10.3 min | 7.5 min | −28% | ~9 |
| 7 | neeto-monitor-ruby | 133 | 9.7 min | ~5.5 min* | −43%* | ~9* |
| 8 | neeto-deploy-web ✓ | 118 | 8.2 min | 5.6 min | −32% | ~5 |
| 9 | neeto-planner-web ✓ | 115 | 7.8 min | 7.2 min | −7% | ~1 |
| 10 | neeto-auth-web ✓ | 292 | 7.5 min | 5.3 min | −29% | ~11 |
| + 5 nanos at 5.5–6.3 min p50 (block pipeline not yet shipped — pending minitest-distributed wiring) | ~7* | |||||
| TOTAL (8 measured + 7 projected) | 2,637 | ~830 hrs/mo | ~520 hrs/mo | −37% | ~308 hrs/mo | |
✓ measured: avg of block-pipeline runs (test_pods > 0) since 2026-05-19 15:30, n = 2–6/app, baseline = success p50 of 7–13 May 2026.
* = projected, suite-size model: large (≥14 min) −45%, medium (8–14 min) −35%, small (<8 min) −15%.
Two consistency improvements that don't show up in averages but matter every day:
Predictable CI is more valuable than fast CI. The new pipeline is both.
Parallelism reshapes wall-time but doesn't reduce total CPU-minutes much by itself:
| Phase | Before | After |
|---|---|---|
| rails test (compute) | 1 pod × 14 min = 14 pod-min | 4 pods × 5 min = 20 pod-min |
| Setup (compute) | ~2 min per pod × 1 pod = 2 pod-min | ~30 s per pod × 5 pods = 2.5 pod-min |
| Total pod-minutes per run: roughly the same. | ||
The real compute savings come from the bake — ~3 min of setup × every pod removed:
The headline win is wall-clock, not cost. But the cluster also stops thrashing — that has its own quiet value.
CI wait-time isn't a storage line item. Every minute is a developer either waiting on a green build, or context-switching away and losing flow.
Compounding effect on team rhythm: when CI is fast and predictable, smaller PRs become viable. Smaller PRs land faster, are easier to review, and break less. The 20 min → 7 min cycle is what makes the whole loop tighter.
:v62) — PR #3907 Ruby + Node + Postgres + pgvector + Redis 7.0.5 pre-installed via declarative docker-ci/dependenciespostgres:{13,14,15,15.1,18,18.3})Tracking issue: neeto-ci-web#3799
| Option | Original idea | Honest take | Verdict |
|---|---|---|---|
| Phase 2: shared EFS workspace across blocks | Run setup once, share to all pods | Breaks per-pod isolation; minitest-distributed needs isolated FS; concurrent pods clobber each other in cache restore | SKIP Issue #3895 closed |
| gp3 IOPS bump (5000 → 16000) | Cheaper than r8gd | Still EBS-bound; doesn't fix variance under concurrent pods; ~$40/node/mo surcharge | DROPPED r8gd wins |
| Redis from apt repo | Lighter than source-compile | packages.redis.io ships only current latest; can't pin 7.0.5 | REJECTED Compiled from source instead |
| True DAG fan-out UI (arrows for arbitrary deps) | Render directed edges between every block pair | Linear depth-column layout reads cleanly for current configs; full DAG drawing adds complexity for no UX gain | DEFERRED |
| EFS CSI driver for shared scratch | One file system, all pods see it | Provisioned + tested; per-pod isolation broke parallel tests; tore it down | REJECTED r8gd local NVMe instead |
Going broad first kept the design honest. Knowing when to stop is part of the work.
ECR (728988564940.dkr.ecr.us-east-1.amazonaws.com)
└── neeto-ci-deployment-image:v62
├── Ruby 4.0.1 pre-installed via rbenv
├── Node 22.13.1 pre-installed via nvm, default alias
├── Postgres 18.4 + pgvector 0.8.2 (apt cluster, fresh per pod)
├── Redis 7.0.5 compiled from source at /opt/redis/7.0.5/
└── docker-ci/dependencies (declarative, easy to extend)
Internal registry (10.100.0.20:5000)
└── postgres:{13,14,15,15.1,18,18.3} (rebuilt with pgvector baked)
Cluster (EKS, neeto-ci)
├── EC2NodeClass arm64
│ ├── instanceStorePolicy: RAID0 (NVMe → kubelet + containerd)
│ └── blockDeviceMappings 200Gi gp3, 8000 IOPS, 500 MB/s
└── NodePool default
└── instance-family: r8gd (Graviton4 + local NVMe SSD)
Code
└── PR #3910 → main Epic merged (14 commits, 65 files)
Questions?
Detailed report:
ci-test-optimization-results.md
Deck source:
github.com/vishal24367/ci-test-optimization-deck
Gist:
gist.github.com/vishal24367/674d77e…
Epic:
neeto-ci-web #3799