29% faster builds · 84% smaller cache · fleet-wide
neeto-deploy · May 2026
Press → to advance · F for fullscreen · ESC for overview
Measured on neeto-planner-web-staging (representative Rails monorepo). Source: ClickHouse logs.app_logs + ECR manifest inspection.
1,854 pack builds in 7 days hit the EKS arm-builds nodes:
| Percentile | Duration | What you felt |
|---|---|---|
| p50 | 624 s (10.4 min) | average wait per deploy |
| p90 | 905 s (15.1 min) | slow deploys |
| p95 | 961 s (16.0 min) | painful |
| p99 | 1,403 s (23.4 min) | brutal |
| max | 1,895 s (31.6 min) | hit the timeout zone |
EXPORT phase was the dominant cost — averaging ~370 s per build with no obvious cause from the surface.
For every layer: tar(dir) → gzip → sha256 → upload-or-reuse.
Layer size drives the wall-clock cost.
Per-phase duration from the before baseline (deployment 7fdc4f7c):
EXPORT alone = 56% of total build time. So we needed to know what was happening inside it.
| flags | build:true cache:true |
| lives in | <app>:cache ECR image |
| used during | build phase only |
| contents | all gems incl. dev/test |
| purpose | speed up future bundle install |
| flags | launch:true |
| lives in | <app>:<deploy-id> app image |
| used during | runtime (in the running pod) |
| contents | only :default + :production |
| purpose | what the app actually loads |
Critical detail: launch-gems is created by copying build-gems, then running
bundle install --without development:test --clean true to strip the dev/test gems out.
:cache exists → no-opbundle install (no --without) → installs every gem into build-gems:cache with build-gems blob. Push :latest with launch-gems.
:cache → untar build-gems into the build containercache_sha metadata. If Gemfile.lock unchanged → "Reusing cached layer" → install is skipped entirely:cache
Problem. When a developer deploys an app without changing any dependencies, the build pipeline should reuse the cached gem and asset layers from the previous build — not re-upload them.
But every build — even ones where the only change was app code — was pushing all 9 cache layers to ECR again.
Inspecting the actual :cache image revealed why: its manifest listed 18 layer references, but only 9 unique data blobs existed — every blob was being uploaded twice.
:cache image kept growing unnecessarily.
Next slide: the root cause + the one-line fix →
Cause.
A missing return in neeto-deploy-lifecycle/phase/cache.go. When a layer's SHA matched the previous build's (reuse path), ReuseLayer() ran correctly — but execution then fell through and also ran AddLayerFile(), re-uploading the same blob:
if layer.Digest == previousSHA {
if err = cache.VerifyLayer(previousSHA); err == nil {
if err = cache.ReuseLayer(previousSHA); err != nil { /* handle */ }
// ← MISSING `return` here. Falls through.
}
}
return layer.Digest, cache.AddLayerFile(layer.TarPath, layer.Digest)
// ↑ called even when ReuseLayer already succeeded
Fix. Add return layer.Digest, nil after the successful ReuseLayer call. One line of code.
cache_add dropped from 9 → 0 on warm rebuilds; EXPORT phase 453 s → 371 s (−80 s/build) — recovered before the bundle-install work even started.
PR (merged): neeto-deploy-lifecycle#2 · Released as lifecycle:0.2
--previous-image launch reuseProblem. Every build's EXPORT phase tars + gzips + uploads each layer of the app image to ECR. Some of those layers — launcher, config, process-types — come from the CNB lifecycle itself and only change when we bump the lifecycle version (months apart).
But pack had no way to look at the previous build's app image. So it treated every layer as new, re-uploading bytes that were byte-for-byte identical to last week's build. Every build's logs were spammed with Adding layer 'buildpacksio/lifecycle:launcher' when they could have been Reusing layer ….
Fix. In neeto-deploy-slug-compiler-web/.docker/pack-build/build.sh: look up :latest via aws ecr describe-images, then pass it to pack build as --previous-image:
if aws ecr describe-images --repository-name "$APP_IMAGE_REPOSITORY" \
--image-ids imageTag=latest --region us-east-1 > /dev/null 2>&1; then
previous_image_args=('--previous-image' "$LATEST_IMAGE_TAG")
fi
pack build "$APP_IMAGE_TAG" "${previous_image_args[@]}" --tag "$LATEST_IMAGE_TAG" …
Effect. The exporter now compares each new layer's SHA against the previous image's manifest. Same SHA → reference the existing blob by digest, skip the upload entirely. Logs flip from Adding layer … to Reusing layer ….
Pulled the :cache ECR image, aggregated uncompressed bytes by category:
| Category | Size (MB) | % of cache |
|---|---|---|
wkhtmltopdf-binary (in :development, :test group) | 1,134 | 35% |
| Other production gems | 300 | 9% |
| Dev/test gems (brakeman, faker, rbs, …) | 152 | 5% |
| Duplicate gem versions | 120 | 4% |
| Native ext sources + bundler cache | 1,338 | 41% |
| Misc | 216 | 6% |
| Total | 3,236 | 100% |
wkhtmltopdf-binary in the cache — that gem is in :development, :test group of Gemfile.common.rb. It should NEVER ship to production. Yet there it was, in every neeto product's build-gems cache.
Upstream Paketo bundle-install in build.go — asymmetric configs:
// BUILD layer install
installProcess.Execute(..., map[string]string{
"path": layer.Path,
"clean": "true",
// ← NO "without". Installs every group.
})
// LAUNCH layer install
installProcess.Execute(..., map[string]string{
"path": layer.Path,
"without": "development:test", // ← hardcoded, launch only
"clean": "true",
})
First build (no cache) → build-gems gets all gems.
Subsequent builds "Reuse cached layer" without re-installing → dev/test gems live in the cache forever.
Launch image was fine. But the cache layer still had to be tarred + gzipped + hashed every export — ~85 s of wasted work per build.
Even though launch install runs bundle install --without development:test --clean true:
--previous-image optimization tells the exporter:
"if this layer's SHA matches the previous build's, reference that blob — don't re-upload".
Gemfile.lock doesn't change, the buildpack logs "Reusing cached layer …/launch-gems" → exporter re-uses the same SHA from the previous build's manifest.
:latest + :cache delete + 1 cold rebuild → 700–900 MB each (−69%).
bundle-install:0.9.0Added BP_BUNDLE_WITHOUT env var (+ RAILS_ENV/RACK_ENV-derived default) honored by both layer installs:
// environment.go
switch railsEnv {
case "production", "staging": return "development:test"
case "development": return "production:test:staging:heroku"
case "test": return "production:development:staging:heroku"
default: return "" // legacy
}
// build.go — same logic now applied to BOTH layers
if environment.BundleWithout != "" {
buildConfig["without"] = environment.BundleWithout
}
Defaults preserve back-compat. Apps with RAILS_ENV=production (i.e., every neeto product) automatically get the savings.
PR (merged): neeto-deploy-paketo-bundle-install-buildpack#12
· Tracking issue: neeto-deploy-web#7146
· 11 new tests, 0 regressions
Layer 5 (build-gems) dropped from 2,652 MB → 125 MB. All other layers unchanged.
Restoring dropped because the cache image is 84% smaller → less to pull + untar. Export dropped because there's less data to tar+gzip+hash.
Top 5 apps per env, builds before vs after 2026-05-11 14:00 UTC:
| App | Before | After | Δ |
|---|---|---|---|
| neeto-cal-web | 937s | 487s | −48% |
| neeto-desk-web | 865s | 454s | −48% |
| neeto-chat-web | 899s | 536s | −40% |
| neeto-git-web | 752s | 463s | −38% |
| neeto-planner-web | 728s | 468s | −36% |
| App | Before | After | Δ |
|---|---|---|---|
| neeto-git-web | 710s | 442s | −38% |
| neeto-engage-web | 752s | 514s | −32% |
| neeto-pay-web | 741s | 521s | −30% |
| neeto-tower-web | 691s | 492s | −29% |
| neeto-deploy-web | 841s | 614s | −27% |
3 production apps audited (had bloated 2.5+ GB launch-gems from dev/test gems):
After deleting :latest + :cache and forcing one cold rebuild: combined size 8,315 MB → 2,541 MB (−5.8 GB / −69%).
Across the apps that have rebuilt since the fleet-wide tag clear, 17 of 72 apps got measurably smaller (the other 55 simply haven't redeployed yet). Top 12 by absolute MB saved:
| App | Before | After | Δ MB | Δ % |
|---|---|---|---|---|
| neeto-record-web-prod | 2,801 MB | 752 MB | −2,049 | −73% |
| neeto-cal-web-prod | 2,829 MB | 870 MB | −1,959 | −69% |
| neeto-form-web-prod | 2,685 MB | 920 MB | −1,765 | −66% |
| neeto-form-web-stag | 1,143 MB | 647 MB | −496 | −43% |
| neeto-chat-web-stag | 1,126 MB | 630 MB | −495 | −44% |
| neeto-chat-web-prod | 950 MB | 630 MB | −320 | −34% |
| neeto-editor-prod | 740 MB | 462 MB | −278 | −38% |
| bigbinary-website-stag | 1,161 MB | 899 MB | −262 | −23% |
| neeto-git-web-prod / stag | 851 MB | 660 MB | −191 | −22% |
| neeto-tower-web-prod / stag | 670 MB | 521 MB | −148 | −22% |
Image size isn't just a storage line item. Every byte gets pulled by the kubelet onto every node, on every cold start.
Compounding effect on traffic spikes: when an app gets a sudden burst, the autoscaler responds faster because new pods come ready in seconds instead of half a minute. Especially impactful for the 3 worst-offender apps (cal-web, record-web, form-web) which each had ~21 s of unnecessary pull latency on every pod startup.
[Build][phase=…][duration_ms=…] log lines, foundation for everything else--previous-image launch reuse — exporter references existing app-image blobs by digestreturn in addOrReuseCacheLayer (duplicate manifest entries bug)--without on both build and launch layers. THE big lever.Tracking issue: neeto-deploy-web#7146
| Option | Original estimate | Honest estimate | Verdict |
|---|---|---|---|
| zstd compression | 30–50 s | 8–15 s | SKIP Lifecycle doesn't support it, code change required |
| Parallel layer export | "cut in half" | 15–25 s | SKIP Diminishing returns post-fix, race-condition risk |
| Bigger build pods (3→6 CPU) | 30–60 s | <5 s | SKIP gzip is single-threaded — no win |
| Reduce launch-layer count | "modest" | ~3 s/layer | SKIP UX risk for tiny gain |
| PVC local layer cache | 200–300 s | ~40 s | SKIP 5-7 days of ops work for 40 s |
| arm-builds image pre-pull | 30–60 s saved | n/a | WRONG Pack uses podman in-pod, not containerd — node-level pre-pull invisible |
The bundle-install fix already extracted the easy multi-minute wins. Every remaining option had diminishing returns relative to its implementation cost. Knowing when to stop is part of the work.
348674388966.dkr.ecr.us-east-1.amazonaws.com/
├── neeto-deploy/paketo/lifecycle:0.2 (cache duplicate-layer fix)
│ ├── 0.2-amd64 sha256:8d6cd7…
│ └── 0.2-arm64 sha256:d79c82…
│
├── neeto-deploy/paketo/lifecycle:0.3 (SBOM placeholder files)
│ └── multi-arch sha256:17637a…
│
├── neeto-deploy/paketo/buildpack/bundle-install:0.9.0 ← FLAGSHIP
│ ├── 0.9.0-amd64
│ ├── 0.9.0-arm64
│ └── multi-arch sha256:736355…
│
└── neeto-deploy/paketo/buildpack/ruby:0.47.21
└── multi-arch sha256:fffc12… (composite — pulls bundle-install:0.9.0)
--without (flagship)--trust-builder, bump lifecycleQuestions?
Detailed report:
pack-build-optimization-results.md
Deck source:
github.com/vishal24367/pack-build-optimization-deck
Gist:
gist.github.com/vishal24367/e06ad…