Pack Build Optimization

29% faster builds · 84% smaller cache · fleet-wide

neeto-deploy · May 2026

Press → to advance · F for fullscreen · ESC for overview

TL;DR

Build time (warm)

668s → 473s

−195s · −29%

Export phase

371s → 235s

−136s · −37%

:cache ECR image

3,236 MB → 531 MB

−2,705 MB · −84%

build-gems layer

2,652 MB → 125 MB

−2,527 MB · −95%

Measured on neeto-planner-web-staging (representative Rails monorepo). Source: ClickHouse logs.app_logs + ECR manifest inspection.

The Problem

1,854 pack builds in 7 days hit the EKS arm-builds nodes:

Percentile	Duration	What you felt
p50	624 s (10.4 min)	average wait per deploy
p90	905 s (15.1 min)	slow deploys
p95	961 s (16.0 min)	painful
p99	1,403 s (23.4 min)	brutal
max	1,895 s (31.6 min)	hit the timeout zone

EXPORT phase was the dominant cost — averaging ~370 s per build with no obvious cause from the surface.

Pack Build — 5 lifecycle phases

          1.
          ANALYZE
           — pull previous-image manifest, decide which layers to reuse
        

          2.
          DETECT
           — each buildpack votes "I apply"
        

          3.
          RESTORE
           — pull :cache, untar layers into build container
        

          4.
          BUILD
           — each buildpack's bin/build runs (bundle install, assets:precompile, …)
        

          5.
          EXPORT  ← dominant cost
           — walk dirs, tar, gzip, SHA256, push to ECR
        

For every layer: tar(dir) → gzip → sha256 → upload-or-reuse. Layer size drives the wall-clock cost.

Where time was actually going

Per-phase duration from the before baseline (deployment 7fdc4f7c):

setup_env

4.7s

clone_repo

1.1s

analyzing

42.7s

detecting

1.8s

restoring

59.5s

building

134.5s

exporting

371.0s

EXPORT alone = 56% of total build time. So we needed to know what was happening inside it.

Bundle-install creates two layers per build

build-gems

flags	`build:true cache:true`
lives in	`<app>:cache` ECR image
used during	build phase only
contents	all gems incl. dev/test
purpose	speed up future `bundle install`

launch-gems

flags	`launch:true`
lives in	`<app>:<deploy-id>` app image
used during	runtime (in the running pod)
contents	only `:default + :production`
purpose	what the app actually loads

Critical detail: launch-gems is created by copying build-gems, then running bundle install --without development:test --clean true to strip the dev/test gems out.

How the cache flows across builds

Build 1
(cold)

RESTORE: no :cache exists → no-op
BUILD: bundle install (no --without) → installs every gem into build-gems
EXPORT: push fresh :cache with build-gems blob. Push :latest with launch-gems.

Build 2+
(warm)

RESTORE: pull :cache → untar build-gems into the build container
BUILD: buildpack reads cache_sha metadata. If Gemfile.lock unchanged → "Reusing cached layer" → install is skipped entirely
EXPORT: re-tar + gzip + hash the same build-gems content → push same :cache

Consequence: build-gems content is frozen from Build 1. Whatever leaked in on day one (dev/test gems, old wkhtmltopdf-binary versions) gets dragged forward forever — and re-hashed every export.

Quick win #1 — lifecycle cache duplicate-layer bug

Problem. When a developer deploys an app without changing any dependencies, the build pipeline should reuse the cached gem and asset layers from the previous build — not re-upload them.

But every build — even ones where the only change was app code — was pushing all 9 cache layers to ECR again.

Inspecting the actual :cache image revealed why: its manifest listed 18 layer references, but only 9 unique data blobs existed — every blob was being uploaded twice.

Cost: ~80 s of duplicate network transfer per build, and the :cache image kept growing unnecessarily.

Next slide: the root cause + the one-line fix →

Quick win #1 — cause + fix

Cause. A missing return in neeto-deploy-lifecycle/phase/cache.go. When a layer's SHA matched the previous build's (reuse path), ReuseLayer() ran correctly — but execution then fell through and also ran AddLayerFile(), re-uploading the same blob:

if layer.Digest == previousSHA {
    if err = cache.VerifyLayer(previousSHA); err == nil {
        if err = cache.ReuseLayer(previousSHA); err != nil { /* handle */ }
        // ← MISSING `return` here. Falls through.
    }
}
return layer.Digest, cache.AddLayerFile(layer.TarPath, layer.Digest)
//                   ↑ called even when ReuseLayer already succeeded

Fix. Add return layer.Digest, nil after the successful ReuseLayer call. One line of code.

Result: cache_add dropped from 9 → 0 on warm rebuilds; EXPORT phase 453 s → 371 s (−80 s/build) — recovered before the bundle-install work even started.

PR (merged): neeto-deploy-lifecycle#2 · Released as lifecycle:0.2

Quick win #2 — `--previous-image` launch reuse

Problem. Every build's EXPORT phase tars + gzips + uploads each layer of the app image to ECR. Some of those layers — launcher, config, process-types — come from the CNB lifecycle itself and only change when we bump the lifecycle version (months apart).

But pack had no way to look at the previous build's app image. So it treated every layer as new, re-uploading bytes that were byte-for-byte identical to last week's build. Every build's logs were spammed with Adding layer 'buildpacksio/lifecycle:launcher' when they could have been Reusing layer ….

Fix. In neeto-deploy-slug-compiler-web/.docker/pack-build/build.sh: look up :latest via aws ecr describe-images, then pass it to pack build as --previous-image:

if aws ecr describe-images --repository-name "$APP_IMAGE_REPOSITORY" \
     --image-ids imageTag=latest --region us-east-1 > /dev/null 2>&1; then
  previous_image_args=('--previous-image' "$LATEST_IMAGE_TAG")
fi
pack build "$APP_IMAGE_TAG" "${previous_image_args[@]}" --tag "$LATEST_IMAGE_TAG" …

Effect. The exporter now compares each new layer's SHA against the previous image's manifest. Same SHA → reference the existing blob by digest, skip the upload entirely. Logs flip from Adding layer … to Reusing layer ….

Investigation — the smoking gun

Pulled the :cache ECR image, aggregated uncompressed bytes by category:

Category	Size (MB)	% of cache
`wkhtmltopdf-binary` (in `:development, :test` group)	1,134	35%
Other production gems	300	9%
Dev/test gems (brakeman, faker, rbs, …)	152	5%
Duplicate gem versions	120	4%
Native ext sources + bundler cache	1,338	41%
Misc	216	6%
Total	3,236	100%

Finding: 1.13 GB of wkhtmltopdf-binary in the cache — that gem is in :development, :test group of Gemfile.common.rb. It should NEVER ship to production. Yet there it was, in every neeto product's build-gems cache.

Root cause

Upstream Paketo bundle-install in build.go — asymmetric configs:

// BUILD layer install
installProcess.Execute(..., map[string]string{
    "path":  layer.Path,
    "clean": "true",
    // ← NO "without". Installs every group.
})

// LAUNCH layer install
installProcess.Execute(..., map[string]string{
    "path":    layer.Path,
    "without": "development:test",  // ← hardcoded, launch only
    "clean":   "true",
})

First build (no cache) → build-gems gets all gems. Subsequent builds "Reuse cached layer" without re-installing → dev/test gems live in the cache forever.

Launch image was fine. But the cache layer still had to be tarred + gzipped + hashed every export — ~85 s of wasted work per build.

And the bloat reached production app images

Even though launch install runs bundle install --without development:test --clean true:

Launch-gems starts as a copy of build-gems (which has dev/test).
Pack's --previous-image optimization tells the exporter: "if this layer's SHA matches the previous build's, reference that blob — don't re-upload".
When Gemfile.lock doesn't change, the buildpack logs "Reusing cached layer …/launch-gems" → exporter re-uses the same SHA from the previous build's manifest.
Result: the polluted launch-gems blob created on the very first build is referenced by every subsequent app image. The 2.5+ GB launch-gems persists in production indefinitely.

Fleet audit found: 3 production app images at 2.5–2.8 GB each. After :latest + :cache delete + 1 cold rebuild → 700–900 MB each (−69%).

The fix — `bundle-install:0.9.0`

Added BP_BUNDLE_WITHOUT env var (+ RAILS_ENV/RACK_ENV-derived default) honored by both layer installs:

// environment.go
switch railsEnv {
case "production", "staging": return "development:test"
case "development":           return "production:test:staging:heroku"
case "test":                  return "production:development:staging:heroku"
default:                      return ""           // legacy
}

// build.go — same logic now applied to BOTH layers
if environment.BundleWithout != "" {
    buildConfig["without"] = environment.BundleWithout
}

Defaults preserve back-compat. Apps with RAILS_ENV=production (i.e., every neeto product) automatically get the savings.

PR (merged): neeto-deploy-paketo-bundle-install-buildpack#12 · Tracking issue: neeto-deploy-web#7146 · 11 new tests, 0 regressions

Result #1 — cache image size

Before (3,236 MB)

After (531 MB)

Layer 5 (build-gems) dropped from 2,652 MB → 125 MB. All other layers unchanged.

Result #2 — per-phase timing

BEFOREAFTER (warm)

setup_env

4.6s

analyzing

42.7s

restoring

59.5s

20.5s (−65%)

building

134.5s

116.2s (−14%)

exporting

371.0s

235.2s (−37%)

Restoring dropped because the cache image is 84% smaller → less to pull + untar. Export dropped because there's less data to tar+gzip+hash.

Result #3 — fleet-wide impact

Top 5 apps per env, builds before vs after 2026-05-11 14:00 UTC:

Staging

App	Before	After	Δ
neeto-cal-web	937s	487s	−48%
neeto-desk-web	865s	454s	−48%
neeto-chat-web	899s	536s	−40%
neeto-git-web	752s	463s	−38%
neeto-planner-web	728s	468s	−36%

Production

App	Before	After	Δ
neeto-git-web	710s	442s	−38%
neeto-engage-web	752s	514s	−32%
neeto-pay-web	741s	521s	−30%
neeto-tower-web	691s	492s	−29%
neeto-deploy-web	841s	614s	−27%

Staging median

−41%

Production median

−27%

Result #4 — production image sizes

3 production apps audited (had bloated 2.5+ GB launch-gems from dev/test gems):

After deleting :latest + :cache and forcing one cold rebuild: combined size 8,315 MB → 2,541 MB (−5.8 GB / −69%).

Result #5 — fleet-wide image savings

Across the apps that have rebuilt since the fleet-wide tag clear, 17 of 72 apps got measurably smaller (the other 55 simply haven't redeployed yet). Top 12 by absolute MB saved:

App	Before	After	Δ MB	Δ %
neeto-record-web-prod	2,801 MB	752 MB	−2,049	−73%
neeto-cal-web-prod	2,829 MB	870 MB	−1,959	−69%
neeto-form-web-prod	2,685 MB	920 MB	−1,765	−66%
neeto-form-web-stag	1,143 MB	647 MB	−496	−43%
neeto-chat-web-stag	1,126 MB	630 MB	−495	−44%
neeto-chat-web-prod	950 MB	630 MB	−320	−34%
neeto-editor-prod	740 MB	462 MB	−278	−38%
bigbinary-website-stag	1,161 MB	899 MB	−262	−23%
neeto-git-web-prod / stag	851 MB	660 MB	−191	−22%
neeto-tower-web-prod / stag	670 MB	521 MB	−148	−22%

Aggregate so far: ~8.3 GB freed across just 17 apps. The remaining 55 prod/staging apps will follow the same pattern on their next natural deploy — expected fleet-wide reclaim ~30–60 GB.

Why smaller images matter operationally

Image size isn't just a storage line item. Every byte gets pulled by the kubelet onto every node, on every cold start.

Faster pod cold start

~30 s → ~9 s

Cold image pull (2.8 GB → 870 MB)

Faster horizontal scale-out

~21 s/pod saved

HPA scales 1 → N replicas faster

Faster rolling deploys

~21 s × replicas

Each pod rotation is quicker

Lower ECR storage

~$0.10 / GB-month

~30–60 GB freed fleet-wide

Compounding effect on traffic spikes: when an app gets a sudden burst, the autoscaler responds faster because new pods come ready in seconds instead of half a minute. Especially impactful for the 3 worst-offender apps (cal-web, record-web, form-web) which each had ~21 s of unnecessary pull latency on every pod startup.

What we actually shipped

SHIPPED Per-phase instrumentation — [Build][phase=…][duration_ms=…] log lines, foundation for everything else

SHIPPED --previous-image launch reuse — exporter references existing app-image blobs by digest

SHIPPED lifecycle:0.2 — PR #2: fixed missing return in addOrReuseCacheLayer (duplicate manifest entries bug)

SHIPPED lifecycle:0.3 — PR #4: 6 SBOM placeholder files, eliminates exporter warnings

SHIPPED slug-compiler — PR #316: removed no-op --trust-builder, bumped lifecycle ref to 0.3

FLAGSHIP bundle-install:0.9.0 — PR #12: env-aware --without on both build and launch layers. THE big lever.

SHIPPED ruby:0.47.21 composite — references bundle-install:0.9.0, propagates fix to every Ruby app

OPS Fleet-wide tag clear — 250 :cache + 118 prod/staging :latest deleted, forcing fresh clean rebuilds

Tracking issue: neeto-deploy-web#7146

What we evaluated and skipped (honestly)

Option	Original estimate	Honest estimate	Verdict
zstd compression	30–50 s	8–15 s	SKIP Lifecycle doesn't support it, code change required
Parallel layer export	"cut in half"	15–25 s	SKIP Diminishing returns post-fix, race-condition risk
Bigger build pods (3→6 CPU)	30–60 s	<5 s	SKIP gzip is single-threaded — no win
Reduce launch-layer count	"modest"	~3 s/layer	SKIP UX risk for tiny gain
PVC local layer cache	200–300 s	~40 s	SKIP 5-7 days of ops work for 40 s
arm-builds image pre-pull	30–60 s saved	n/a	WRONG Pack uses podman in-pod, not containerd — node-level pre-pull invisible

The bundle-install fix already extracted the easy multi-minute wins. Every remaining option had diminishing returns relative to its implementation cost. Knowing when to stop is part of the work.

Artifacts shipped to ECR

348674388966.dkr.ecr.us-east-1.amazonaws.com/
├── neeto-deploy/paketo/lifecycle:0.2          (cache duplicate-layer fix)
│   ├── 0.2-amd64    sha256:8d6cd7…
│   └── 0.2-arm64    sha256:d79c82…
│
├── neeto-deploy/paketo/lifecycle:0.3          (SBOM placeholder files)
│   └── multi-arch   sha256:17637a…
│
├── neeto-deploy/paketo/buildpack/bundle-install:0.9.0    ← FLAGSHIP
│   ├── 0.9.0-amd64
│   ├── 0.9.0-arm64
│   └── multi-arch   sha256:736355…
│
└── neeto-deploy/paketo/buildpack/ruby:0.47.21
    └── multi-arch   sha256:fffc12…   (composite — pulls bundle-install:0.9.0)

PRs (all merged):
· neeto-deploy-paketo-bundle-install-buildpack #12 — env-aware --without (flagship)
· neeto-deploy-slug-compiler-web #316 — remove --trust-builder, bump lifecycle
· neeto-deploy-lifecycle #4 — SBOM placeholder files (0.3)
· neeto-deploy-lifecycle #2 — cache duplicate-layer fix (0.2)
Tracking issue: neeto-deploy-web #7146

Bottom line

29%

faster builds

84%

smaller cache

95%

smaller build-gems layer

8 apps

7 PRs merged, 1 open

368 tags

cleared fleet-wide

~6 GB

freed from just 3 apps

Thanks 🙌

Questions?

Detailed report: pack-build-optimization-results.md
Deck source: github.com/vishal24367/pack-build-optimization-deck
Gist: gist.github.com/vishal24367/e06ad…