Ephemeral GitHub Actions Runners on Cloudflare Containers

The Motivation

AI coding agents are hard on continuous integration. Every push, every “fix the lint”, every iteration of an agent loop fires a workflow. A week of working with Claude Code on a couple of repositories chews through a GitHub-hosted minutes allowance much faster than a human typing commits ever did. The bill is real, and the obvious escape, a self-hosted runner, has its own problems: it is a single box that runs one job at a time, holds your deploy credentials and SSH keys the whole time it is alive, reuses its workspace between jobs, and has to be started and kept running by hand.

I wanted the opposite of that box: a runner that is created on demand, runs exactly one job in a clean isolated environment, and disappears. No standing VM, no shared state, no idle machine sitting on secrets.

Cloudflare Containers fit the shape. They scale to zero, bill per second while running, and are driven by code in a Worker. So the listener that hears GitHub and the compute that runs the job can both live on Cloudflare. The result is flare-runner, open source under MIT.

How It Works

The whole system is three small pieces: a Worker, a Container-backed Durable Object, and a runner image.

GitHub (workflow_job: queued)
   |  HMAC webhook
   v
Cloudflare Worker
   - verify HMAC signature
   - ignore anything not queued / not our labels
   - POST .../generate-jitconfig  ->  GitHub API  ->  encoded_jit_config
   - start one container with that config
   v
Cloudflare Container (ephemeral)
   - ./run.sh --jitconfig "$JIT_CONFIG"
   - registers, claims ONE job, runs it, exits
   - instance reclaimed (scale to zero)

When a job is queued, GitHub sends a workflow_job webhook. The Worker verifies the HMAC signature, ignores everything that is not a queued event carrying the labels we care about, and asks the GitHub API for a just-in-time runner configuration. A JIT runner is single-use by design: it registers, claims one job, and removes itself afterwards.

The Worker then starts one Cloudflare Container, passing that JIT config in as an environment variable. The container’s entrypoint is just the stock GitHub Actions agent:

exec ./run.sh --jitconfig "${JIT_CONFIG}"

The runner is an outbound long-poll client, not an HTTP server, so the container needs no listening port. Cloudflare documents exactly this batch-job shape: start() the container, let the process run, and when it exits the instance is reclaimed. One container equals one job. There is no _work directory carried between runs and no box left holding credentials.

A single instance type and a concurrency ceiling are all the configuration the container needs:

"containers": [
  {
    "class_name": "RunnerContainer",
    "image": "./Dockerfile",
    "max_instances": 5,
    "instance_type": "standard-2"
  }
]

standard-2 is 1 vCPU, 6 GiB of memory, 12 GB of disk, which is comfortably more than a free GitHub-hosted runner for typical pnpm / go test / pytest work.

How to Use It in Your Own Repo or Org

You deploy one instance per GitHub repo or org. Here is the whole path.

1. Pick a scope. In wrangler.jsonc, set exactly one. Repo scope works for user repositories (which have no runner groups):

"vars": {
  "GITHUB_REPO": "your-name/your-repo",
  "RUNNER_GROUP_ID": "1",
  "RUNNER_LABELS": "self-hosted,cloudflare"
}

For an organization, set GITHUB_ORG instead and, optionally, a runner group id.

2. Create a token. A fine-grained personal access token is the fastest path. For a repo, give it Administration: Read and write on that repository; for an org, Self-hosted runners: Read and write. (For production you would swap this for a GitHub App; the Worker reads the token through a single function, so it is a config change, not a rewrite.)

3. Deploy. wrangler deploy builds the Dockerfile, pushes the image to Cloudflare’s own registry, and deploys the Worker:

npm install
npx wrangler deploy
npx wrangler secret put GITHUB_TOKEN     # the token from step 2
npx wrangler secret put WEBHOOK_SECRET   # any high-entropy string

4. Add the webhook. In the repo or org settings, add a webhook pointing at https://<your-worker>.workers.dev/webhook, content type application/json, the same WEBHOOK_SECRET, and subscribe to Workflow jobs only. GitHub immediately sends a ping; the Worker answers 204, which confirms the signature wiring.

5. Opt a workflow in. Point any job at the labels:

jobs:
  test:
    runs-on: [self-hosted, cloudflare]
    steps:
      - uses: actions/checkout@v7
      # ...

Push a job and watch it: the webhook fires, the Worker mints a JIT config and starts a container, the runner claims the job and exits. wrangler tail shows the spawn, and the repo’s runner list shows the ephemeral runner appear and remove itself around the job.

What actually needs Docker

Cloudflare Containers cannot nest Docker, so the instinct is to ask “how do I run Docker without a daemon”. The better first question is whether the job needs Docker at all - far more CI is Docker-free than people assume.

pnpm/npm typecheck, lint, build; go test unit runs; pytest; wrangler deploy - none touch a daemon. They move to flare-runner unchanged.
Tests that boot real services with testcontainers do need a container runtime, but in a tidy suite those sit behind a build tag (//go:build integration in Go) and the default CI run excludes them. A go.mod that lists testcontainers does not mean your test job needs Docker. I had a suite of ~1900 tests across ~100 packages I assumed was Docker-bound; it ran green with the daemon stopped, because every container-backed test was tagged out of the default run. The check is one command: stop Docker, run the job, see if it passes.
The one real exception is building container images. Swap docker build for a rootless, daemonless builder:

- run: |
    buildah bud --isolation chroot --storage-driver vfs -t "$IMAGE" .
    buildah push "$IMAGE"

The runner image ships buildah for exactly this. The catch on a fresh-every-time runner is caching: cold image builds are slower because nothing carries over between jobs. A follow-up post covers the fix - a registry layer cache (buildah --cache-from/--cache-to) and, for compiled languages, compiling on the runner and packaging a thin image so the build cache never lives in a layer. Anything that genuinely needs a live Docker daemon (untagged testcontainers tests, say) stays on a Docker-capable runner; the container disk ceiling is 20 GB either way.

How It Is Built and Tested

The logic that matters is small and pure, so it is unit-tested without any runtime. The Worker’s two responsibilities are separated into functions that take their inputs as arguments:

verifySignature(secret, body, header) does an HMAC-SHA256 compare in constant time.
shouldSpawn(event, body, labels) returns true only for a queued workflow_job whose labels are a superset of the configured set, so the Worker never spawns a container for a job that asked for a different runner.
mintJitConfig(params, fetch) builds and sends the generate-jitconfig request and takes an injectable fetch, so the success and error paths are tested against a fake.

Vitest covers the signature checks (valid, tampered, wrong secret, missing header), the event filter (ignore non-queued, wrong label, wrong event, case-insensitive matching), and the JIT request shape. tsc --noEmit type checks the Worker and the Durable Object against the real Cloudflare types.

The runner image is plain infrastructure, so it is verified by building and running it rather than by a unit test. There is a demo workflow that runs on the Cloudflare runner itself and does two things: it starts a tiny standard-library Python API and curls its /health endpoint, and it builds a small image with buildah and pushes it to GHCR. That single run proves the whole chain end to end: a real job lands on a real ephemeral container with no Docker daemon, and the no-Docker image build works.

Everything else runs on GitHub-hosted runners:

ci runs the tests and type check on every push and pull request.
build-push builds the runner image and publishes it to ghcr.io/andrius/flare-runner for anyone who wants to pull or inspect it. Cloudflare builds the same Dockerfile to its own registry on deploy, so the two stay in sync from one source.
deploy runs wrangler deploy and syncs the Worker secrets.

When I wired it onto a repository the first live run was honest about the result: the queued webhook returned 202, the in_progress event arrived five seconds later as the container booted and the JIT runner claimed the job, the job finished in under half a minute, and afterwards the repository had zero registered runners. One container, one job, gone.

Dependency Auto-Discovery

A runner image has one dependency that drifts on someone else’s schedule: the GitHub Actions agent itself. Pin it and it goes stale; float it and a build can change under you. flare-runner pins the version in the Dockerfile

ARG RUNNER_VERSION=2.335.1

and a scheduled workflow keeps that pin honest. runner-version runs weekly. It asks the GitHub API for the latest actions/runner release, compares it to the pin in the Dockerfile, and if there is a newer version it edits the ARG and opens a pull request:

- name: Resolve latest runner vs current pin
  run: |
    latest=$(gh api repos/actions/runner/releases/latest --jq .tag_name | sed 's/^v//')
    current=$(grep -oP 'ARG RUNNER_VERSION=\K.*' Dockerfile)
    echo "latest=$latest"   >> "$GITHUB_OUTPUT"
    echo "current=$current" >> "$GITHUB_OUTPUT"

Merging that pull request changes the Dockerfile, which triggers build-push, which republishes the image. The discovery, the bump, and the rebuild are one chain, and nothing is done by hand. There is no marketplace action involved; it is the gh CLI and a sed.

The rest of the dependencies do not drift silently, so they are kept current by review rather than automation. That review caught something worth mentioning: it is easy, even with documentation open, to bump an action to a version that was current last quarter rather than today. The fix was to check the actual latest of every action and package before committing, which moved the workflows to actions/checkout@v7, actions/setup-node@v6, the v4 and v7 Docker actions, and the project to TypeScript 6 and Vitest 4, each verified by the test and type-check run before landing. The lesson is the same one the runner pin encodes: do not trust a remembered version number, resolve the current one.

Try It

The project is at github.com/andrius/flare-runner with a setup.md that walks through the wiring above. It is small on purpose: a Worker, a Durable Object, a Dockerfile, and a handful of workflows. If your CI is mostly pnpm, go test, pytest, and wrangler deploy, it will move onto ephemeral Cloudflare Containers cleanly, and your minutes allowance stops being the thing the AI agent runs out of first.