ADR 0008 — Judge auth_mode and claude-code OAuth driver
Status: Proposed (issue #126)
Context
ADR 0007 (issue #125) introduced a pluggable driver layer for
tests/e2e/lib/llm_judge.sh, but every driver still assumes
API-key-in-env authentication: the core loader exports
JUDGE_API_KEY_ENV, drivers dereference it via indirect expansion, and
the preflight contract returns AUTH_TOKEN=<api-key> or rc=2.
The Claude Code CLI's primary auth surface is OAuth, not an API key.
Credentials minted by task e2e:auth:claude (ADR 0002, Decision 2)
land in ~/.crewrig-e2e/claude/.credentials.json (and at
~/.claude/.credentials.json for a developer's day-to-day session).
Re-using that on-disk token to call the Anthropic Messages API as the
judge would let contributors run the e2e oracle without minting a
separate ANTHROPIC_JUDGE_API_KEY — a real friction surfaced by
issue #126.
The Messages API accepts the OAuth access token via the standard
Authorization: Bearer <token> header in place of x-api-key, so the
delta is small: a second driver and a config switch.
Decision
1. Add an auth_mode field to [judge]
[judge]
auth_mode = "api_key" # default — preserves today's behaviour
# or
auth_mode = "oauth" # driver reads CLI credential store from disk
Loader changes (_llm_judge_load_config in tests/e2e/lib/llm_judge.sh):
- Default literal:
auth_mode="api_key". - Parsed via
jq -r '.judge.auth_mode // "api_key"' effective.json. - Emitted as
JUDGE_AUTH_MODE=%qalongside the existingJUDGE_*lines. - Declared
localandexported inllm_judge()next toJUDGE_API_KEY_ENVso drivers can read it via plain expansion.
Forwarding model: env, not positional. auth_mode is a
driver-internal concern — the core never branches on it. Exporting it
keeps the _call positional signature (ADR 0007 §1) intact and avoids
a breaking change to the existing anthropic driver. Drivers that
care branch in their own _preflight; drivers that do not (e.g.
anthropic) ignore the variable.
2. New driver: tests/e2e/lib/llm_judge_drivers/claude-code.sh
Implements the ADR 0007 §1 contract. _preflight:
- If
E2E_JUDGE_MOCK=1→printf 'AUTH_TOKEN=mock\n'; return 0(mirrorsanthropic.sh). - Read
JUDGE_AUTH_MODE. The driver supportsoauth(primary) andapi_key(fallback for users who setANTHROPIC_JUDGE_API_KEYalongsidebackend = "claude-code"); any other value → rc=1 hard failure with an_e2e_assert_diagline. - When
auth_mode = "oauth":-
Credential path:
${CLAUDE_CREDENTIALS_PATH:-$HOME/.claude/.credentials.json}. The env override is added so the e2e harness can point the driver at${CREWRIG_E2E_HOME}/claude/.credentials.jsonwithout symlinking into$HOME. -
Missing or unreadable file → rc=2 (soft auth-missing; core maps to UNCERTAIN per ADR 0007 §3).
-
Read access token via:
token="$(jq -r '.claudeAiOauth.accessToken // empty' "$path")"UNVERIFIED — verify before merge. The Claude Code CLI's on-disk schema is not documented in this repository. The conventional upstream layout is
{"claudeAiOauth": {"accessToken": "sk-ant-oat01-…", "refreshToken": "…", "expiresAt": <ms>, "scopes": […], "subscriptionType": "…"}}. The developer MUST runjq 'keys, .claudeAiOauth | keys?' \ ~/.crewrig-e2e/claude/.credentials.json(or the developer's own~/.claude/.credentials.json) against a freshly-minted file and confirm the key path before landing the driver. If the observed schema differs, update thejqselector here and amend this ADR in the same PR — do not guess. -
Empty/null token → rc=2 (treat as auth-missing, not hard failure; consistent with ADR 0007 §3 — "user has not configured a key on this machine").
-
expiresAtcheck: if the field is present AND parseable ANDexpiresAt < now_ms, return rc=2 with a# WARNmessage on stderr (e.g.# WARN claude-code judge: OAuth token expired — re-run task e2e:auth:claude). Missing/unparseableexpiresAt→ proceed and let the API surface the 401 on_call. -
On success:
printf 'AUTH_TOKEN=%s\n' "$token"; return 0.
-
- When
auth_mode = "api_key": identical body to_llm_judge_driver_anthropic_preflight(readJUDGE_API_KEY_ENVvia indirect expansion). This is intentional duplication, not a shared helper — ADR 0007 §1 keeps drivers self-contained.
_call:
- Same body as
anthropic.shexcept the curl header line swaps-H "x-api-key: ${api_key}"for-H "Authorization: Bearer ${api_key}". The positional is namedapi_keyfor parity with the contract; semantically it is the Bearer token underoauth. anthropic-version: 2023-06-01header retained.- Request body, retry loop, counter increment, and verdict regex are copy-equivalent to the Anthropic driver. The duplication is deliberate — ADR 0007 §1 commits to one file per backend.
3. Backward compatibility
-
auth_modedefaults to"api_key". Users who do not editlocal.tomlsee zero behavior change. -
The
anthropicdriver is not modified. It continues to readJUDGE_API_KEY_ENVand ignoresJUDGE_AUTH_MODE. -
New
local.toml.examplestanza (commented) documents the claude-code oauth path:# Re-use the OAuth token minted by `task e2e:auth:claude` instead of # `ANTHROPIC_JUDGE_API_KEY`. Reads ${CLAUDE_CREDENTIALS_PATH:-~/.claude/.credentials.json}. # [judge] # backend = "claude-code" # auth_mode = "oauth" # model = "claude-sonnet-4-6" # temperature = 0.0 # strict = false
4. ADR scope
ADR 0007 covers the driver-contract surface. This ADR adds a single
field (auth_mode) and a single driver — small enough that a new ADR
is borderline. Recorded as ADR 0008 because:
- It introduces a public TOML field that becomes load-bearing the
moment a third driver wants OAuth (e.g. a future
gemini-clibackend that reuses~/.gemini/oauth_creds.json). - The "credential file on disk" auth mode is a new threat surface (file permissions, expiry, stale tokens) that deserves a written decision the security agent can audit against.
4a. Threat model — CLAUDE_CREDENTIALS_PATH trust assumption
CLAUDE_CREDENTIALS_PATH is treated as trusted input: it must be
controlled by the same operator who configures JUDGE_ENDPOINT. An
attacker who can set both variables can exfiltrate any JSON file on
disk that contains a .claudeAiOauth.accessToken field, by pointing
the driver at the target file and at an attacker-controlled endpoint
that captures the bearer token.
This is an accepted threat. The mitigation is environmental: setting
either variable already requires shell-level access equivalent to
running arbitrary commands as the operator, so a successful attacker
has strictly more direct paths to credential disclosure (e.g.
reading $HOME/.claude/.credentials.json directly). The driver does
not attempt to sandbox the path; it only refuses to read credential
files whose POSIX permissions are more permissive than 0600 (any
group/other bit set), which closes the local other-user
disclosure path without claiming to address the compromised
operator path.
File list
| Path | Change |
|---|---|
tests/e2e/lib/llm_judge.sh |
Loader: parse + export JUDGE_AUTH_MODE. No quorum or verdict changes. |
tests/e2e/lib/llm_judge_drivers/claude-code.sh |
New. Bearer-auth driver with oauth/api_key preflight branches. |
tests/e2e/defaults.toml |
Add auth_mode = "api_key" under [judge] with a one-line comment. |
tests/e2e/local.toml.example |
Add commented [judge] stanza demoing backend = "claude-code" + auth_mode = "oauth". |
docs/adr/0008-judge-oauth-auth-mode.md |
This file. |
scripts/tests/test-e2e-llm-judge-lib.sh |
New cases: oauth happy-path (stubbed credentials.json), missing file → UNCERTAIN, empty token → UNCERTAIN, expired → UNCERTAIN, api_key parity unchanged. |
tests/e2e/lib/README.md |
One-paragraph note pointing at ADR 0008. |
docs/cli-matrix.md |
Touch only if a row references the judge backend. Spot-check during implementation. |
Non-goals
- No change to the
anthropicdriver. Bearer auth on the anthropic backend is out of scope; users who want OAuth pick theclaude-codebackend. - No new env vars beyond
CLAUDE_CREDENTIALS_PATH. NoCLAUDE_OAUTH_TOKENshortcut — the file is the source of truth. - No token refresh. ADR 0002 already documents that refresh fails
under the RO mount. The driver surfaces expiry as UNCERTAIN; it does
not attempt a refresh write. Users re-run
task e2e:auth:claude. - No Copilot / Gemini OAuth drivers. Symmetric implementation is out of scope for this PR; ADR 0008 only adds the mechanism.
- No bundled-component changes. Nothing under
community-config/,.gemini/, or.claude/is touched.scripts/build-components.shis not required. - No version bump. No
community-config/skills/*/SKILL.mdorcommunity-config/agents/*/AGENT.mdis modified.
Blast radius
- Files modified on
maintoday: 4 (llm_judge.sh,defaults.toml,local.toml.example,test-e2e-llm-judge-lib.sh). - Files added: 2 (the driver and this ADR).
- Public-contract additions:
- New
[judge].auth_modefield — additive, default preserves behavior. - New
claude-codebackend value — additive. - New
CLAUDE_CREDENTIALS_PATHenv override — additive.
- New
- Risks:
- OAuth token schema drift if the Claude Code CLI changes the
claudeAiOauth.accessTokenpath. Mitigated by the UNVERIFIED flag above — the developer confirms the path empirically before merge. - Token expiry surfacing as 401 inside
_callrather than_preflightwhenexpiresAtis missing. Caller maps to UNCERTAIN via the existing malformed-output path (ADR 0007 §3) — acceptable, not catastrophic. - File-permission leak (world-readable
.credentials.json). Out of scope for this PR but worth a security-skill check during review.
- OAuth token schema drift if the Claude Code CLI changes the
- Reversibility: Easy. The driver file can be deleted and the
auth_modefield removed; default behavior is untouched.