Security
Defense-in-depth security architecture and governance
Core Security Philosophy
Four principles guide every design decision in the gh-aw security architecture.
Defense in Depth
Multiple independent layers — substrate isolation, configuration hardening, and runtime planning — each enforce distinct security properties independently.
Zero-Secret Agents
Agents have zero access to tokens, keys, or credentials. LLM keys live in an isolated API Proxy; MCP credentials in a separate Gateway. Architectural boundary, not policy.
Stage & Vet All Writes
Every write is buffered as a structured artifact, analyzed by deterministic filters and a security-focused AI, then executed by separate permission-scoped jobs.
Log Everything
Comprehensive observability at every trust boundary enables forensic reconstruction and anomaly detection. Logging is a structural feature, not optional instrumentation.
“Our first mitigation was to isolate the agent in a dedicated container with tightly controlled egress: firewalled internet access, MCP access through a trusted MCP gateway, and LLM API calls through an API proxy.” — Landon Cox & Jiaxiao Zhou, GitHub Engineering Blog
What the Adversary Can Do
Two properties fundamentally change the threat model: agents reason autonomously over untrusted inputs, and GitHub Actions provides a shared, highly-permissive trust domain.
Access & Corrupt Shared State
A compromised agent can read environment variables, config files, SSH keys, /proc state, and workflow logs to discover credentials and sensitive data.
Communicate Over Unintended Channels
Encode secrets in PR descriptions, issue bodies, comments, or other public-facing objects to exfiltrate data through legitimate GitHub features.
Abuse Legitimate Channels
Spam issues and PRs to overwhelm maintainers, inject objectionable content, or trigger cost escalation through excessive API usage.
Confuse Control Logic
Deviate from expected workflows, manipulate upstream decision-making, or exploit prompt injection via crafted issues, comments, and hidden Unicode.
Three Primary Risk Classes
Data Exfiltration
Agent leaks context — including tokens — to unintended destinations via network calls or encoded in public-facing objects like PR descriptions.
Impersonation
Unclear who issued a directive or who is accountable for agent actions. Without proper attribution, malicious actions can be concealed.
Prompt Injection
Malicious users craft inputs — issues, files, comments with hidden Unicode — that trick agents into performing unauthorized behavior.
Three-Layer Defense in Depth
Each layer enforces independent security properties and constrains the impact of failures in the layers above it.
🧠 Planning Layer
Constrains behavior over time. SafeOutputs MCP buffers writes as artifacts. Content sanitization strips injection vectors. Threat detection AI analyzes all outputs. Secret redaction scrubs credentials. GitHub Lockdown Mode filters untrusted input in public repos.
⚙️ Configuration Layer
Declarative artifacts and a trusted compiler produce security-hardened .lock.yml workflows. Schema validation, expression allowlists, Action SHA pinning via actions-lock.json, and security scanners (actionlint, zizmor, poutine) all run at compile time. Strict mode adds further constraints.
🏗️ Substrate Layer
Kernel-enforced isolation via Actions runner VM, Docker containers, iptables, and cgroups. Three privileged containers — Network Firewall (AWF), API Proxy, and MCP Gateway — mediate all external communication. Holds even if the agent container is fully compromised.
Critical Guarantee
Substrate protections hold even if the agent executes arbitrary code. A breach at this layer would require vulnerabilities in the container runtime, kernel, hypervisor, or hardware.
Safe Outputs Pipeline
All write operations follow a staged pipeline: buffer → detect → apply. The agent never writes directly to external state.
Buffer
Agent runs with read-only permissions. Write actions (create issue, open PR) are buffered as structured JSON artifacts — never executed immediately.
Detect
Buffered artifacts pass through deterministic filters and a security-focused AI agent. Threat detection emits a pass/fail verdict that gates all writes.
Apply
Only artifacts that survive the full pipeline are applied by separate, permission-scoped jobs — create_issue, add_comment, create_pr, add_labels.
Three Categories of Control
| Control | Description | Example |
|---|---|---|
| Operation Allowlisting | Authors specify which write operations an agent may perform | create-issue, create-pull-request, add-comment |
| Volume Limits | Cap the number of allowed operations per run | At most 3 PRs per run |
| Content Sanitization | Analyze and filter output content | Remove untrusted URLs, potential secrets, objectionable content |
Missing-Data Safe Output
A built-in safe output called missing-data encourages agents to explicitly report when they lack data to complete a task, rather than hallucinating. This rewards honest agent behavior and creates a feedback loop for improvement.
Agent Workflow Firewall (AWF)
Open-source network firewall (github/gh-aw-firewall) that containerizes the agent, isolates the network, and routes all traffic through a Squid proxy with configurable domain allowlists.
🤖 Agent Container
Isolated network
namespace
🔀 Squid Proxy
Domain allowlist
at 172.30.0.10
✅ Allowed
Traffic proceeds
🚫 Blocked
Dropped & logged
How It Works
- AWF establishes an isolated network namespace with a Squid proxy
- iptables rules redirect all HTTP/HTTPS traffic through the proxy
- The proxy evaluates each request against the domain allowlist
- Allowed traffic proceeds; blocked traffic is dropped and logged
- AWF drops its iptables capabilities before launching the agent, preventing network reconfiguration
Configuration
Three-Container Design
Squid Proxy
Filters all outbound traffic by domain allowlist. Supports SSL Bump for HTTPS content inspection and URL path filtering.
Agent Container
Runs the command with all HTTP/HTTPS routed through Squid. Cannot reconfigure iptables — capabilities are dropped pre-launch.
API Proxy Sidecar
Holds LLM API keys so they never reach the agent process. Injects credentials into outbound requests transparently.
Chroot Mode for Coding Agents
Host filesystem mounted read-only at /host. Selected paths overlaid with empty tmpfs layers. Agent runs in a chroot jail, retaining access to host binaries (Python, Node, Go) while controlling network access.
Cosign-Signed Releases
All AWF release artifacts are cosign-signed for image verification. All network traffic is logged for forensic analysis. SSL Bump enables HTTPS content inspection.
Threat Detection Job
A critical sub-stage within SafeOutputs that runs as a separate, isolated job — analyzing all agent outputs before any writes are externalized.
How It Works
- Downloads the buffered artifacts — agent outputs, git patches, prompt context
- Invokes a security-focused AI agent with a hardened system prompt to analyze them
- Emits a pass/fail verdict that gates all subsequent safe output jobs
If threats are detected, the workflow terminates before any writes are externalized.
What It Scans For
Secret Leaks
API keys, tokens, credentials embedded in outputs or code patches
Malicious Patches
Backdoors, vulnerabilities, suspicious code modifications in git diffs
Policy Violations
Scope violations, unauthorized operations, out-of-bounds actions
Key Properties
Isolated Execution
No write permissions, no access to the agent’s runtime state
AI-Powered Analysis
Uses the same AI engine with a security-focused system prompt
Artifact-Based
Only sees buffered artifacts, not live repository state
Blocking Gate
Must emit a “safe” verdict before any writes proceed
Custom Detection
API Proxy & MCP Gateway
Secrets never enter the agent container. Two trusted sidecar containers hold all authentication material and mediate every external call.
🔑 API Proxy
A trusted sidecar that holds agent authentication tokens (LLM API keys) so they never enter the agent container.
- Agents route model traffic through the proxy
- Proxy injects credentials into outbound requests
- Agent sees only the proxy endpoint — never actual API keys
- All request/response metadata, token usage, and details are logged
🌐 MCP Gateway
A trusted container managing communication between the agent and MCP (Model Context Protocol) servers.
- Spawns isolated MCP server containers — prevents cross-server attacks
- Holds MCP authentication material outside agent boundary
- Logs every tool invocation: name, arguments, result, timing
- Routes all MCP requests through a unified HTTP endpoint
Why This Matters
Prompt-injected agents with shell access could read config files, SSH keys, /proc state, and workflow logs to discover credentials. The API Proxy and MCP Gateway eliminate this entire attack surface by keeping secrets outside the agent’s isolation boundary.
Security Scanners
Three security scanners integrate into the compilation pipeline, catching issues before workflows ever execute.
actionlint
—actionlintWorkflow linting — catches syntax errors, type mismatches, and best-practice violations in GitHub Actions workflow definitions.
zizmor
—zizmorSecurity auditing — detects script injection vulnerabilities, excessive permissions, and unsafe GitHub context expressions that could enable privilege escalation.
poutine
—poutineSupply-chain security — identifies unpinned action references, dangerous pull request event triggers, and other supply chain attack vectors.
Compilation Commands
Real-World Results
From the December 2025 static analysis report scanning 119 agentic workflows:
(actionlint)
(poutine)
(zizmor)
(permissions)
Severity
All Checks
Content Sanitization Pipeline
User-generated content is sanitized before the agent ever sees it, neutralizing injection vectors at the activation boundary.
| Mechanism | Example | Protection |
|---|---|---|
| @mention Neutralization | @user → `@user` | Prevents unintended notifications |
| Bot Trigger Protection | fixes #123 → `fixes #123` | Prevents automatic issue linking |
| XML/HTML Tag Conversion | <script> → (script) | Prevents injection via HTML tags |
| URI Filtering | http://evil.com → (redacted) | Restricts to HTTPS from trusted domains |
| Unicode Normalization | Homoglyphs → normalized forms | Prevents visual spoofing attacks |
| Content Limits | Large payloads → truncated | Enforces 0.5 MB max, 65k lines max |
| Control Character Removal | ANSI escapes → stripped | Removes terminal manipulation codes |
GitHub’s 6 Rules for Agentic Security
From Rahul Zhade’s blog post (November 2025) — six rules that apply across all of GitHub’s hosted agentic products.
Ensure All Context Is Visible
Display context source files and strip invisible Unicode/HTML before passing to the agent. Users must be able to see exactly what the agent sees.
Firewall the Agent
AWF limits network access with configurable domain allowlists. Agents cannot reach arbitrary endpoints — all traffic is proxied and filtered.
Limit Access to Sensitive Information
Only provide information absolutely necessary for the task. CI secrets are excluded. Authentication tokens are revoked after each session completes.
Prevent Irreversible State Changes
Agents create pull requests — not direct commits. CI doesn’t auto-run on agent PRs. A human must validate the code and manually trigger workflows.
Consistently Attribute Actions
Actions are co-committed by the initiating user. The agent identity (Copilot) is clearly marked on all PRs. Accountability chains are always visible.
Only Gather Context from Authorized Users
Agents can only be assigned by users with write access. In public repos, agents only read comments from users with write access — preventing context poisoning.
Enterprise Governance
Centralized policy management, comprehensive audit trails, and role-based controls for organizations operating agents at scale.
Centralized Policy via Imports
Organizations maintain shared agents, skills, and security policies in a central repository and distribute them via version-pinned imports across all repos.
Role-Based Activation
Pre-activation validates that the triggering user has appropriate permissions. Integrates with GitHub’s roles (read, triage, write, maintain, admin) and can be further constrained with roles: restrictions.
Complete Audit Trail
Every workflow run produces a downloadable artifact trail enabling post-incident forensics, policy compliance validation, cost monitoring, and anomaly detection.
Repository Ruleset Integration
All existing rulesets and branch protections apply. Required reviewers, status checks, CODEOWNERS, and merge queue policies gate agent PRs exactly as human PRs.
No Auto-Merge, No Auto-CI
Pull requests from agents are never auto-merged. CI does not auto-run on agent PRs. A human must inspect, approve, and manually trigger — preserving the human-in-the-loop boundary.
Observability & Auditing
| Boundary | What Is Logged |
|---|---|
| Network Firewall | All HTTP/HTTPS requests: destination, status, bytes transferred |
| API Proxy | LLM request/response metadata, token usage, authenticated request details |
| MCP Gateway | Every tool invocation: tool name, arguments, result, timing |
| MCP Servers | Individual tool call details and responses |
| Agent Container | Environment variable accesses, file operations, process execution |
Saved Artifacts per Run
agent_output.json
Agent decisions & buffered actions
prompt.txt
Generated prompts sent to AI engine
aw.patch
Git diff of code changes
Complete Security Layers
Every layer from kernel isolation to AI-powered detection working together to ensure automation scales without scaling risk.
| Layer | Mechanism | Protects Against |
|---|---|---|
| Substrate | Actions runner VM (kernel, hypervisor) | Memory corruption, privilege escalation, host escape |
| Substrate | Docker container isolation | Process isolation bypass, shared state access |
| Substrate | AWF network controls (iptables + Squid) | Data exfiltration, unauthorized API calls |
| Substrate | MCP sandboxing (per-server containers) | Cross-server attacks, unauthorized tool access |
| Config | Schema validation, expression allowlist | Invalid configurations, secret exposure |
| Config | Action SHA pinning | Supply-chain attacks, tag hijacking |
| Config | Security scanners (actionlint, zizmor, poutine) | Privilege escalation, misconfigurations |
| Plan | Content sanitization & Lockdown Mode | @mention abuse, injection, context poisoning |
| Plan | Secret redaction (pre-upload) | Credential leakage in logs/artifacts |
| Plan | AI-powered threat detection | Malicious patches, secret leaks |
| Plan | SafeOutputs permission separation | Direct write access abuse |
“AI agents cannot be trusted the way deterministic automation can. Rather than attempting to make agents trustworthy, the system makes them containable.” — GitHub Engineering, Agentic Workflows Security Architecture
References & Further Reading
1. Cox, L. & Zhou, J. (2026). “Under the hood: Security architecture of GitHub Agentic Workflows.” GitHub Blog.
2. Zhade, R. (2025). “How GitHub’s agentic security principles make our AI agents as secure as possible.” GitHub Blog.
3. GitHub. github/gh-aw-firewall — Open-source network firewall for agentic workflows.
4. GitHub. GitHub Agentic Workflows — Official Documentation.
5. GitHub. Static Analysis Report — December 2025.