🔒

Security

Defense-in-depth security architecture and governance

Core Security Philosophy

Four principles guide every design decision in the gh-aw security architecture.

🛡️

Defense in Depth

Multiple independent layers — substrate isolation, configuration hardening, and runtime planning — each enforce distinct security properties independently.

🔑

Zero-Secret Agents

Agents have zero access to tokens, keys, or credentials. LLM keys live in an isolated API Proxy; MCP credentials in a separate Gateway. Architectural boundary, not policy.

📋

Stage & Vet All Writes

Every write is buffered as a structured artifact, analyzed by deterministic filters and a security-focused AI, then executed by separate permission-scoped jobs.

📊

Log Everything

Comprehensive observability at every trust boundary enables forensic reconstruction and anomaly detection. Logging is a structural feature, not optional instrumentation.

“Our first mitigation was to isolate the agent in a dedicated container with tightly controlled egress: firewalled internet access, MCP access through a trusted MCP gateway, and LLM API calls through an API proxy.” — Landon Cox & Jiaxiao Zhou, GitHub Engineering Blog

What the Adversary Can Do

Two properties fundamentally change the threat model: agents reason autonomously over untrusted inputs, and GitHub Actions provides a shared, highly-permissive trust domain.

⚠️

Access & Corrupt Shared State

A compromised agent can read environment variables, config files, SSH keys, /proc state, and workflow logs to discover credentials and sensitive data.

⚠️

Communicate Over Unintended Channels

Encode secrets in PR descriptions, issue bodies, comments, or other public-facing objects to exfiltrate data through legitimate GitHub features.

⚠️

Abuse Legitimate Channels

Spam issues and PRs to overwhelm maintainers, inject objectionable content, or trigger cost escalation through excessive API usage.

⚠️

Confuse Control Logic

Deviate from expected workflows, manipulate upstream decision-making, or exploit prompt injection via crafted issues, comments, and hidden Unicode.

Three Primary Risk Classes

Critical

Data Exfiltration

Agent leaks context — including tokens — to unintended destinations via network calls or encoded in public-facing objects like PR descriptions.

High

Impersonation

Unclear who issued a directive or who is accountable for agent actions. Without proper attribution, malicious actions can be concealed.

High

Prompt Injection

Malicious users craft inputs — issues, files, comments with hidden Unicode — that trick agents into performing unauthorized behavior.

Three-Layer Defense in Depth

Each layer enforces independent security properties and constrains the impact of failures in the layers above it.

Layer 3

🧠 Planning Layer

Constrains behavior over time. SafeOutputs MCP buffers writes as artifacts. Content sanitization strips injection vectors. Threat detection AI analyzes all outputs. Secret redaction scrubs credentials. GitHub Lockdown Mode filters untrusted input in public repos.

Layer 2

⚙️ Configuration Layer

Declarative artifacts and a trusted compiler produce security-hardened .lock.yml workflows. Schema validation, expression allowlists, Action SHA pinning via actions-lock.json, and security scanners (actionlint, zizmor, poutine) all run at compile time. Strict mode adds further constraints.

Layer 1

🏗️ Substrate Layer

Kernel-enforced isolation via Actions runner VM, Docker containers, iptables, and cgroups. Three privileged containers — Network Firewall (AWF), API Proxy, and MCP Gateway — mediate all external communication. Holds even if the agent container is fully compromised.

Critical Guarantee

Substrate protections hold even if the agent executes arbitrary code. A breach at this layer would require vulnerabilities in the container runtime, kernel, hypervisor, or hardware.

Safe Outputs Pipeline

All write operations follow a staged pipeline: buffer → detect → apply. The agent never writes directly to external state.

1

Buffer

Agent runs with read-only permissions. Write actions (create issue, open PR) are buffered as structured JSON artifacts — never executed immediately.

Read-Only Perms
2

Detect

Buffered artifacts pass through deterministic filters and a security-focused AI agent. Threat detection emits a pass/fail verdict that gates all writes.

No Write Perms
3

Apply

Only artifacts that survive the full pipeline are applied by separate, permission-scoped jobs — create_issue, add_comment, create_pr, add_labels.

Scoped Write Perms

Three Categories of Control

ControlDescriptionExample
Operation AllowlistingAuthors specify which write operations an agent may performcreate-issue, create-pull-request, add-comment
Volume LimitsCap the number of allowed operations per runAt most 3 PRs per run
Content SanitizationAnalyze and filter output contentRemove untrusted URLs, potential secrets, objectionable content
💡

Missing-Data Safe Output

A built-in safe output called missing-data encourages agents to explicitly report when they lack data to complete a task, rather than hallucinating. This rewards honest agent behavior and creates a feedback loop for improvement.

Agent Workflow Firewall (AWF)

Open-source network firewall (github/gh-aw-firewall) that containerizes the agent, isolates the network, and routes all traffic through a Squid proxy with configurable domain allowlists.

🤖 Agent Container

Isolated network
namespace

🔀 Squid Proxy

Domain allowlist
at 172.30.0.10

✅ Allowed

Traffic proceeds

🚫 Blocked

Dropped & logged

How It Works

  1. AWF establishes an isolated network namespace with a Squid proxy
  2. iptables rules redirect all HTTP/HTTPS traffic through the proxy
  3. The proxy evaluates each request against the domain allowlist
  4. Allowed traffic proceeds; blocked traffic is dropped and logged
  5. AWF drops its iptables capabilities before launching the agent, preventing network reconfiguration

Configuration

network: firewall: true allowed: - defaults # Certificates, JSON schema - python # PyPI, Conda - node # npm, npmjs.com - “api.example.com” # Custom domain

Three-Container Design

🔀

Squid Proxy

Filters all outbound traffic by domain allowlist. Supports SSL Bump for HTTPS content inspection and URL path filtering.

🤖

Agent Container

Runs the command with all HTTP/HTTPS routed through Squid. Cannot reconfigure iptables — capabilities are dropped pre-launch.

🔑

API Proxy Sidecar

Holds LLM API keys so they never reach the agent process. Injects credentials into outbound requests transparently.

🔒

Chroot Mode for Coding Agents

Host filesystem mounted read-only at /host. Selected paths overlaid with empty tmpfs layers. Agent runs in a chroot jail, retaining access to host binaries (Python, Node, Go) while controlling network access.

✍️

Cosign-Signed Releases

All AWF release artifacts are cosign-signed for image verification. All network traffic is logged for forensic analysis. SSL Bump enables HTTPS content inspection.

Threat Detection Job

A critical sub-stage within SafeOutputs that runs as a separate, isolated job — analyzing all agent outputs before any writes are externalized.

How It Works

  1. Downloads the buffered artifacts — agent outputs, git patches, prompt context
  2. Invokes a security-focused AI agent with a hardened system prompt to analyze them
  3. Emits a pass/fail verdict that gates all subsequent safe output jobs

If threats are detected, the workflow terminates before any writes are externalized.

What It Scans For

🔍

Secret Leaks

API keys, tokens, credentials embedded in outputs or code patches

🔍

Malicious Patches

Backdoors, vulnerabilities, suspicious code modifications in git diffs

🔍

Policy Violations

Scope violations, unauthorized operations, out-of-bounds actions

Key Properties

🔐

Isolated Execution

No write permissions, no access to the agent’s runtime state

🤖

AI-Powered Analysis

Uses the same AI engine with a security-focused system prompt

📦

Artifact-Based

Only sees buffered artifacts, not live repository state

🚫

Blocking Gate

Must emit a “safe” verdict before any writes proceed

Custom Detection

threat-detection: prompt: | Additionally check for: - References to internal infrastructure URLs - Attempts to modify CI/CD configuration files - Changes to security-sensitive files steps: - name: Run TruffleHog run: trufflehog filesystem . —only-verified - name: Run Semgrep run: semgrep scan aw.patch —config=auto - name: Run LlamaGuard run: llamaguard analyze agent_output.json

API Proxy & MCP Gateway

Secrets never enter the agent container. Two trusted sidecar containers hold all authentication material and mediate every external call.

🔑 API Proxy

A trusted sidecar that holds agent authentication tokens (LLM API keys) so they never enter the agent container.

  • Agents route model traffic through the proxy
  • Proxy injects credentials into outbound requests
  • Agent sees only the proxy endpoint — never actual API keys
  • All request/response metadata, token usage, and details are logged

🌐 MCP Gateway

A trusted container managing communication between the agent and MCP (Model Context Protocol) servers.

  • Spawns isolated MCP server containers — prevents cross-server attacks
  • Holds MCP authentication material outside agent boundary
  • Logs every tool invocation: name, arguments, result, timing
  • Routes all MCP requests through a unified HTTP endpoint
🛡️

Why This Matters

Prompt-injected agents with shell access could read config files, SSH keys, /proc state, and workflow logs to discover credentials. The API Proxy and MCP Gateway eliminate this entire attack surface by keeping secrets outside the agent’s isolation boundary.

Security Scanners

Three security scanners integrate into the compilation pipeline, catching issues before workflows ever execute.

Linting

actionlint

—actionlint

Workflow linting — catches syntax errors, type mismatches, and best-practice violations in GitHub Actions workflow definitions.

Security

zizmor

—zizmor

Security auditing — detects script injection vulnerabilities, excessive permissions, and unsafe GitHub context expressions that could enable privilege escalation.

Supply Chain

poutine

—poutine

Supply-chain security — identifies unpinned action references, dangerous pull request event triggers, and other supply chain attack vectors.

Compilation Commands

# Standard compilation gh aw compile # With all security scanners gh aw compile —actionlint —zizmor —poutine # Strict mode with validation gh aw compile —strict # Full validation (compile —validate —no-emit —zizmor —actionlint —poutine) gh aw validate

Real-World Results

From the December 2025 static analysis report scanning 119 agentic workflows:

0
Linting Errors
(actionlint)
0
Supply Chain Issues
(poutine)
13
Info/Low Warnings
(zizmor)
3
Medium Severity
(permissions)
0
Critical/High
Severity
93%
Workflows Passed
All Checks

Content Sanitization Pipeline

User-generated content is sanitized before the agent ever sees it, neutralizing injection vectors at the activation boundary.

MechanismExampleProtection
@mention Neutralization@user`@user`Prevents unintended notifications
Bot Trigger Protectionfixes #123`fixes #123`Prevents automatic issue linking
XML/HTML Tag Conversion<script>(script)Prevents injection via HTML tags
URI Filteringhttp://evil.com(redacted)Restricts to HTTPS from trusted domains
Unicode NormalizationHomoglyphs → normalized formsPrevents visual spoofing attacks
Content LimitsLarge payloads → truncatedEnforces 0.5 MB max, 65k lines max
Control Character RemovalANSI escapes → strippedRemoves terminal manipulation codes

GitHub’s 6 Rules for Agentic Security

From Rahul Zhade’s blog post (November 2025) — six rules that apply across all of GitHub’s hosted agentic products.

1

Ensure All Context Is Visible

Display context source files and strip invisible Unicode/HTML before passing to the agent. Users must be able to see exactly what the agent sees.

2

Firewall the Agent

AWF limits network access with configurable domain allowlists. Agents cannot reach arbitrary endpoints — all traffic is proxied and filtered.

3

Limit Access to Sensitive Information

Only provide information absolutely necessary for the task. CI secrets are excluded. Authentication tokens are revoked after each session completes.

4

Prevent Irreversible State Changes

Agents create pull requests — not direct commits. CI doesn’t auto-run on agent PRs. A human must validate the code and manually trigger workflows.

5

Consistently Attribute Actions

Actions are co-committed by the initiating user. The agent identity (Copilot) is clearly marked on all PRs. Accountability chains are always visible.

6

Only Gather Context from Authorized Users

Agents can only be assigned by users with write access. In public repos, agents only read comments from users with write access — preventing context poisoning.

Enterprise Governance

Centralized policy management, comprehensive audit trails, and role-based controls for organizations operating agents at scale.

📦

Centralized Policy via Imports

Organizations maintain shared agents, skills, and security policies in a central repository and distribute them via version-pinned imports across all repos.

imports: - org/security-policies/.github/agents/secure-agent.agent.md@v2.1.0 - org/shared-skills/.github/skills/compliance-check/SKILL.md@v1.0.0
👤

Role-Based Activation

Pre-activation validates that the triggering user has appropriate permissions. Integrates with GitHub’s roles (read, triage, write, maintain, admin) and can be further constrained with roles: restrictions.

📜

Complete Audit Trail

Every workflow run produces a downloadable artifact trail enabling post-incident forensics, policy compliance validation, cost monitoring, and anomaly detection.

🏛️

Repository Ruleset Integration

All existing rulesets and branch protections apply. Required reviewers, status checks, CODEOWNERS, and merge queue policies gate agent PRs exactly as human PRs.

🚫

No Auto-Merge, No Auto-CI

Pull requests from agents are never auto-merged. CI does not auto-run on agent PRs. A human must inspect, approve, and manually trigger — preserving the human-in-the-loop boundary.

Observability & Auditing

BoundaryWhat Is Logged
Network FirewallAll HTTP/HTTPS requests: destination, status, bytes transferred
API ProxyLLM request/response metadata, token usage, authenticated request details
MCP GatewayEvery tool invocation: tool name, arguments, result, timing
MCP ServersIndividual tool call details and responses
Agent ContainerEnvironment variable accesses, file operations, process execution

Saved Artifacts per Run

📄

agent_output.json

Agent decisions & buffered actions

💬

prompt.txt

Generated prompts sent to AI engine

🔧

aw.patch

Git diff of code changes

Complete Security Layers

Every layer from kernel isolation to AI-powered detection working together to ensure automation scales without scaling risk.

LayerMechanismProtects Against
SubstrateActions runner VM (kernel, hypervisor)Memory corruption, privilege escalation, host escape
SubstrateDocker container isolationProcess isolation bypass, shared state access
SubstrateAWF network controls (iptables + Squid)Data exfiltration, unauthorized API calls
SubstrateMCP sandboxing (per-server containers)Cross-server attacks, unauthorized tool access
ConfigSchema validation, expression allowlistInvalid configurations, secret exposure
ConfigAction SHA pinningSupply-chain attacks, tag hijacking
ConfigSecurity scanners (actionlint, zizmor, poutine)Privilege escalation, misconfigurations
PlanContent sanitization & Lockdown Mode@mention abuse, injection, context poisoning
PlanSecret redaction (pre-upload)Credential leakage in logs/artifacts
PlanAI-powered threat detectionMalicious patches, secret leaks
PlanSafeOutputs permission separationDirect write access abuse

“AI agents cannot be trusted the way deterministic automation can. Rather than attempting to make agents trustworthy, the system makes them containable.” — GitHub Engineering, Agentic Workflows Security Architecture

References & Further Reading

1. Cox, L. & Zhou, J. (2026). “Under the hood: Security architecture of GitHub Agentic Workflows.” GitHub Blog.

2. Zhade, R. (2025). “How GitHub’s agentic security principles make our AI agents as secure as possible.” GitHub Blog.

3. GitHub. github/gh-aw-firewall — Open-source network firewall for agentic workflows.

4. GitHub. GitHub Agentic Workflows — Official Documentation.

5. GitHub. Static Analysis Report — December 2025.