🔒

Security

Defense-in-depth security architecture and governance

Foundation

Core Security Philosophy

Four principles guide every design decision in the gh-aw security architecture.

🛡️

Defense in Depth

Multiple independent layers — substrate isolation, configuration hardening, and runtime planning — each enforce distinct security properties independently.

🔑

Zero-Secret Agents

Agents have zero access to tokens, keys, or credentials. LLM keys live in an isolated API Proxy; MCP credentials in a separate Gateway. Architectural boundary, not policy.

📋

Stage & Vet All Writes

Every write is buffered as a structured artifact, analyzed by deterministic filters and a security-focused AI, then executed by separate permission-scoped jobs.

📊

Log Everything

Comprehensive observability at every trust boundary enables forensic reconstruction and anomaly detection. Logging is a structural feature, not optional instrumentation.

“Our first mitigation was to isolate the agent in a dedicated container with tightly controlled egress: firewalled internet access, MCP access through a trusted MCP gateway, and LLM API calls through an API proxy.” — Landon Cox & Jiaxiao Zhou, GitHub Engineering Blog

Threat Model

What the Adversary Can Do

Two properties fundamentally change the threat model: agents reason autonomously over untrusted inputs, and GitHub Actions provides a shared, highly-permissive trust domain.

⚠️

Access & Corrupt Shared State

A compromised agent can read environment variables, config files, SSH keys, /proc state, and workflow logs to discover credentials and sensitive data.

⚠️

Communicate Over Unintended Channels

Encode secrets in PR descriptions, issue bodies, comments, or other public-facing objects to exfiltrate data through legitimate GitHub features.

⚠️

Abuse Legitimate Channels

Spam issues and PRs to overwhelm maintainers, inject objectionable content, or trigger cost escalation through excessive API usage.

⚠️

Confuse Control Logic

Deviate from expected workflows, manipulate upstream decision-making, or exploit prompt injection via crafted issues, comments, and hidden Unicode.

Risk Classes

Three Primary Risk Classes

Critical

Data Exfiltration

Agent leaks context — including tokens — to unintended destinations via network calls or encoded in public-facing objects like PR descriptions.

High

Impersonation

Unclear who issued a directive or who is accountable for agent actions. Without proper attribution, malicious actions can be concealed.

High

Prompt Injection

Malicious users craft inputs — issues, files, comments with hidden Unicode — that trick agents into performing unauthorized behavior.

Architecture

Three-Layer Defense in Depth

Each layer enforces independent security properties and constrains the impact of failures in the layers above it.

Layer 3

🧠 Planning Layer

Constrains behavior over time. SafeOutputs MCP buffers writes as artifacts. Content sanitization strips injection vectors. Threat detection AI analyzes all outputs. Secret redaction scrubs credentials. GitHub Lockdown Mode filters untrusted input in public repos.

Layer 2

⚙️ Configuration Layer

Declarative artifacts and a trusted compiler produce security-hardened .lock.yml workflows. Schema validation, expression allowlists, Action SHA pinning via actions-lock.json, and security scanners (actionlint, zizmor, poutine) all run at compile time. Strict mode adds further constraints.

Layer 1

🏗️ Substrate Layer

Kernel-enforced isolation via Actions runner VM, Docker containers, iptables, and cgroups. Three privileged containers — Network Firewall (AWF), API Proxy, and MCP Gateway — mediate all external communication. Holds even if the agent container is fully compromised.

✅

Critical Guarantee

Substrate protections hold even if the agent executes arbitrary code. A breach at this layer would require vulnerabilities in the container runtime, kernel, hypervisor, or hardware.

Write Control

Safe Outputs Pipeline

All write operations follow a staged pipeline: buffer → detect → apply. The agent never writes directly to external state.

Buffer

Agent runs with read-only permissions. Write actions (create issue, open PR) are buffered as structured JSON artifacts — never executed immediately.

Read-Only Perms

→

Detect

Buffered artifacts pass through deterministic filters and a security-focused AI agent. Threat detection emits a pass/fail verdict that gates all writes.

No Write Perms

→

Apply

Only artifacts that survive the full pipeline are applied by separate, permission-scoped jobs — create_issue, add_comment, create_pr, add_labels.

Scoped Write Perms

Three Categories of Control

Control	Description	Example
Operation Allowlisting	Authors specify which write operations an agent may perform	`create-issue`, `create-pull-request`, `add-comment`
Volume Limits	Cap the number of allowed operations per run	At most 3 PRs per run
Content Sanitization	Analyze and filter output content	Remove untrusted URLs, potential secrets, objectionable content

💡

Missing-Data Safe Output

A built-in safe output called missing-data encourages agents to explicitly report when they lack data to complete a task, rather than hallucinating. This rewards honest agent behavior and creates a feedback loop for improvement.

Network Isolation

Agent Workflow Firewall (AWF)

Open-source network firewall (github/gh-aw-firewall) that containerizes the agent, isolates the network, and routes all traffic through a Squid proxy with configurable domain allowlists.

🤖 Agent Container

Isolated network
namespace

→

🔀 Squid Proxy

Domain allowlist
at 172.30.0.10

→

✅ Allowed

Traffic proceeds

🚫 Blocked

Dropped & logged

How It Works

AWF establishes an isolated network namespace with a Squid proxy
iptables rules redirect all HTTP/HTTPS traffic through the proxy
The proxy evaluates each request against the domain allowlist
Allowed traffic proceeds; blocked traffic is dropped and logged
AWF drops its iptables capabilities before launching the agent, preventing network reconfiguration

Configuration

network:
  firewall: true
  allowed:
    - defaults        # Certificates, JSON schema
    - python          # PyPI, Conda
    - node            # npm, npmjs.com
    - “api.example.com”  # Custom domain

Three-Container Design

🔀

Squid Proxy

Filters all outbound traffic by domain allowlist. Supports SSL Bump for HTTPS content inspection and URL path filtering.

🤖

Agent Container

Runs the command with all HTTP/HTTPS routed through Squid. Cannot reconfigure iptables — capabilities are dropped pre-launch.

🔑

API Proxy Sidecar

Holds LLM API keys so they never reach the agent process. Injects credentials into outbound requests transparently.

🔒

Chroot Mode for Coding Agents

Host filesystem mounted read-only at /host. Selected paths overlaid with empty tmpfs layers. Agent runs in a chroot jail, retaining access to host binaries (Python, Node, Go) while controlling network access.

✍️

Cosign-Signed Releases

All AWF release artifacts are cosign-signed for image verification. All network traffic is logged for forensic analysis. SSL Bump enables HTTPS content inspection.

Detection

Threat Detection Job

A critical sub-stage within SafeOutputs that runs as a separate, isolated job — analyzing all agent outputs before any writes are externalized.

How It Works

Downloads the buffered artifacts — agent outputs, git patches, prompt context
Invokes a security-focused AI agent with a hardened system prompt to analyze them
Emits a pass/fail verdict that gates all subsequent safe output jobs

If threats are detected, the workflow terminates before any writes are externalized.

What It Scans For

🔍

Secret Leaks

API keys, tokens, credentials embedded in outputs or code patches

🔍

Malicious Patches

Backdoors, vulnerabilities, suspicious code modifications in git diffs

🔍

Policy Violations

Scope violations, unauthorized operations, out-of-bounds actions

Key Properties

🔐

Isolated Execution

No write permissions, no access to the agent’s runtime state

🤖

AI-Powered Analysis

Uses the same AI engine with a security-focused system prompt

📦

Artifact-Based

Only sees buffered artifacts, not live repository state

🚫

Blocking Gate

Must emit a “safe” verdict before any writes proceed

Custom Detection

threat-detection:
  prompt: |
    Additionally check for:
    - References to internal infrastructure URLs
    - Attempts to modify CI/CD configuration files
    - Changes to security-sensitive files
  steps:
    - name: Run TruffleHog
      run: trufflehog filesystem . —only-verified
    - name: Run Semgrep
      run: semgrep scan aw.patch —config=auto
    - name: Run LlamaGuard
      run: llamaguard analyze agent_output.json

Token Isolation

API Proxy & MCP Gateway

Secrets never enter the agent container. Two trusted sidecar containers hold all authentication material and mediate every external call.

🔑 API Proxy

A trusted sidecar that holds agent authentication tokens (LLM API keys) so they never enter the agent container.

Agents route model traffic through the proxy
Proxy injects credentials into outbound requests
Agent sees only the proxy endpoint — never actual API keys
All request/response metadata, token usage, and details are logged

🌐 MCP Gateway

A trusted container managing communication between the agent and MCP (Model Context Protocol) servers.

Spawns isolated MCP server containers — prevents cross-server attacks
Holds MCP authentication material outside agent boundary
Logs every tool invocation: name, arguments, result, timing
Routes all MCP requests through a unified HTTP endpoint

🛡️

Why This Matters

Prompt-injected agents with shell access could read config files, SSH keys, /proc state, and workflow logs to discover credentials. The API Proxy and MCP Gateway eliminate this entire attack surface by keeping secrets outside the agent’s isolation boundary.

Static Analysis

Security Scanners

Three security scanners integrate into the compilation pipeline, catching issues before workflows ever execute.

Linting

actionlint

—actionlint

Workflow linting — catches syntax errors, type mismatches, and best-practice violations in GitHub Actions workflow definitions.

Security

zizmor

—zizmor

Security auditing — detects script injection vulnerabilities, excessive permissions, and unsafe GitHub context expressions that could enable privilege escalation.

Supply Chain

poutine

—poutine

Supply-chain security — identifies unpinned action references, dangerous pull request event triggers, and other supply chain attack vectors.

Compilation Commands

# Standard compilation
gh aw compile

# With all security scanners
gh aw compile —actionlint —zizmor —poutine

# Strict mode with validation
gh aw compile —strict

# Full validation (compile —validate —no-emit —zizmor —actionlint —poutine)
gh aw validate

Real-World Results

From the December 2025 static analysis report scanning 119 agentic workflows:

Linting Errors
(actionlint)

Supply Chain Issues
(poutine)

Info/Low Warnings
(zizmor)

Medium Severity
(permissions)

Critical/High
Severity

93%

Workflows Passed
All Checks

Input Filtering

Content Sanitization Pipeline

User-generated content is sanitized before the agent ever sees it, neutralizing injection vectors at the activation boundary.

Mechanism	Example	Protection
@mention Neutralization	`@user` → `@user`	Prevents unintended notifications
Bot Trigger Protection	`fixes #123` → `fixes #123`	Prevents automatic issue linking
XML/HTML Tag Conversion	`<script>` → `(script)`	Prevents injection via HTML tags
URI Filtering	`http://evil.com` → `(redacted)`	Restricts to HTTPS from trusted domains
Unicode Normalization	Homoglyphs → normalized forms	Prevents visual spoofing attacks
Content Limits	Large payloads → truncated	Enforces 0.5 MB max, 65k lines max
Control Character Removal	ANSI escapes → stripped	Removes terminal manipulation codes

Principles

GitHub’s 6 Rules for Agentic Security

From Rahul Zhade’s blog post (November 2025) — six rules that apply across all of GitHub’s hosted agentic products.

Ensure All Context Is Visible

Display context source files and strip invisible Unicode/HTML before passing to the agent. Users must be able to see exactly what the agent sees.

Firewall the Agent

AWF limits network access with configurable domain allowlists. Agents cannot reach arbitrary endpoints — all traffic is proxied and filtered.

Limit Access to Sensitive Information

Only provide information absolutely necessary for the task. CI secrets are excluded. Authentication tokens are revoked after each session completes.

Prevent Irreversible State Changes

Agents create pull requests — not direct commits. CI doesn’t auto-run on agent PRs. A human must validate the code and manually trigger workflows.

Consistently Attribute Actions

Actions are co-committed by the initiating user. The agent identity (Copilot) is clearly marked on all PRs. Accountability chains are always visible.

Only Gather Context from Authorized Users

Agents can only be assigned by users with write access. In public repos, agents only read comments from users with write access — preventing context poisoning.

Enterprise

Enterprise Governance

Centralized policy management, comprehensive audit trails, and role-based controls for organizations operating agents at scale.

📦

Centralized Policy via Imports

Organizations maintain shared agents, skills, and security policies in a central repository and distribute them via version-pinned imports across all repos.

imports:
  - org/security-policies/.github/agents/secure-agent.agent.md@v2.1.0
  - org/shared-skills/.github/skills/compliance-check/SKILL.md@v1.0.0

👤

Role-Based Activation

Pre-activation validates that the triggering user has appropriate permissions. Integrates with GitHub’s roles (read, triage, write, maintain, admin) and can be further constrained with roles: restrictions.

📜

Complete Audit Trail

Every workflow run produces a downloadable artifact trail enabling post-incident forensics, policy compliance validation, cost monitoring, and anomaly detection.

🏛️

Repository Ruleset Integration

All existing rulesets and branch protections apply. Required reviewers, status checks, CODEOWNERS, and merge queue policies gate agent PRs exactly as human PRs.

🚫

No Auto-Merge, No Auto-CI

Pull requests from agents are never auto-merged. CI does not auto-run on agent PRs. A human must inspect, approve, and manually trigger — preserving the human-in-the-loop boundary.

Observability & Auditing

Boundary	What Is Logged
Network Firewall	All HTTP/HTTPS requests: destination, status, bytes transferred
API Proxy	LLM request/response metadata, token usage, authenticated request details
MCP Gateway	Every tool invocation: tool name, arguments, result, timing
MCP Servers	Individual tool call details and responses
Agent Container	Environment variable accesses, file operations, process execution

Saved Artifacts per Run

📄

`agent_output.json`

Agent decisions & buffered actions

💬

`prompt.txt`

Generated prompts sent to AI engine

🔧

`aw.patch`

Git diff of code changes

Summary

Complete Security Layers

Every layer from kernel isolation to AI-powered detection working together to ensure automation scales without scaling risk.

Layer	Mechanism	Protects Against
Substrate	Actions runner VM (kernel, hypervisor)	Memory corruption, privilege escalation, host escape
Substrate	Docker container isolation	Process isolation bypass, shared state access
Substrate	AWF network controls (iptables + Squid)	Data exfiltration, unauthorized API calls
Substrate	MCP sandboxing (per-server containers)	Cross-server attacks, unauthorized tool access
Config	Schema validation, expression allowlist	Invalid configurations, secret exposure
Config	Action SHA pinning	Supply-chain attacks, tag hijacking
Config	Security scanners (actionlint, zizmor, poutine)	Privilege escalation, misconfigurations
Plan	Content sanitization & Lockdown Mode	@mention abuse, injection, context poisoning
Plan	Secret redaction (pre-upload)	Credential leakage in logs/artifacts
Plan	AI-powered threat detection	Malicious patches, secret leaks
Plan	SafeOutputs permission separation	Direct write access abuse

“AI agents cannot be trusted the way deterministic automation can. Rather than attempting to make agents trustworthy, the system makes them containable.” — GitHub Engineering, Agentic Workflows Security Architecture

Sources

References & Further Reading

1. Cox, L. & Zhou, J. (2026). “Under the hood: Security architecture of GitHub Agentic Workflows.” GitHub Blog.

2. Zhade, R. (2025). “How GitHub’s agentic security principles make our AI agents as secure as possible.” GitHub Blog.

3. GitHub. github/gh-aw-firewall — Open-source network firewall for agentic workflows.

4. GitHub. GitHub Agentic Workflows — Official Documentation.

5. GitHub. Static Analysis Report — December 2025.