We Need gcc for Markdown: The Case for an Agent Config Compiler

The Problem

Every software team in 2026 has rigorous quality controls:

Compilers catch type errors before runtime
Linters enforce style and best practices
Tests verify behavior before deployment
CI/CD gates prevent broken code from shipping

Now look at the files that control how your AI agent behaves:

CLAUDE.md — no validation
AGENTS.md — no compilation
SKILL.md — no testing
lessons.md — no linting

These files are configuration code. They determine agent behavior as surely as source code determines program behavior. But they get zero quality assurance.

We wouldn't ship C without gcc. We shouldn't ship agent configs without mdcc.

Shannon Meets Agents

Claude Shannon's information theory gives us the framework. Every communication channel has a capacity, and useful information competes with noise.

The context window is a communication channel:

Channel capacity: The token limit (200K, 128K)
Signal: Instructions that improve agent output
Noise: Redundant rules, contradictions, vague guidance, stale lessons
Encoding: How instructions are phrased — concise vs. verbose

Shannon proved that reliable communication requires keeping noise below capacity. For agents: if instruction noise exceeds the model's processing ability, reliable behavior is impossible regardless of model quality.

The Entropy Problem

Agent configs tend to have low entropy (many tokens, little information):

"Always make sure to carefully validate all API inputs" = 10 tokens, 3 bits of info ("validate API inputs")
"Write clean, maintainable code" — what does "clean" mean? Zero actionable signal.
"Follow best practices" — which ones? Zero signal.

Every low-entropy token wastes channel capacity. A 3K-token CLAUDE.md might carry only 200 bits of real information.

Signal-to-Noise Ratio for Agent Configs

SNR = Unique, actionable instruction bits / Total instruction tokens

High SNR (> 0.5): Crisp, specific instructions
Medium SNR (0.2–0.5): Some verbosity
Low SNR (< 0.2): Bloated, contradictory, or vague

Most agent configs run at SNR 0.1–0.2. Verbose, redundant, full of "motherhood statements" that carry no signal.

The Compiler Analogy

A compiler does four things agent configs desperately need:

Parsing: Verify valid structure (headings, sections)
Semantic Analysis: Detect contradictions, redundancies, ambiguities
Optimization: Remove dead code (stale rules), reduce verbosity
Code Generation: Produce optimized, high-SNR output

gcc transformed C from dangerous to reliable. mdcc would do the same for Markdown agent configs.

mdcc: The Spec

Lint Pass (Static Analysis)

$ mdcc lint CLAUDE.md

CLAUDE.md:12 WARNING  Vague instruction: "write clean code"
  → Suggestion: Specify measurable criteria
CLAUDE.md:24 ERROR    Contradiction with line 8
  → Line 8: "Use TypeScript strict"
  → Line 24: "Follow existing convention" (codebase has JS)
CLAUDE.md:31 WARNING  Redundant with line 15
CLAUDE.md:45 INFO     Low information density (0.12 bits/token)

4 issues (1 error, 2 warnings, 1 info)
SNR: 0.18 (target: > 0.5)

Compile Pass (Optimization)

$ mdcc compile CLAUDE.md --target optimized

Input:  3,241 tokens (SNR: 0.18)
Output: 891 tokens  (SNR: 0.67)
Compression: 72.5%

Removed: 12 redundant rules
Merged:  5 overlapping rules
Flagged: 2 contradictions (manual resolution needed)

Test Pass (Behavioral Verification)

$ mdcc test CLAUDE.md --scenario fixtures/

Running 15 behavioral scenarios...
✓ TypeScript strict applied to new .ts files
✓ Input validation present on API endpoints
✗ FAIL: Line 24 conflicts with strict mode
✗ FAIL: Line 31 too broad (false positives)

13/15 passed (86.7%)

What mdcc Would Catch

Contradictions (Three-Body Conflicts)

Instructions that conflict. The AGENTS.md paper showed these cause oscillating behavior.

Redundancy (Entropy Waste)

Multiple rules saying the same thing. Each wastes context tokens and creates inconsistencies.

Vagueness (Zero-Signal Tokens)

"Write good code" is not a specification — it's a wish. A compiler would flag it like an untyped variable.

Scope Bleeding

Rules meant for one domain applied everywhere — the autoimmune drift from Part 2.

Staleness

Rules referencing deprecated APIs, fixed bugs, old patterns. Dead code that confuses the system.

Building Toward mdcc

mdcc doesn't exist yet. But you can implement its principles today:

Manual Lint Checklist

Token count: CLAUDE.md under 2K tokens? If not, audit.
Contradiction scan: Each rule — does any other conflict?
Vagueness check: "Could a junior dev implement this unambiguously?"
Scope check: Every rule appropriately scoped?
Staleness check: Last validated >60 days ago? Review or remove.
Redundancy check: Two rules saying the same thing? Merge.

Manual Test Protocol

Create 5 representative scenarios
Run the agent on each 3 times
Check behavior matches instructions
Check consistency across runs
If inconsistent → instruction conflicts

The Future

We believe mdcc will become as essential as eslint for JavaScript. Teams that treat agent configs as first-class code will outperform those that don't.

Shannon's information theory tells us the limit: your agent can only be as reliable as its signal-to-noise ratio allows. mdcc is how you raise the signal and cut the noise.

Part 4 of the Eureka Series. Previous: The Three-Body Problem. Next: Kessler Syndrome.

Get the complete hardening checklist | Subscribe to the weekly security digest