We Need gcc for Markdown: Agent Config Compiler
analysis#eureka#mdcc#compiler

We Need gcc for Markdown: The Case for an Agent Config Compiler

Imagine shipping production code with no compiler, no linter, no tests. That's exactly what every team does with their AI agent configuration. It's time for mdcc.

February 25, 202610 min read
Share

Audit your agent stack in 30 minutes

Get the free 10-point hardening checklist. Copy-paste configs for Docker, Caddy, Nginx, and UFW included.

Get the Free Checklist →

The Problem

Every software team in 2026 has rigorous quality controls:

  • Compilers catch type errors before runtime
  • Linters enforce style and best practices
  • Tests verify behavior before deployment
  • CI/CD gates prevent broken code from shipping

Now look at the files that control how your AI agent behaves:

  • CLAUDE.md — no validation
  • AGENTS.md — no compilation
  • SKILL.md — no testing
  • lessons.md — no linting

These files are configuration code. They determine agent behavior as surely as source code determines program behavior. But they get zero quality assurance.

We wouldn't ship C without gcc. We shouldn't ship agent configs without mdcc.

Shannon Meets Agents

Claude Shannon's information theory gives us the framework. Every communication channel has a capacity, and useful information competes with noise.

The context window is a communication channel:

  • Channel capacity: The token limit (200K, 128K)
  • Signal: Instructions that improve agent output
  • Noise: Redundant rules, contradictions, vague guidance, stale lessons
  • Encoding: How instructions are phrased — concise vs. verbose

Shannon proved that reliable communication requires keeping noise below capacity. For agents: if instruction noise exceeds the model's processing ability, reliable behavior is impossible regardless of model quality.

The Entropy Problem

Agent configs tend to have low entropy (many tokens, little information):

  • "Always make sure to carefully validate all API inputs" = 10 tokens, 3 bits of info ("validate API inputs")
  • "Write clean, maintainable code" — what does "clean" mean? Zero actionable signal.
  • "Follow best practices" — which ones? Zero signal.

Every low-entropy token wastes channel capacity. A 3K-token CLAUDE.md might carry only 200 bits of real information.

Signal-to-Noise Ratio for Agent Configs

SNR = Unique, actionable instruction bits / Total instruction tokens

High SNR (> 0.5): Crisp, specific instructions
Medium SNR (0.2–0.5): Some verbosity
Low SNR (< 0.2): Bloated, contradictory, or vague

Most agent configs run at SNR 0.1–0.2. Verbose, redundant, full of "motherhood statements" that carry no signal.

The Compiler Analogy

A compiler does four things agent configs desperately need:

  1. Parsing: Verify valid structure (headings, sections)
  2. Semantic Analysis: Detect contradictions, redundancies, ambiguities
  3. Optimization: Remove dead code (stale rules), reduce verbosity
  4. Code Generation: Produce optimized, high-SNR output

gcc transformed C from dangerous to reliable. mdcc would do the same for Markdown agent configs.

mdcc: The Spec

Lint Pass (Static Analysis)

$ mdcc lint CLAUDE.md

CLAUDE.md:12 WARNING  Vague instruction: "write clean code"
  → Suggestion: Specify measurable criteria
CLAUDE.md:24 ERROR    Contradiction with line 8
  → Line 8: "Use TypeScript strict"
  → Line 24: "Follow existing convention" (codebase has JS)
CLAUDE.md:31 WARNING  Redundant with line 15
CLAUDE.md:45 INFO     Low information density (0.12 bits/token)

4 issues (1 error, 2 warnings, 1 info)
SNR: 0.18 (target: > 0.5)

Compile Pass (Optimization)

$ mdcc compile CLAUDE.md --target optimized

Input:  3,241 tokens (SNR: 0.18)
Output: 891 tokens  (SNR: 0.67)
Compression: 72.5%

Removed: 12 redundant rules
Merged:  5 overlapping rules
Flagged: 2 contradictions (manual resolution needed)

Test Pass (Behavioral Verification)

$ mdcc test CLAUDE.md --scenario fixtures/

Running 15 behavioral scenarios...
✓ TypeScript strict applied to new .ts files
✓ Input validation present on API endpoints
✗ FAIL: Line 24 conflicts with strict mode
✗ FAIL: Line 31 too broad (false positives)

13/15 passed (86.7%)

What mdcc Would Catch

Contradictions (Three-Body Conflicts)

Instructions that conflict. The AGENTS.md paper showed these cause oscillating behavior.

Redundancy (Entropy Waste)

Multiple rules saying the same thing. Each wastes context tokens and creates inconsistencies.

Vagueness (Zero-Signal Tokens)

"Write good code" is not a specification — it's a wish. A compiler would flag it like an untyped variable.

Scope Bleeding

Rules meant for one domain applied everywhere — the autoimmune drift from Part 2.

Staleness

Rules referencing deprecated APIs, fixed bugs, old patterns. Dead code that confuses the system.

Building Toward mdcc

mdcc doesn't exist yet. But you can implement its principles today:

Manual Lint Checklist

  1. Token count: CLAUDE.md under 2K tokens? If not, audit.
  2. Contradiction scan: Each rule — does any other conflict?
  3. Vagueness check: "Could a junior dev implement this unambiguously?"
  4. Scope check: Every rule appropriately scoped?
  5. Staleness check: Last validated >60 days ago? Review or remove.
  6. Redundancy check: Two rules saying the same thing? Merge.

Manual Test Protocol

  1. Create 5 representative scenarios
  2. Run the agent on each 3 times
  3. Check behavior matches instructions
  4. Check consistency across runs
  5. If inconsistent → instruction conflicts

The Future

We believe mdcc will become as essential as eslint for JavaScript. Teams that treat agent configs as first-class code will outperform those that don't.

Shannon's information theory tells us the limit: your agent can only be as reliable as its signal-to-noise ratio allows. mdcc is how you raise the signal and cut the noise.

Part 4 of the Eureka Series. Previous: The Three-Body Problem. Next: Kessler Syndrome.

Get the complete hardening checklist | Subscribe to the weekly security digest

🛡️

Deploy Agentic AI Without Leaking Secrets

Join 300+ security teams getting weekly hardening guides, threat alerts, and copy-paste fixes for MCP/agent deployments.

Subscribe Free →

10-point checklist • Caddy/Nginx configs • Docker hardening • Weekly digest

#eureka#mdcc#compiler#Shannon#signal-to-noise#agent config#agentic AI#information theory

Never Miss a Security Update

Free weekly digest: new threats, tool reviews, and hardening guides for agentic AI teams.

Subscribe Free →
Share

Free: 10-Point Agent Hardening Checklist

Get It Now →