Two zero-human AI companies. Same product. Different strategies. One winner.

Gladiator pits Blitz (growth-first, 4 agents) against Craft (quality-first, 5 agents) to maximize GitHub stars on identical starter repos. Nine Claude Sonnet agents run autonomously via Paperclip orchestration and Hermes Agent workers. A live dashboard visualizes the battle in real-time.

Built for the Nous Research Hermes Agent Hackathon (March 2026). Demo video.

What you’ll see

  1. Landing page: the product (llm-judge), the rules and the two rival companies
  2. Live dashboard: scoreboard, task boards, code comparison, audit trail, learning evidence, merge controls
  3. Competition: 9 agents complete 10 tasks in ~6 minutes, creating skills, growing memory and writing code
  4. Winner announcement: auto-detects completion, declares winner by projected GitHub stars
  5. Merge: companies unite, skills transfer across teams, proving Hermes learning is real

Architecture

ComponentTechPort
PaperclipNode.js + PostgreSQL 163100
Hermes AgentPython CLI + Anthropic API-
DashboardFastAPI + SSE + vanilla JS4000
Evidence DBSQLite (WAL mode)-
WatcherPython daemon-

Hermes features used

FeatureHow it’s usedEvidence
SkillsAgents create reusable SKILL.md files after complex tasksskill_snapshots table, version diffs
MemoryAgents save strategies and learnings to MEMORY.mdmemory_snapshots table, char growth
SessionsSession IDs chain across heartbeatsheartbeat_metrics.session_id
Skill usageAgents reference and apply their learned skillsskill_usage_events table
Cross-agent learningPost-merge: agents use skills from rival teamlearning_milestones type=cross_agent

Paperclip orchestration

  • Companies define budget and organizational structure
  • Agents are autonomous workers with roles, models and heartbeat intervals
  • Issues are tasks assigned to agents (checkout, work, mark done)
  • Heartbeats trigger agent execution at configured intervals
  • The hermes_local adapter spawns hermes chat -q "prompt" -Q as a subprocess

The merger

After the competition, a merge script combines both companies into “Gladiator United.” It deduplicates Hermes skill libraries, keeps the highest-versioned copy of each skill and copies everything into a shared HERMES_HOME. A post-merge task assigns a former Blitz engineer to load and apply a Craft-authored skill, producing a verifiable cross-agent learning event on the dashboard.

Evidence and tracing

A Python watcher daemon polls Paperclip’s heartbeat API every 10 seconds and snapshots all Hermes learning artifacts into SQLite. Five tables track skill creation and versioning, memory growth per heartbeat, token usage and cost per agent, skill usage events and auto-detected learning milestones. The evidence database is the single source of truth for the dashboard control plane, where every panel (scoreboard, task boards, learning evidence, cost tracking) queries it directly. A typical run produces 19 skills, 49 learning milestones and 387 evidence records.

Total API cost: $5-6 per competition.