đŸ—ī¸ Getting Started

Architecture Overview

CodeSleuth AI is built around four core concepts: the RARV cycle, versioned handoffs, a structured artifact directory, and a compounding knowledge base. Understanding these makes every agent's behavior predictable.

The RARV Cycle

Every agent follows the same four-step micro-loop before producing any output:

R Reason — analyze inputs, form a plan
A Act — execute the plan
R Reflect — check the output against rules
V Verify — confirm acceptance criteria are met

No agent produces output that skips the Reflect or Verify step. This prevents silent failures and half-complete tasks.

Complexity Tiers

Every project is automatically assigned a complexity tier at Discovery. The tier governs how many security domains are reviewed, how many test contracts are required, and which performance budget thresholds apply.

TierCriteriaSecurity DomainsTC Requirement
Simple Single platform, no auth, no payments, no external APIs 5 (LOW only) 1 TC per task minimum
Standard Auth, database, 1–2 platforms, standard API integrations HIGH+ domains 2 TC per task minimum
Complex Multi-platform, payments, multi-tenant, ML/AI components All 20 domains Full test contract suite

Two-Phase Planning

Agent 2 (Technical Planning) produces all implementation artifacts in a single pass triggered by !plan.

OutputHuman ActionPurpose
TDD, INTERFACES, SCHEMA, TASK-GRAPH, contracts/ !build Full implementation blueprint with test contracts, Zod schemas, and OpenAPI stubs.
â„šī¸
Why two phases?

Phase A catches scope disagreements early — when they're cheap to fix. Without it, the Planner could spend thousands of tokens on full contracts before the human realizes the task count is wrong or a platform was missed.

Spec Change Protocol

Mid-build scope changes go through a formal review before any implementation changes. This prevents silent drift between the spec and the codebase.

  1. 1
    Human issues !change [description]
    Describes what needs to change — a new feature, a removed feature, or a design pivot.
  2. 2
    Orchestrator produces Blast Radius Report
    Lists every affected task, file, contract, and test. Estimates scope as Small / Medium / Large.
  3. 3
    Human approves with !change-approve or cancels with !change-cancel
    No implementation changes happen until approved.
  4. 4
    Affected tasks marked STALE; CHANGE_LOG.md updated
    Builder detects STALE tasks and re-plans them before executing. The changelog is permanent record.

Artifact Directory Structure

All pipeline artifacts are written relative to the active project directory, never the agents directory itself.

artifacts/ ├── discovery/ │ ├── feature-spec.md # discovery output │ └── design-contract.md # web design system spec ├── build/ │ ├── TDD.md # technical design document │ ├── INTERFACES.md # API and component contracts │ ├── SCHEMA.md # database schema │ ├── TASK-GRAPH.md # ordered implementation tasks │ ├── FILE_OWNERSHIP_MAP.md # file → owning task mapping │ └── contracts/ │ ├── [module].contracts.ts # Zod schemas per module │ └── openapi.yaml # OpenAPI stubs ├── security/ │ └── SECURITY_REPORT.md ├── verification/ │ ├── VERIFICATION_REPORT.md │ ├── PERF_REPORT.md # performance budget results │ └── screenshots/ │ └── baselines/ # visual regression baselines ├── critique/ │ └── CRITICISM.md └── pipeline/ ├── HANDOFF.json # progressively versioned (v1–v6) ├── CHECKPOINT.md # builder progress snapshots ├── DECISIONS.md # architectural decisions log ├── CHANGE_LOG.md # spec change history ├── PATTERN_LIBRARY.md # reusable PATTERN-NNN entries └── KNOWN_ERRORS.md # error memory for Builder

Pattern Library

The Pattern Library is a compounding knowledge base. Every time an agent solves a non-obvious problem, it captures the solution as a PATTERN-NNN entry. Future projects query the library before writing new code, preventing the same problem from being solved twice.

FieldPurpose
domainTechnology area (e.g., auth, database, payments)
stackSpecific libraries involved
contextWhen this pattern applies
patternThe implementation — code, config, or approach
pitfall_avoidedWhat goes wrong without this pattern
reuse_signalKeywords that should trigger lookup
evidenceSource project or commit reference

Use !pattern-add to capture the current implementation as a pattern. The Planner automatically queries PATTERN_INDEX.md at build start.

Machine-Readable Contract Layer

The Planner emits two types of contracts that serve as ground truth for the Builder and Verifier:

  • Zod schemas at artifacts/build/contracts/[module].contracts.ts — runtime-validated type shapes for every major data model and API response
  • OpenAPI stubs at artifacts/build/contracts/openapi.yaml — endpoint signatures, request/response shapes, and authentication requirements

The Builder validates its output against these contracts after every task. The Verifier re-runs contract conformance independently in Phase 6C.

HANDOFF.json — Versioned State

Agents communicate via a single HANDOFF.json file that is progressively enriched as the pipeline advances. Each agent reads the current version and writes the next.

VersionWritten byKey additions
v1Discoveryspec_hash, platforms, complexity_tier, tech_stack
v2Planningtask_count, contracts_path, schema_hash, stack_versions, component_library
v3Buildertasks_completed, tasks_stale, patterns_used, perf_budget_path
v4Criticscorecard_mean, scorecard_verdict, critical_flaws, next_action
v5Securitysecurity_tier, domains_reviewed, critical_findings, blockers
v6Verifiership_verdict, perf_verdict, contract_verdict, visual_baseline

Parallel Task Execution

The Planner annotates every task with four parallelism fields that the Builder uses to schedule concurrent work:

FieldValuesPurpose
task_typeapi | auth | data | infra | ui | config | feature | cicd | obsDetermines which scripts and Pattern Library queries apply
depends_on[TASK-NNN] or "none"Explicit dependency edges — drives batch ordering
can_paralleltrue | falseWhether this task may run concurrently with sibling tasks in the same batch
output_files[list of files]Used for write-conflict detection between parallel tasks

The TASK-GRAPH ends with a Parallelism Map that groups tasks into dependency-ordered batches. Tasks with no shared output_files and can_parallel: true are placed in the same batch.

Method A — Go Goroutines (parallel_runner.go)

go run scripts/parallel_runner.go \
    -tasks TASK-003,TASK-004,TASK-005 \
    -task-graph artifacts/build/TASK-GRAPH.md \
    -output artifacts/telemetry/batch_results.json
â„šī¸
Built-in write-conflict detection

Before launching goroutines, parallel_runner.go compares output_files across the batch. Tasks that write to the same file are automatically split into sequential sub-batches — no manual intervention required.

Method B — Parallel Sub-Agents

Multiple Agent(...) calls in a single Claude Code response execute concurrently. The Builder uses this for architecture-class tasks that benefit from independent context windows. Results are merged back into the main task graph on completion.

Pipeline Scripts Integration

18 Python and Go scripts are wired directly into agent workflows. Each script is called at a specific point in the pipeline — agents do not call them ad-hoc.

ScriptCalled byPipeline point
bootstrap.pyAgent 3!build activation
spec_validate.pyAgent 2Before !build unlocks
staged_write.pyAgent 3Before every file write
secret_scan.pyAgent 3Pre-write (blocks on detection)
write_log.pyAgent 3After each approved write
sast_scan.pyAgent 3After each write is applied
check_coverage.pyAgent 3, Agent 6After TC tests pass; Phase 3A
parallel_runner.goAgent 3Multi-task batches
semantic_search.pyAgent 3Before every arch-class task
migration_safety.pyAgent 2, Agent 3Schema changes
pipeline_replay.pyAgent 0 (!replay)On demand
session_fork.pyAgent 0 (!fork)On demand
pipeline_improve.pyAgent 3, Agent 0Post-build, post-run
container_verify.pyAgent 6Phase 12B (before final SHIP)
contract_test.pyAgent 6Phase 6D (live endpoint validation)
commit_by_task.pyAgent 6Post-SHIP
register_project.pyAgents 3–6, Agent 7Build completion and each stage advancement
token_tracker.pyAll agentsAfter every turn

See Pipeline Scripts for the full reference including parameters, exit codes, and integration notes.

HANDOFF Version Evolution

Each agent enriches HANDOFF.json with fields its downstream consumers require. The full v1–v6 field progression:

VersionWritten byKey additions
v1 Discovery project, platform, user_stories, capability_intelligence, design_contract
v2 Planning stack_versions, task_graph_path, parallelism_map, bootstrap, component_library
v3 Builder build_status, task_checkpoints, coverage_pct, secret_blocks, semantic_index_path
v4 Critic critique_verdict, scorecard_mean, market_context
v5 Security security_verdict, finding_counts, dependency_intelligence_summary
v6 Verifier verification_verdict, coverage_verdict, runtime_contract_testing, container_verification, review_rubric_score
â„šī¸
Full schema reference

See HANDOFF.json Schema for the complete field-by-field specification for all six versions.

Performance Budget

Every project template ships with .agent/PERF_BUDGET.json defining thresholds:

{
  "bundle": { "total_kb": 500, "initial_chunk_kb": 200 },
  "api":    { "p50_ms": 200, "p95_ms": 800, "timeout_ms": 3000 },
  "build":  { "max_warnings": 0 }
}

The Verifier runs scripts/perf.mjs to measure actual values and compare against budget. A budget violation is a NO-SHIP blocker. Use --update flag to rebaseline after deliberate changes.