🐍 Reference

Pipeline Scripts Reference

18 Python and Go infrastructure scripts that power the CodeSleuth AI pipeline. These scripts handle environment bootstrapping, file write safety, parallel execution, intelligence, database safety, pipeline operations, and post-build verification.

⚠️

Script path convention

All script paths use /home/user/codesleuth/scripts/ — the agents directory, not the project directory. Always use absolute paths when calling scripts from within a project. Examples in this reference use relative python scripts/ notation for readability; resolve to absolute paths in practice.

Environment & Bootstrap

bootstrap.py

Validates toolchain, installs dependencies, and generates .env.example. Run at Agent 3 activation before any build tasks proceed.

Detail	Value
Usage	`python scripts/bootstrap.py --project-dir . --tdd artifacts/build/TDD.md`
Exit 0	Environment validated, deps installed, `.env.example` generated
Exit 1	Blocked — missing toolchain component; error message identifies what is missing

spec_validate.py

Validates TASK-GRAPH.md consistency before the !build gate opens. Catches structural issues before the Builder begins execution.

Check	Description
task_type	Every task has a valid `task_type` field (api, auth, data, infra, ui, test, docs)
output_files	Every task declares at least one `output_files` entry
TC status	Test contracts reference valid task IDs
Circular deps	Dependency graph has no cycles
Parallelism map	Every task in the parallelism map exists in the task graph

python scripts/spec_validate.py --task-graph artifacts/build/TASK-GRAPH.md

Exit 0 = pass. Exit 1 = fail with specific error messages per failing check.

File Write Pipeline

Every file write by the Builder passes through a 5-step pipeline: secret scan → staged write → hot compilation → SAST scan → write log. Each step is a separate script.

staged_write.py

Stage a file write with a terminal diff and an Apply / Edit / Reject prompt. The human sees exactly what will change before any file is touched.

Detail	Value
Usage	`python scripts/staged_write.py stage --task TASK-NNN --file path --content /tmp/proposed.txt`
Exit 0	Write applied — user chose Apply or Edit-then-Apply
Exit 1	Write rejected — repair context printed to stdout; Builder re-enters implementation step
Exit 2	Error — file lock, permissions issue, or malformed arguments

secret_scan.py

Pre-write secret scanner. Runs before every file write and hard-blocks the write if any secret pattern is found. Repair context is returned to the Builder.

Detail	Value
Usage	`python scripts/secret_scan.py <target> [--log path]`
Patterns	14 regex patterns including OpenAI keys, Anthropic keys, AWS credentials, GitHub tokens, Stripe keys, Twilio tokens, and generic high-entropy strings
Exit 0	Clean — no secrets detected
Exit 1	Found — secret location and pattern name reported; write is blocked

write_log.py

Records every approved write to WRITE_LOG.jsonl. This log is consumed by commit_by_task.py to group commits by task.

Sub-command	Description
`record`	Append a write entry: task ID, file path, timestamp, hash
`list`	Show all writes recorded in this session
`task`	Show all writes for a specific task ID

python scripts/write_log.py record --task TASK-NNN --file path/to/file.py

sast_scan.py

Post-write SAST scan. Runs non-blocking after each approved write. Uses gosec, bandit, or semgrep when available, with an inline regex fallback. Findings accumulate in SAST_FINDINGS.jsonl for the Security agent to review.

Detail	Value
Usage	`python scripts/sast_scan.py <target> --task TASK-NNN`
Exit	Always exits 0 (non-blocking). Findings written to `SAST_FINDINGS.jsonl`.
Backends	gosec (Go), bandit (Python), semgrep (multi-language), inline regex fallback

check_coverage.py

Enforces a minimum coverage threshold after test contracts pass. A result below the threshold blocks DONE marking for the task.

Detail	Value
Usage	`python scripts/check_coverage.py --threshold 70 --task TASK-NNN`
Supported stacks	go (go test -cover), node (nyc/c8), python (pytest-cov), rust (cargo-tarpaulin)
Exit 0	Coverage at or above threshold
Exit 1	Coverage below threshold — current % and required % reported

Parallel Execution

parallel_runner.go

Go goroutine batch executor. Runs multiple tasks simultaneously using goroutines and automatically detects write conflicts before execution begins.

Detail	Value
Usage	`go run scripts/parallel_runner.go -tasks TASK-001,TASK-002 -cmd "..." -task-graph artifacts/build/TASK-GRAPH.md`
Conflict detection	Reads `FILE_OWNERSHIP_MAP.md`; refuses to run tasks that write to the same file in the same batch
Exit 0	All tasks in the batch passed
Exit 1	One or more tasks failed; per-task output captured and reported

ℹ️

Two parallel execution methods

Method A (Go goroutines): parallel_runner.go — best for independent build/test commands. Method B (Agent tool calls): Agent 3 issues multiple parallel tool calls in a single response — best for independent file writes across non-overlapping task groups.

Intelligence

semantic_search.py

Embedding-based RAG search over the codebase. Agent 3 queries this before each architecture-class task to find relevant prior code and avoid duplication.

Sub-command	Description
`index`	Build semantic index from source files; writes to `SEMANTIC_INDEX.json`
`search`	Query the index; returns top-N relevant file snippets
`update`	Incrementally update the index after new files are written

python scripts/semantic_search.py search --query "validate user input" --index artifacts/build/SEMANTIC_INDEX.json

Uses Ollama nomic-embed-text when available; falls back to TF-IDF if Ollama is not running.

Database

migration_safety.py

Schema delta detection, bidirectional SQL generation, and data-loss risk assessment. HIGH-risk migrations block the pipeline until !migration-approve is issued.

Sub-command	Description
`generate`	Diff current schema against previous; generate UP and DOWN SQL
`check`	Classify risk level (LOW / MEDIUM / HIGH / DATA-LOSS) and print report
`dry-run`	Apply migration to a shadow database and verify it succeeds without data loss

⚠️

HIGH risk blocks pipeline

When migration_safety.py check returns a HIGH or DATA-LOSS risk rating, the pipeline halts and prints a risk report. The human must issue !migration-approve to proceed or !migration-cancel to discard the schema change.

Pipeline Operations

pipeline_replay.py

Restore the codebase to the git checkpoint created after a specific task, then reset all later tasks to PENDING. Requires commit_by_task.py to have run, so each task has a corresponding git commit.

python scripts/pipeline_replay.py --task TASK-NNN --project-dir .

Invoked by the !replay TASK-NNN Orchestrator command. See Command Reference.

session_fork.py

Fork the pipeline into two parallel branches for side-by-side comparison of alternative implementations or architectures.

Sub-command	Description
`create`	Fork the project directory into two branch directories (fork-a, fork-b)
`compare`	Diff the two branches and summarize structural differences
`merge`	Copy the winning branch back to the project directory and remove both fork dirs
`status`	Show current fork state: which branch is ahead, task counts, test results

Invoked by !fork and !merge Orchestrator commands.

register_project.py

Registers or updates a project entry in config/context.json → registered_projects. Called by Agents 3–6 at each stage completion to keep the registry current, and by Agent 7 after every maintenance cycle. Enables the always-resident pipeline — registered projects auto-resume on the next Start agent 0 without requiring re-activation.

Detail	Value
Usage	`python scripts/register_project.py --project-dir /path --project-name "MyApp" --stage built --last-agent 3 --handoff artifacts/build/HANDOFF.json`
Stages	`built` → `critiqued` → `secured` → `shipped` → `maintenance`
Stage protection	Refuses to regress a stage — calling with an earlier stage than the current value is a no-op with a warning
Safety check	Blocks registration if `--project-dir` resolves inside the agents directory
Exit 0	Project registered or updated successfully
Exit 1	Safety violation (path inside agents dir) or config file missing/corrupt

pipeline_improve.py

Post-run telemetry analyzer. After a pipeline completes, analyzes timing, failure rates, and retry patterns to generate PROP-NNN improvement proposals.

Invocation	Description
`python scripts/pipeline_improve.py`	Default: analyze current session and emit proposals to `PROP_LOG.md`
`python scripts/pipeline_improve.py --health`	Show existing proposals ranked by impact score
`python scripts/pipeline_improve.py --promote PROP-NNN`	Apply a specific proposal to the agents directory

Invoked by !pipeline-health and !pipeline-promote commands.

token_tracker.py

Session token usage tracking and !budget reporting. Reads per-model pricing from config/pricing.json.

Sub-command	Description
`record`	Record token counts for a completed agent turn
`budget`	Print session summary: total tokens, context window %, estimated cost, per-agent breakdown
`breakdown`	Per-task token breakdown with cache hit/miss analysis

Pricing data from config/pricing.json supports multiple Claude models with separate input/output/cache rates.

Post-Build

container_verify.py

Build the Docker image and run the full test suite inside the container. Auto-generates a Dockerfile.test if none exists. A passing test suite that fails inside the container indicates a genuine environment mismatch.

Detail	Value
Usage	`python scripts/container_verify.py --project-dir .`
Exit 0	Image built and all tests pass inside container
Exit 1	Tests fail in container — container logs and diff vs. local results printed

Invoked by the Verifier agent in Phase 12B. See Verifier.

contract_test.py

Start the application, call every endpoint declared in INTERFACES.md with minimal requests, and validate status codes and response fields. Provides runtime API contract verification beyond static type checking.

python scripts/contract_test.py --interfaces artifacts/build/INTERFACES.md --base-url http://localhost:8080

Invoked by the Verifier agent in Phase 6D.

commit_by_task.py

Read WRITE_LOG.jsonl, group all writes by task ID, create one conventional git commit per task, and generate PR_BODY.md with a structured pull-request description.

python scripts/commit_by_task.py --project-dir .

Output	Description
Git commits	One conventional commit per task (feat:, fix:, test:, etc.) in task order
`PR_BODY.md`	Structured PR description with task list, test summary, and change log

Also required by pipeline_replay.py — each task needs a commit reference for checkpoint restoration.