📊 Agent 4

Codebase + Product Critic

The Critic runs immediately after the Builder — before security hardening and QA — judging whether the built product should exist, whether it is positioned well, and what must change before real launch. It uses the 12-Dimension Product Scorecard as its primary judgment framework.

⚠️
Truth over comfort

The Critic does not provide false reassurance. If the product has fundamental problems — unclear value proposition, broken monetization, or uncompetitive positioning — the Critic says so directly with evidence.

12-Dimension Product Scorecard

The scorecard is completed immediately after Phase 0 market intelligence. Every dimension is scored 1–5. A score of 1 on any dimension is an automatic HOLD.

#DimensionWhat it measures
D1First-Run ExperienceCan a new user understand the product and complete their first action within 60 seconds, with no documentation?
D2Activation Funnel ClarityIs the path from landing/signup to first delivered value obvious and frictionless?
D3Core Loop RetentionDoes the product give users a compelling reason to return tomorrow?
D4Error Recovery UXWhen something fails, can the user recover without leaving the product or contacting support?
D5Feature DiscoverabilityCan users find secondary features without being told they exist?
D6Monetization FitDoes the paywall hit at the right moment? Is the free tier compelling but limited enough to convert?
D7Competitive DifferentiationDoes the product do one thing demonstrably better than named competitors, visible on the first screen?
D8Onboarding CompletenessDoes every declared user type have a path to their first success?
D9Empty State QualityWhat does the app show before any data exists? Does it guide the user forward?
D10Mobile / Responsive ExecutionWas mobile designed or just adapted? Does every critical flow work at 390px?
D11Trust Signal QualityDoes the product look production-ready? Are there obvious "made by AI" tells?
D12Spec FidelityWhat was in the spec that didn't ship? What shipped that wasn't in the spec?

Scoring Scale

ScoreMeaning
5Excellent — clearly better than most products in this category
4Good — solid, a few rough edges
3Adequate — functional but unremarkable
2Weak — material problems that reduce the product's viability
1Broken / Missing — this dimension is absent or severely defective

Scorecard Verdict

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
📊 12-Dimension Product Scorecard — [Project Name]
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
D1  First-Run Experience         4/5 — clear onboarding, first action in ~45s
D2  Activation Funnel Clarity    4/5 — single CTA, minimal friction
D3  Core Loop Retention          3/5 — some reason to return, not reinforced
D4  Error Recovery UX            3/5 — most errors informative, some dead ends
D5  Feature Discoverability      3/5 — core features accessible, secondary hidden
D6  Monetization Fit             4/5 — paywall at value moment, free tier compelling
D7  Competitive Differentiation  3/5 — differentiation exists but not prominent
D8  Onboarding Completeness      4/5 — primary user well-served, admin gaps
D9  Empty State Quality          2/5 — raw "no data" messages on 3 screens
D10 Mobile/Responsive Execution  3/5 — usable but clearly secondary
D11 Trust Signal Quality         4/5 — polished, consistent, no obvious AI tells
D12 Spec Fidelity                5/5 — all MVP features shipped
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Mean score: 3.5/5
Dimensions at risk (score ≤ 2): D9 Empty State Quality
Auto-HOLD triggers (score = 1): none

Scorecard verdict: CONDITIONAL
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
ConditionVerdict
Mean ≥ 4.0, no dimension ≤ 2SHIP_READY
Mean ≥ 3.0, no dimension = 1CONDITIONAL
Any dimension = 1, or mean < 3.0HOLD

Market Intelligence Tools

The Critic uses live market data during Phase 0 to ground critique in external reality, not just artifact analysis:

ToolUsed for
Brave Search MCPVerify stated differentiation against actual search results; surface competitors
Firecrawl MCPScrape competitor pricing pages and feature tables for direct comparison
Playwright MCPLoad the built app; take first-impression screenshots at desktop and mobile
Reddit APISearch product-category subreddits for pain points and competitor mentions
Hacker News APISearch "Ask HN" threads for comparable products; surface community perception
GitHub APICheck star counts and contributor velocity for open-source competitors

Review Workflow

PhaseDescription
0Orientation — stack, platforms, users, monetization, design contract. Run market intelligence pre-scan.
0BProduct Scorecard — complete 12-Dimension Scorecard with evidence per dimension.
1System Map — architecture, data flow, major modules, state ownership, dependency risks.
2Assumption Stress Test — for each core assumption: how it fails, cost of failure, platform affected.
3A–LDeep Review — Correctness, Security, Performance, Maintainability, DX, Observability, Testing, Product/GTM, Platform Fit, Cross-Platform, Monetization, UI/UX.
4Prioritization — Top 10 Fix-First, Stop-Doing, Keep-Doing lists. Every finding includes severity, effort, and acceptance criteria.

Final Verdict Vocabulary

VerdictMeaning
SHIP [platform]Genuinely ready for production launch on this platform
CONDITIONALLaunch requires contained fixes; documented conditions must be met first
HOLD [platform]Platform implementation is not ready for production
RESTYLEUI trust is below launch quality; visual redesign required
REDESIGNFoundation or product architecture is wrong; rebuild required
KILLThe product concept should not proceed as built
PAUSEShipping would be irresponsible; specific risks must be mitigated first

CRITICISM.md Report Structure

  1. Executive Summary (scorecard verdict as first line)
  2. 12-Dimension Product Scorecard (complete table)
  3. Assessment per Platform
  4. Critical Flaws
  5. Risks
  6. Top 10 Fix-First List (ordered by scorecard dimension impact)
  7. Stop-Doing List
  8. Keep-Doing List (evidence required)
  9. Product & Growth Gaps
  10. Platform-Specific Critique
  11. Cross-Platform Consistency Assessment
  12. Monetization Architecture Assessment
  13. UI/UX Design Contract Assessment
  14. What You're Avoiding
  15. Product Improvement Examples (3–10)
  16. Stress Question & Answer
  17. Assumptions Stress Test
  18. Review Category Summary (A–L)
  19. Per-Platform Verdict
  20. Overall Verdict (references scorecard verdict)
  21. One Next Action (targets lowest-scoring dimension)
  22. Audit Checklist