Code IntelligenceStatic AnalysisAI SaaSCase Study

SystemAudit: Shipping a Codebase Intelligence SaaS That Doesn't Hallucinate

How I built a production SaaS that turns any GitHub repo into a full architecture, security, dependency, and AI-readiness report in under 3 minutes — using a two-layer deterministic-then-AI engine that eliminates the hallucination problem other code-review tools have. Live at systemaudit.dev.

By Nic Chin9 min read

Most AI code review tools have the same fundamental flaw: they ask a language model to read your code and tell you what's wrong. That produces confident-sounding prose with invented line numbers, fabricated CVEs, and architectural claims the model pattern-matched from training data rather than observed in your repository. It looks impressive until a reviewer actually opens the file and the "finding" isn't there. For a product that investors, engineering directors, and founding teams rely on for real diligence decisions, that failure mode is unacceptable.

SystemAudit is my answer: systemaudit.dev, a production SaaS that turns any GitHub repository into a full system health report in under three minutes, starting at $49 versus the $5,000–$16,000 traditional audits cost. The core innovation is not the AI summary on top — it's the two-layer architecture underneath that makes the AI layer structurally incapable of hallucinating. This case study explains how that works and why it matters for production AI products in general.

The Problem: AI Code Review is a Trust Crisis

The category SystemAudit competes in has two broken halves. Traditional technical due diligence is expensive, slow, and unavailable to anyone without a budget and a network. A proper engineering audit runs one to three weeks at $5K–$16K — fine for Series A diligence, useless for a founder on Thursday afternoon trying to understand whether they've accumulated enough technical debt to rewrite. The other half — automated AI-based code reviewers — solves the speed and price problem but introduces a worse one: outputs you can't actually trust.

When an AI code reviewer tells me "there's a SQL injection on line 42 of auth.ts," I need that to be true. If I open auth.ts and there's no such code, the whole tool is dead to me — not just for that report, but forever. This is why technical buyers bounce off most AI code review products: one hallucination and the trust is gone.

The design problem is therefore not "build a better prompt". It's structural. You have to make hallucination architecturally impossible for the claims that matter.

The Core Innovation: Two-Layer Architecture

SystemAudit splits analysis into two layers with a hard separation of responsibilities.

Layer 1 — Deterministic Analysis (No AI)

Before any language model sees the code, a comprehensive static analysis pass runs over the repository. This layer produces hard evidence, not opinions:

  • Exact vulnerability patterns with file and line references (hardcoded API keys, SQL injection vectors, innerHTML sinks, sensitive data exposure).
  • Full import dependency graph with proven circular dependencies, single points of failure, and dead code.
  • Dependency health flags (outdated, unmaintained, known-CVE packages).
  • Structural quality metrics (modularity, nesting depth, file-size distribution, type coverage).
  • Verified configuration state — what's actually in package.json, CI files, Docker, lint configs, test configs.

Layer 1 runs at zero AI cost. These are facts extracted by deterministic code, and they're what downstream claims must anchor to.

Layer 2 — AI Interpretation, Constrained to Layer 1

The AI layer receives the source code alongside every Layer 1 finding as immutable constraints. Its job is not to discover vulnerabilities — Layer 1 has already done that deterministically. Its job is to synthesize patterns, identify architectural concerns, and translate technical findings into business language for non-engineer readers (investors, boards, CEOs). It cannot contradict Layer 1 evidence; the prompting and post-processing enforce this.

This is why SystemAudit reports cite specific files and line numbers for every security finding (those come from Layer 1, not the LLM). It's why cost estimates are calibrated to actual complexity metrics rather than guessed. It's why the architecture analysis describes what's really in the repo rather than pattern-matched from training data on a similar-sounding project.

What the Report Delivers

Every SystemAudit report includes:

  • System Architecture Map — a visual diagram of how every component connects, generated from the actual import graph.
  • Security & Vulnerability Scan — hardcoded secrets, SQL injection, XSS vectors, sensitive data exposure, with exact file and line evidence.
  • Dependency Graph Analysis — circular dependencies, single points of failure, and dead code identified deterministically rather than estimated by an LLM.
  • Risk Assessment — each risk ranked by severity with "cost to fix" and "cost if ignored" framed in business language a non-engineer can act on.
  • Feature Verification — cross-references feature claims against test files, so "we have auth" only passes if there's actually auth code with tests.
  • AI Readiness Score — a 5-dimension assessment (Code Clarity, Test Coverage, Modularity, Documentation, Type Safety) with an A-F grade per dimension.
  • Health Score — a single number across 20+ checks (tests, security, CI/CD, documentation, dependency health, structural metrics).
  • Prioritized Fix Plan — a week-by-week remediation roadmap with investment tiers, ROI projection, and hiring guidance.
  • Exportable PDF — the whole report as a single document suitable for investors, boards, or dev teams.

Language and Ecosystem Support

SystemAudit supports 50+ languages across 9 major ecosystems: JavaScript/TypeScript (Next.js, React, Express), Python (Django, FastAPI, Flask), Java/Kotlin (Spring Boot), Go (Gin, Echo, Chi), Rust (Actix, Axum, Rocket), C#/.NET (ASP.NET Core), PHP (Laravel, Symfony), Ruby (Rails), plus Docker, CI/CD, serverless, and monorepo configurations. The deterministic analyzers are the per-language work; the AI layer and report format are the same across all of them.

Quality Assurance

A tool that claims "near-zero hallucination" has to prove it, continuously. SystemAudit is tested against a standing corpus of 10 real public codebases ranging from 800 lines to 600K+ lines. Each run is graded across four dimensions: factual accuracy, free-tier reliability, full-analysis quality, and business-translation clarity. The current benchmark is 10/10 across all four. Every AI finding is cross-checked against Layer 1 evidence; any finding the AI layer produces that can't be anchored to deterministic evidence is dropped rather than shown to the user.

The Economic Argument

Traditional technical due diligence: $5,000–$16,000 and one to three weeks. SystemAudit: under three minutes, starting at $49. That's not a marginal improvement — it's a different category of product. It changes who can afford technical diligence. A founder pitching investors on Friday can run their own repo through SystemAudit on Thursday night and walk into the meeting already knowing what the tech-literate board member will find. A VC screening 40 startups for a thesis round can afford to audit all of them instead of picking three.

The scanner engine is open source (MIT license), which lets technically sophisticated teams inspect exactly how the deterministic layer works and verify it for themselves before trusting the report. For a product whose whole value proposition is trust, that transparency is itself a feature.

Why This Pattern Matters Beyond SystemAudit

The two-layer architecture is the generalizable lesson. Whenever you're shipping an AI product whose output needs to be trustworthy — not just plausible-sounding — the pattern is the same: do the deterministic work first, hand the LLM both the raw input and the deterministic findings, and constrain it to synthesizing and explaining rather than discovering. This is how I build RAG systems (see DocsFlow — retrieval is deterministic, the LLM only generates on top), how I build agentic systems, and how I evaluate vendors for clients who are wondering whether to build or buy an AI layer on top of their own product.

The specific shape changes per domain. The underlying principle doesn't: if you let the LLM do the work of both finding and interpreting, you're going to ship something that looks impressive in demos and falls apart with real users. Move the finding into deterministic code; let the AI do the interpretation. That bright line is what turns a cool demo into a product people trust.

Related reading: For how the same "deterministic first, AI second" discipline applies to retrieval, see DocsFlow. For the strategic version of this conversation — "should we build an AI layer in our product at all?" — see Build vs Buy AI Systems and Why AI Projects Fail: Architecture. I work with a small number of founders and operators each quarter as a fractional AI CTO, including clients in Malaysia and Singapore.

Ready to discuss your AI project?

Book a free 30-minute discovery call to explore how AI can transform your business. Or if you already have a codebase, get an instant architecture report at SystemAudit.dev — no technical knowledge needed, results in 3 minutes.

About the Author

Nic Chin is an AI Architect and Fractional CTO who helps companies design and deploy production AI systems including RAG pipelines, multi-agent systems, and AI automation platforms. He has delivered enterprise AI solutions across the UK, US, and Europe, and provides AI consulting in Malaysia and Singapore.