Why We Built This
Most cybersecurity assessments follow the same pattern. A consultant collects documents, reads through them, maps findings to a framework, identifies gaps, writes interview questions, conducts interviews, and produces a report. The knowledge required to do this well is significant. The process itself is largely manual.
We wanted to know: how much of this process could AI handle reliably, and where does it fall short?
Not as a thought experiment. We built the platform, tested it against real assessment workflows, and shipped working software. What follows is a walkthrough of what the platform does, how it works, and what we learned building it.
Document Ingestion and Client Context
The first thing any assessor needs is context. Who is the client? What does their estate look like? What have they told us so far?
The platform ingests client documents (interview transcripts, policy documents, incident logs, internal notes) and synthesises an AI-generated client summary. This is not a simple extraction. The system reads across all uploaded material and builds a contextual picture of the organisation: their size, their technology stack, their security posture, and critically, where the gaps are.
Client Overview
From this summary, the platform generates contextually-aware follow-up questions. These target areas where the evidence is thin or missing entirely. In the example above, the system identified four significant gaps:
- •How cybersecurity is governed at the organisational level
- •How security logging and monitoring works (or whether it exists at all)
- •The current network architecture and segmentation approach
- •Whether the organisation holds cyber insurance
Why This Matters
A consultant reviewing the same documents manually might miss the governance gap. The platform surfaces it automatically because it cross-references every document against what a complete assessment requires.
Alongside the narrative summary, the platform extracts structured client attributes from the source material: industry, headcount, primary identity provider, operating system estate, whether they use a managed service provider, and whether they have dedicated security resources.
Structured Data Extraction
This structured extraction serves two purposes. First, it gives the assessor an immediate factual baseline without manually combing through transcripts. Second, and more usefully, it highlights what we do not yet know. The gaps in structured data become the starting point for follow-up interviews.
Claim Extraction and Document Analysis
Once documents are uploaded, the platform processes every file and extracts individual claims: discrete factual assertions made within the source material.
In a typical engagement, uploading five documents produced 788 individual claims. Each claim is tagged by severity (High, Medium, Low) and linked back to its source location within the original document.
Document Explorer
The claim extraction works inline within the original transcript text. Hover over a highlighted passage and the platform shows you the structured claim it extracted, along with its reasoning.
Inline Claim Extraction
This is where the platform starts saving serious time. In a traditional assessment, a consultant reads a 40-page transcript and takes notes. They might miss a claim buried on page 23 that contradicts something stated on page 6. The platform reads everything, extracts every claim, and makes all of them searchable and traceable.
What This Looks Like in Practice
An assessor uploads five files on a Monday morning. By the time they have made a coffee, the platform has extracted nearly 800 claims, categorised them by severity, and linked each one to its source. The assessor spends their time reviewing and validating claims rather than hunting for them.
Framework Mapping
Claims on their own are useful. Claims mapped to a compliance framework are actionable.
The platform takes the extracted claims and maps them against the target framework. In this example, CMMC v2.0. The result is a clear picture of evidence coverage: which controls have supporting evidence, which do not, and how many claims support each control.
Framework Mapping (CMMC v2.0)
Each control card shows the control ID, a description, and the number of claims supporting it. The assessor can filter by domain (Access Control, Audit and Accountability, and so on), by level, or toggle between controls with evidence and those without.
This is where the platform's value becomes most tangible for consultancies. The mapping step in a traditional assessment is painstaking. An experienced consultant might spend days cross-referencing interview notes against a framework spreadsheet. The platform does this in minutes and provides full traceability back to source documents.
The assessor still makes the judgement call. The platform maps evidence to controls. It does not decide whether the evidence is sufficient. That decision, whether a control is adequately met, requires human expertise. The platform positions the consultant to make that judgement faster and with better information.
Targeted Interview Questions
Evidence gaps are only useful if you know what to do about them. The platform generates interview questions for every area where supporting evidence was missing or insufficient.
These are not generic questions pulled from a template. Each question is generated in context: the platform knows what evidence already exists, what the client has already told us, and where the specific gaps sit. The result is questions that are immediately usable in a stakeholder interview.
Contextually-Aware Interview Questions
Each question comes with assessor guidance notes. These explain what the platform already knows (for example, that separate admin accounts exist and MFA is bypassed on internal IPs with 60-day reauthentication), and what the assessor should probe further. This is the kind of preparation that typically takes a senior consultant hours. The platform produces it as a byproduct of its analysis.
Design Decision
We deliberately chose to generate guidance notes alongside questions, rather than just the questions themselves. A question without context forces the interviewer to go back and re-read source material. A question with context lets them walk into the room ready.
Semantic Search
Framework assessments are rarely linear. An assessor reviewing access controls might suddenly need to know what the client said about endpoint detection three interviews ago. Traditional approaches mean searching through multiple documents manually.
The platform provides semantic search across all uploaded material. Search by concept, not just keyword. Type "edr" and the system returns every relevant claim across every document, ranked by relevance.
Semantic Search
Each search result shows the matched claim, its source document, the effective date, and a relevance score. Claims carry their severity badges (High, Medium) through to the search results, so the assessor can prioritise what to look at first.
This is one of the most powerful features in the platform. It turns the entire document corpus into a queryable knowledge base. When a client says something in an interview that contradicts earlier evidence, the assessor can verify it in seconds.
What We Learned Building This
Twelve months of building, testing, and iterating produced a platform that works. It also produced a set of hard-won insights that we did not expect at the outset.
What AI Is Genuinely Good At
The strongest results came from tasks that involve volume and cross-referencing. Reading hundreds of pages of transcripts and extracting structured claims is exactly the kind of work AI handles well. It does not get tired on page 38 and it does not forget what was said on page 4. Mapping those claims to framework controls across multiple documents and hundreds of data points is tedious for humans and fast for machines.
Gap identification was similarly strong. Once the system knows what "complete" looks like (a framework with full evidence coverage), identifying what is missing is straightforward. We initially tried to have the model also assess the severity of each gap, but the results were inconsistent. Severity depends on business context that the model does not have. We stripped that out and left severity assessment to the human assessor.
Contextual question generation surprised us. We expected the output to be generic, but because the model holds the full evidence set in context, the questions were specific and usable. The assessor guidance notes were an iteration on the first version, which generated bare questions with no context. Those were close to useless in practice.
Where We Hit Walls
Sufficiency is the hardest problem. A control might have three supporting claims. Whether those claims constitute adequate evidence requires professional judgement that the model cannot replicate. We tried several approaches to automated sufficiency scoring, including confidence thresholds and claim-count heuristics. None of them were reliable enough to ship. The assessor makes this call.
We also found that the model's framework mapping was only as good as the claim extraction. Early versions extracted too many low-quality claims, which created noise in the mapping. We spent significant time tuning the extraction pipeline to favour precision over recall. Fewer, higher-quality claims produced better mapping results than a larger volume of uncertain ones.
The other area where AI falls short is narrative. Translating technical findings into board-level recommendations that drive action is a fundamentally human skill. We experimented with report generation and the output was technically accurate but tonally flat. It read like a compliance document, not a strategic recommendation. We stopped pursuing automated report generation entirely.
The Broader Insight
This platform sits at the intersection of cybersecurity, AI, and software engineering. Building it required deep expertise in all three. We had to understand the assessment workflow intimately to know what to automate. We had to understand AI's strengths and limitations to avoid building something that produced confident nonsense. We had to understand software engineering to ship a product that actually works.
The Takeaway
AI can meaningfully augment cybersecurity assessment delivery. The technology works. But the harder problem, the one most organisations underestimate, is redesigning the workflow around what AI is actually good at. That is an engineering challenge, not a procurement one.