How FirstReader Actually Works (Under the Hood)
Most AI writing tools work the same way. You paste your text in, the tool sends it to an LLM with a prompt that says something like "analyze this for craft issues," and you get back whatever the model feels like saying. Maybe it's useful. Maybe it's hallucinated. You have no way to know.
That's not how FirstReader works. And since I'm the one who built it, I figured I'd show you what actually happens between the moment you upload your manuscript and the moment you get your report. Not the polished marketing version. The real pipeline, with all the ugly plumbing showing.
If you're the kind of writer who respects engineering when it's done in service of craft... pull up a chair.
Before the AI touches a single word
This is the part most people don't expect.
When you upload your manuscript, the first thing that happens is... nothing AI. No LLM calls. No prompts. Just code. Deterministic code that measures your text the way a lab measures a blood sample.
Sentence length distributions. Readability scores. Passive voice density. Adverb frequency. Dialogue-to-narration ratios. These are NUMBERS. Measured, not interpreted. And they matter because they get injected directly into the analysis prompts later, so when the AI finally does show up, it's working from facts. Not vibes. Not guesses. Measured properties of your actual text.
There's also a character extraction step using a natural language processing model called BookNLP. It scans your full manuscript and identifies characters, attributes dialogue to speakers, and maps aliases and co-reference chains. It figures out that "Mom," "Catherine," and "Mrs. Brennan" are all the same person. Not because of simple name-matching (any find-and-replace can do that), but because of context, dialogue patterns, and how your narrative refers to people across 80,000 words.
Then it builds a structural skeleton. Chapter boundaries, scene breaks, word counts, compressed summaries of narrative events. All before any craft analysis begins.
You get to review the character map before anything else happens. Confirm characters, merge aliases, correct mistakes. This whole prep phase is free. Getting the character data right makes everything downstream more accurate, and charging you for setup would be... well, tacky.
You tell it what it needs to know
Before the analysis runs, you make what I call ground-truth declarations. You tell FirstReader your point of view (first person, third limited, omniscient), your primary tense (past or present), and your POV character names exactly as they appear on the page.
Why? Because without it, the tool is guessing. And a guessing tool is a WRONG tool.
A present-tense novel gets flagged for tense inconsistency. An omniscient narrator gets flagged for head-hopping. A first-person narrator who can't know what's in another character's head gets treated like a third-person narrator who can. Every one of those is a false positive. Every one wastes your time and erodes your trust in the report.
Other tools skip this step entirely. They analyze blind and let you figure out which flags are artifacts of the tool not understanding your basic setup. FirstReader asks because the alternative is giving you a report stuffed with phantom problems.
Three levels, not one prompt
This is where it gets interesting (at least, I think so... your mileage may vary).
Most AI analysis is a single call. Paste the chapter in, get feedback out. FirstReader runs a three-level hierarchical pipeline.
Level 1: Chapter-level craft analysis. Each chapter gets analyzed independently for each craft dimension in your selected tier. The analysis has the full chapter text, the deterministic metrics from prep, the character data, and your ground-truth declarations. It identifies specific findings... places where a craft principle is being executed well or violated... and cites the exact excerpt from your manuscript. Each finding gets tagged with a severity and mapped to a named craft principle. "Head-Hopping." "Value Turn." "Said-Bookisms." Named principles you can look up in any craft reference, not vague impressions.
Level 2: The review pass. A separate, adversarial analysis examines every finding from Level 1. It reads the same chapter and asks: is this finding real? Is the cited excerpt accurate? Is the severity appropriate? Findings that fail get quarantined. Demoted or dropped. This isn't a rubber stamp. It's a quality gate that exists specifically because I don't trust the AI to get it right the first time. (Nobody should.)
Level 3: Book-level analysis. After all chapters are processed individually, the tool steps back and looks at the manuscript as a whole. Recurring issues across many chapters get consolidated into manuscript-wide patterns instead of repeating the same flag chapter after chapter. Continuity gets checked: character details, timeline consistency, factual contradictions. Full-arc pacing gets assessed. And revision priorities get ranked so you know where to start.
If your tier includes developmental analysis, you also get an editorial letter (the kind a human developmental editor would write) and chapter-by-chapter notes covering what each chapter accomplishes structurally, where it stalls, and how it connects to the larger arc.
The argument with itself
I wrote about this in "Your Editing Pipeline Has a Blind Spot", but the short version goes like this.
LLMs are inconsistent. Run the same analysis twice on the same input and you'll get different findings. Some are stable observations. Some are one-offs that the model generated because of the particular way the tokens happened to fall. Run it once and you can't tell the difference.
So for mid-tier and above, FirstReader runs each chapter analysis multiple times with slight variation. Findings that show up consistently across runs stay. Findings that appear once and vanish get dropped.
This is called self-consistency, and it's one of the most effective accuracy mechanisms in the entire pipeline. It costs more (more API calls, more processing time), but it's the difference between a report full of noise and one where every finding earned its place.
Then there's the adversarial review from Level 2. The research on LLM self-correction is clear: asking a model to verify its own work DOESN'T WORK. It confirms its mistakes with the same confidence it confirms correct findings. I tested this in my own system early on. Asked the model to verify its citations against the source text. It confirmed fabricated quotes as accurate. Repeatedly.
So FirstReader doesn't ask the model to check itself. It runs a separate pass that's specifically framed to DISPROVE findings. Different prompt, different framing. The research shows that adversarial framing avoids the accuracy degradation you get from confirmatory self-review.
Two mechanisms, both designed to catch the AI when it's wrong. Because it will be wrong sometimes. The only question is whether you find out before or after you've revised based on bad feedback.
Everything else catching mistakes
Self-consistency and adversarial review are the big two. But there are five more accuracy mechanisms running in the background.
Finding validator. Deterministic code that compares each AI finding against the measured text properties from prep. If the AI claims your dialogue ratio is low but the actual number says otherwise... that finding gets flagged. The AI's opinion doesn't override the math.
Citation verification. Three-tier matching that confirms cited excerpts actually exist in your manuscript. I wrote about the paraphrasing problem in the Pipeline Blind Spot post... left unchecked, about 30% of LLM "verbatim" quotes are actually reconstructed from memory. Subtle word swaps, pronoun changes, sometimes entire fabricated phrases. Citation verification dropped that to near zero.
Cross-dimension dedup. If two different craft dimensions flag the same passage for the same issue (say, Prose Quality and Showing vs. Telling both catch a chunk of exposition), you see it once. Not twice with slightly different wording filling up your report.
Near-duplicate detection. Mathematical fingerprinting that catches when the AI generates the same observation in slightly different words across different parts of the analysis. LLMs do this more than you'd think.
Tic scanner. Flags recurring patterns in your prose... a pet phrase, a habitual construction, a default sentence opener... that might not rise to the level of a craft violation individually but become noticeable when they show up every other chapter.
And there's a cost ceiling. If you're running FirstReader with your own API key (the BYOK option), this one matters to you directly. A hard cap prevents a runaway analysis from burning through your credits. If some unusual chapter triggers an explosion of findings and the processing cost spikes, the analysis stops gracefully instead of racking up charges. Your costs stay predictable. Always.
What you actually get back
The report is not a wall of text.
(If you've used Inkshift, you know the wall-of-text experience. Ten thousand words of AI essay dumped on you with no navigation, no structure, no way to find the thing you care about without reading all of it. FirstReader does not do that.)
The report is an interactive web document. Overall scores for each analyzed dimension on a 5-point scale. Pattern cards that group recurring craft issues by principle, with instance counts and severity levels. Drill-down navigation... click a pattern and see every instance across your manuscript. A split-panel view with the finding on the left and your highlighted excerpt in the manuscript text on the right, so you can see EXACTLY what the tool is talking about without hunting through your draft.
(And down the road... the plan is to let you revise directly inside the report and rescan at a discount. Fix the problem, re-run the analysis on the updated section, see if the finding clears. That's coming.)
Chapter rankings show which chapters scored highest and lowest on each dimension. And there's a prioritized revision list ranking the most impactful changes you could make, so you know where to spend your time first.
You can download the whole thing as a branded PDF if you want a portable copy.
There's also an AI Perception Score. A deterministic scan (no LLM involved) that flags passages whose surface patterns resemble common AI-generated text. It doesn't claim to detect AI authorship. It flags stylistic patterns that readers and reviewers ASSOCIATE with AI writing. Whether you used AI to write those passages or not is beside the point. If they read like AI to human eyes, you probably want to know.
What it doesn't do
FirstReader doesn't generate prose. It'll show you examples of what a fix COULD look like alongside a finding, but it's not rewriting your book for you. The examples are there to clarify the principle, not to replace your sentences with its sentences. If you're looking for a tool that'll rewrite chapter three... this isn't it. (Sudowrite is over there. I won't tell anyone.)
It doesn't replace a human developmental editor, either. A great editor brings taste, intuition, and an understanding of your specific creative vision that no tool matches. What FirstReader gives you is a structured craft analysis BEFORE you spend $3,000 on that editor, so when you do hire one, you're not paying them to catch stuff you could've caught yourself two drafts ago.
And... it's not perfect. I said this in the Pipeline post and I'll say it here. Accuracy sits around 95%. One in twenty findings might miss the mark. I know that number because I check. Constantly. That 5% is what keeps me iterating on this thing, and I'd rather tell you about it upfront than pretend it doesn't exist.
The boring part that matters
I've written somewhere north of 30,000 lines of code around the LLM calls in this tool. The prompts matter. The model matters. But the architecture matters more than either one. How the input gets chunked. How the output gets verified. How findings get validated against real measurements. How the tool argues with itself before it shows you anything.
You can't replicate that by pasting a really good prompt into ChatGPT. Believe me... I tried. For months. That's how this project started. And that's why it became a product instead of a prompt.
FirstReader went live on May 18th, 2026. Everything I described above is running right now, on real manuscripts, for real writers.
If you want to see what that architecture produces on your own manuscript, try FirstReader. One free chapter, full analysis depth.
If you enjoyed this, please leave a comment below to let me know. If you DIDN'T enjoy it, well, I'd like to hear from you too.