← Back to blog

Your Novel Might Read Like AI Even If You Wrote Every Word

A Big Five publisher pulled a debut horror novel in March 2026. Hachette cancelled the US release, pulped the UK edition, and did it all within 24 hours of the New York Times running the story. The book was called Shy Girl. The author was Mia Ballard. And the AI detectors couldn't even agree on what happened.

Pangram Labs scanned the manuscript: 78% AI-generated. Originality.ai scanned the same text: 91% human. One of them was wrong. Probably both. Doesn't matter. The book was already dead.

Because the detectors didn't kill Shy Girl. Readers did.

What the readers counted

A self-described veteran book editor posted a Reddit thread in January 2026 itemizing the tells. A BookTuber named Frankie's Shelf followed with a three-hour video dissection that pulled 1.2 million views. And the things they pointed to weren't statistical models or token-probability distributions. They were things you could count with your fingers.

The word "sharp" appeared 159 times across 209 pages. Not sharp knives. Not sharp edges in a physical sense. "Sharp anticipation." "Sharp silence." "Sharp certainty." One abstract adjective doing the work of an entire emotional vocabulary, page after page after page.

The word "edge" showed up 84 times.

Nearly every noun carried an adjective in front of it. Rule-of-three constructions saturated the prose (lists of three descriptors, three actions, three sensations repeated across scenes until the pattern became impossible to unsee). Weather and elemental similes kept showing up for emotions. "His amusement curled like smoke." "Anticipation blooming sharp and fast." A small stock of images (smoke, fog, fire, glass, knives, wire) recycled across the entire book.

Slate's verdict was blunt: "There was no phrase, word choice, grammar, structure, syntax, or punctuation mark that ChatGPT would not have done."

And that's the sentence that should keep you up at night. Not because your book was written by AI. Because it doesn't have to be.

The folk consensus is real, and it has teeth

What happened after Shy Girl was a crystallization. Patterns that had been floating around Reddit threads and writing forums since 2023 suddenly locked into a canon. Before March 2026, "AI tells" were scattered observations. After March 2026, they became a checklist.

Here's what readers, editors, and agents are now actively scanning for when they read a manuscript (or a submission, or a self-published book on Kobo, or a BookTok recommendation that seems too polished):

The vocabulary cluster. "Delve." "Tapestry." "Navigate." "Landscape." "Realm." "Testament." "Leverage." "Seamless." "Pivotal." "Unwavering." "Nestled." "Bustling." "Vibrant." These words aren't wrong. They're just... suspicious now. "Delve" peaked as a tell in 2023-24 before OpenAI trained it down. Some of these are already fading. Others are hardening. The list drifts every quarter.

The structural patterns. Contrastive negation is the big one. "It's not X, it's Y." "Not just X, but Y." LessWrong ran a whole analysis on why LLMs default to this construction. It's the single most recognized ChatGPT sentence skeleton in writing communities. Rule-of-three saturation. Symmetric clause pairs written for balance rather than meaning. Paragraph-ending summary sentences that restate what the paragraph just said (you know the ones... "And in that moment, she knew").

The emotional formulas. Body-sensation-as-emotion at industrial scale. "Her breath caught." "His jaw tightened." "A chill ran down her spine." "She steeled herself." These aren't AI inventions. Romance and thriller writers have used them for decades. But AI uses them CONSTANTLY, and now they're on the list.

The register leaks. This one is subtle and probably the most dangerous for human writers who don't see it coming. Therapy voice: "give yourself grace," "hold space for," "sit with the discomfort," "honor your truth." RLHF (the training process that makes chatbots polite) baked this language into every model. It leaks into fiction narration when the narrator suddenly sounds like a wellness podcast instead of a storyteller. Corporate SaaS register does the same thing in a different direction: "optimize," "leverage," "unlock," "streamline." One of those in a fantasy novel and a reader's antenna goes up.

The punctuation. Em dashes became a meme in early 2025. Washington Post, Rolling Stone, Slate, NPR all ran pieces on the "ChatGPT hyphen." OpenAI actually added a toggle to suppress them. The irony is that em dashes have been a legitimate punctuation mark for centuries. Writers who've used them their whole careers are now self-censoring. That's the power of folk consensus. Accuracy is beside the point.

The folk consensus and the statistical evidence disagree

This is where it gets interesting (and where most coverage of AI tells falls apart).

Academic researchers have been studying AI-generated text since GPT-2. They've identified real, measurable, statistically validated features that separate AI prose from human prose. And those features are MOSTLY NOT the same things readers complain about.

The academic tells are things like paragraph-length coefficient of variation (human prose varies wildly in paragraph length; AI prose is eerily uniform). Sentence burstiness (humans write in unpredictable rhythms; AI flatlines). Lexical diversity metrics like MTLD and HD-D that measure vocabulary richness in ways robust to document length. Fronted participial phrases (Reinhart 2025 found GPT-4o uses them at 5.3x the human rate). Function-word distributions that can be compared against a precomputed human-fiction centroid using Cosine Delta.

These are real. They work. And almost no reader has ever consciously noticed any of them.

Meanwhile, the folk tells (em dashes, "delve," rule of three, "sharp" 159 times) are statistically WEAKER as discriminators. Em-dash usage varies enormously across human writers. "Delve" has been trained out of newer models. Rule-of-three is a legitimate rhetorical device humans have used since ancient Greece. Some of these tells have high false-positive rates and mediocre true-positive rates.

But the folk tells are what get books cancelled.

There's a gap between what readers perceive as AI and what statistical analysis identifies as AI. And that gap is where careers get destroyed. A manuscript can be statistically indistinguishable from human writing and STILL trigger every folk-consensus alarm. A different manuscript can light up every statistical detector and sail through reader scrutiny because it doesn't hit the patterns readers have been trained to look for.

The perception gap is the story. And if you're a working author in 2026, the perception side is the one that matters to you.

The false positive problem; why this isn't just about AI users

Where the detectors really fall apart is where the conversation starts to get really uncomfortable.

Liang et al. (2024, Stanford) tested seven commercial AI detectors on 91 TOEFL (Test of English as a Foreign Language - the standardized English proficiency test for non-native speakers) essays written by non-native English speakers. Human writers. Every word their own. 61.3% were flagged as AI-generated by at least one detector. Over half were flagged unanimously. The same detectors correctly identified 90%+ of essays by native English-speaking eighth graders.

Why? Because non-native writers tend to use more predictable phrasing, more conventional grammar, smaller active vocabularies, fewer idiomatic constructions. LLMs produce the same profile. The detectors aren't measuring "AI authorship." They're measuring "distance from stylized native English." And they're punishing anyone who doesn't write like a native speaker.

It gets worse. Research has documented elevated false-positive rates for autistic writers (who often write in a more formal, rule-based, low-idiom register). For translated prose (translation smoothing strips the idiosyncratic tics detectors look for). For heavily copy-edited manuscripts (each editorial pass makes the text more uniform, more "clean," more... AI-looking).

And for entire genres. Commercial romance and epic fantasy reward high adjective density, sensory stacking, body-language tags, and rule-of-three constructions as CONVENTIONS. A romance writer using "her breath hitched" isn't channeling ChatGPT. She's writing to genre. A scanner that doesn't know the difference is just another bad detector with a different paint job.

This is why a single 'percent AI' score is both intellectually bankrupt and genuinely dangerous. You can't reduce this to one number. The features overlap. The contexts matter. The genres shift the baselines. Anyone selling you a probability is selling you confidence they don't have.

So what do you actually do about this?

You could ignore it. Hope your prose doesn't trip any wires. Hope nobody counts your adjectives or screenshots your similes. That's a strategy. Not a great one in 2026, but it's a strategy.

Or you could get ahead of it.

The useful question isn't "did I use AI?" You know the answer to that. The useful question is "will a reader, an agent, a Kobo reviewer, a slush reader THINK I used AI?" That's a different question. And it's answerable.

Know what the folk tells are. Not because they're right (they're often wrong), but because they're what people look for. If "sharp" is your go-to adjective and it shows up 80 times in your manuscript, that's worth knowing before you submit. Not because it proves anything. Because it's what someone will screenshot.

Know what the statistical tells are, too. Not because a reader will notice your paragraph-length uniformity, but because an editor might. Because a contest judge running your submission through GPTZero might. Because understanding BOTH sides of the perception gap gives you a clearer picture of where your manuscript actually stands.

And know your genre baselines. A paranormal romance that flags for body-sensation language and adjective density isn't necessarily reading as AI. It's reading as paranormal romance. Context matters. A literary novel that flags for the same patterns has a different problem.

This is what I built FirstReader's perception scanner to do. Not to play AI cop. Not to spit out a meaningless probability score. To show you, pattern by pattern, excerpt by excerpt, where your manuscript overlaps with the things readers and editors have been trained to flag. Folk tells and statistical tells, weighted separately, with genre context and false-positive caveats surfaced by name. So you can make informed decisions about your own prose before someone else makes those decisions for you.

Because in 2026, the question isn't whether your book was written by AI. The question is whether it READS like it was. And that's a craft question, not a technology question. It always has been.


Want to check your own manuscript? FirstReader's AI Perception Check scans your prose for all 17 pattern families and 4 statistical metrics discussed in this article. Two-track scoring, genre-aware baselines, false-positive caveats. Free, instant, no AI used in the analysis. Run your free perception check here.


If you found this useful, leave a comment below to let me know. If you DIDN'T find it useful, well, I'd like to hear from you too.

Also on Substack

Comments

No comments yet. Be the first.

Leave a comment

Don't have a FirstReader account yet? Sign up free and get a free chapter analysis.