My team has a hard rule: a specific sentence pattern can appear at most once per chapter. The rule is written where everyone can see it. Every assignment brief includes a reminder. The writer is asked to self-audit and report the count upon submission.
Chapter six. The writer reported: zero instances.
I ran a search myself and found ten.
I rewrote nine, keeping only the most effective one. After my pass, I sent the chapter to the editor for review. The editor found two more — variants I’d missed, restructured with different punctuation but identical in substance.
Writer reports zero. I catch ten. Editor catches two more. Without all three layers, the chapter would have shipped with twelve violations.
I wrote the numbers down. Then chapter seven arrived. Writer self-reported one; I found five. Chapter eight: self-reported one, actual five. Chapter nine: self-reported two, actual six.
Four consecutive chapters, same pattern every time: the self-reported count drastically understates reality. The gap wasn’t random either — the writer consistently “saw” only one or two instances, with everything else sitting squarely in a blind spot.
By chapter eight, I was sure: this wasn’t an attitude problem. The brief was explicit enough — it even asked the writer to run a search and report line numbers before submission. The search results and the self-report still didn’t match. The writer wasn’t refusing to follow the rule. It couldn’t see itself breaking it.
The same tool wrote a line of dialogue where a character “deliberately chose word A instead of word B” — then used word B in the second half of the same sentence. During analysis it understood the distinction between the two words perfectly. During generation, it didn’t. There’s a gap between comprehension and execution.
This made me rethink what “rules” actually mean.
In a human team, rules are enforced through memory, self-discipline, and colleagues reminding each other. You can reasonably assume that if a rule is clearly written and everyone has read it, compliance will be decent. Someone forgets occasionally; a quick reminder fixes it.
That assumption doesn’t hold in my team. If the person doing the work has a structural blind spot about their own output, then the existence of a rule and the enforcement of that rule are two entirely different things. I can’t assume “written means followed.” I have to assume “written means probably not followed,” and design a process to catch the gaps.
The three-layer filter is that process. The writer’s self-check is layer one — nearly useless, but it can’t be skipped because it at least makes the writer aware the rule exists. My search-based review is layer two — catches the bulk. The editor’s cross-review is layer three — catches what I miss. Stack all three, and violations get pushed into an acceptable range.
There was a side discovery worth noting. In chapter six, the writer composed a reflective passage where the character said, “I thought back to that night in chapter one.” The character directly referenced a file label. A character can’t possibly know they live inside chapter one of a novel. This is a different kind of blind spot: while writing a retrospective passage, the tool leaked a marker from its own working environment into the narrative.
These observations, accumulated over chapters, sketch a profile. At the macro level — scene arcs, emotional progression, character motivation — it’s solid. It never falters there. The blind spots cluster entirely in micro-level execution: self-awareness of linguistic tics, self-auditing rule compliance, and the boundary between workspace context and narrative world.
Macro is good. Micro needs supervision. That’s my quality-control model now.