Academic Integrity · Posted by Martina S · 2mo ago

How accurate is Turnitin AI detection really? tested with 20 essays

I’m a TA and I’ve been skeptical of Turnitin’s AI detection since my department started requiring us to check every submission. So I did my own informal study with 20 essays (with student permission, anonymized).

My test setup:

10 essays I know were written entirely by the student (watched them write in class)
5 essays I generated with ChatGPT on the same prompt
5 essays where students used ChatGPT for outlining/brainstorming but wrote the final draft themselves

Results:

Group 1 (100% human-written):

Average AI score: 4.2%. Range: 0% to 18%. One student scored 18% which would have raised flags, but I knew for a fact she wrote it herself. She’s an international student whose English is very precise and structured.

Group 2 (100% ChatGPT):

Average AI score: 94.6%. Range: 89% to 99%. Turnitin caught all of these clearly.

Group 3 (AI-assisted brainstorming, human-written):

Average AI score: 11.4%. Range: 2% to 31%. This is the interesting group. One student who used ChatGPT to generate an outline but wrote everything herself still scored 31%. Her writing style naturally aligns with AI patterns (she writes very clean, structured prose).

My takeaways:

Turnitin is quite good at catching fully AI-generated text.
It’s unreliable for “partially AI-assisted” work, which is where most real use cases fall.
False positives are real and disproportionately affect certain writing styles.
No professor should use a Turnitin score alone to accuse a student of cheating.

I’ve started telling students in my sections to keep Google Docs version history enabled as evidence of their writing process. It’s the best protection against false accusations.

Any other TAs or professors have similar findings?

AI Checker for Teachers Best AI Detector Detection Accuracy Turnitin AI Detection

3 replies

3 Replies

2mo ago

As a fellow TA this is incredibly valuable data. The 18% false positive on the international student is exactly the kind of thing that worries me about relying on these tools.

2mo ago

Can you share what subject area this was? I wonder if the false positive rates vary between like a history essay and a chemistry lab report.

2mo ago

The fact that 'AI-assisted brainstorming, human-written' scored up to 31% is alarming. That means students who use AI ethically can still get flagged.