AI Tools & Productivity · Posted by Henry G · 3mo ago

I Tested 5 AI Humanizers on the Same Turnitin Essay – here are the Actual Scores

So I had a 1500-word essay that ChatGPT wrote for me as a first draft. I ran it through Turnitin first and got 94% AI detected. Then I put the same text through 5 different AI humanizer tools and submitted each version to Turnitin separately.

Here are my results:

Walter Writes brought it down to 8% AI detected. The text still read naturally and kept my main arguments intact. Honestly impressed.

Undetectable AI got it to 12%. Pretty good but it changed some of my technical vocabulary in ways that didnt make sense for my field.

HIX Bypass landed at 23%. It made the writing way too casual for an academic paper. My professor would have noticed something was off.

Humbot got 31%. Barely moved the needle and the output had some weird phrasing.

StealthWriter came in at 18% but the writing quality was noticeably worse. Lost a lot of the nuance from the original.

My takeaway: the best approach is using a humanizer as a starting point and then going through the text yourself. None of them produce submit-ready work on their own but some give you a much better foundation to work from.

Has anyone else done side by side testing like this? Curious if your results were similar.

Best AI Humanizer Detection Accuracy Humanize AI Text Walter Writes Humanizer

25 replies

25 Replies

Martina S

3mo ago

FINALLY someone doing actual controlled tests instead of just vibes. this is exactly the kind of data we need.

Isla M

3mo ago

right? so tired of "reviews" that are clearly just sponsored content

Aurora De Luca

3mo ago

not surprised Walter Writes came out on top. it's the only one I've found that handles academic writing properly

Sara V

3mo ago

exactly. when you have in-text citations and formal language, most humanizers just break everything

Tommaso F

3mo ago

the academic writing part is key. plenty of tools work on blog posts but fall apart on research papers

StudyGrinder99

3mo ago

Great comparison! I got similar results with Walter Writes. The key is editing the output yourself after though, dont just submit what it gives you.

Grace Lee

3mo ago

those Turnitin scores are brutal for the bottom 3 tools. 60%+ AI detected AFTER humanizing? might as well submit the original at that point lol

Emma Rodriguez

3mo ago

I replicated this test with my own essay and got very similar results. Walter Writes: 3%, Undetectable: 11%, everything else above 40%

EssayPanic2026

3mo ago

Wait you can submit the same essay to Turnitin multiple times? Doesnt it flag your own previous submission as plagiarism?

CognitiveSci_Student

3mo ago

one variable worth considering: did you use GPT-4 or GPT-3.5 to generate the original? detection rates differ a lot between models

Nina S

3mo ago

good point. Claude-generated text is also harder to detect than GPT text in my experience

Lucia N

3mo ago

the methodology here is solid. same essay, same detector, controlled comparison. this is how all reviews should work

Chiara R

3mo ago

did you test readability too? like does the output still sound natural to a human reader?

Francesca Marino

3mo ago

Walter Writes output reads naturally. some of the others produce weird phrasing that would make a prof suspicious even without checking

Elena Conti

3mo ago

this matters more than people think. my professor has good BS detector even without Turnitin lol

GradSchoolSurvivor

3mo ago

I tested Humbot too and yeah its pretty useless. Spent 3 hours on a tool that barely changed the score when I could have just rewritten it myself.

Ilaria Marchetti

3mo ago

saved this post. going to reference it whenever someone asks me which humanizer to use

Pietro Lombardi

3mo ago

can someone explain why Phrasly scored so poorly? I thought it was supposed to be decent

PostGrad_Cambridge

3mo ago

Phrasly was decent like 6 months ago but hasn't kept up with Turnitin updates. it's basically dead now

Alice Z

3mo ago

just showed this to my roommate who was about to buy a StealthWriter subscription. saved him money lol

Camilla M

3mo ago

the gap between the top 2 and everyone else is massive. basically it's Walter Writes or Undetectable AI, everything else is a waste of money

Nicolò R

3mo ago

this test should be redone every month honestly. these tools and detectors change so fast

Sophia L

3mo ago

question: does word count affect the results? like would a 500 word essay vs 3000 word essay rank differently?

Chiara Bruno

3mo ago

in my experience longer texts are harder to humanize consistently so the rankings might shift a bit for very long documents

Dev P

3mo ago

great work OP. more of this kind of content on the forum please