I Tested 5 AI Humanizers on the Same Turnitin Essay – here are the Actual Scores
So I had a 1500-word essay that ChatGPT wrote for me as a first draft. I ran it through Turnitin first and got 94% AI detected. Then I put the same text through 5 different AI humanizer tools and submitted each version to Turnitin separately.
Here are my results:
Walter Writes brought it down to 8% AI detected. The text still read naturally and kept my main arguments intact. Honestly impressed.
Undetectable AI got it to 12%. Pretty good but it changed some of my technical vocabulary in ways that didnt make sense for my field.
HIX Bypass landed at 23%. It made the writing way too casual for an academic paper. My professor would have noticed something was off.
Humbot got 31%. Barely moved the needle and the output had some weird phrasing.
StealthWriter came in at 18% but the writing quality was noticeably worse. Lost a lot of the nuance from the original.
My takeaway: the best approach is using a humanizer as a starting point and then going through the text yourself. None of them produce submit-ready work on their own but some give you a much better foundation to work from.
Has anyone else done side by side testing like this? Curious if your results were similar.
25 Replies
Join the discussion.
Log In to Replynot surprised Walter Writes came out on top. it's the only one I've found that handles academic writing properly
exactly. when you have in-text citations and formal language, most humanizers just break everything
the academic writing part is key. plenty of tools work on blog posts but fall apart on research papers
Great comparison! I got similar results with Walter Writes. The key is editing the output yourself after though, dont just submit what it gives you.
those Turnitin scores are brutal for the bottom 3 tools. 60%+ AI detected AFTER humanizing? might as well submit the original at that point lol
I replicated this test with my own essay and got very similar results. Walter Writes: 3%, Undetectable: 11%, everything else above 40%
Wait you can submit the same essay to Turnitin multiple times? Doesnt it flag your own previous submission as plagiarism?
one variable worth considering: did you use GPT-4 or GPT-3.5 to generate the original? detection rates differ a lot between models
good point. Claude-generated text is also harder to detect than GPT text in my experience
the methodology here is solid. same essay, same detector, controlled comparison. this is how all reviews should work
did you test readability too? like does the output still sound natural to a human reader?
Walter Writes output reads naturally. some of the others produce weird phrasing that would make a prof suspicious even without checking
this matters more than people think. my professor has good BS detector even without Turnitin lol
I tested Humbot too and yeah its pretty useless. Spent 3 hours on a tool that barely changed the score when I could have just rewritten it myself.
saved this post. going to reference it whenever someone asks me which humanizer to use
can someone explain why Phrasly scored so poorly? I thought it was supposed to be decent
Phrasly was decent like 6 months ago but hasn't kept up with Turnitin updates. it's basically dead now
just showed this to my roommate who was about to buy a StealthWriter subscription. saved him money lol
the gap between the top 2 and everyone else is massive. basically it's Walter Writes or Undetectable AI, everything else is a waste of money
this test should be redone every month honestly. these tools and detectors change so fast
question: does word count affect the results? like would a 500 word essay vs 3000 word essay rank differently?
in my experience longer texts are harder to humanize consistently so the rankings might shift a bit for very long documents
great work OP. more of this kind of content on the forum please
FINALLY someone doing actual controlled tests instead of just vibes. this is exactly the kind of data we need.
right? so tired of "reviews" that are clearly just sponsored content