A Curious Glitch in XSS Sanitizing

Rodolfo Assis (Brute)
4 min readFeb 6, 2022

When looking for ways to bypass XSS sanitizing (sanitizing, not filtering), I’ve figured out something interesting but almost useless because it’s very, very unlikely. But who knows, maybe due to some complexity of the code we face, this might not be… Impossible?

I hope one of you readers make a good use of it someday, in real world or in a CTF challenge, since what you gonna see here is probably something unknown for most of people out there (it was for me).

I was fighting against the twin PHP’s classic sanitizing functions, htmlentities and htmlspecialchars. If you are a developer or a bug hunter, you know they are very effective against XSS when it arises in HTML context, which seems the most common scenario usually you deal with.

Let’s get into practical examples while explaining what’s happening but first check this little poll I’ve created in Twitter some days ago for this post:

Those 2 scenarios can be found online in the links below for you to check how things go there.

Sanitization Scenario 1

https://brutelogic.com.br/lab/sanitization1.php?p=any

Sanitization Scenario 2

https://brutelogic.com.br/lab/sanitization2.php?p=any

If you like some challenge, stop reading this now, play with those 2 URLs and see if you can conclude the right answer of the tweet above!

Let’s check the differences between those 2.

The 1st one checks if input has up to 49 characters before displaying it in response. But it does that by using htmlentities() first, sanitizing it. No way to inject a full HTML element and consequently no XSS.

The 2nd one checks with htmlentities() if the sanitized input has up to 9 characters, then outputs it, cleanly, without any sanitizing.

But 9 characters is not enough for any XSS vector we know yet at the moment of this writing and probably for years to come. While 49 characters from the 1st scenario is more than enough for any XSS payload we already know it’s not possible.

So what’s the conclusion here? Option number 1 of the poll, both secure? It seems so.

There’s something related to those PHP sanitizing functions that’s not well known: when input handled by them contains certain characters they are considered invalid and there’s even a flag to handle that.

As we can see there, this flag discards an invalid input instead of returning an empty string. And that’s an important clue to figure out how to bypass one of the cases above.

By injecting an invalid character for htmlentities function, we make it return an empty string. Well, this would be useless because there won’t be anything to reflect to achieve XSS but our 2nd scenario is very particular, it does the sanitized check first then let the input reflects.

In real world scenarios, with more complex code, maybe that’s possible to find although almost impossible. But if so, it would be such an awesome bypass that owners of the vulnerable application will be shocked! Here’s how this happens:

https://brutelogic.com.br/lab/sanitization2.php?p=%3CSvg+OnLoad=confirm(1)%3E%80

By adding any single byte above the ASCII byte range (%00 to %7F) like %80 we make htmlentities to discard the input. That way, the counting of characters of the input drops to 0 and pass the check with any number of characters. Then it gets reflected unmodified in response.

That’s it. The right answer to the poll was option number 2.

#hack2learn

Also check my XSS online stuff:

KNOXSS

Brute XSS Blog

Brute XSS Cheat Sheet

Thanks for your attention,

Rodolfo.

--

--

Rodolfo Assis (Brute)

Artist, free thinker. Computer hacker known as Brute (@brutelogic). Follow me in Twitter @rodoassis.