π URL Encoder / Decoder
://, /, ?, &, # β safe for encoding a complete URL without breaking its structure.
There's a persistent belief among developers that URL encoding is just "turning spaces into %20." Slap encodeURIComponent on everything, ship it, done. But URLs break in production anyway, API responses come back garbled, and someone opens a ticket. The problem isn't that percent-encoding is hard β it's that most developers never properly learned which characters need encoding, where, and why the rules differ depending on where in the URL you're working.
The Myth: One Function Encodes Everything Correctly
Ask a developer to encode a URL and they'll reach for encodeURIComponent without hesitation. Run this on https://example.com/search?q=hello world and you'll get https%3A%2F%2Fexample.com%2Fsearch%3Fq%3Dhello%20world β a string that's completely useless as a navigable link. The slashes got encoded. The colon disappeared into %3A. Every delimiter that gives a URL its meaning has been eaten alive.
This happens because encodeURIComponent was never meant for full URLs. It's designed for encoding the value inside a query parameter, not the URL itself. The sibling function encodeURI handles full URIs by deliberately preserving structural characters: :, /, ?, &, #, and a handful of others that define a URL's anatomy. Using the wrong function is one of the most common URL-related bugs in frontend and backend code alike.
What RFC 3986 Actually Says
Every URL lives under RFC 3986, the specification that defines what a Uniform Resource Identifier is. It draws a hard line between reserved characters and unreserved characters. Reserved characters β like /, ?, #, =, & β carry structural meaning and must only be percent-encoded when they appear as literal data, not as delimiters. Unreserved characters β letters, digits, -, _, ., ~ β never need encoding and shouldn't be encoded.
Everything else is a candidate for percent-encoding: a percent sign followed by two uppercase hexadecimal digits representing the byte value. A space becomes %20 because the ASCII decimal value of a space is 32, and 32 in hexadecimal is 20. An at-sign (@) becomes %40. An ampersand used as literal data in a query parameter value becomes %26, so it doesn't get mistaken for the delimiter between parameters.
Full URI Mode vs. Component Mode β The Core Distinction
This is the distinction most tutorials gloss over. When you have a complete URL like https://api.example.com/v1/search?q=cafΓ© au lait&lang=fr, you want to encode only the parts that need it β the spaces and the accented character β while leaving the protocol, host, path separators, and query delimiters untouched. That's Full URI mode, which maps directly to JavaScript's encodeURI / decodeURI.
When you're building a URL programmatically and you have a raw parameter value β say, the user typed Tom & Jerry into a search box β you need every special character encoded before you concatenate it into the query string. If you left the & raw, the server would read it as a parameter delimiter and split your value in half. That's Component mode, using encodeURIComponent / decodeURIComponent, which encodes the reserved characters too.
The practical rule: use Component mode for values, Full URI mode for complete addresses. Never swap them.
The Plus Sign Problem Nobody Talks About
HTML forms built before the modern fetch API encode spaces as + rather than %20. This is called application/x-www-form-urlencoded format, and it's a completely different encoding scheme from standard percent-encoding. When a server returns name=John+Doe and you decode it with decodeURIComponent, you get John+Doe β the plus stays a plus, not a space. You'd need to replace + with %20 first, or use a library aware of form encoding.
This silently corrupts data in countless apps. A user with a + in their name or a search query containing "C++" will find their input mangled. If you're consuming query strings from classic HTML form submissions, always check which encoding the form used before blindly decoding.
Unicode Characters and Multi-Byte Encoding
Spaces are simple. Unicode characters are where things get genuinely interesting. The cafΓ© example above: the letter Γ© is U+00E9. In UTF-8, it encodes to two bytes: 0xC3 and 0xA9. As a percent-encoded sequence, it becomes %C3%A9. Both encodeURI and encodeURIComponent handle this correctly in every modern browser and Node.js runtime β they always use UTF-8 byte representation. The danger comes from older systems that used Latin-1 or other encodings, where Γ© would encode as just %E9. Mix the two and your server will read mojibake instead of text.
Chinese, Arabic, Hindi, and emoji characters follow the same UTF-8 pattern but expand to more bytes. The emoji π is U+1F600, encoded as four UTF-8 bytes: F0 9F 98 80, giving you %F0%9F%98%80 in a URL. Legitimate, valid, and increasingly common as URLs carry richer content.
Where Decoding Goes Wrong
Decoding has its own failure modes. The most frequent: a malformed percent sequence. If someone passes %ZZ, there's no valid hex representation β both decodeURI and decodeURIComponent throw a URIError: URI malformed exception. Code that calls these functions without try/catch will crash. This is exactly the kind of bug that doesn't show up in development (where inputs are controlled) but surfaces immediately when real users paste URLs from sources you didn't anticipate.
Double-encoding is another trap. A URL that's been encoded twice β %2520 instead of %20 (because %25 is the encoding for %) β will decode to %20 on the first pass, not to a space. Servers see a literal percent-20 in the path, routing fails, and developers spend an afternoon in confusion.
The Fragment Identifier Edge Case
URL fragments β the #section part β are never sent to the server. The browser strips them before making the HTTP request. This means encoding a fragment identifier is pointless from a server-routing perspective, but it matters for client-side JavaScript that reads window.location.hash. If your single-page app uses hash-based routing and the fragment contains special characters, you need to handle encoding and decoding on the client yourself, because the server never sees it.
A Quick Field Guide
Before reaching for an encoder, ask three questions. First: is this a complete URL or a fragment of one? Complete URL β use Full URI mode. Raw value going into a parameter β use Component mode. Second: where did this string come from? Form submission may mean plus-encoded spaces. Third: has this already been encoded? If you see % signs in the input, decoding first before re-encoding prevents the double-encoding trap.
Percent-encoding exists because URLs were designed for ASCII and the web became global. The rules are precise but learnable, and once you internalize the Full URI vs. Component distinction, most URL-related bugs stop being mysterious. They become predictable, catchable, and preventable β which is the best thing any encoding scheme can aspire to be.