What's the difference between named and numeric entities?

Named: & — human-readable, defined by the HTML spec. Numeric: & (decimal) or & (hex) — work for any Unicode character.

Does encoding prevent all XSS?

HTML encoding prevents HTML context XSS. You also need JavaScript encoding for JS contexts, CSS encoding for style attributes, and URL encoding for href values.

is a non-breaking space — a space that won't wrap to the next line and won't collapse with adjacent spaces.

HTML Encoder / Decoder

Processed locally · Never leaves your browser

Encode special HTML characters to entities (& < >) or decode entities back to characters. Named and numeric modes.

Input

Output

Common HTML Entities

&→&

<→<

>→>

"→"

'→'

©→©

®→®

™→™

€→€

£→£

¥→¥

°→°

Runs entirely in your browser — nothing is uploaded

Runs in your browser

Next steps

URL Encoder / Decoder

Recommended

Percent-encode and decode URLs and URL components. Supports encodeURIComponent and encodeURI.

Base64 Converter

Recommended

Encode or decode Base64 in one click — works with text and files.

Code Formatter

Make messy code readable — beautify or minify JS, HTML, and CSS.

Regex Tester

Write and test regex patterns with real-time match highlighting.

View all tools →

Runs entirely in your browser. No uploads. Your files stay private.

What Is HTML Encoding?

HTML encoding (also called HTML escaping) replaces characters that have special meaning in HTML - the angle brackets that mark up tags, the ampersand that begins entities, and the quote characters that delimit attribute values - with named or numeric entity references. The browser then displays them as literal characters instead of treating them as syntax.

This encoder uses a small in-house mapping (ENTITY_MAP in the source above) for the most common entities and falls back to numeric references via charCodeAt for anything outside the named set. The five mandatory escapes for a safe HTML encoder are & (must come first - escaping it last would double-escape everything else), <, >, ", and ' for the apostrophe. Without ', attribute values single-quoted in HTML can be broken by injected single quotes - a real XSS vector.

Two output modes are exposed. Named entities like &copy; are human-readable and standardised in the HTML5 spec - there are about 2,000 of them, covering almost every glyph you'd realistically need. Numeric entities like &#169; (decimal) or &#xA9; (hex) work for any Unicode code point and are the only safe choice for characters outside the named set. The decoder accepts all three styles - named, decimal numeric, and hex numeric - so a round-trip never loses data.

Where this matters most: server-side rendering. Frameworks like React (auto-escapes children), Vue (auto-escapes interpolations), and Django (auto-escapes via {{ }} unless you mark a string as |safe) handle this for you. Old-school string-concatenation rendering (PHP echo, raw template literals into innerHTML, dangerouslySetInnerHTML in React) does not - and that's where most XSS vulnerabilities live. Run any user-controlled string through HTML encoding before it touches innerHTML or its equivalent.

HTML encoding is context-specific. Inside an HTML element body, encoding the five characters above is sufficient. Inside an HTML attribute, you also need to consider the quoting style. Inside a <script> block, HTML encoding does nothing useful - you need JavaScript escaping. Inside a URL attribute (href, src), you need URL encoding instead. Inside a CSS context (<style>, style=), you need CSS escaping. Treating HTML encoding as a universal "sanitizer" is the most common security mistake - it isn't, it's only one of four context-specific encodings.

The decoder handles a more interesting edge case: numeric references like &#x1F600; can encode characters outside the BMP (the smiley face emoji is at U+1F600). String.fromCharCode handles 16-bit code units, so for code points above 0xFFFF you technically need String.fromCodePoint to avoid splitting them into surrogate pairs. The current implementation handles standard cases well; for full emoji round-trip safety, run the output through a Unicode normalization step.

What this tool deliberately does not do: it does not parse or sanitize HTML. If you paste <script>evil()</script>, the tool encodes the angle brackets but does not reach in and remove the script element. Encoding makes it safe to display as text; if you want to remove dangerous tags entirely while keeping safe ones (a paste-from-Word workflow, say), use a real HTML sanitizer like DOMPurify, which runs in-browser and handles attribute filtering, URL scheme allowlists, and namespace coercion.

Common Use Cases

XSS-safe output

Encode user comments before inserting them into a server-rendered HTML page when your framework doesn't auto-escape.

Displaying code samples

Encode an HTML snippet so it renders as text inside a documentation page instead of being interpreted as markup.

Email template safety

Encode dynamic merge-fields in HTML email templates so a user's name with an angle bracket doesn't break the layout.

Decoding API responses

Reverse double-encoded HTML returned from legacy APIs that wrapped already-escaped output in another encoding pass.

Frequently Asked Questions

Named entities (&copy;, &mdash;) use a memorable mnemonic from the HTML spec - there are around 2,000 in HTML5. Numeric entities (&#169; decimal, &#xA9; hex) work for any Unicode code point and don't depend on the browser knowing the name. Numeric is universal; named is more readable.

Only HTML-context XSS. Encoding the five characters & < > " ' makes user content safe inside an element body and properly-quoted attributes. JavaScript contexts need JS string escaping, URL attributes need URL encoding, CSS contexts need CSS escaping. Use a context-aware framework or a library like DOMPurify for defense in depth.

&nbsp; is the non-breaking space (U+00A0). Browsers treat it as a space character but won't wrap a line at it and won't collapse it with adjacent whitespace, which is why it's used to keep numbers and units together (10&nbsp;kg) or as a layout shim in tables.

Because every other HTML entity starts with &. If you escape < into &lt; first and then escape & into &amp;, your < becomes &amp;lt; - double-encoded. Always escape the ampersand first, then the other characters, which is exactly what the regex-based encoder here does in a single pass.

The five HTML-mandatory characters (& < > " ') plus a curated set of common typographic and currency symbols (© ® ™ € £ ¥ ° ± × ÷ — – non-breaking space). Anything else falls through to a numeric reference if the character matches the encode regex.

For Basic Multilingual Plane characters (most scripts, most languages), yes - encode to numeric and decode back. Characters above U+FFFF (emoji, ancient scripts) are encoded as surrogate pairs in JavaScript strings, which the standard String.fromCharCode doesn't fully reverse for code points above 0xFFFF; the encoder uses charCodeAt which only sees the high surrogate. For 100% emoji-safe round-tripping, encode and decode using the codePointAt / String.fromCodePoint pair.

For one-off encoding while writing template strings or debugging, yes. For runtime user-content rendering, prefer your framework's built-in escaping (React's {} interpolation, Vue's {{ }}, Django's autoescape) - those are battle-tested across many context combinations and won't miss edge cases.

Encoding turns dangerous characters into safe text representations - the page displays <script> as literal text. Sanitizing parses the input as HTML, removes dangerous elements and attributes, and re-serializes only the safe parts. Use encoding when you want the input shown as text; use a sanitizer like DOMPurify when you want to allow some HTML (bold, links) but block others (script, iframe).

No - the encoder operates on character-level escapes only. CDATA is a parser-level concept used in XML and XHTML to disable entity interpretation inside a block; if you're generating XHTML you may need to wrap content in <![CDATA[...]]> instead of escaping. Modern HTML5 parsing doesn't use CDATA outside SVG and MathML.

No. The encoder and decoder are pure string operations running in JavaScript inside your tab. There is no network call during encoding or decoding, so user content stays in your browser.