Skip to content

Unicode Converter

Convert between Unicode escape sequences and readable text online. Free Unicode converter supporting UTF-8, UTF-16, and code points.

Encoding / Decoding
Instant results

About Unicode Converter

Convert between text and Unicode code points. Supports multiple output formats including hex (U+XXXX), HTML entities, CSS escape sequences, and JavaScript notation.

How to Use Unicode Converter

1

Enter text or codepoint

Type a character to see its Unicode codepoint, or paste a codepoint like U+1F600 to see the matching character. The conversion runs in both directions from the same input field.

2

View all representations

You'll see the codepoint, the UTF-8 hex bytes, the JavaScript escape \u{...}, the HTML entity &#NNN;, the URL-encoded percent form %XX, and several other formats side by side.

3

Copy the format you need

Pick the right format for your destination. Source code wants the language-specific escape, HTML templates want the entity form, and URLs need percent encoding. The converter covers each common case so you don't have to translate by hand.

4

Inspect emoji structure

Compound emoji like πŸ‘¨β€πŸ‘©β€πŸ‘§β€πŸ‘¦ decompose into multiple codepoints linked by zero-width joiners. The converter shows the full breakdown, which is the easiest way to see why these characters take so much storage and behave oddly with naive length functions.

When to Use Unicode Converter

Embedding special characters in code

Some editors and build pipelines still flinch at raw Unicode in source files. Escape sequences sidestep the problem β€” \u{1F600} for JavaScript, \U0001F600 for Python, and so on β€” letting you keep emoji, symbols, and translated strings in plain ASCII while the runtime restores them to their full glory.

Debugging encoding issues

When an apostrophe shows up as Ò€ℒ, the bytes are correct but the program reading them picked the wrong encoding. Looking at the raw codepoints exposes that mismatch quickly and usually points the finger at UTF-8 being interpreted as Latin-1 somewhere along the path.

Embedding emoji in HTML

Modern HTML accepts Unicode directly, but some templating layers and legacy systems mangle anything outside ASCII. Falling back to entity references like 😀 for πŸ˜€ routes around those filters and keeps your output readable across environments.

Working with internationalized domain names

Domains containing non-ASCII characters travel through DNS in Punycode form, so cafΓ©.com becomes xn--caf-dma.com on the wire. Translating between the two views matters whenever you're configuring an IDN, validating user input, or auditing how a domain renders in different clients.

Unicode Converter Examples

Emoji to codepoint

Input
πŸ˜€ (smiling face)
Output
Codepoint: U+1F600\nUTF-8: F0 9F 98 80 (4 bytes)\nJavaScript: \u{1F600}\nHTML: 😀

One emoji, four equivalent identifiers. Reach for the codepoint when writing documentation, the UTF-8 bytes when you're estimating storage, the JavaScript escape when patching source files, and the HTML entity when templating engines won't pass raw Unicode through.

Surrogate pair issue

Input
JS String.length of 'πŸ˜€'
Output
String.length = 2 (not 1!)\nReason: surrogate pair

JavaScript measures strings in UTF-16 code units, and any codepoint above U+FFFF needs two of them β€” a surrogate pair. That's why most emoji return a length of 2 instead of 1, and why Array.from(str).length is the safer way to count actual characters.

Chinese character

Input
δΈ­ (Chinese 'middle')
Output
Codepoint: U+4E2D\nUTF-8: E4 B8 AD (3 bytes)\nUTF-16: 4E 2D (2 bytes)

BMP characters (anything below U+10000) take three bytes in UTF-8 but only two bytes in UTF-16. Since JavaScript uses UTF-16 internally, String.length reports the expected 1 for this character.

Tips & Best Practices for Unicode Converter

  • 1.Treat UTF-8 as the default for any new file or API you build. It's the web standard, it handles every codepoint, and it sidesteps the parade of incompatibilities that Latin-1 and other legacy encodings keep alive.
  • 2.Excel and Word sometimes save text and CSV files with a leading byte-order mark (U+FEFF) even though UTF-8 doesn't require one. Strip that BOM at the boundary, otherwise the first column header in your import quietly carries a phantom character.
  • 3.In JavaScript, Array.from(str).length is the right way to count visible characters because it iterates by codepoint. Plain String.length returns UTF-16 code units, which means emoji and other non-BMP characters report inflated lengths.
  • 4.Encoding bugs are almost always a three-way mismatch. The file is saved one way, the Content-Type header or meta charset declares another, and your code reads it as a third. Spot which step disagrees and the mojibake usually clears up.
  • 5.For IDNs, decide on a canonical form and store everything that way. Punycode is great for databases and DNS lookups, while Unicode is friendlier in user-facing displays β€” pick one for storage and convert at the edges.
  • 6.Emoji art varies wildly across platforms. Apple, Google, Microsoft, and Twitter all draw the same codepoint differently, so any test that depends on a specific rendering is fragile. The codepoint itself is what's portable.

Frequently Asked Questions

Unicode is the standard that gives every character in every writing system β€” Latin, Cyrillic, Chinese, Arabic, math symbols, emoji β€” its own unique number called a codepoint. The standard currently defines more than 149,000 characters. UTF-8 is the most common encoding and uses one to four bytes per codepoint, while UTF-16 is what JavaScript uses internally.