What does the Unicode converter do?

It walks between text and several Unicode representations — codepoints like U+1F600, HTML entities like 😀, JavaScript escapes like \u{1F600}, UTF-8 hex bytes (F0 9F 98 80), URL-encoded percent forms (%F0%9F%98%80), and more. The conversions help with debugging Unicode issues, teaching character encoding, and embedding special characters in source code that doesn't accept them directly.

What's the difference between codepoint and UTF-8 bytes?

A codepoint is the abstract Unicode number assigned to a character — U+1F600 for 😀, for example. The UTF-8 bytes are how that codepoint is actually serialized for storage or transmission, which for that emoji is the four-byte sequence F0 9F 98 80. One codepoint corresponds to anywhere from one to four UTF-8 bytes, and seeing both representations side by side makes the storage cost obvious.

What's a surrogate pair?

JavaScript stores strings as UTF-16, and any codepoint above U+FFFF — most emoji, plenty of historical scripts — requires two UTF-16 code units to represent. Those two paired units are called a surrogate pair. That's why 😀 (U+1F600) returns 2 from String.length even though it's a single Unicode character, and why character counting in JavaScript needs care.

How do I escape Unicode in code?

Each language has its own form. Modern JavaScript supports \u{1F600} or you can write the surrogate pair as \uD83D\uDE00 the old way. Python 3 expects \U0001F600 with an uppercase U and eight hex digits. Java requires the surrogate pair \uD83D\uDE00. The converter shows the right escape format depending on the language you're targeting.

Why does my emoji look different on different platforms?

Every platform draws emoji in its own style. Apple, Google, Microsoft, Twitter (Twemoji), and Facebook each maintain separate sets, so the same codepoint renders as visually different art on each. The codepoint itself never changes — only the picture each system paints to represent it.

What's the largest Unicode codepoint?

The maximum is U+10FFFF, which gives roughly 1.1 million possible codepoints in total. Around 149,000 are currently assigned across human writing systems, emoji, math, and reserved private-use areas. There's plenty of headroom for centuries of additional expansion before that ceiling matters.

Is the data sent anywhere?

No. The conversion runs entirely in your browser, which is what you want for sensitive text or proprietary content you'd rather not hand to a server.

Unicode Converter

Convert between Unicode escape sequences and readable text online. Free Unicode converter supporting UTF-8, UTF-16, and code points.

Encoding / Decoding

Instant results

Text

Unicode

About Unicode Converter

Convert between text and Unicode code points. Supports multiple output formats including hex (U+XXXX), HTML entities, CSS escape sequences, and JavaScript notation.

How to Use Unicode Converter

Enter text or codepoint

Type a character to see its Unicode codepoint, or paste a codepoint like U+1F600 to see the matching character. The conversion runs in both directions from the same input field.

View all representations

You'll see the codepoint, the UTF-8 hex bytes, the JavaScript escape \u{...}, the HTML entity &#NNN;, the URL-encoded percent form %XX, and several other formats side by side.

Copy the format you need

Pick the right format for your destination. Source code wants the language-specific escape, HTML templates want the entity form, and URLs need percent encoding. The converter covers each common case so you don't have to translate by hand.

Inspect emoji structure

Compound emoji like 👨‍👩‍👧‍👦 decompose into multiple codepoints linked by zero-width joiners. The converter shows the full breakdown, which is the easiest way to see why these characters take so much storage and behave oddly with naive length functions.

When to Use Unicode Converter

Embedding special characters in code

Some editors and build pipelines still flinch at raw Unicode in source files. Escape sequences sidestep the problem — \u{1F600} for JavaScript, \U0001F600 for Python, and so on — letting you keep emoji, symbols, and translated strings in plain ASCII while the runtime restores them to their full glory.

Debugging encoding issues

When an apostrophe shows up as â€™, the bytes are correct but the program reading them picked the wrong encoding. Looking at the raw codepoints exposes that mismatch quickly and usually points the finger at UTF-8 being interpreted as Latin-1 somewhere along the path.

Embedding emoji in HTML

Modern HTML accepts Unicode directly, but some templating layers and legacy systems mangle anything outside ASCII. Falling back to entity references like 😀 for 😀 routes around those filters and keeps your output readable across environments.

Working with internationalized domain names

Domains containing non-ASCII characters travel through DNS in Punycode form, so café.com becomes xn--caf-dma.com on the wire. Translating between the two views matters whenever you're configuring an IDN, validating user input, or auditing how a domain renders in different clients.

Unicode Converter Examples

Emoji to codepoint

Input

😀 (smiling face)

Output

Codepoint: U+1F600\nUTF-8: F0 9F 98 80 (4 bytes)\nJavaScript: \u{1F600}\nHTML: &#128512;

One emoji, four equivalent identifiers. Reach for the codepoint when writing documentation, the UTF-8 bytes when you're estimating storage, the JavaScript escape when patching source files, and the HTML entity when templating engines won't pass raw Unicode through.

Surrogate pair issue

Input

JS String.length of '😀'

Output

String.length = 2 (not 1!)\nReason: surrogate pair

JavaScript measures strings in UTF-16 code units, and any codepoint above U+FFFF needs two of them — a surrogate pair. That's why most emoji return a length of 2 instead of 1, and why Array.from(str).length is the safer way to count actual characters.

Chinese character

Input

中 (Chinese 'middle')

Output

Codepoint: U+4E2D\nUTF-8: E4 B8 AD (3 bytes)\nUTF-16: 4E 2D (2 bytes)

BMP characters (anything below U+10000) take three bytes in UTF-8 but only two bytes in UTF-16. Since JavaScript uses UTF-16 internally, String.length reports the expected 1 for this character.

Tips & Best Practices for Unicode Converter

1.Treat UTF-8 as the default for any new file or API you build. It's the web standard, it handles every codepoint, and it sidesteps the parade of incompatibilities that Latin-1 and other legacy encodings keep alive.
2.Excel and Word sometimes save text and CSV files with a leading byte-order mark (U+FEFF) even though UTF-8 doesn't require one. Strip that BOM at the boundary, otherwise the first column header in your import quietly carries a phantom character.
3.In JavaScript, Array.from(str).length is the right way to count visible characters because it iterates by codepoint. Plain String.length returns UTF-16 code units, which means emoji and other non-BMP characters report inflated lengths.
4.Encoding bugs are almost always a three-way mismatch. The file is saved one way, the Content-Type header or meta charset declares another, and your code reads it as a third. Spot which step disagrees and the mojibake usually clears up.
5.For IDNs, decide on a canonical form and store everything that way. Punycode is great for databases and DNS lookups, while Unicode is friendlier in user-facing displays — pick one for storage and convert at the edges.
6.Emoji art varies wildly across platforms. Apple, Google, Microsoft, and Twitter all draw the same codepoint differently, so any test that depends on a specific rendering is fragile. The codepoint itself is what's portable.

Frequently Asked Questions

Unicode is the standard that gives every character in every writing system — Latin, Cyrillic, Chinese, Arabic, math symbols, emoji — its own unique number called a codepoint. The standard currently defines more than 149,000 characters. UTF-8 is the most common encoding and uses one to four bytes per codepoint, while UTF-16 is what JavaScript uses internally.

Unicode Converter

About Unicode Converter

How to Use Unicode Converter

Enter text or codepoint

View all representations

Copy the format you need

Inspect emoji structure

When to Use Unicode Converter

Embedding special characters in code

Debugging encoding issues

Embedding emoji in HTML

Working with internationalized domain names

Unicode Converter Examples

Emoji to codepoint

Surrogate pair issue

Chinese character

Tips & Best Practices for Unicode Converter

Frequently Asked Questions

Related Tools

Morse Code Converter

Base64 Encoder

Base64 Decoder

Hex to Text

Binary Text Converter

Text to Binary Converter