Why bytes vs. characters?

Many systems impose limits in bytes, not characters. Database VARCHAR(100) means 100 bytes, not 100 characters. SMS messages are limited to 160 bytes (when 7-bit) or 70 characters (when UCS-2 for emoji/non-Latin). HTTP headers, cookies, file metadata all have byte limits. Byte counting catches these.

How does the counter calculate UTF-8 bytes?

ASCII characters (a-z, 0-9, basic punctuation) are 1 byte each. Latin-1 extensions (é, ñ, ü) are 2 bytes. Most BMP characters (CJK, Cyrillic, Arabic) are 3 bytes. Supplementary plane characters (rare emoji, ancient scripts) are 4 bytes. The counter uses TextEncoder.encode().length for accurate UTF-8 byte counting.

What about UTF-16 vs UTF-8?

JavaScript stores strings as UTF-16 internally (each char is 2 or 4 bytes). UTF-8 is the web standard for transmission and storage. The counter shows UTF-8 bytes (most common limit context). For UTF-16 byte count, multiply character count by 2 for BMP, 4 for supplementary plane chars.

When do bytes matter most?

Database fields (column size limits in bytes), HTTP headers (typical 8KB limit), cookies (4KB per cookie), URL path/query (~2000 byte typical), email subject line (~78 character/byte recommendation), file format metadata (ID3 tags, XMP), embedded scripts, JSON Web Tokens (size affects everything that includes them).

Are emoji counted differently?

Yes. A simple emoji like 😀 is 4 bytes in UTF-8 (1 character). Compound emoji (👨👩👧👦 family) are many more bytes (uses ZWJ joiners). The byte counter accurately reflects what storage will use. The character counter shows the visual character count which is usually what users perceive.

Why is my SMS message hitting limits early?

SMS uses GSM-7 encoding (7-bit) for 160 chars per message. Including ANY character outside GSM-7 (most emoji, accented characters, em-dashes) switches to UCS-2 encoding (16-bit), reducing limit to 70 chars per message. The byte counter helps explain why a message you thought was short is exceeding limits.

Is the data saved anywhere?

No. All counting happens in your browser. Your text never leaves your device — safe for any sensitive content.

Byte Counter

Calculate string byte size in UTF-8, UTF-16, and other encodings online. Free byte counter for measuring text payload and storage.

Calculators

Instant results

Input Text

Characters

Code Points

ASCII Chars

Byte Size by Encoding

UTF-8(Most common for web)

0bytes

0 Bytes

UTF-16(JavaScript internal)

0bytes

0 Bytes

UTF-32(Fixed width)

0bytes

0 Bytes

Encoding Notes:

UTF-8: 1-4 bytes per character, efficient for ASCII
UTF-16: 2-4 bytes per character, used by JavaScript/Java
UTF-32: Fixed 4 bytes per character, simple indexing
Emojis and CJK characters use more bytes than ASCII

About Byte Counter

Calculate the byte size of text in different character encodings. Useful for understanding string storage requirements and optimizing data transmission.

How to Use Byte Counter

Paste your text

Paste or type the text you want to measure. Byte size updates instantly as you type, calculated as UTF-8 encoding.

Compare bytes vs characters

The counter shows both: 'characters' (visual count, what users see) and 'bytes' (storage/transmission size). Each emoji is 4 bytes; each accented character is 2-3 bytes.

Apply to your context

Use the byte count when working with byte-limited contexts: SMS, database VARCHAR limits, HTTP headers, cookies, JWT tokens, file metadata.

Optimize if needed

If exceeding limits, reduce by: removing emoji (each saves 4 bytes), simplifying accented chars (Café → Cafe saves 1 byte), removing unnecessary spaces, using ASCII-only when possible.

When to Use Byte Counter

Database field size validation

VARCHAR(N) and TEXT fields have byte limits (not character limits) in many databases. Before saving multi-language or emoji-containing text, verify it fits in the allocated byte space. PostgreSQL's varchar(255) allows 255 bytes which might be only 60 Chinese characters or fewer with emoji.

HTTP header and cookie size

HTTP headers typically limit to 8 KB total per request; cookies to 4 KB each. When storing user preferences, session data, or auth tokens in cookies, calculating exact byte size prevents unexpected request failures or cookie rejections by browsers.

SMS encoding optimization

GSM-7 encoding (Latin alphabet) gives 160 chars per SMS = 160 bytes. UCS-2 encoding (used when message contains non-Latin chars or emoji) gives 70 chars but uses 140 bytes. The byte counter helps optimize SMS content to use cheaper GSM-7 when possible.

JWT token size tracking

JWT tokens are sent in every request (Authorization header). Each byte adds to request size. Tokens with many claims, custom data, or namespace-prefixed claims grow quickly. Tracking byte size helps balance feature richness vs. request overhead.

Byte Counter Examples

ASCII text

Input

Hello, World!

Output

Bytes: 13\nCharacters: 13

Simple ASCII text: each character is 1 byte in UTF-8. Bytes equal characters for plain English text. Standard messaging, code identifiers, URL paths typically use only ASCII.

Accented characters

Input

Café résumé

Output

Bytes: 13\nCharacters: 11

Café and résumé contain é characters, each 2 bytes in UTF-8. So 11 visible characters require 13 bytes. Important when storing in fields with byte limits or transmitting via byte-limited channels.

Emoji and CJK

Input

Hello 👋 你好

Output

Bytes: 17\nCharacters: 9

Emoji 👋 is 4 bytes; Chinese characters 你好 are 3 bytes each. The 9-character message requires 17 bytes total. Critical for SMS encoding decisions, database field sizing, and any byte-limited context.

Tips & Best Practices for Byte Counter

1.Use byte count, not character count, when limits are specified in bytes. Database fields, cookies, HTTP headers, and many file format metadata fields all use byte limits.
2.For SMS, prefer ASCII when possible to use cheaper GSM-7 encoding. A message with ANY non-GSM character (most emoji, accents) switches to UCS-2 doubling the byte cost.
3.Watch for character normalization differences. 'é' (U+00E9) is 1 codepoint (2 bytes UTF-8); 'é' decomposed (e + combining acute U+0301) is 2 codepoints (3 bytes UTF-8). Visually identical but different bytes.
4.ZWJ (zero-width joiner) emojis like 👨‍👩‍👧‍👦 (family) compose multiple emojis with joiners, reaching 25+ bytes. They appear as single 'characters' visually but are surprisingly large in storage.
5.When designing database schemas, account for UTF-8 worst case: typical 3 bytes per BMP character. A varchar(100) for user names handles ~33 Chinese names but ~100 ASCII names. Plan for the worst case.
6.For cookies/headers approaching limits, consider compressing values. Base64-encoded gzipped JSON is often 30-50% smaller than raw JSON for typical values.

Frequently Asked Questions

It counts the byte size of text — how many bytes the text occupies when stored as UTF-8 (the standard web encoding). Different from character count: 'café' is 4 characters but 5 bytes (the é is 2 bytes in UTF-8). Critical when working with byte-limited contexts: HTTP headers, file size limits, database VARCHAR limits.

Byte Counter

Byte Size by Encoding

About Byte Counter

How to Use Byte Counter

Paste your text

Compare bytes vs characters

Apply to your context

Optimize if needed

When to Use Byte Counter

Database field size validation

HTTP header and cookie size

SMS encoding optimization

JWT token size tracking

Byte Counter Examples

ASCII text

Accented characters

Emoji and CJK

Tips & Best Practices for Byte Counter

Frequently Asked Questions

Related Tools

Data Size Converter

Temperature Converter

Length Converter

Weight Converter

Area Converter

Volume Converter