Characters, Bytes, and Code Points: Why String Length Is Never Simple

Pop quiz. What does "hello".length return in JavaScript? Five, obviously. Now what does "cafe\u0301".length return? If you said 5, you're right. The string looks like "cafe" with an accent on the e, rendering as "caf\u00e9." But it's 5 characters, not 4, because the accent is a separate combining character. And "caf\u00e9".length returns 4, even though it looks identical on screen. Two strings that look the same, render the same, and compare as equal in some contexts have different lengths. Welcome to Unicode. This is why building a character counter -- the kind you'd use for checking tweet length or meta description limits -- is surprisingly non-trivial once you step outside ASCII. Characters vs. code points vs. grapheme clusters The word "character" is ambiguous in computing. There are at least three things it can mean: Code units are the individual values in a string's underlying encoding. In JavaScript (UTF-16), each code unit is 16 bits. String.length returns the number of UTF-16

Characters, Bytes, and Code Points: Why String Length Is Never Simple

Related Articles

The Hidden Complexity of Citation Formatting (And Why I Automated It)

The Widmark Formula: How BAC Is Actually Calculated

Three Ways to Talk to Claude Remotely When You’re Not at Your Desk

The Anatomy of a Good Box Shadow (and Why Most Look Fake)

How to Use Google Stitch to Turn Design Systems into Production-Ready UI

Related Articles

How-To
The Hidden Complexity of Citation Formatting (And Why I Automated It)
Dev.to Beginners • 2h ago

How-To
The Widmark Formula: How BAC Is Actually Calculated
Dev.to Tutorial • 2h ago

How-To
Three Ways to Talk to Claude Remotely When You’re Not at Your Desk
Medium Programming • 2h ago

How-To
The Anatomy of a Good Box Shadow (and Why Most Look Fake)
Dev.to Tutorial • 3h ago

How-To
How to Use Google Stitch to Turn Design Systems into Production-Ready UI
Medium Programming • 5h ago