Base64 encoding explained: what it is, how it works, and when to use it

Base64 turns arbitrary binary data into plain text. Here's the 3-bytes-to-4-characters trick behind it, the 33% overhead, data URIs, URL-safe variants, and why it is not encryption.

By Muhammad Tahir7 min readdevencodingexplainer

Base64 is one of those things you use constantly without ever stopping to look at. It's in your email attachments, your JSON web tokens, your data-URI images, your SSH keys. And almost everyone who uses it has, at some point, mistaken it for encryption. It isn't. Here's what Base64 actually is, the surprisingly tidy math that makes it work, and when reaching for it is the right call.

The problem Base64 solves

Computers store everything as bytes — sequences of 8 bits, each byte a number from 0 to 255. An image, a ZIP file, a font: all just byte sequences. The trouble is that a lot of the systems we move data through were designed for text, not arbitrary bytes.

Email is the classic example. The original SMTP protocol assumed messages were 7-bit ASCII text. Bytes above 127, or control characters like a null byte (0) or a line-feed in the wrong place, could get mangled, stripped, or interpreted as protocol commands. Send a raw JPEG through a text-only channel and it arrives corrupted.

Base64 is the fix. It takes any binary data and re-expresses it using only 64 safe, printable characters — the uppercase letters A–Z, the lowercase letters a–z, the digits 0–9, and two symbols, + and /. Every one of these survives transit through systems that only expect text. The data is now "binary-safe": you can paste it into an email body, embed it in JSON, drop it into an XML attribute, or put it in a URL, and it comes out the other side unchanged.

The cost is size — Base64 output is bigger than the input. We'll get to exactly how much bigger, because it falls straight out of how the encoding works.

How the 3-to-4 mapping works

Here's the core idea. Base64 chews through your data 3 bytes at a time and emits 4 characters for each group.

Why 3 and 4? It's about bits. Three bytes is 3 × 8 = 24 bits. Each Base64 character represents exactly 6 bits (because 2⁶ = 64, and we have 64 characters). And 24 bits divides evenly into four 6-bit chunks: 4 × 6 = 24. So three input bytes map cleanly onto four output characters. No remainder, no waste — when the input length is a multiple of three.

Let's encode the word Cat by hand. It's three ASCII characters, which is perfect — exactly one group.

First, the ASCII byte values:

C = 67
a = 97
t = 116

Now write each as 8 bits:

C = 01000011
a = 01100001
t = 01110100

Concatenate all 24 bits into one stream:

01000011 01100001 01110100

Now regroup the same bits into chunks of 6 instead of 8:

010000 110110 000101 110100

Convert each 6-bit chunk back to a decimal number (0–63):

010000 = 16
110110 = 54
000101 = 5
110100 = 52

Finally, look each number up in the Base64 alphabet. The alphabet is indexed A=0, B=1, … Z=25, a=26, … z=51, 0=52, … 9=61, +=62, /=63. So:

16 = Q
54 = 2
5  = F
52 = 0

Putting it together, Cat encodes to Q2F0. Three bytes in, four characters out. You can verify this against the Base64 Encoder — type Cat and you'll get Q2F0 back.

What about padding?

That clean 3-to-4 split only happens when your input length is a multiple of 3. Most data isn't. So Base64 has a rule for the leftovers, and that rule is the = character.

  • If you have 1 leftover byte (8 bits), you pad it out to encode as 2 characters plus ==.
  • If you have 2 leftover bytes (16 bits), they encode as 3 characters plus a single =.

For example, the single letter C (one byte) encodes to Qw==. The pair Ca encodes to Q2E=. The = characters aren't data — they're a signal to the decoder about how many real bytes the final group represents, so it can discard the bits that were padding. That's why Base64 strings so often end in one or two equals signs.

The 33% size overhead

Now the math makes the overhead obvious. Every 3 bytes become 4 characters, and each character is itself one byte of text. So 3 bytes of input become 4 bytes of output: a ratio of 4/3, which is roughly 1.33. Base64 inflates your data by about 33%.

A 900 KB image becomes about 1.2 MB once Base64-encoded. This is the single most important practical fact about Base64: it is not free. People sometimes reach for it as if it were a neutral transformation, then wonder why their JSON payloads ballooned. If you're moving large binaries and the channel actually supports binary, send the raw bytes — don't pay the 33% tax for nothing.

Data URIs: embedding images directly in code

One genuinely useful place that 33% is worth paying is the data URI. A data URI lets you embed a file's contents directly inside HTML or CSS, no separate network request needed. The format looks like this:

data:[<mime-type>];base64,<base64-data>

A tiny red dot PNG might look like:

<img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAAB...">

This is handy for small assets — icons, a 1×1 tracking pixel, a small logo in an email template — because it removes an HTTP round trip and keeps everything self-contained. The tradeoffs: the 33% size increase, and the fact that data URIs can't be cached separately by the browser the way a real image file can. Use them for small, rarely-changing assets, not for your hero image. If you want to generate one from a picture, the Image to Base64 tool produces a ready-to-paste data URI right in your browser.

URL-safe Base64

Remember those two symbol characters, + and /? They cause trouble in URLs. In a URL, / is a path separator and + is sometimes interpreted as a space. So standard Base64 can break if you drop it into a query string or a path segment.

The fix is a small variant called URL-safe Base64 (defined in RFC 4648). It's identical to regular Base64 except:

  • + is replaced with - (hyphen)
  • / is replaced with _ (underscore)
  • The trailing = padding is often omitted entirely, since the length can be inferred

This is the variant you'll see in JSON Web Tokens (JWTs). Crack open a JWT and you'll notice the three dot-separated sections contain - and _ but no +, /, or =. That's URL-safe Base64 so the token can travel safely in an Authorization header or a URL.

Base64 is not encryption, and not compression

This is the part worth tattooing somewhere. Base64 is an encoding, not encryption.

Encoding means transforming data into a different representation using a public, reversible scheme. There's no key, no secret. Anyone who sees Q2F0 can decode it back to Cat in seconds — the algorithm is open and tools to reverse it are everywhere, including the encoder linked above. So Base64 provides zero confidentiality. If you "hid" a password by Base64-encoding it, you hid nothing. It's the equivalent of writing a secret in a different alphabet that everyone can read.

It's also not compression. Compression makes data smaller by finding and removing redundancy. Base64 does the opposite — it makes data 33% larger. The two are sometimes combined (compress first, then Base64-encode the compressed bytes for safe transport), but Base64 itself never reduces size.

So the mental model is clean:

  • Encryption = make data unreadable without a key (confidentiality).
  • Compression = make data smaller (efficiency).
  • Base64 = make binary data survive text-only channels (compatibility), at a 33% size cost.

When to actually use it

Reach for Base64 when you need to carry binary data through something that only speaks text:

  • Embedding small images or fonts in CSS, HTML, or email via data URIs.
  • Putting binary payloads inside JSON or XML, which have no native binary type.
  • Encoding tokens, signatures, or keys that need to live in headers or URLs (use the URL-safe variant).
  • Pasting a file's contents somewhere a file upload isn't available.

Avoid it when the channel already supports binary (just send the raw bytes), when the data is large (the overhead adds up fast), or when you actually need secrecy (use real encryption).

Base64 is a small, elegant trick: 24 bits, regrouped into four 6-bit chunks, mapped onto 64 printable characters. Once you've encoded Cat to Q2F0 by hand, it stops being magic and becomes one of the most predictable tools in your kit.