Comparing Two YAML Files: A Guide to YAML Diffing
A line-by-line text diff lies about YAML. Here's why, what a structure-aware diff shows instead, and how to compare two YAML documents reliably.
Why a text diff is the wrong tool
The diff in Git, your editor, or diff compares text line by line. YAML is unusually flexible about how the same data can be written, so a text diff lights up with changes that aren't really changes:
- Mapping keys can be reordered freely — a mapping is unordered — yet a text diff flags every reorder.
- The same value can be written as a plain scalar, quoted, or in flow style (
[a, b]vs a block list), all identical to a parser but different as text. - Comments and blank lines aren't data, but a text diff treats them as content.
- Re-indenting from 2 to 4 spaces, or normalizing it, changes every line textually while changing nothing semantically.
The result is a noisy diff that buries the one real change among a dozen cosmetic ones.
What a structure-aware diff does instead
A proper YAML diff parses both documents into data — maps, sequences, and scalars — and compares the data, not the text. Because it works on structure, it normalizes away everything that doesn't matter and reports only genuine differences:
- A key that exists on one side but not the other (added or removed).
- A value that changed — and, importantly, whether its type changed (a string that became a number).
- List items that were added, removed, or reordered (sequence order is significant).
Consider these two configs. A text diff shows several changed lines; a structure-aware diff reports exactly one change — the port.
server:
host: localhost
port: 8080
debug: true
# app config
debug: true
server:
port: 9090
host: localhost
The reordered keys, the added comment, and the rearranged block are correctly ignored; only port: 8080 → 9090 is a real difference.
A pragmatic pre-step: normalize, then diff
If you only have a text diff, you can get much of the benefit by canonicalizing both files first. Parse each and re-emit it with sorted keys and a fixed indent — for example, load with PyYAML and dump with sort_keys=True — then run the text diff on the normalized output. That removes key-order and indentation noise, leaving only value changes. Converting both to JSON and diffing that is another quick way to strip YAML's surface variability (see the YAML-to-JSON guide).
Diffing YAML programmatically
In Python, parse both files and compare the resulting structures, or use a library like deepdiff for a detailed, type-aware report:
import yaml
from deepdiff import DeepDiff
a = yaml.safe_load(open("old.yaml"))
b = yaml.safe_load(open("new.yaml"))
print(DeepDiff(a, b)) # added, removed, and changed values with paths
Because safe_load produces plain dicts and lists, this comparison is naturally key-order-insensitive and reports type changes — exactly the structure-aware behavior you want. To track comments and formatting too, parse with ruamel.yaml instead.
Compare two YAML files online
Paste both documents and get a structure-aware comparison — added, removed, and changed values only, with key order and formatting normalized. Runs in your browser.
Open YAML Diff →Frequently Asked Questions
Why not just use a normal text diff for YAML?
A text diff compares lines, so reindenting, reordering map keys, adding comments, or rewriting a value in flow style all appear as differences even when the parsed data is identical. A structure-aware YAML diff parses both documents and compares the data, reporting only real changes.
Does the order of keys matter when comparing YAML?
For mapping keys, no — a YAML mapping is an unordered set of pairs, so reordering keys doesn't change the data and a correct diff ignores it. For sequence (list) items, order is significant, so a reordered list is a real change.
Do comments show up in a YAML diff?
It depends on the tool. Comments aren't part of the parsed data, so a structure-aware diff comparing parsed values ignores comment-only changes. To track comments, use a text diff on canonicalized files or a comment-preserving parser like ruamel.yaml.
How can I compare YAML when I only have a text diff?
Canonicalize both files first — parse and re-emit each with sorted keys and a fixed indent, or convert both to JSON — then run the text diff on the normalized output. That removes key-order and formatting noise so only value changes remain.