Implementing Merkle Trees: A Comparative Guide in TypeScript vs. Rust — Part I
Rout Chaitanya4 min read·Just now--
This is a multi part series on Merkle Trees. A Merkle Tree, also known as a binary hash tree, is a fundamental data structure in cryptography and computer science used to efficiently verify the integrity and consistency of large sets of data.
The Problem
Imagine you’re sending a massive, 1,000-page digital manuscript to an editor. You want to make sure that not a single comma was changed during the transfer.
The Magic of Hashing
In a perfect world, you’d read both copies side-by-side, but that’s exhausting. Instead, you use Hashing. Think of a hash function as a high-tech blender: you throw in your 1,000 pages, and it spits out a unique, fixed-length string of characters — a “digital fingerprint.”
If even one tiny letter is changed in that manuscript, the blender will spit out a completely different fingerprint. By comparing your fingerprint with the editor’s, you can verify the data’s integrity in seconds. If the fingerprints match, the data is identical.
The “Big Data” Problem
This works great for one file. But what if you aren’t sending one manuscript? What if you’re a bank sending one million individual transaction records every hour?
If the editor finds a fingerprint mismatch, they have to tell you, “Something is wrong somewhere in these million records.” To find the error, you’d have to re-send every single record or hash them one by one all over again. It’s slow, it’s expensive, and it wastes massive amounts of bandwidth.
Enter the Merkle Tree
This is where the Merkle Tree saves the day. Instead of tossing the whole pile into one blender, you hash each transaction individually. Then, you pair those hashes up and hash them together. You keep pairing the results until you are left with just one “Master Hash” at the top (the Merkle Root).
The beauty of this structure is its surgical precision. If one record is tampered with, it only changes the hashes in its direct path to the top. This allows the receiver to say, “I see the Master Hash is wrong, and I can trace the ‘broken’ branch down the tree to find the exact record that failed.”
The Search Process: A Binary Investigation
Imagine you have a tree with 8 pieces of data at the bottom (the leaves), and the Master Hash (the Root) tells you there is an error. Here is how you find it:
- Start at the Top: You compare the Root Hash you have with the Root Hash the sender provided. They don’t match.
2. Check the Next Level Down: The Root was created by hashing two “Branch Hashes” together (Left Branch and Right Branch). You check these two. If the Left Branch hash matches the sender’s record but the Right Branch doesn’t, you know with 100% certainty that the error is hidden somewhere in the Right half of the data. You can now ignore the entire Left half of the dataset.
3. Repeat the Split: You move down to the “children” of that broken Right Branch. Again, you check the two hashes that formed it.
4. Pinpoint the Leaf: You keep following the “mismatched” hashes down the levels.
Why this is fast: The “Power of 2”
Because the tree splits in two at every level, the number of steps you have to take is incredibly small compared to the size of the data. This is known as Logarithmic complexity, or O(log n).
If you have 1000 transactions, you only need about 10 checks to find the broken one.
If you have 1M transactions, you only need about 20 checks.
Real life scenarios
1. Blockchain Light Clients
You want to know if your specific transaction (Transaction A) is actually in a block, but you don’t want to download 5,000 other transactions. You ask a Full Node (a server that stores everything). You say: “Hey, is Transaction A in Block #800,000?” The Full Node doesn’t send the whole tree. It sends you a Merkle Proof — a tiny list of sibling hashes. You hash your transaction, then hash it with those siblings. If the result matches the Merkle Root (which you already have from the trusted block header), you are done.
Wait, what if the Full Node lies? They can’t. If they give you a fake proof or a fake transaction, your final calculation won’t match the Merkle Root. The integrity being solved here is Authentication: “Is this specific piece of data part of the set I trust?”
2. P2P Databases
BitTorrent or a distributed database like Cassandra. You are downloading a 10GB file. The file is split into 1,000 chunks. You finish the download, check the Root Hash, and it doesn’t match. Something is corrupted. Instead of re-downloading 10 GB, you “trace the broken branch”: You ask a peer for the two hashes just below the root (Left Child and Right Child). You calculate the hashes of your local data for the left half and right half. Your left half matches, but your right half doesn’t. Bingo!! The error is in the right 5GB. You repeat this for the children of the Right Child. Within 10 “rounds” of asking for hashes (2^¹⁰ = 1024), you find the exact 10MB chunk that is corrupted. The integrity being solved here is Efficiency: “I know my data is broken; help me find exactly which 1% is wrong so I don’t have to fix the other 99%.”
Stay tuned for the next article, we will do a simple implementation of merkle trees, proof generation and validation in both rust and typescript. Please reach out, I’m happy to help always.