How I Built a Distributed Payment System from Scratch (And What It Taught Me)
Abubakkar Siddhiq6 min read·Just now--
A honest account of building something I had no business building, and finishing it anyway.
Why I Built This
I’ve always been fascinated by how money moves on the internet.
When you tap “Pay” on GPay, something happens in milliseconds. Money leaves your account, reaches someone else’s, a notification fires, a fraud check runs. All simultaneously, all reliably, even when things go wrong. I never really understood how.
So I decided to stop wondering and start building.
This isn’t a tutorial. It’s the Story of building a Distributed Payment Processing System. The kind of infrastructure that sits underneath apps like Razorpay or PayPal and the hard lessons I learned along the way.
What I Set Out to Build
Not a UPI clone. Not a wallet app. The engine underneath.
A system that could:
- Move money reliably between accounts
- Never process the same payment twice
- Detect and respond to fraud in real time
- Survive crashes without losing a single transaction
Six microservices. Five Kafka topics. Three databases. One month.
The Architecture (And Why I Made These Choices)
Why microservices?
Honestly? I could have built this as a monolith. For a personal project, a monolith would have been faster.
But I wanted to understand something specific, what happens when services need to communicate without calling each other directly? What happens when one service goes down? How do you guarantee a payment notification fires even if the notification service was offline for an hour?
These are questions you can only answer by building distributed systems.
The Services (What Each One Actually Does)
Most microservices tutorials show you five boxes in a diagram and call it architecture.
Here’s what each service actually does and why it exists as its own thing.
API Gateway — the front door. Every request hits here first. JWT
validated, rate limited, role checked, then routed. Internal services
never see an unauthenticated request. They trust the X-User-Id header
Gateway injects because only Gateway can set it.
Auth Service — issues identity. Register, login, JWT. That’s it.
When a user registers, it publishes a `user-registered` event to Kafka.
Payment Service picks that up and creates an account automatically.
No HTTP call between services. No coupling. Auth Service doesn’t know
Payment Service exists.
Payment Service — the brain. Every interesting problem lives here.
Double-entry ledger, idempotency, pessimistic locking, outbox pattern,
saga compensation. If this service gets something wrong, money is lost
or duplicated. There’s no margin for error here.
Fraud Service — the watchdog. Consumes every payment event and runs
it through a rules engine. High value transaction? Flag it. Round number?
Flag it. Self-transfer? Flag it. Flagged transactions go into a review
queue. Admins approve or reject. Rejection triggers a compensation event
back to Payment Service reverse the transaction, freeze the account.
Notification Service — the simplest service and the most
underappreciated. One job: consume payment events and notify the user.
It’s separate from Payment Service for one reason. If your email
provider is having a bad day, payments should still work. Decoupling
failure domains is the point.
Why Kafka over RabbitMQ?
Early on I asked myself this question. Both are message brokers. Both could work.
The difference came down to one scenario: what if my Fraud Service goes down for 30 minutes? With RabbitMQ fanout, those payment events are gone. With Kafka’s message retention, Fraud Service catches up on restart. Every event processed. Nothing lost.
For a payment system, losing events isn’t an option. Kafka won.
The Hard Problems (This Is The Good Part)
Problem 1: What if the same payment request is sent twice?
Network timeouts are real. A client sends a payment request, the network drops, the client retries. Without protection you charge someone twice.
The solution is Idempotency. Every payment request carries a client-generated key. Before processing, I check Redis:
- Key exists with SUCCESS? Return same response. Don’t process again.
- Key exists with PROCESSING? Another request is in flight. Return 409.
- Key doesn’t exist? Process it. Store key with PENDING. Update to SUCCESS or FAILED.
Simple concept. Surprisingly tricky to implement correctly. The atomic setIfAbsent operation in Redis was the key. without it, two concurrent requests could both pass the check simultaneously.
Problem 2: What if two payments hit the same account at the same time?
Imagine User A has ₹1000. Two payment requests arrive simultaneously, Both want to deduct ₹800. Both read the balance as ₹1000. Both see enough funds. Both process. User A’s balance goes to -₹600.
Pessimistic locking fixed this. When I fetch an account for payment processing, I lock that DB row. The second request waits. When the first commits, the second reads the updated balance ₹200 and correctly fails.
One annotation. @Lock(LockModeType.PESSIMISTIC_WRITE). Weeks of potential bugs prevented.
Problem 3: What if my service crashes after saving the transaction but before publishing to Kafka?
This one kept me up at night.
Payment saved to DB. Then crash. Kafka event never fires. Fraud Service and Notification Service never know the payment happened. Silent data loss.
The Outbox Pattern solved this. Instead of publishing to Kafka directly, I write the event to an outbox table in the same database transaction. A separate scheduler reads unpublished events and publishes them. Since the DB write and the outbox write happen in the same transaction either both succeed or neither does.
The scheduler handles Kafka. If Kafka is down, the scheduler retries. No events lost. Ever.
Problem 4: How do you reverse a payment after it’s already processed?
Fraud detection is async. By the time I detect a suspicious transaction, the money has already moved.
You can’t just delete the transaction. Financial systems never delete. Instead, I implemented the Saga pattern. When Fraud Service confirms fraud, it publishes a compensation event. Payment Service consumes it and creates a compensating transaction. credit the sender back, debit the receiver, freeze the account.
Two transactions. Books balance. Full audit trail. No data deleted.
What I Got Wrong (And Fixed)
Using float for money. Floating point arithmetic gives you things like 0.1 + 0.2 = 0.30000000000000004. In a payment system, that's unacceptable. Switched to BigDecimal everywhere. Lesson: never use floating point for money.
Publishing directly to Kafka from within @Transactional. Kafka publish happens outside the DB transaction. If the transaction rolls back, the Kafka event is already out. The Outbox Pattern fixed this, but I had to understand the problem first by making the mistake.
Checking idempotency after building the transaction object. Early version built the transaction, then checked idempotency, then processed. Wrong order. Idempotency check goes first before any work is done.
Returning null instead of Optional. Classic Java mistake. Switched every repository method to return Optional and forced myself to handle the empty case explicitly.
The Security Thinking
Building a payment system forces you to think about security differently.
Every design decision becomes a security question:
- Who owns this account? (Ownership validation on every write)
- Can someone fake the user identity header? (Strip untrusted headers at Gateway, inject verified ones)
- What if someone brute forces login? (Rate limiting 5 requests/minute per IP)
- What if someone decodes the JWT and changes the user ID? (JWT signature validation can’t forge without the secret)
- Can someone see other users’ balances? (Account number abstraction UUIDs stay internal)
Security isn’t a feature you add at the end. It’s a question you ask at every step.
What I Learned
Distributed systems fail in interesting ways. Not “server is down” failures. Subtle failures. Message delivered twice, event lost between two healthy services, race condition that only appears under load. Building for failure is the job.
Complexity compounds. Every pattern I added Outbox, Saga, Idempotency solved a real problem but added surface area. There’s a reason experienced engineers say “start simple.” I added complexity deliberately, one problem at a time.
The questions matter more than the answers. The best moments weren’t when I solved problems. They were when I asked the right question.
“What happens if this service crashes here?” led to the Outbox Pattern.
“What if two requests arrive simultaneously?” led to pessimistic locking.
Learning to ask better questions is the real skill.
Understanding beats memorizing. I could have copied a microservices tutorial. Instead I built it from the problem up. Understood why Kafka, why Redis, why separate databases. That understanding is what I can take into any system, any language, any problem.
Thanks for Reading 💖
If this post resonated or you have questions about any of the patterns kindly do reach out. I’m always happy to talk distributed systems.
GitHub: https://github.com/Abubakkar-Siddhiq/payment-system/
Tech: Spring Boot 4.x · Kafka · Redis · PostgreSQL · Docker · Grafana LGTM