Start now →

A Microservice That Teaches Everything Not to Do

By Mobin Shaterian · Published May 3, 2026 · 7 min read · Source: DataDrivenInvestor
Mining
A Microservice That Teaches Everything Not to Do

Every engineer eventually encounters a system that becomes a living catalog of anti-patterns. Recently, I reviewed a microservice responsible for parsing large measurement files and storing the extracted results in a database. The problem itself is straightforward: read files, extract measurements, enrich the data, and persist it. Unfortunately, the implementation demonstrates how a relatively simple pipeline can become fragile, inefficient, and operationally dangerous when basic distributed-systems principles are ignored.

This article walks through several architectural mistakes found in this service and explains what better alternatives would look like.

The Problem the Service Was Supposed to Solve

The pipeline’s intended workflow is simple:

  1. Large measurement files are uploaded.
  2. A microservice reads the files.
  3. The service parses measurements.
  4. Data is enriched with reference tables.
  5. Results are stored in ClickHouse for analytics.

The service runs as multiple pods in Kubernetes to scale horizontally. In theory, this architecture should allow parallel processing of files and high ingestion throughput.

In practice, the system fights its own design.

1. Random Sleep Instead of Concurrency Control

The first red flag appears immediately when the service starts processing files. Each worker begins with a random delay measured in seconds. The intention was to reduce the chance that multiple pods would pick the same file simultaneously.

This approach reveals a misunderstanding of concurrency in distributed systems. Random delays are not synchronization mechanisms. They only reduce the probability of collisions; they never eliminate them.

In distributed systems, coordination must be explicit. Common solutions include:

For example, a simple and reliable pattern is to maintain a job table:

Without deterministic coordination, race conditions are guaranteed eventually.

2. Using NFS as a Coordination Mechanism

Instead of using object storage, the system relies on a shared NFS mount for file management.

NFS can work for simple shared storage, but it is poorly suited for distributed event pipelines. It provides weak guarantees around concurrent file operations and becomes a bottleneck under heavy parallel workloads.

Modern systems typically use object storage, such as:

Object storage provides durability, versioning, event triggers, and scalability that NFS simply cannot match. It also integrates naturally with event-driven architectures.

Using NFS for a distributed ingestion pipeline is a common source of race conditions, performance degradation, and operational complexity.

3. File Renaming as a Locking Strategy

The service tries to prevent multiple workers from processing the same file by renaming it. When a pod starts processing, it appends a suffix like .zumbolize to the filename. Once processing finishes, the file is deleted.

This creates several problems:

First, the rename operation itself is not a reliable distributed lock. Multiple pods can still read the file list simultaneously and race to rename it.

Second, deleting the file after processing removes any ability to audit or reprocess data.

Third, there is no traceability. If processing fails halfway through, there is no reliable record of what happened.

A proper ingestion pipeline maintains clear state transitions, such as:

These states are persisted in a durable store. Files remain immutable artifacts rather than temporary coordination tools.

4. Data Modeling Contradictions

Another problematic area is the database schema.

The system stores measurements in a massive table that approaches 1 terabyte. However, enrichment is performed by joining with normalized reference tables designed using the third normal form (3NF).

Normalization is useful in transactional systems to eliminate redundancy. But analytical databases like ClickHouse are optimized for denormalized, columnar data.

Mixing OLTP normalization concepts with OLAP storage leads to two major issues:

In analytical pipelines, denormalization is often intentional. It improves query performance and simplifies downstream processing.

ClickHouse is especially optimized for wide tables with many columns. Trying to force a highly normalized schema into a columnar analytics database defeats its design advantages.

5. Storing Structured Data as Strings

The most painful design decision appears in how measurement metadata is stored.

The system builds a dictionary of key-value pairs, serializes it as a string, and then stores it in ClickHouse. Every downstream query must:

  1. Convert the string into JSON
  2. Query the fields
  3. Convert the result back to a string
  4. Store it again

This is computationally wasteful and unnecessary.

ClickHouse already supports multiple structured data types:

Serializing structured data into opaque strings eliminates:

It also dramatically increases CPU overhead during query execution.

In a columnar database designed for analytical workloads, flattening structured data into proper columns usually produces the best performance.

6. Database Access Anti-Pattern: Per-Row Queries and Connection Exhaustion

One of the most critical failures in the system is not architectural at a high level, but operational at the code level — and it directly impacts reliability.

A production incident exposed a severe flaw: the service opens a new ClickHouse connection and executes a full table scan for every single cell processed.

The failure manifests as:

What’s happening under the hood?

Inside the parsing pipeline:

This happens inside a loop over all cells.

So for a file with 1,000 cells:

This is not just inefficient — it is catastrophic under load.

Why does this fail in practice?

Databases are not designed for this access pattern. Specifically:

This is a textbook violation of a core principle:

Never perform external I/O inside a tight processing loop.

7. The Missing Abstraction: In-Memory Lookup

The fix is trivial, which makes the mistake more costly.

Instead of querying the database per cell, the system should:

  1. Fetch the reference data once per job
  2. Transform it into an in-memory structure
  3. Perform constant-time lookups during parsing

Concretely:

This transforms:

It also eliminates connection exhaustion.

8. Secondary Issues Amplifying the Problem

Several additional flaws made the situation worse:

These are not independent issues — they compound each other.

9. Lack of Observability and Idempotency

The architecture also lacks two key properties required for reliable pipelines:

Idempotency:
If a file is processed twice, the system should produce the same result without duplicating data.

Observability:
Operators should be able to answer simple questions such as:

Deleting files and avoiding job tracking make these questions impossible to answer reliably.

Final Thoughts

None of these mistakes individually would necessarily break a system. But together they create a pipeline that is fragile, inefficient, and difficult to operate.

The most striking part is that the system does not fail because the problem is complex. It fails because fundamental engineering principles are ignored:

When those fundamentals are violated, engineers compensate with random sleeps, filename tricks, and excessive database calls — until the system collapses under its own weight.

And that is often the most expensive mistake of all.


A Microservice That Teaches Everything Not to Do was originally published in DataDrivenInvestor on Medium, where people are continuing the conversation by highlighting and responding to this story.

This article was originally published on DataDrivenInvestor and is republished here under RSS syndication for informational purposes. All rights and intellectual property remain with the original author. If you are the author and wish to have this article removed, please contact us at [email protected].

NexaPay — Accept Card Payments, Receive Crypto

No KYC · Instant Settlement · Visa, Mastercard, Apple Pay, Google Pay

Get Started →