There is a particular kind of frustration that comes from a well-designed system that still underperforms. The architecture is sound. The code is clean. Everything scales. And yet — the numbers tell a different story. In 2010, I found out why.
Everything Looked Right on Paper
The system was a core Java-based online transaction processing platform — OLTP — processing 300 transactions per second at peak load. The architecture was multi-tiered, with each layer capable of scaling independently based on demand. It handled spikes. It was resilient. From a design standpoint, there was very little to fault.
But system speed and stability were consistently below expectation. Response times were erratic under load. Transaction processing would stall intermittently — not long enough to trigger an outage, but long enough to accumulate into a real performance problem in a system where every millisecond of delay compounds across hundreds of concurrent transactions.
When I looked deeper, the answer was both obvious in retrospect and completely invisible in the moment. The JVM configuration and garbage collection settings had never been touched. The system was running entirely on defaults. In a high-performance application processing hundreds of transactions per second, this is roughly equivalent to precision-engineering a racing engine and leaving the fuel mixture at factory spec.
Switching to G1GC
The first meaningful decision was switching to the G1 Garbage Collector (G1GC). It was relatively new at the time — introduced as experimental in Java 6 and not yet mainstream — but its design philosophy was exactly right for an OLTP workload.
Traditional garbage collectors treated the heap as a monolithic block, cleaning it in sweeping passes that could halt all application threads. G1GC takes a different approach: it divides the heap into equal-sized regions and works on them incrementally, concurrently, in small controlled bursts. Rather than triggering one large Full GC that stops the world, it cleans sections of the old generation continuously, often deferring the need for a Full GC entirely.
For a system where even a 200ms pause means dozens of delayed transactions, this was the right trade-off. The catch: more tuning knobs is not the same as simpler tuning. G1GC gave us more control and more complexity in equal measure. Finding the right configuration took sustained effort across four distinct dimensions.
Four Dimensions of the Challenge
Latency vs. Throughput
OLTP systems demand that each individual transaction completes quickly. Any pause — even a brief one — compounds across hundreds of concurrent transactions, creating backlogs that ripple through the entire system. Maximising throughput and minimising latency pull in opposite directions. Tuning for one degrades the other. Finding the right balance for a specific workload profile requires measurement, not intuition.
Memory Management at Scale
At 300 TPS, the application required large heap sizes to hold the working set of concurrent transactions. Large heaps make garbage collection more expensive — more memory to scan, more objects to evaluate. Without precise tuning, extended pause times during GC cycles would periodically halt transaction processing entirely. Getting the young-to-old generation ratio right was critical and non-obvious.
Predicting Load Changes
OLTP systems do not run at constant load. Traffic varies by time of day, day of week, month-end cycles, promotional events. GC settings that perform well at average load may degrade severely under peak load — or waste resources during quiet periods. The configuration needed to remain stable and responsive across the full load spectrum, which took significant trial and error to achieve.
There Is No Universal Configuration
Garbage collection has no one-size-fits-all solution. Every system has a unique object allocation pattern, memory profile, and latency requirement. With G1GC specifically, the number of configurable parameters — pause time targets, region sizes, initiating heap occupancy thresholds — means that configuration from another system is only a starting point. The right settings have to be found through continuous monitoring and evidence-based adjustment.
Minor GC vs. Full GC: The Distinction That Matters
To tune effectively, you have to understand the two fundamentally different collection events that G1GC — like all generational collectors — performs. They are not just different in scale. They are different in cause, behaviour, predictability, and impact on a running system.
Young Generation Collection
Old Generation Collection
G1GC, good configuration can reduce frequency significantly — but never eliminate it entirely.
G1GC's core contribution to this problem is its ability to defer Full GC by incrementally collecting the old generation in concurrent background threads. By processing small regions of old-generation memory continuously, it keeps the heap clean enough that a Full GC becomes an infrequent last resort rather than a regular occurrence. In our OLTP system, shifting from frequent Full GCs to rare ones was where most of the performance improvement came from.
Finding the Balance
After sustained monitoring, adjustment, and evidence-based tuning — iterating on young-to-old generation ratios, pause time targets, region sizes, and heap occupancy thresholds — the system stabilised. The improvement was significant and consistent. Transaction processing became smooth. The erratic behaviour that had plagued the system disappeared.
But the deeper lesson was about process, not configuration. JVM tuning is not a one-time exercise you complete and move on from. The right configuration today may need revisiting when transaction volume grows, when the data model changes, when a new release alters the object allocation pattern. Monitoring is not an afterthought — it is the ongoing work.
What this experience made clear is that architectural correctness and runtime correctness are separate concerns. A well-designed system running on misconfigured infrastructure is still a poorly performing system. The two have to be right together.
In high-performance systems, the most expensive problems accumulate quietly, beneath the architecture. The work of a performance engineer is to find them before the system does.
Understand your garbage collector. Choose it deliberately. Monitor it continuously. The defaults were designed for the general case. Your system is not the general case.