When memory usage grows, Java garbage collection slows. This historic weakness of the platform was a sticking point for Terracotta’s Ehcache distributed in-memory cache project. So the company opened the doors yesterday to a public beta test of BigMemory for Enterprise Ehcache, which aims to smooth the final wrinkles in a solution the company hopes to have ready for a mid-October release.
Amit Pandey, CEO of Terracotta, said that the Ehcache team found Java garbage collection to be an issue. Rather than spend their time helping each customer configure his or her own application server, the Ehcache team decided to fix the problem at its root.
“Our distributed cache is used by customers who have applications that need a fair amount of scale,” said Pandey. “One of the things we almost always run into in the field is that they want to make the caches as big as possible. Of course, when our heap exceeds a certain size—six or eight gigabytes—we start to run into the garbage collection tuning issues just like any other Java process.
“Rather than spending time tuning garbage collection, our team said, ‘Since it’s a hash map in the cache, it would not be that hard for us to write our own memory manager.’ It turned out that it took them a whole year.”
Despite its label as a distributed cache, Ehcache can now scale across large swaths of memory in a single machine, said Pandey. This non-distributed usage model is more compelling to some developers, he said. While the world outside screams for cloud services hosted across hundreds of machines, he said that many Terracotta customers are still scared of committing to such architectures. For those users, scaling across 100GB of memory on a single box can mean a great deal of development time-savings.
“Generally there is certainly a massive push for trying to get—not necessarily distributed—but get as much data into memory as possible, and to get as much hardware density under that as possible,” said Pandey.
“We have customers who say, ‘I’d love to put it on one or two boxes and be done with it.’ We see that as a fantastic use case for BigMemory. You can put it on two boxes and you don’t have to run 200 instances of VMware. If each instance can take 100 to 200GB of memory, what more do you need?”
That means developers looking to scale don’t have to be experts in distributed systems, said Pandey.
“It takes a fair bit of effort to get a person across the line from ‘I have a monolithic server, now you’re asking me to share data between instances of my application? Will I run into concurrency issues? Is my network fast enough?’ Those are the questions of distributed computing,” he said.
“A lot of the trends around big data are pushing people to get big and distributed. That’s wonderful and that’s our bread and butter, but the reason we’re so excited about BigMemory is we felt this could be used by people who are afraid of being distributed.”