Monitoring solutions have been around for some time, but I still haven't found the perfect one. I first implemented Zabbix for Lucidchart in late 2011, and, just a few months ago, I installed Graphite. I'd like to take you through my decision process so you can find the right monitoring tool for your needs.
Comparing these two products is not easy, because they were designed to do different things. Zabbix was meant to be a server monitoring solution, while Graphite is more of a data collection and reporting tool. What I'd really like to see is a merger of the two tools, but that probably won't happen anytime soon.
My experience, guesses, and blunders making highly available products and services.
Friday, August 23, 2013
Friday, August 2, 2013
PDF Service Memory Leaks
One of the most attractive features of Lucidchart is the direct mapping of pixels from screen to page. An essential part of this process is our PDF generator. JSON render data goes in and a PDF or an image comes out. Though it sounds simple, it contains 13k lines of Scala code, heavily uses Akka actors to gather and render fonts, images, and pages, depends on 8 internally maintained jars and 83 others, and is responsible for generating 50k PDFs and images a day (1.5M per month, 18.25M per year) at its current load. This is anything but a simple service.
Keeping this service running smoothly is a high priority. On July 8, a code release to the Lucidchart editor uncovered several issues with the PDF service. More specifically, the new image manager allowed users to retrieve images from Facebook, Flickr, and Dropbox. With these changes, our robust system fell on its face. PDF JVMs were crashing hundreds of times a day, causing those servers to be terminated and replaced with new ones. It wreaked havoc on our users and our uptime.
Keeping this service running smoothly is a high priority. On July 8, a code release to the Lucidchart editor uncovered several issues with the PDF service. More specifically, the new image manager allowed users to retrieve images from Facebook, Flickr, and Dropbox. With these changes, our robust system fell on its face. PDF JVMs were crashing hundreds of times a day, causing those servers to be terminated and replaced with new ones. It wreaked havoc on our users and our uptime.
Labels:
Java,
JVM,
Lucidchart,
Memory Leak,
Scala
Location:
Herriman, UT 84096, USA
Subscribe to:
Posts (Atom)