Monday, April 22, 2013

Storing UUIDs with Anorm

I am not a big fan of Object Relational Mappers (ORMs). It is too easy to turn a simple query into a massive CPU-eating hydra - complete with 7 outer joins and no index usage. When I write code, I try to write it in such a way that the worst things to do are the hardest. That way, when I'm in a rush, I still write good code.

Enough about my rant: enter Anorm. I've been developing software on the Scala / Play stack for over a year now. Anorm is currently the supported database connector. It is a thin layer on top of JDBC that takes out all of the ugliness, but leaves the manual SQL writing goodness.

One of the things that Anorm does not do is read and write UUIDs. In 200 lines of copy-paste code, you'll have Anorm reading and writing UUIDs like a pro. A side effect of this code is that Anorm will also be able to handle byte arrays.

Wednesday, April 17, 2013

"In" Clause with Anorm

At Lucidchart, two coding styles have been adopted because Anorm doesn't support the 'in' clause.

  1. Code that allows for, or is conducive to, SQL injection.

    Example:
    SQL(""" SELECT * FROM users WHERE id IN (%s) """.format(userIds.mkString(",")))

  2. Code that is hard to follow and duplicate.

    Example
    val params = List(1, 2, 3) 
    val paramsList = for ( i <- 0 until params.size ) yield ("userId" + 
    i) // results in List("userId0", "userId1", "userId2") 
    SQL(""" SELECT * FROM users WHERE id IN ({%s}) """.format(paramsList.mkString("},{")) // 
    produces "id in ({userId0},{userId1},{userId2})" 
    .on(paramsList.zip(params)) // produces Map("userId0" -> 1, 
    "userId1" -> 2, ...)
Ugly much? The madness ends here with code samples.

Wednesday, March 27, 2013

UUIDs in MySQL

Auto incremented numbers can be so nice for unique IDs in a database; they're guaranteed to be unique, can range to fit your needs (tinyint to unsigned bigint), and easy to use. When working in a sharded environment, they may not fit the bill anymore; you either lose global uniqueness or you lose ease of use (depending on your sharding mechanism). One very good alternative is to use UUIDs.

UUIDs solve the global uniqueness and ease of use problem at the same time, but introduces another problem in the database layer - memory footprint. A UUID is a 36 byte ASCII string of characters representing a 16 byte value. I've seen them stored as CHAR(36), VARCHAR(36), etc. The problem with storing them in their ASCII form is that the index will grow beyond the capacity of the database server faster than when they're stored in a binary form. Memory will fill faster, queries will hit the disk more frequently, and it will ultimately cost more money to save the exact same information. It's not worth it.

Monday, February 25, 2013

Best AWS re:Invent Session

I recently came across a post about James Hamilton, the star AWS employee who keeps the internet alive while living on a boat, and I thought I would write a blurb about his sage wisdom bestowed upon me during the best session I attended at the AWS re:Invent conference.

Thursday, February 21, 2013

OpenVPN Configuration for VPC on Ubuntu 12.04

One of my more popular posts so far has been OpenVPN with Amazon VPC. I'm guessing that the lack of a step by step process or, at the very least, the configuration files, has left you annoyed. In this post, I'll give step by step instructions and configuration files to you, so that you can have all the goodness that is a private VPN on VPC.

While the servers that I ran this on are Ubuntu 12.04, it wouldn't surprise me if the configuration and setup worked on various versions of Ubuntu, Debian, Fedora, CentOS, and more.

Tuesday, January 29, 2013

VPC Migration: Post Mortem

All done! Every last one of the servers is running inside of Amazon's VPC. For the most part, everything went as expected. There are just a few loose ends I'd like to note.

Friday, January 25, 2013

OpenVPN with Amazon VPC

Although I did not initially plan to setup a VPN between Lucidchart's office and the newly setup VPC, I changed my mind before I even migrated the first server.

The reasoning is simple. I don't want our services to be publicly accessible; however, our office needs access to those services. The services I'm talking about include git, chef, apt, jenkins, and more. 

These services are not the only issue. Imagine a problem in production that requires manual debugging. I would have to tunnel through the NAT instance manually just to debug the problem server. When I'm having any issue in production, the last thing I want is an extra step.