Dynamic Images and Performance Implications

posted on 05/07/12 at 11:03:14 am by Joel Ross

When we first started writing Tourneytopia, we made a lot of decisions without really thinking about the performance implications. The choices we made were fine for the time, but as we grew, we noticed slowdowns and eliminated them as they cropped up.

Well, one cropped up just recently, and I like the solution we came up with, so I thought I'd share it.

Before we get to the solution, let's talk about the problem. When a user creates a new pool, they can upload their own logo. We want to store that image in the database and display it to every user who comes to the pool. We chose to store the image in our database, so we'll need to be able to get it out of the database and into a format that we can use for the page.

Our initial solution served us well for a few years, really (or at least it wasn't so bad that there were complaints). We pulled the image out of the database and served it up through an HTTP Handler. I'll leave that code out of this post, but there are resources that show something similar to what we did. The solution isn't horrible, but what we found was we had a lot of trouble getting browsers to cache the request, even when we set the necessary HTTP headers.

There's two problems with this solution. First, it requires that we pull the image out of the database every time we need it. The second problem magnifies the first. Since the image is not being cached by the browser, we're going back to the database even more - every request for every user. Now, we probably could have solved the caching problem, but we'd still be hitting the database to get the image once per user, and if the cache expired, we'd have to go back to the database again.

So what's the solution look like? Well, we should talk about the ideal solution. From a performance standpoint, the best would be to never hit the database. When a user uploads a new image, we could just store it on the file system and we'd never have to hit the database. This isn't ideal for us because it doesn't work well in a web farm and it makes it harder to move the application to a new server. Those two problems are why we chose to store the image in the database in the first place.

So if 0 database hits are out of the question, what's the next best solution? One database hit. Which is pretty close to what we ended up with. Technically, it's one database hit per server in the farm, but that's acceptable too. If we worked out the caching issue I mentioned above, the best we could do is get to one database hit per user, and since we have more users than servers in the web farm, this is definitely better.

How did we get it down to 1 hit? When the image is requested, we write it out to the file system, store the path to the image in the database, and return the path as the URL. Subsequent requests just return the path as the URL. The result: One hit to the database no matter how many users request the image! And, since it's just an image, IIS handles setting the cache headers for us so the browser caches it just as it would with any other image.

What does the code look like?

   1: public string GetLogoPathFor(Pool pool)
   2: {
   3:     if (pool.LogoPath.HasValue()) return pool.LogoPath;
   4:  
   5:     var image = pool.GetPoolImage();
   6:  
   7:     if (image != null)
   8:     {
   9:         if (!image.LocalPath.HasValue())
  10:         {
  11:             SaveLocalImageFor(image);
  12:         }
  13:         else if (!File.Exists(TranslateToLocalPath(image.LocalPath)))
  14:         {
  15:             image.ImageBitmap.SaveJPG100(TranslateToLocalPath(image.LocalPath));
  16:         }
  17:         return image.LocalPath;
  18:     }
  19:     
  20:     return GetDefaultLogoUrl();
  21: }
  22:  
  23: private void SaveLocalImageFor(PoolImage image)
  24: {
  25:     var guid = Guid.NewGuid().ToString();
  26:  
  27:     var imageName = "logo-{0}.jpg".FormatUsing(guid);
  28:  
  29:     image.LocalPath = "~/Content/Logos/{0}".FormatUsing(imageName);
  30:     image.ImageBitmap.SaveJPG100(TranslateToLocalPath(imageName));
  31: }

This is in an image service class we have that handles our image manipulation. When a page renders, it calls the GetLogoPathFor() method. We have a way for a pool to have an external logo, so that's what line 3 is doing. After that, we have 3 possible states:

  • The image hasn't been cached: This'll be the case right after a logo is uploaded. We write the file out, storing the path with the image, then return the path.
  • The image is already on the file system: This'll be the case when a logo has already been cached to the file system. We just return the path.
  • The image has a local path set, but the image doesn't exist on the file system: This'll be the case when the file was cached on server 1 and there's a subsequent request from server 2. In this case, we take the local path for the image, and write it out to the file system, then return the path.

Also notice that we use a Guid in the file path. This is for cache-busting purposes. If a pool admin uploads a new logo, it'll get a different file name when it's saved (since Guids don't repeat), and when the browser sees the new path, it will re-download the file instead of using the cached image.

The above solution handles both problems we talked about in our solution: Minimal hits to the database, and the browser caches the image without any special coding on our part.

The final part of this is cleanup. When a pool admin changes the logo, we have an orphaned file on the file system. This is fairly minor, but can be handled a few different ways:

  • Do nothing at all. Disk space is cheap.
  • Delete the image when the logo is changed. This can be done, but could be tougher in a server farm because you have to delete files from multiple servers based on a request from one server. This could be mitigated by using file system replication for the folder where dynamic images are stored.
  • Run a cron job that cleans up images. Look at all of the valid paths in the database, and delete any image that doesn't match a valid path.
  • delete all of the images periodically. The code will recreate the images for you, so deleting them all won't cause problems. The downside is you get a database hit again. The upside is that you only cache images that are being used. If we had a pool that hadn't been accessed in 3 years, the image wouldn't be on our file system any more.

We put this in place during the recent Aerosmith Song Madness Challenge, and we could see a noticeable difference. We were hitting what appeared to be a maximum number of concurrent users based on bandwidth limitations. Just implementing the above change, we noticed about a 15% increase in traffic. We didn't really notice a CPU difference because we were being limited by bandwidth at the time, but since we're not dynamically writing an image out every request, it's pretty simple to conclude that CPU would be better as well.

Categories: Develomatic, Development, C#