Performance Tweaks For Your Cache

posted on 06/13/08 at 08:00:00 pm by Joel Ross

Over the past few weeks, I've been dealing with a performance issue on Tourneytopia. The site has been working as expected, but it's been a bit slower than I think is acceptable. So we decided to take a look and see if there's a better way to do things that would speed it up.

One of the areas we found was with how we cache data. We cache results fully scored and ranked, which is fine. The problem lies in how we were populating the cache. It's something we would never notice in development, because when you're just firing up the debugger, you aren't running multiple threads against that code. It's only when you hit the same code on multiple threads do you notice the issue. Here's how our cache was working:

   1:  public List<ScoredEntry> GetScoredEntries()
   2:  {
   3:    if(HttpContext.Current.Cache["ScoredEntries"] == null)
   4:    {
   5:      HttpContext.Current.Cache["ScoredEntries"] = Pool.GetScoredEntries();
   6:    }
   7:    return HttpContext.Current.Cache["ScoredEntries"] as List<ScoredEntry>;
   8:  }

At first glance, this looks just fine. The first time this code is exercised, the cache is null, so we populate it. Then we return it. Nothing wrong there, right? Well, yes, actually. Pool.GetScoredEntries() takes time to complete. While it's running, what happens if another thread hits this code? The cache is still null, so it tries to populate it, meaning it calls Pool.GetScoredEntries() again. Now imagine 50 threads come through. See the problem?

So we needed a way to signal that Pool.GetScoredEntries() was already running, so no other code would run it. As it turns out, that's what the lock keyword is for. After implementing it, this is what our updated code would look like:

   1:  private static object _cacheLock = new object();
   2:   
   3:  public List<ScoredEntry> GetScoredEntries()
   4:  {
   5:    List<ScoredEntry> entries = HttpContext.Current.Cache["ScoredEntries"] as List<ScoredEntry>;
   6:    if(entries == null)
   7:    {
   8:      lock(_cacheLock)
   9:      {
  10:        entries = HttpContext.Current.Cache["ScoredEntries"] as List<ScoredEntry>;
  11:        if(entries == null)
  12:        {
  13:          entries = Pool.GetScoredEntries();
  14:          HttpContext.Current.Cache["ScoredEntries"] = entries;
  15:        }
  16:      }
  17:    }
  18:    return entries;
  19:  }

A couple of things to note. First, we have a private static object declared. Second, we lock that object if the cache is null. The check for null inside of the lock() is because you could have a situation where while the lock is being obtained in one thread, another thread could hit the if statement and have it be true. Then the first thread would gain the lock, populate the cache, and release the lock. Once the second thread obtained the lock,  line 10 will actually return the correct results. No need to repopulate the cache - that's the situation we were in before!

Also, one last thing about this code. I introduced a local variable (entries) that wasn't there before. I probably should have that in the original as well, because there's a slight possibility that in the time that I check if the cache is null and then grab the value out of cache, the entries could have been thrown out of cache. By using a local variable, I ensure that if that happens, it doesn't affect me.

Now, this code isn't exactly what we have for Tourneytopia. I don't use magic strings, nor are my methods that simple to call. And we don't have just one cache lock object. Because scoring (and thus, cache) is by pool, I have a method that gets the correct object to lock, based on a key, which is in turn based on a pool. That actually has its own lock object:

   1:  private static object _cacheSetter = new object();
   2:  private static Dictionary<string, object> _cacheLockList = new Dictionary<string, object>();
   3:          
   4:  private object GetCacheLockObject(string key)
   5:  {
   6:    if(_cacheLockList.ContainsKey(key))
   7:    {
   8:      return _cacheLockList[key];
   9:    } 
  10:    else
  11:    {
  12:      lock(_cacheSetter)
  13:      {
  14:        if(!_cacheLockList.ContainsKey(key))
  15:        {
  16:          _cacheLockList.Add(key, new object());
  17:        }
  18:        return _cacheLockList[key];
  19:      }
  20:    }
  21:  }

The _cacheSetter object is locked only long enough to create the necessary cache lock object, which is by key. Then in my cache retrieval methods, I lock on that object. If there's an object already there for that key, it's returned. Otherwise, the global lock object is locked and the new object is created and added to the dictionary for that key.

Now, the same lines of code can be run by multiple threads for different pools, and not lock each other out, which is what we want, because in reality, they're actually setting different caches. This makes my GetScoredEntries look a little different:

   1:  public List<ScoredEntry> GetScoredEntries(string key)
   2:  {
   3:    List<ScoredEntry> entries = HttpContext.Current.Cache[key] as List<ScoredEntry>;
   4:    if(entries == null)
   5:    {
   6:      lock(GetCacheLockObject(key)
   7:      {
   8:        entries = HttpContext.Current.Cache[key] as List<ScoredEntry>;
   9:        if(entries == null)
  10:        {
  11:          entries = Pool.GetScoredEntries();
  12:          HttpContext.Current.Cache[key] = entries;
  13:        }
  14:      }
  15:    }
  16:    return entries;
  17:  }

Instead of locking on just one global object, we call GetCacheLockObject to get the actual object to lock.

Lots of tweaks involve making code run faster, This particular one has nothing to do with optimizing the code - it's all about making the code smarter so there aren't multiple instances of the same code running at the same time.

Tags: |

Categories: Develomatic, Development, C#