When optimizing code there are a number of techniques than are frequently used – tried and true methods to speed up our code. These optimization patterns occur again and again. In this series we’ll look at various patterns in performance than we can apply in numerous situations.

Almost all programming can be viewed as an exercise in

caching – Terje Mathisen

The first pattern we’ll look at is **caching**. Caching is simply storing the result of calculations for later use. This is a traditional space-time trade-off, where we use a limited amount of memory to reduce recalculating functions, trading memory of speed.

## Pattern: Once and Only Once

This is the most basic of optimization techniques – only calculate a result once. The simplest example is pulling constant terms from inside a loop to outside

for (int i = 0; i < n; i++) c = Calculate() Process(i, c)

If Calculate isn’t dependent on the loop variable or the Process function then you can simply pull it out of the loop

c = Calculate() for (int i = 0; i < n; i++) Process(i, c)

Now, in theory, we’ve reduced the execution of this code by about (n-1) * cost(Calculate). This optimization is so basic that you could almost consider the original version badly written, even broken. Sometimes the compiler will pull constants out of the loop for you, but since the optimized loop is simpler, your better off changing the code yourself.

## Pattern: Short Term Memory

An extension of the previous pattern, loop iterations don’t always have constant calculations to remove, but often they do rely on the previous iteration (or iterations). Take this sample loop that uses the current and previous counters each iteration.

for (int i = 1; i < n; i++) c0 = Calculate(i-1) c1 = Calculate(i) Process(i, c0 + c1)

Clearly we’re calling Calculate(i) twice, once at iteration i, and then again at i+1. Again, assuming Calculate is only dependent on i, we can fix this up a bit:

c0 = Calculate(0) for (int i = 1; i < n; i++) c1 = Calculate(i) Process(i, c0 + c1) c0 = c1

So we’ve just halved the number of calls to Calculate, with a simple re-arrangement. Really, this is just an extension of the once-and-only-once pattern above. Also, it’s pretty trivial to see how this can be extended to loops that need more than one iteration’s value. Of course it many not be so obvious that you’re making redundant calculations in real code, but it’s common enough that you should look if you have a slow loop.

## Pattern: Fixed Size Cache

The previous caching patterns work well for speeding up loops that have calculations dependent on the loop counter. We can generalize this technique to cache any result. To generalize our caching we simply need a way to store and retrieve previously calculated values in a more flexible manner.

Here’s a psuedo-code implementation of a (admittedly pretty limited) cache using an associative array.

GenericCache<int Size, typename T, typename Key> { public: bool TryGetValue(Key key, T &value) { for (int i = 0; i < Size; i++) { if ( mValues[i].key == key ) { value = mValues[i].value; return true; } } return false; } void InsertValue(Key key, T value) { mValues[mNextEntry].key = key; mValues[mNextEntry].value = value; mNextEntry = (mNextEntry + 1) % Size; // circular cache } private: struct Entry { Key key; T value; } T mValues[Size]; int mNextEntry; }

We’d then simply check our cache before calling a function. So to take our sample loop, we’d cache values something like so:

// declare a cache that stores the result of an "int F(int)" function GenericCache<4, int, int> cache; for ( int i = 0; i < n; i++ ) { v = SomeRandomValue(i) int c; if ( !cache.TryGetValue(v, c) ) { c = Calculate(v); cache.InsertValue(v, c); } Process(i, c); }

Not the most interesting loop, but it shows the difference between this general cache and the previous techniques – we aren’t dependent on the loop counter anymore to perform caching. All we require is that our Caclulate function is only dependent on it’s parameters (key).

I’ve chosen to use the most naive approach of retrieving values – brute force. We could be clever and use a smarter structure – like a hash map. The best choice really depends on what we’re caching and how we retrieve it. A small cache may be best implemented as above, a large one will probably need faster retrieval. Implementation also depends on the key we use, which may be non-trivial if we’re caching a function with many parameters. We need our implementation to be fast enough that the cost of checking the cache is less than the cost of recalculating values – ideally much less.

There are a great deal of variables to consider when using a generic cache:

- Ejection policy – we need to be aware of how values occur – random? sequential? – to create an ideal cache policy
- Storage structure – hash table, list, array, etc.
- Retrieval time – it needs to be much faster than the function we’re caching
- Memory requirements – consider total memory, hardware memory cache issues, etc

You need to balance all of the above on a case by case basis. Which is why I chose a trivial cache mechanism in the above sample – it’s the general pattern we’re identifying not a specific implementation.

Extended further, caching all previous results, we start to diverge from the caching pattern. Caching uses a limited amount of space to store recent results. Storing all results becomes memoization. The main difference that caching uses a limited amount of memory to store recent calculations, where memoization will store all calculated results.

You could consider caching a limited form of memoization. However full memoization requires referential transparency – meaning F(x) is constant with respect to x, and is not impacted by the global state. Caching only requires that F(x) be constant for the lifetime of the cache since we can re-populate the cache on the next use.

## Leave a Reply