Systematic Gaming

August 19, 2008

Memory Management: Design

Filed under: game programming, memory management — Tags: , , — systematicgaming @ 12:29 pm

We’ve reviewed the different types of memory in part one and looked at issues specific to consoles in part two. Now we’re ready to start designing our own memory manager. Lets review our requirements:

  • Our memory manager must be fast and efficient. We don’t want to waste CPU cycles or extra bytes of RAM managing memory.
  • We need to allow memory to be allocated with arbitrary alignment.
  • We need to support managing multiple physical memory heaps as well as the ability to support both internal and external heap managers.

These are the main requirements we have above and beyond the basic functionality of allocating and releasing memory.

The Allocator

As a starting point let’s create a simple interface we can use as a starting point to build our memory manager from. Lets go back to the basics of memory allocation, we have two basic operations Allocate and Free. With that we simply define an abstract interface

class IAllocator
{
public:
   IAllocator() {}
   virtual ~IAllocator() {}

   virtual void * Alloc(size_t size) = 0;
   virtual void Free(void *ptr) = 0;
};

From this simple interface we can start designing a better memory manager.

Fast and Efficient

I’ve mentioned more than once that general heap allocation isn’t very good. Using a list of blocks means we have to search this list each time we allocate or free memory. Also because our list is embedded in the memory blocks we manage it’s very spread out in memory, this means we cause many cache misses traversing this list. Cache misses are very costly on modern CPUs. Block based heap management also has overhead – a few bytes to memory used to manage each block. Let’s look at some faster and more efficient ways of managing memory.

Fixed Block Allocator

A fixed block allocator manages memory in fixed size blocks. A fixed block scheme has some obvious limitations, you can’t allocate memory larger than the block size. But has some nice advantages, we can quickly allocate a specific block without traversing a list and we don’t have to maintain the size of a block for management purposes. So a fixed block allocator is more efficient, with little or no per-block overhead. We only need to know if a block is allocated or free. Here’s a simple design:

template<int BlockSize, int NumBlocks>
class FixedBlockAllocator : public IAllocator
{
public:
   FixedBlockAllocator();
   virtual ~FixedBlockAllocator() {}

   virtual void *Alloc(size_t size);
   virtual void Free(void *ptr);

private:
   uint32_t mBlocks[ NumBlocks / 32 ];    // bit-vector indicating if a block is allocated or not
   uint8_t mHeap[ BlockSize * NumBlocks ]; // array of block data
};

FixedBlockAllocator::FixedBlockAllocator()
{
   // by default all blocks are free
   for ( int i = 0; i < NumBlocks / 32; i++ )
   {
      mBlocks[i] = 0;
   }
} 

void *FixedBlockAllocator::Alloc(size_t size)
{
   // we can't handle allocations larger than our block size
   if ( size > BlockSize )
   {
      return NULL;
   }

   // search for a free block
   int block = -1;
   for (int i = 0; i < NumBlocks / 32; i++ )
   {
      // any free blocks in this chunk?
      if ( 0xffffffff != mBlocks[i] )
      {
         // find a free entry in this chunk
         // then allocate it and set the proper block index
         for ( int j = 0; j < 32; j++ )
         {
            if ( 0 == (mBlocks[i] & (1 << j)) )
            {
               mBlocks[i] |= (1 << j);
               block = i * 32 + j;
               break;
            }
         }

         break;
      }
   }

   // no free block found
   if ( -1 == block || block >= NumBlocks )
   {
      return NULL;
   }

   // return our block
   return mHeap + (block * BlockSize);
}

void FixedBlockAllocator::Free(void *ptr)
{
   uint8_t *pointer = static_cast<uint8_t *>(ptr);

   // is this pointer in our heap?
   assert( pointer > mHeap && pointer < mHeap + BlockSize * NumBlocks );

   // convert the pointer into a block index
   int block = static_cast<int>(pointer - mHeap) / BlockSize;

   // reset the bit in our block management array
   int chunkIndex = block / 32;
   int bitIndex = block % 32;

   mBlocks[chunkIndex] &= ~(1 << bitIndex);
}

Our allocator is fairly straightforward, we maintain a bit vector with one bit per block indicating if the block is allocated or not. Then we have a simple array of bytes as our heap. This design is pretty fast – we still have to search for free blocks when allocating, but the search is an efficient linear scan of an array, so it’s cache friendly. Once we have a block index returning the appropriate data is trivial. Freeing a block is also trivial, using some simple pointer arithmetic we simply clear the proper bit. However the fixed block allocator is much more efficient than a general heap allocator. We only need 1 bit per allocation to manage this heap. That at least 8x more efficient than the simple heap allocator we looked at in part one.

You probably noticed that I made this allocator a template. The reason is this makes the allocator flexible – we can have any combination of sizes and blocks, but more importantly it gives our allocator a fixed memory footprint that is defined at compile time. This gives us the ability to declare the allocator as a global variable which means we can allocate it from global memory if we want. It also keeps everything simple, having our memory manager need to allocate memory to maintain itself can get messy so it’s better to avoid it.

However as fast and efficient as the fixed block allocator is, it’s pretty limited. If we allocate sizes smaller than the block size we are wasting memory and we can’t allocate larger blocks. So this allocator doesn’t make for a good general allocator. It is a very useful allocator however, and we’ll certainly find some good uses for it later.

Stack Allocator

Lets look at another specialized allocator. This one is inspired by the callstack, which we reviewed in part one. If we limit allocation and freeing to the end of the stack we can create a very fast and fairly efficient allocator. Let’s look at some code:

template <int HeapSize>
class StackAllocator : public IAllocator
{
public:
   StackAllocator();
   virtual ~StackAllocator() {}

   virtual void *Alloc(size_t size);
   virtual void Free(void *ptr);

   void *Push(size_t size);
   void Pop();

private:
   struct HeapAlloc
   {
      size_t size;          // size of the data block
      HeapAlloc *prev;      // pointer to the previous allocation
   };

   HeapAlloc *mTop;         // pointer to the top (last) allocation
   uint8_t mHeap[HeapSize]; // heap data
};

StackAllocator::StackAllocator()
{
   // reset the stack
   mTop = NULL;
}

void *StackAlllocator::Alloc(size_t size)
{
   return Push(size);
}

void StackAllocator::Free(void *ptr)
{
   // check that we're freeing the top of the stack, any other pointer is invalid
   void *topData = reinterpret_cast<uint8_t *>(mTop) + sizeof(HeapAlloc);
   assert( ptr == topData );

   Pop();
}

void StackAllocator::Push(size_t size)
{
   HeapAlloc *prevTop = mTop;
   if ( mTop )
   {
      // do we have enough space in our heap?
      size_t bytesFree = (reinterpret_cast<uint8_t *>(mTop) - mHeap) - mTop->size;

      if ( bytesFree < size + sizeof(HeapAlloc) )
      {
         return NULL;
      } 

      // allocate our next block from the top of the stack
      mTop = reinterpret_cast<HeapAlloc *>(reinterpret_cast<uint8_t *>(mTop) + mTop->size + sizeof(HeapAlloc));
   }
   else
   {
      // alloc from the start of the heap if large enough
      if ( HeapSize < size + sizeof(HeapAlloc) )
      {
         return NULL;
      }

      mTop = reinterpret_cast<HeapAlloc *>(mHeap);
   }

   mTop->size = size;
   mTop->prev = prevTop;
   return reinterpret_cast<uint8_t *>(mTop) + sizeof(HeapAlloc);
}

void StackAllocator::Pop(void *ptr)
{
   void *topData = reinterpret_cast<uint8_t *>(mTop) + sizeof(HeapAlloc);
   assert( ptr == topData );

   mTop = mTop->prev;
}

Our stack allocator just allocates memory from the top (or end) of its heap, and only allows freeing the block from the top. Blocks are managed similarly to our simple general heap allocator from part one, with a small header at the beginning of each allocated chunk. Looking at the implementation you can see that this allocator is about as fast as you can get – no searching, allocation is a trivial pointer incrementation. Also we’ll never have heap fragmentation, since we only free memory from the top of the stack.

As fast as this allocator is, it is severely limited because we can only properly use it when our allocation matches the push/pop pattern of the callstack. This StackAllocator does show that a more specialized allocation scheme can overcome the shortcomings of general heap management. However, it’s not so clear how we can use it in memory management, we can still use this allocator when we need a fast allocator with an algorithm that follows the push/pop pattern.

Alignment

Alignment requires we be able to specify not only how much memory we want to allocate, but also what the memory’s address should be aligned to. Looking at our IAllocator interface, you’ll notice that we haven’t provided a means to do so. Simplest way is to just allocate a slightly larger block of memory and round up the address to the requested alignment

// allocate 16 byte aligned memory
// the worst case is we're 15 bytes from our desired alignment
uint8_t *data = reinterpret_cast<uint8_t *>myAllocator.Alloc(size + 15);

// create an aligned pointer by shifting the pointer to the next aligned address
int offset = reinterpret_cast<uint32_t>(data) % 16;
uint8_t *alignedAddress = data + (16 - offset);

That will give us a pointer aligned to 16 bytes every time. It can easily be extended to support arbitrary alignment values. However we now have a problem, when we Free this aligned pointer we need to pass the original unaligned pointer to the Free call. It isn’t really practical to keep two pointers around, it makes more sense to just add this functionality to our IAllocator interface. Here’s our updated interface:

class IAllocator
{
public:
   IAllocator() {}
   virtual ~IAllocator() {}

   virtual void * Alloc(size_t size, size_t alignment) = 0;
   virtual void Free(void *ptr) = 0;
};

That’s not all, if you remember all the restrictions caused by alignment and take a look back at the allocators we’ve designed, you may see a problem. They’re all broken! Oops!

Take the StackAllocator for example, if we allocate a block of size 3, then our mTop pointer will have an alignment of 3. That’s bad, many systems won’t work properly with that alignment. This means that not only must we align our allocated memory we must be careful to ensure that our internal management data is also properly aligned.

Our FixedBlockAllocator has another problem: we have no control over alignment. Each block has a fixed address, and therefore a fixed alignment. One quick fix is to align the heap to the same address as the block size, since most data types have alignment requirements equal to their size. If someone requests a larger alignment, we can just return NULL.

I’ll leave updating or sample allocators to handle alignment properly as an exercise for the reader.

Physical Memory

We also need to handle different physical types of memory. The simple approach is to use multiple allocators:

void *myData = gMainMemory.Alloc(1024);
void *myVideoData = gVideoMemory.Alloc(1024*1024);

Simple, effective, so what’s the problem? Well for a single platform with a fixed memory configuration is can work. But if you’re writing portable code is doesn’t work too well. Say we’re writing a game both the PS3 and Xbox 360. One main difference is the PS3 has separate video and main memory whereas the Xbox has a single unified set of memory. The above code makes sense on the PS3, but not so much on the Xbox. So a better way is to inform the memory manager what type of memory you want and let it handle the different memory types.

void *myData = gAllocator.Alloc(1024, MEM_MAIN);
void *myVideoData = gAllocator.Alloc(1024*1024, MEM_VIDEO);

Think of this a providing a hint to the allocator as to how to allocate our memory. On the PS3 we can then redirect any MEM_VIDEO hint to the proper heap allocator, while on the Xbox both allocations could come from the same heap. If we want we can also expand the hints to include lifetime or other allocation hints (like MEM_TEMPORARY, MEM_HIGH), but it’s probably not necessary and can expose too many internal implementation details. Let’s update our IAllocator interface to include this:

class IAllocator
{
public:
   IAllocator() {}
   virtual ~IAllocator() {}

   virtual void * Alloc(size_t size, size_t alignment, uint32_t type) = 0;
   virtual void Free(void *ptr) = 0;
};

Hierarchal Memory Manager

We’ve designed an allocator interface that meets our basic requirements. We’ve seen how we can use specialized allocation patterns to increase speed and efficiently. Now let’s put it all together into a system we can used to manage memory of our entire game. This memory manager will act as the entry point for all memory allocations in our game. By routing all allocations through a single point we can better manage memory resources at a global level.

We’ll design what I call a hierarchical memory manager, since it uses a hierarchy of allocators to manage our memory. Basically it’s just another IAllocator implementation that forwards allocations to the appropriate allocator.

class MainAllocator : public IAllocator
{
public:
   MainAllocator() {}
   virtual ~MainAllocator() {}

   virtual void * Alloc(size_t size, size_t alignment, uint32_t type);
   virtual void Free(void *ptr);

private:
   HeapAllocator mMainMemory;
   HeapAllocator mVideoMemory;
};

So this is just another allocator that manages two heaps, like the one suggested to earlier. Depending on the type of memory requested we forward it allocation to the appropriate sub-allocator:

void *MainAllocator::Alloc(size_t size, size_t alignment, uint32_t type)
{
   switch (type)
   {
   case MEM_MAIN:
      return mMainMemory.Alloc(size, alignment, type);

   case MEM_VIDEO:
      return mVideoMemory.Alloc(size, alignment, type);
   }

   assert(false); // invalid type requested
   return NULL;
}

Simple, enough right? But what about Free? How do we know which heap to free from? We could just forward the Free to all our sub-allocators, but that’s hardy efficient. I think we’ll have to update our allocator again:

class IAllocator
{
public:
   IAllocator() {}
   virtual ~IAllocator() {}

   virtual void * Alloc(size_t size, size_t alignment, uint32_t type) = 0;
   virtual void Free(void *ptr) = 0;

   virtual bool ContainsPointer(void *ptr);
};

This new ContainsPointer class member is straightforward: it returns true if the pointer was allocated by this allocator. Fortunately this is almost always a trivial check, we just need to make sure the pointer is in the range [ HeapStart, HeapStart + HeapSize ). For more complex allocators, like our hierarchical we simply check each sub-allocator until we find one that returns true, and return false if none of our sub-allocators contains the pointer, like so:

bool MainAllocator::ContainsPointer(void *ptr)
{
   return mMainMemory.ContainsPointer(ptr) || mVideoAllocator.ContainsPointer(ptr);
}

Now we can implement our hierarchical allocator’s Free:

bool MainAllocator::Free(void *ptr)
{
   if ( mMainMemory.ContainsPointer(ptr) )
   {
      mMainMemory.Free(ptr);
   }
   else if ( mVideoAllocator.ContainsPointer(ptr) )
   {
      mVideoAllocator.Free(ptr);
   }
}

So that’s the idea behind our hierarchical allocator. It’s a simple example, and is more of a pattern to follow than a specific implementation. There are many things we can do once we’re allocating memory through allocator interfaces, one nice way to extend our hierarchical example is to add some fixed block allocators to speed up and make small allocations more efficient.

class MainAllocator : public IAllocator
{
public:
   MainAllocator() {}
   virtual ~MainAllocator() {}

   virtual void * Alloc(size_t size, size_t alignment, uint32_t type);
   virtual void Free(void *ptr);

private:
   HeapAllocator mMainMemory;
   HeapAllocator mVideoMemory;

   FixedBlockAllocator<4, 1024> mFixedAllocator4;
   FixedBlockAllocator<8, 1024> mFixedAllocator8;
   FixedBlockAllocator<16, 1024> mFixedAllocator16;
};

void *MainAllocator::Alloc(size_t size, size_t alignment, uint32_t type)
{
   switch (type)
   {
   case MEM_MAIN:
      {
         void *ptr = NULL;
         if ( size <= 4 )
            ptr = mFixedAllocator4(size, alignment, type);
         if ( !ptr && size <= 8 )
            ptr = mFixedAllocator8(size, alignment, type);
         if ( !ptr && size <= 16 )
            ptr =  mFixedAllocator16(size, alignment, type);
         if ( !ptr )
            ptr = mMainMemory.Alloc(size, alignment, type);
         return ptr;
      }
   case MEM_VIDEO:
      return mVideoMemory.Alloc(size, alignment, type);
   }

   assert(false); // invalid type requested
   return NULL;
}

Now when we allocate from main memory we’ll try to allocate small allocations out of fixed block allocators. If the allocators are full, then we’ll allocate from the main heap. This will reduce the overhead and fragmentation issues of allocating small blocks of memory from a general heap allocator. By hiding the fixed block allocators behind a single hierarchical we can manage memory more efficiently and at a global scope.

Summary

We gone over a variety of allocation patterns, and combined with our knowledge of console memory issues have come up with a simple allocation interface that meets our needs. We also saw how to combine allocators to simplify allocation and increase portability. However up until now we haven’t really been discussing memory management only memory allocation. While allocating memory is an important part of memory management, it’s not the entire picture. Now that we’ve examined the issues of memory allocation in detail we need to look at the entire system of memory management. That’s part four of this series.

6 Comments »

  1. I notice you have the following:
    in MainAllocator

    private:
    HeapAllocator mMainMemory;
    HeapAllocator mVideoMemory;

    however you don’t discuss a class called HeapAllocator. Is this either the FixedBlockAllocator or the StackAllocator instead or is HeapAllocator missing from the discussion?

    Later you reference mMainAllocator which looks like it could be mMainMemory. Can you clarify these points for me please?

    Comment by Christopher Clark — March 12, 2012 @ 10:21 pm

  2. Thank you, will make the changes and try it out.

    Comment by Christopher Clark — March 13, 2012 @ 12:08 am

  3. I noticed that in your FixedBlockAllocator::Free(void*) function, you are calling assert on ( pointer > mHeap && pointer = mHeap && pointer < mHeap + BlockSize * NumBlocks )? As I understand it, the first allocated block starts at the same address as mHeap.

    Comment by Hew Jun-Wei — May 5, 2012 @ 4:40 am

  4. Gah, the above didn’t format correctly- shouldn’t it be ( pointer >= mHeap && pointer < mHeap + BlockSize * NumBlocks )?

    Comment by Hew Jun-Wei — May 5, 2012 @ 4:42 am

  5. This is what I have in FixedBlockAllocator.h and it compiles.
    assert( pointer > mHeap && pointer < mHeap + BlockSize * NumBlocks );

    Comment by Anonymous — May 5, 2012 @ 3:29 pm


RSS feed for comments on this post. TrackBack URI

Leave a reply to Hew Jun-Wei Cancel reply

Blog at WordPress.com.