Systematic Gaming

September 30, 2008

Game Profiling 101

Filed under: game programming, profiling — Tags: , — systematicgaming @ 8:21 am

On this blog I keep mentioning efficiency, resource usage and optimization as key requirement for any well designed game system.  When discussing memory I put a large emphasis on tracking and profiling memory usage.

This time we’ll look at the various ways we can profile our game. We’ll also look at how to instrument our game for custom profiling, focusing on CPU usage.

The most basic way of profiling a game is calculating the frame rate.  Most console games will target 30 or 60 frames per second (FPS), and it’s very simple to calculate and everyone understands.  Here’s one way to calculate the FPS from you main loop

float lastTime = 0.0f;
while (true)
   float time = GetTimeInMs();
   float frameTime = time - lastTime;
   lastTime = time;

   // convert from ms per frame to fps
   fps = 1000.0 / frameTime;         


However, the frame rate alone isn’t very useful for profiling a game – it’s more of a goal than a useful profiling data.  In other words, if the frame rate is too low we know something is wrong, but not what, so more detailed information is necessary.

Also, FPS as a measurement is hard to work with – mostly because it’s the inverse of how long it takes to render a frame.  Generally when doing timings we calculate the the actual execution time, often measured in milliseconds.

So if our frame rate is too low, how do we figure out what’s slow? We need to perform more detailed profiling.  However there are a number of different ways to profile a program.

Types of Profiling

There are a number of way to get timing information from a program, most methods can be broken down into two approaches, instrumented or non-instrumented.

Instrumented means we alter the program so that the program generates its own profiling data.  Manually adding a timer, like in the FPS calculation above is an example of instrumentation. While flexible, instrumentation does change the behaviour of our program, and in some cases can add substantial overhead.

Non-instrumented profiling works by taking an existing program and running it unmodified, but still extracting profiling data.  Intel’s Vtune works like this, it’s a tool that’s able to analyze a program with out alteration.  Non-instrumented profiling often works by sampling the program.  This means the sampling profiler will copy the address (or other data) of a running program at a high frequency.  The collected samples create a profile of the program’s execution.

So if 10% of the samples occur in the function Foo(), we can assume that Foo take 10% of the execution time.  Sampling profilers are nice because then work on unmodified programs.  However, because of how they work it may not be possible to capture all parts of a program, especially small or very fast functions that have a small chance of being hit by a sample.

In this series we’ll be focusing on instrumented profiling, and comparing different ways to instrument our program.

What to Profile

The first thing we need to figure out is what type of data do we need.  The most common metric is time, how long some operation or sequence of operations take. We also need to profile resources, like memory, which we examined in another article. There are also many other types of data we can profile, many of which can have a great impact on performance: cache misses, idle time or stalls, memory bandwidth, etc.

Once we’ve decided on the type of data we want to collect, we have to figure out the scope of the data we want to collect.  Do we want data for a single frame, multiple frames?  Or only a single function call?  Or even the time of all functions calls over multiple frames?  Each different scope often needs a different way of capturing information, some data doesn’t have much meaning at certain scopes.  For example the total number of cache misses over 100 frames isn’t an interesting statistic, but the number of cache misses inside a single function can be valuable information.

How to Instrument Code

Some compilers support automatic per-function instrumentation.  The gcc compiler has a mcount function that is added when you compile with profiling.  You can override this function to add your own instrumentation.  While very handy, we don’t always want to profile all functions, or even profile at a function scope.  Also, this can miss inline functions, and not every platform or compiler has automatic instrumentation, so we’ll have to do some instrumentation manually.

Manually instrumenting code is pretty straightforward – we just did it above, although in a simplistic manner.  But the basic flow remains

  1. Store an initial value from some counter (current time, current free memory, etc)
  2. Perform one or more operations
  3. Get the current value (as in 1)
  4. Calculate the difference

Sometimes we don’t need a difference, we only want the current value, such as when an event occurs, rather than how long it took.  So we may choose to just track the current value.

If we want to instrument a single function we just do:

void myFunction()
   start = GetValue();
   // stuff
   difference = GetValue() - start;

Where GetValue can be any incrementing value, like time, cache misses or whatever data is available on our platform.  Lets look at how to make this simple instrumentation easier.  From the above flow we can see we just need to sample two data points at the start and end of a scope, so it’s pretty trivial to create a simple class to do what we want, so here’s a simple timer class:

class AutoTimer
      // get the current clock time from the OS
      mStartTime = GetCurrentTime();

      // take the time difference
      float timeElapsed = GetCurrentTime() - mStartTime;
      printf( "Time Elapsed: %f\n", timeElapsed );
   float mStartTime;

We now can simply time a function like so

void myFunction()
   // do stuff

That’s a pretty simple pattern, but it’s bothersome if you have hundreds or thousands of functions to instrument.  We’re just calculating a single data point: how long a single call of a single function takes.  This is useful if the function if only called once per-frame, but not very useful if we’re timing a function called often.

In Summary…

We’ve discussed some very basic profiling concepts, and mentioned the various ways of profiling a program.  We also saw how we can simply instrument some code.  There’s a lot more about profiling to cover, so in the following articles we’ll look at how to profile your whole program.  We’ll see ways to We’ll learn how analyze profiling data to look for slow points and figure out what to optimize.


Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

Create a free website or blog at

%d bloggers like this: