Systematic Gaming

October 23, 2008

Game Optimization 101

Filed under: game programming, optimization — Tags: , , — systematicgaming @ 6:23 am

In this article we’ll look at the process of optimization.  There’s no single way to optimize a game, but there are optimization methodologies to follow, which will help us prioritize our optimization strategy.

In previous articles we looked at how we can profile parts of our game.  At this point we only know how long sections of our game code are taking.  We haven’t looked into how to use this information to make our game faster.  We need to know how to target the slow parts of our game.

The process of optimization is two parts methodology, one part knowledge and one part intuition.  We’ll start with a simple process and show how it can help us optimize effectively.

Methodology of Optimization

The basic process is very simple:

If our goal may be to run at 60 FPS in all parts of the game, we simply follow the above algorithm until our game is running at 60 FPS.  It’s as simple as that.

Well, okay, not really.  The process is pretty straightforward.  Measurement is critical, since I can’t the count number of times I’ve made a change that should be an improvement, but turned out to not help or even hurt performance.  The real trick is deciding what to change.

Remember when I said that optimization is 2 parts methodology, 1 part knowledge and 1 part intuition?

Well, this is where the knowledge and intuition come in.  This is where optimization becomes more of an art than a science.  Guided by our process, we can iterate until our game is fast enough, but without the intuition and experience to guide our changes, we’ll be making changes blindly and we may never find our goal.

So we’ve made some measurements that show we’re not 60 FPS.  Okay, what next?  The frame rate tells us almost nothing, so we need more detailed information.  We can take more and more measurements until we get a better picture of our game’s performance, like how we discussed in Profiling 102.  The measurements alone only show us what, is taking time not why.  Or more importantly, the measurements don’t directly say how to make things faster. We need to use our knowledge and intuition to interpret our data.

Knowledge

When we use our knowledge to interpret our profiling data we rely on recognizing patterns we’ve encountered before.  We rely on our domain knowledge of the task at hand. We rely on all the previous games we’ve optimized before, as well as our education, knowledge from teammates or ramblings on the web.

Knowledge will tell us that our AI sub-system probably shouldn’t be using over 50% of a frame’s time.  Because it never took more than 10% on our last couple of games.

Knowledge will tell us that our GetValue class members shouldn’t be showing up in our profile data.  Because we know that they should have been inlined by the compiler.

Knowledge will also tell us that our list of particles really should be an array, or at least a single pooled heap of memory.  Because we understand computer architecture and know how the cache works.

Our knowledge of course also helps us to come up with solution to our problems.  We know that the C++ inline keyword is a hint to the compiler, not a requirement.  But we also know that most compilers have an extension, like forceinline.  We know, upon further investigation, that our AI’s collision detection shouldn’t be using a naive O(n^2) comparison loop – there are better algorithms.

Simply put, the more we know the better we can optimize.  Like most parts of programming, optimization is a skill and we get better at it the more we practice it.  The best way to improve our optimization knowledge is to continually learn, study and apply our knowledge.

Intuition

Knowledge is certainly important, but often we don’t really know the answer.  Sometimes we don’t even know where to look.  Our timing data doesn’t have any large time sinks, we don’t recognize any slow areas that we’ve encountered before.  How do we decide where to start optimizing?

This is the when we rely on intuition.  After working with a code base, or a specific hardware architecture, you start to get a feel for performance.  When you look at a profile of the game, certain times will stick out.  7% for AI updates – that feels fine.  12% for setting material parameters – hmm, that’s odd.

Our intuition won’t always be right – but it gives us a starting point, a place to look deeper.  Our intuition, tempered with experience, can be very accurate.  People are very good at recognizing patterns, and our intuition will often guide us to find the gap in the sequence or bump that sticks out.  Trust your intuition to guide you.

However, intuition must be guided by our methodology.  The core of optimization is still a science: measure, experiment, measure.  If we let our intuition run wild we’ll be in trouble, but if we allow it to guide us we’ll be stronger for it.

Process of Elimination

Once we’ve isolated a piece of code we want to make faster, we need to figure out how to make it faster, and also we need a reasonable estimate of how much time a process should take.  In general the performance requirements, and thus the CPU usage, are determined by the following in order of impact:

  1. Game Design
  2. Architecture
  3. Algorithms
  4. Data Structures
  5. Code Implementation

The most important influence on performance is the game design. This should be obvious, but often people forget that one way to reach your performance goal is to alter the design.  Too many characters on screen?  Put less characters in the level.  Of course you need to be careful, altering the design has drastic effects no just on the programmer but everyone on the project.  This also means you must design your code to support the game design.  The game design requires 100 characters independently animated on screen?  Well, you need to design the code for that, assuming it’s even possible.

The design also lets you get a feel for how much time parts of the game require.  If you’re working on an RTS with hundreds of units active at a time, spending 50% of your CPU time updating them, that may be reasonable.  However, if your spending 50% updating characters in your FPS with 5 characters on screen, you probably doing something wrong.

Architecture has the next largest influence.  I’m not really distinguishing between the hardware architecture and the software architecture.  Both can have equal influence and the software architecture, if well designed, will match the hardware architecture.  This layer deals with how the game makes use of threads, the GPU and other system-wide resources.  Is the game single threaded? Multiple single-task threads?  Job based parallelism?

These architectural choices will limit how you can solve performance problems.  Single processor hardware?  Well, you can’t speed up by threading your game code.  Processors have limited memory, like the PS3’s SPUs?  You might need to avoid algorithms that rely on large data tables, or random access to memory.  In many ways your architecture, both hardware and software, will determine which solutions are available to you.

Algorithms are the classic starting point for optimization.  People with a computer science background, or any interest in optimization, should be intimately familiar with Big-O notation and asymptotic complexity.  Also having a wide knowledge of computer science and related fields helps when you’re trying to select an algorithm.  There’s not much to say here, since the performance of various algorithms are generally well know, other than choose an efficient algorithm that fits within your architecture.

Closely following the importance of our algorithm is the data structures we’re working with. Often the data structure goes hand in hand with our algorithm, but not always.  Algorithms usually require a certain data structure to guarantee their run time, but often aren’t terribly particular as to how the structure is implemented.  Take an algorithm needing a binary tree, you could implement the tree as a traditional struct with pointers to two children, or implement the tree as an array.  Both have different performance characteristics and restrictions, you need to choose the one that best fits your algorithm.

Last of all comes the actual code implementation.  This is the piece of code you’ve written – or more specifically the machine code generated from your code.  That’s a very important point to remember, all compilers are not equal, all CPUs are not equal.  An implementation that performs well on one platform may not perform well on another.  Maybe the cache latency is much greater on a new CPU.  Maybe the compiler doesn’t inline properly, or doesn’t pull constants out of loops, or unroll when needed.  There are a whole number of optimizations a compiler can do in theory, but won’t, or can’t for a variety of reasons.

What you think a compiler can do, what the language allows a compiler to do and what a compiler actually does are rarely the same.  To understand how your code performs you really need to look at the generated machine code.  Once you understand the generated code, you can start writing faster code.  This doesn’t mean you start implementing everything in assembly – that’s usually a last resort.  It does mean you need to understand when passing variables by reference, value or pointer matters.  It also means you should investigate how well you compiler understands aliasing and the restrict keyword.

Exceptions

There are of course exceptions to the above ranking.  Sometimes the code implementation will have more influence than the algorithm – this can often be the case when the data set is small.  Similarly, a slower algorithm, say O(n^2) that has a space requirement of O(n) may perform better than a O(nlogn) algorithm with a space requirement of O(2n), simply because it’s much more cache efficient.  Sure, we know from O-notation that eventually the O(nlogn) will be faster, but given the speed of memory on today’s consoles it may take a very large data set before gets faster.

Methodology Revisited

Now that we have a better understanding of where we can make changes to improve performance lets revise our process.

This isn’t very different than our first diagram, we just opened up the “Make a Change” box.  Now we try to improve the algorithm, if we can.  Then improve our implemented code, if we can.  Where finally we need to decide if what we’re doing is even practical.

Of course these decisions are rules of thumb, we may decide a feature is impractical first, or that we need new architecture.  But the above is a pretty good process to follow most of the time, as long as we’re following the core measure, change, measure process.

Summary

So we now have a methodology for approaching game optimization.  A lot of what we discussed may seem somewhat vague or even philosophical, but that’s because we’re discussing very general approaches to a very general problem.  It’s difficult to get into the specifics, especially in optimization where the specifics are very important.

In the future we’ll look at more specific, yet still fairily general, patterns that often emerge when optimizing and we can use to help in many situations.

Advertisements

Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Create a free website or blog at WordPress.com.

%d bloggers like this: