Flap
During playtesting of Flap it became obvious that Flap had some issues. Random glitches would occur in the framerate.
Steve from StevePro Studios suggested that I investigate garbage collection. He also pointed me towards two excellent articles that he wrote. I also investigated a few others. See the links on the right hand side.
To determine if there are conventional memory leaks simply watch what the process does over time. However, creating temporary strings, managed objects or boxed values will not lead to memory leaks but they will lead to garbage collection events occuring.
Other useful resources include a good presentation ny Shawn
hargreaves on the subject of how managed code runs on the Xbox.
http://www.microsoft.com/downloads/en/details.aspx?FamilyId=B11AD912-4158-44CC-A771-A5E044F7E3BB&displaylang=en
I also particularly like this simple description of how to use idlasm to locate boxing and unboxing. Some of them can be a bit hard to spot.
With XNA Remote Performance Monitor I can see that the main "start menu"
"press start" is creating lots of strings and boxed values.
Ok, lets dig into what I'm doing wrong.
The xbox remote performance monitor is very useful because you can
see the GC work occuring. As the collections occur I get glitching
on the framerate. Its very small and subtle but it is definately
there.
Build, deploy, start the XNA framework performance monitor, start
flap.
Flap starts and then "sits" on the start menu (i.e the menu where it
scans for the controller to bind player the to).
The delta column is the important one. I have 200 managed strings
being allocated, 550 managed objects allocated and 200 boxed value
types.
Woah. this screen just sits there and displays a string such as
"Press A"
Every few seconds the GC will come along and clean up. Each time it
does that it collects about a 1MB of memory. This is not good.
If I display the screen with a highscore table (10 rows with format
number, name, number) I have:
3400 strings, 6350 managed objects and 700 boxed value types!
That is a large scale problem. At that level of allocatios the GC
kicks in very frequently. This is not a bad place to start
investigations though. It is a well defined "local" problem - i.e. I
can make changes in one isolated location that I can directly
measure.
The highScore table has a CompleteScoreRecord that in turn has three
List<ScoreRecord>.
CompleteScorerecord and Scorerecord are both value types as they are
defined as structs.
Ahh. I bet we are boxing and unboxing quite a bit here.
Ok, simple change; just make them both classes.
Run it again, measure. What effect has that had?
Very little!
That wasn't it - ok, it is almost certainly the strings and
string.format function.
Should I carry on guessing or should in fact use ildasm to work it
out?
Ok, I know I build the string every single time I render the frame. I
do this because to be honest performance is not an issue in this bit
and it doesn't cause problems. I could use the same argument to not
bother fixing it but this is a good localised problem so I can gain
a few insights here that i can then apply elsewhere.
My score Record looks like this:
public class ScoreRecord
{
public string Name;
public int Score;
public int Level;
}
This was a dumb struct (which I was using in a C++ sort of way even
though C# is very different to C++ treatment of structs).
On screen this is displayed as [int] position [string] Name [int] score. The posiiton and name are left aligned and the score is right aligned.
If I add a data member to cache the display line instead of
building it everyframe:
public class ScoreRecord
{
public string Name;
public int Score;
public int Level;
[XmlIgnore]
public string displayLine;
}
During construction or allocation of a new score record I shall
pre-build the displayLine. I won't persist it so I mark it as
ignored. The displayLine will be a combination of the position and
the player's tag.
After making this change and remeasuring:
The managed string count is now down to 1900 and the number of boxed
value types is down to 200. (which are probably attributable to the
rest of the screen).
To work out exactly which are ours I shall tweak the menu settings
to remove the highscore display from this menu.
All of flap is data driven so it is a small change in an XML file.
Ok base numbers for the menu screen without a high score table are 1400 for the strings and 200 for the boxed
values. We can conclude exactly 500 string allocations are down to
the high score table. How can make those other 500 string allocations
go away? I suspect pre-rolling the other string work we do will be
enough.
Final changes: Preroll the invariant culture score and through in
the size calculation (to work out positioning) and the whole score
record should now be rendered off cached data with no "on the fly"
string creation.
Great, re-examination shows that the string allocations and boxed
value types is back to 1400 and 200. This is the result of "other
things" going on.
My menus built on the fly every frame are a bit pants but to be
honest but also not an issue that will lead to glitches. Lets go to
work on those now as they are much more interesting!
If I examine the main rendering loop of the game in the remote
performance monitor first to see what we are aiming for and "how bad
is it".
Managed string objects 150, managed objects 1200 and boxed value
types 200. Remember these are only the deltas! The values can all
rise dramatically during game play but I think that is the steady
state. Time to see if we can eliminate all of those.
The score is formatted everyframe and rendered. I shall keep a
pre-formatted string that is updated when the score changes and use
that.
internal void setScore( int n )
{
_score = n;
_scoreString=string.Format("Score: {0}", n );
}
And that's all it needed to get rid of the boxed value types and the
managed strings.
There is the odd one that you can see pop in but not the 150/200
that we were seeing.
More worryingly though the managed object count is 4000 or so per
delta. What is going on there then - because those are still leading
to GC events firing? These are explicit new calls that I am making.
The number is directly proportional to the number of actors on
screen. What do actors do every frame? They are rendered. Time to
examine the actor rendering code by eye first and then maybe use
some of the other tools at our disposal.
That many calls must be something to do with the main actor class -
especially as the number of allocations rises smartly as the amount
going on screen also rises.
What's in the main render function?
If there is a Effect for this actor then use the effect and set up
parameters. It could be that but the effects are all
shared off a singleton manager class.
Then the actor itself is drawn with a simple Spritebatch.Draw call.
The other thng that actors do everyframe is get updated. What's in the update method of an actor then?
public override void Update(GameTime gameTime)
{
removeChildren();
updateSourceRectangle(gameTime);
if (!Engine.IsDisplayingMenu() && _isAlive)
{
if (isUserControlled())
{
checkForInput(gameTime);
}
determineVelocity(gameTime);
_position += Vector2.Multiply( _velocity,(float)
gameTime.ElapsedGameTime.TotalSeconds);
checkAndConstrainPosition();
updateVisibility();
updateParticleEmitters(gameTime);
updateAnimations(gameTime);
updateArrowPorts(gameTime);
base.Update(gameTime);
}
}
Start at the top and work down. The removeChildren() fucntion will never be called on early levels. It only applies when knights position on walls or the bosses. This doesn't occur on level 1.
The next line is the updateSourceRectangle().
Ah ha. A possible culprit.
Most of the actors in the game use Sprite sheets. A single texture
with lots of pictures on. This cuts down on texture switches and
other expensive render state changes.
Each Actor has a source rectangle property that has, not
surprisingly, where to get the source rectangle on the sprite sheet.
Everytime time the animation is due for an update the rectangle is
moved by replacing it with a new rectangle.
SourceRectangle = new Rectangle(spriteColumn * width, spriteLine *
height, width, height);
Ok, what happens if we replace that with a simple set of assignments
instead?
Not great. Very little difference. What now?
determineVelocity is a possible. It has a number of overrides in most
derived types.
Lets look at the arrow class:
public override void determineVelocity(GameTime gt)
{
Vector2 v = Velocity;
switch (_gravity)
{
case GravityAffected.FALLS_TO_EARTH:
v += new Vector2(0f, 0.375f * Actor.SCALE_FACTOR *
(float)gt.ElapsedGameTime.TotalSeconds);
Velocity = v;
break;
... other details elided
}
}
Ah. Well that's not required. The code is written as conceptually
that's what we want to do. Lets refactor out that new statement.
The Velocity is a property that wraps a member variable of the base
class.
We have a double assignment and the new call. Vector2 is a struct
hence lots of copying going on here.
Move the declaration of member variable velocity to be
protected. This is rubbish and really we should have a by ref
setters. We'll go back and add that in a minute.
That had a tiny impact.
We're starting to struggle a little. Time for the profiler.
The CLR profile can profile allocations. Simply start the profiler
and ask it to start flap. Play through a level or so and see what we
have.
The number of allocations rather than the actual size of memory is
important here.
Most of the raw memory is spent in texture allocations.
But if I look at the "histogram of allocations" I can see what has
been allocating the most.
Of 400,000 total instances, 124,000 are Vector3, 62,000 lists,
62,000 PossibleLight collections and 62,000 LightValues.
Ah hah. The real time lighting that is used on many sprites is
starting to look a bit guilty! Conclusion. I should have used the
profiler earlier. This is a truism when profilers are involved and
one I should have remembered. Don't guess; measure properly with the
correct tools and don't assume you know what is really going on.
The function is quite simple - every frame, for every possible scene
light it determines which three are lighting the actor. It sorts and
allocates local collections for the actor.
A couple of strategies spring to mind:
The light list should belong to the actor, be passed in by reference
and reset each frame rather than a new one been allocated
constantly.
That's the first place to start.
The maximum lighting to be applied to a single actor is three
lights. I'm newing in a List of three lights each time rather than
assigning in.
These two changes alone may be enough.
Each actor now has an instance of LightValues which is passed by
reference to the lighting engine (which works out which scene lights
are in effect for any given actor).
The Vectors are initialised up front and then assigned into. So we
still have copy overheads going on but many less "news".
Lets see if we can see the effects in the XNA Framework Remote
performance Monitor.
Wow. That's a significant impact. We've just cut it in half. We're
down to 2000 or so managed object allocations per delta or less when
little going on the screen.
Ok, back to the profiler. Find and then knock off the next guilty party!
I run level one, play it properly and activate as many spells as
possible.
Rerun the allocation profile and, again, look at the histogram of
allocations first. This will tell us what types of things are being
allocated and for how much memory.
I have 360,000 total allocations.
149,995 are particles. Ah ha. This was one of my "if I had to bet"
candidates even though the particles are designed to be reused and
not many should be allocated dynamically. Well that obviousy isn't
working quite as expected!
Work out what the issue is and then come up with an appropriate
strategy. Perhaps offloading to the GPU with a GPU based particle
system would be best but then again that may be overegging it?
Right click on the allocation you are interested in and "show whos
allocated".
90% of these allocations are the "Dragon::fireDragonsFire" method
that launches the rocket powered bombs horizontally.
Ok, that should be fixable pretty easily.
The bombs are reusable. They are allocated "upfront" and then there
is a list of active and inactive bombs (for when you need the next
bomb). This is to prevent "newing" the bombs. This bit of the design
works perfectly.
The small change is that each bomb needs to *keep* its particle
systems and simply turn them on and off. What happens at the moment
is the bomb changes its behaviour at firing time by adding dynamic
particle systems.
Popping back into the performance monitor and see what impact that
has had. I'm hoping that this will also have fixed some of the last
few string allocations as well (as the particle system, like many
other items in the game, are data driven with XML files and the XML
no longer requires deserialisation - which in turn builds strings).
That has cut it in half again. The managed object allocations is now
down to less than a 1000. We're really moving in the right direction
now.
Whilst we're at it lets do the same for arrows and also the smoke
damage. I know they work in the same fashion and don't really
require a profile to see it.
That's now cut it down even further with a few hundred. Back to the
profiler to work out where to go next.
We're now down to the PossibleLight function that builds sorted
lists on the fly. we can do better there.
The list that holds the sorted set of lights is newed for each actor
per frame. No need for that as we're not holding for deferred
lighting - it can be one common list used by everybody.
There are also some Vector assignments that are using new vectors to
go from vector2 to vector3 - those can be re-engineered as
assignment operators.
The steady state on a frame is now down to a small handful of
assignments. This is from a starting point of a few thousand. A
couple of easy to use tools has given enough clues to solve the
glitching frames.
A final run through on the xbox and what do we have left?
a few tens to a few hundred when it is particularly busy. The
allocations are becoming harder to solve. Now the profiler needs to
be activated just for periods and the application terminated. This
"snapshot" of allocations removes all of the level load allocations
and other issues that we simply aren't interested in.