Jun 18, 2012
How to use a profiler

All of us programmers have at some point tried to speed up a large program. We remember "measure before optimizing" and profile it, and end up (a few hours later) with something intimidating like this and… what next? If you're like me, you scratch your head at the prospect of optimizing StringAppend, and the call-graph seems to tell you what you already know: Your program spends most of its time in the main loop, divided between the main subtasks.

I used to imagine the optimization process like this:

1. Run a profiler
2. Select a hot spot
3. ...
4. Speedup!

But the details were hazy. Especially in step 3. Michael Abrash was clearly doing a lot more than this. What was it?

Worse, I kept forgetting to use the profiler. I'd have a split-second idea and blunder around for hours before remembering the wisdom of "measure before optimizing." I was forgetting to measure because I was getting so little out of it, because I'd never learned to do it right.

After a lot of trial and error in the last few months, I think I have a better sense of the process. Optimization is like science. You can't start with experiments. You have to start with a hypothesis. "My program is spending too much time in _." Fill in the blanks, then translate the sentence for a profile. "I expect to see more time spent in function A than B." Then run the profile and check your results. Skip the low-level stuff, look for just A and B in the cumulative charts. Which takes more time? Is one much more of a bottleneck? Keep an eye out for a peer function that you hadn't considered, something that's a sibling of A and B in the call-graph, about the same level of granularity.

Do this enough times and you gain an intuition of what your program is doing, and where it's spending its time.

When you do find a function at a certain level of granularity that seems to be taking too long, it's time to focus on what it does and how it works. This is what people mean when they say, "look for a better algorithm." Can the data structures be better organized from the perspective of this function? Is it being called needlessly? Can we prevent it being called too often? Can we specialize a simpler variant for 90% of the calls?

If none of that generates any ideas, then it's time to bite the bullet and drop down to a lower level.

But remember: optimization is about understanding your program. Begin there, and profiling and other tools will come to hand more naturally.


Comments gratefully appreciated. Please send them to me by any method of your choice and I'll include them here.

RSS (?)
twtxt (?)
Station (?)