The Case of the Visual Studio Profiler
Sherlock Holmes movie publicity still circa 1930's - 1940's, Basil Rathbone as Sherlock.

The Case of the Visual Studio Profiler

After church, I took to working on performance while listening to my beloved Browns and Eagles inflict their own particular brands of football misery upon me. As I mentioned last night, I was going to fire up the profiler in Visual Studio Professional and have a look.

Visual Studio's C++ profiler can be readily invoked from the Debug menu. Once invoked, you have some choices. I have been looking at I/O and CPU.


Profiler options

Visual Studio will run your application, but it will be slower as it is so instrumented. But once done, you can check your I/O.


Physical I/O performannce

What this graph says is, that from an i/o perspective, I have yet to flush or do physical writes at all. So, any performance problems at this point are on my end.

Now we can check our CPU tab. You get a tree of function and method calls. The selected node in the tree gives you a function breakdown. Now, we can look at this a bit, and think, gosh we need to make "compare_node" faster and "peek.read" faster, and we do, making the same kind of observations all up and down the stack of calls.

But, here is where we must observe, when we see. That view is simplistic.

The most important question when tuning isn't "how do I make this go faster", but rather, "do I need to call this at all".

What that means is:

You might be calling something too frequently, when you don't have to, assuming that because there's some caching or something, it will be "fast". Of course it is fast, but it is still work, and it is always better to do something that needs to be once, just once.

"And look what we have here, Mr. Watson!"

If we go up to the find_node, that calls our find_advance, we find this.


Looking at caller of find_advance.

Here, we notice that find_advance returns a pointer to a node, in relative_ptr_type, nl. If we observed, rather than just noticed our previous details of find_advance, we might remember that we actually did a read in there as well.

Are we reading the same node twice!

We are! Look again! There it is. We read the node at location t, and then we return t. We don't need to do that read in our caller, at all, if find_advance reads a node passed to it.


Witness the only read we need here.

I'll do some changes here, and check this. But always, when profiling, check your algorithm first!

As for me, doing a bit of cleanup on the duplicate reads has increased performance significantly, up to around 3000 nodes a second, bursting at 5000 nodes a second, and that's double with where I was on Friday.


Improved performance.




要查看或添加评论,请登录

社区洞察

其他会员也浏览了