VSPerfCmd and you!

I love writing fast code. There’s something so deeply satisfying about having something perform well that gives me a rush and I hope the same is true for you, my little smuglords-in-training. Performance will probably continue to be a major theme for a lot of my posts here, so if you at all interested in going fast, please read on. If you like your code shitty and slow, well, I guess go find another article, or eat paste or whatever it is you do, but you should probably read on. Who knows, you might come across some student beauticians who aren’t used to geniuses, you might impress them! Plus, making something fast teaches you how to take advantage of all the resources you’ve got available, and there are plenty of useful things you can learn during that process.

The free lunch is over

In case you didn’t know it by now, you don’t get exponential software speed for free anymore. You can’t just rely on better hardware to make your code run faster. Sorry, your job just got harder. Writing performant code now requires a little something extra during your development cycle, namely, profiling to find your hot-spots. Maybe it’s I/O related, or your CPU isn’t taking advantage of cache, or you have an opportunity to parallelize something, or make it asynchronous, or perhaps you’ve tried to apply parallelism and now you’ve got some kind of thread synchronization happening around a shared lock causing your application to hang. In all of these scenarios (and many more) profiling tools can be invaluable in determining the root cause of the performance loss.

If possible, this should be done early on in the life of the application to establish ongoing metrics, but even for large-scale existing systems it can help to profile your application when debugging to get a general sense of what the performance characteristics of the app are, and whether or not you’ve accidentally introduced a feature which hurts performance.

Performance testing in development.

This is by far the best possible way to do performance testing. You can isolate testing behavior more reliably, and you get the immediate feedback if you correct the bottleneck, and may find that this uncovers other bottlenecks that need fixing also. There are of course plenty of tutorials on MSDN about how to use the performance wizard, here let me google that for you, but if there is one thing my internal smug-sense won’t allow me to do, it’s use a GUI when there is a perfectly good Command line interface available.

Show me the code already! Save your pathetic prose for someone who cares!

Here I’m using powershell to help me automate the repetitive bits of instrumenting and running the profiler. Sampling is good, but if you really want to go deep, I find that just instrumenting the assembly from the start is the best approach. I would highly recommend adding this to your powershell profile for ease of use. This assumes that sn.exe is on your PATH.


function get-assemblies([string] $path, [regex] $expr) {
	# if you havin' regex problems I feel bad for you son
	# I got \d+ problems and matching text ain't one.
	dir $path | where { $expr.IsMatch($_.Name) } | select Name
}

function instrument-binaries ([string] $basePath, $assemblies) {
    $profilerPath = "C:\Program Files (x86)\Microsoft Visual Studio 10.0\Team Tools\Performance Tools"

    if (-not ($env:Path.Contains($profilerPath))) {
    	write-host "appending vs_profiler path to Path environment variable."
        $env:Path += (";" + $profilerPath);
    }

    push-location $basePath

    if (-not (test-path ( where.exe VsInstr.exe ))) {
        write-host "VsInstr.exe is not installed!"
    } else {
        $flag = "is a delay-signed or test-signed assembly"

        foreach ($instr in $assemblies) {
            $path = [System.IO.Path]::Combine($basePath, $instr);

            VsInstr.exe $path /ExcludeSmallFuncs

            $output = (sn.exe -v $path)
            $resign = (($output -match $flag).Count -gt 0)
                
            if ($resign) {
                write-host $instr needs to be resigned...
                # assuming your keyfile is in the project root.
                # adjust accordingly
                sn.exe -Ra "$path" ..\..\your-keyfile-here.snk
            }
        }
    }

    pop-location
}

Invoking the script then might look like this (and could itself be another script in your profile):


$path = "c:\dev\product\bin\debug"
$assemblies = get-assemblies $path "MyCompany.*\.dll"
instrument-assmblies $path $assemblies

That’s it. Your assemblies are now ready to be poked and prodded by the profiler. Let’s see what that might look like.


function begin-profiling ( [string] $session, [string] $report_path ) {
    if (-not (test-path $report_path )) {
        mkdir "$report_path"| out-null
    }

    $env:_NT_SYMBOL_PATH = "srv*C:\mssymbols*http://msdl.microsoft.com/downloads/symbols"
    $profilerPath = "C:\Program Files (x86)\Microsoft Visual Studio 10.0\Team Tools\Performance Tools"

    if (-not ($env:Path.Contains($profilerPath))) {
        $env:Path += (";" + $profilerPath);
    }

    # make sure the profiler isn't already running.
    VsPerfCmd.exe /Status

    if ($LastExitCode -ne 0) {
        $name = $session + [DateTime]::Now.ToString("MM-dd-yyyy-hh-mm")
        $report = $report_path + $name + ".vsp"

        VsPerfCmd.exe /user:Everyone /start:Trace /output:$report /CrossSession

        write-host "Profiling report will be stored in:" $report
    } else {
        write-host "Profiler already running. New session will not be started."
    }
}

I could probably go more in-depth about these arguments, but just trust me when I say that they were arrived upon through much trial and error, and represent my best attempt at a general solution that works for both dev and production profiling. We finally invoke the real profiler VsPerfCmd.exe, setting the output path to our desired report folder. The /user:Everyone is just for convenience in production. You can use this to restrict which users have access to write to the profiler report.


begin-profiling "my_app_startup" "c:\perf\reports"

# trace it all baby!
VSPerfClrEnv /TraceOn
VSPerfClrEnv /TraceGCLife
VSPerfClrEnv /InteractionOn

start-process your.application.here.exe /some /args /maybe:?

Run your test scenario, whatever that looks like. Close your application (the profiler will block if you forget). Then shut down the profilers. This is actually really important. You won’t see any data written to your report until after the process being profiled terminates.


function end-profiling {
    VsPerfCmd.exe /GlobalOff
    VsPerfCmd /Shutdown
    VsPerfClrEnv /off

    write-host "Profilers detached."
}

You did it! /slowclap

You can now double-click your .vsp report (in whichever directory you specified in begin-profiling) and review your findings inside the comfort of Visual Studio. You’ve captured a lot of interesting data I’m sure. That /InteractionOn flag enables Tier-Interaction Profiling, which will keep track of all the ADO.Net invocations executed under the profiler. It sure beats asking a DBA to run a trace on the database eh? You’ve also got CPU metrics, and some default windows performance monitor counters you can examine as well.

Okay, that’s all for now. Next time we’ll talk about production profiling, or whatever else I feel like talking about. Maybe the continuation of that other postYou’ll just have to check back and see. Comments help. If you like this, let me know. If you want hands-on labs, I could probably accommodate that too.

Anyway, until next time.

Yours smugly,
Ross

Ten shocking reasons to stop using LINQ. Some may surprise you.

Okay, the title is a total troll, I just thought it would be funny to create it as link-bait in light of a recent Scott Hanselman article about how these things are the death-throes of the internet. Sorry Scott!

When Microsoft first introduced these Haskell-inspired query operators in the 3.5 release of the .net framework, like so many others, I was ecstatic. I watched every video Erik Meijer put up on channel9, started reading about the functional concepts that underpinned the new features, learned about the compiler-generated state machines and deferred execution, but most prominently I used LINQ everywhere.

I call it the law of the instrument, and it may be formulated as follows:
Give a small boy a hammer, and he will find that everything he encounters needs
pounding. — Abraham Kaplan

This is commonly referred to as the Law of the instrument. It’s a bit cliché, but in this instance it was true, I had one hell of a hammer, and before me all I could see was a vast IEnumerable<Nail> begging to be sorted, and reversed, and selected. A few years and many a mangled and bent nail later I have come to regard Language Integrated Query, much like its functional forebears, as perhaps too powerful for the average programmer. The allure of function composition, the terse yet readable query operators, the syntactic sugar of the more SQL-like declarative syntax are all amazing advances to the language, and the minds behind it are truly remarkable. Sadly, not all advancement is to the betterment of all, and I’m of the opinion that LINQ is one such tool which has been abused for ill.

To be fair, many of these justifications stem from the fact that most .net developers did not do their homework when 3.5 first came out, either out of laziness or because they were trapped in an environment with antiquated tooling, and now the educational content is simply harder to find. It’s not new and exciting anymore, so nobody writes about it.

  1. Seductively easy to not filter and sort at the database level.

This is a huge pet peeve of mine which is why it is at the top of my list. I cannot fathom the laziness that drives a person to write code like this:

    var products = FetchAllTheThings()
    	.Where(item => item.Price > 500.00m)
    	.OrderBy(item => item.Price)
    	.Take(10);
    

Seems harmless enough, but let’s break this down. Firstly, I’m taking it as assumed that FetchAllTheThings() is responsible for reading from the database (possibly over the network) but is not backed by Linq-to-sql or Entity Framework, because that would enable this query to actually be rendered to the database. In this case I’m talking specifically about technologies which are not linq-friendly like direct reads from an ADO.net SqlReader or some homegrown micro-ORM that your best and brightest devs put together 10 years ago to protect the less savvy from harming themselves or others.

So with those basic constraints in mind, let’s take a look at the above query in more detail. The result set is potentially huge, could continue to grow over time, and we have no control over it. We throw away most of the results between the where and take, so all that I/O and ORM work was essentially a complete waste. Worst of all, none of this needs to be expressed as managed code; these are all directly mapped functions of SQL server. I will be the first to admit that t-sql is far from an elegant language, but in the domain of sorting, filtering, projecting, aggregating, and indexing it remains the most efficient way to leverage the power of your relational database. Linq-to-objects abstracts the details away, but this code is non-trivial and took only seconds to write. The performance implications are huge, and the amount of thought that needed to go into writing it was dangerously low.

This pattern of unapologetic disregard for coding decency reminds me a lot of Hedonism Bot from Futurama. In my head, this is how I imagine them.

Yes, bring me all the data, I shall pick what I like and discard the rest.

I trust the memory-orgy heap has been garbage collected and made ready for my next indulgence.

From one million records I need only one. How decadent!

  1. Complex expressions quickly become unreadable.

Repeat after me: “instructions are for processors, code is for humans”. If the next person who comes along after you cannot quickly comprehend what your code is doing then you have done him or her a disservice, and should feel bad about that, unless you are a sociopath, in which case, as always, you feel nothing.

There is something magical about LINQ that takes all rules about formatting and throws them out the window. The generally accepted pattern is to put each query operator on a separate line. This is seen as preferential to the epic one-liners whic happen just as often, but I would argue that both are code smells.

        var seq = setOfThings
                .Where(w => w.IsSpecial || w.HasValue || w.HighPriority)
                .OrderBy(x => x.Price)
                .ThenBy(t => t.Discount)
                .Skip(10)
                .Reverse()
                .Take(50)
                .Select(s =>
                    new
                    {
                        Name = s.Name,
                        Price = s.Price.ToString("D")
                    }).ToList();
    

It’s actually quite hard to produce knowingly ugly example code that gets this message across. It’s like trying to will yourself to spell something incorrectly…

At any rate, as readable and concise as the query operators are by themselves, the composition is something that has quickly spiraled out of control. Imagine how this would be to debug (which I’ll touch on later).

I think a more hideous example is the mixing of the declarative “query syntax” and the extension methods where there is no support for things. Like FirstOrDefault() and ToList(); You get these extra parenthesis that just wrap around your weirdly out of place code haiku as below:


   var seq = (from thing in setOfThings
              where w.IsSpecial || w.HasValue || w.HighPriority
              orderby thing.Price, thing.Discount)
              .Skip(10) // oops, we've run out of query support.
              .Reverse()
              .Take(50)
              .Select(s =>
                    new
                    {
                        Name = s.Name,
                        Price = s.Price.ToString("D")
                    }).ToList();

    

Okay, this post has gotten a bit wordy. I had no idea I was bottling up all this nerd-rage about LINQ. I’m definitely not going to get to all 10 without people losing interest, so I’ll break it up into multiple posts.

Tune in next week for the next exciting installment: “More ways LINQ causes code-tumors.”

Confessions of a code-reviewer: What’s in a name?

There are only two hard things in Computer Science: cache invalidation and naming things.

— Phil Karlton

ᔥ http://martinfowler.com/bliki/TwoHardThings.html

This is an honest plea, free of smugness or sarcasm: Please make your names count. When thinking about what to call something, think about the fact that your code may well outlive you, or at the very least your tenure with your current company, and make it something meaningful enough to stand the test of time. Your fellow developers will thank you, and who knows, you might even find that the life you save is your own. Maybe you aren’t too thrilled with your current team and you’ve decided to take revenge, or your pants don’t fit quite right, or you’ve got a bad marriage, whatever it is that compels you to write a function called fnCalcCurSal just stop; put the keyboard down and read below. Good names improve understanding, and bad names, well, bad names make your code look terrible and people will point and laugh at you. Don’t become an object of ridicule. Rise above, and cast your scorn down upon those who use silly names.

This is a cobbled together list (in no particular order) of official guidance from the Microsoft Framework Design Guidelines and my own personal opinions on what makes a good function name. Feel free to disagree.

Things to absolutely avoid in method names

(things get a bit dicey with variable names because opinions on those are more diverse)

  1. Acronyms
    1. industry standard or otherwise, they really don’t belong in the method name. The most glaring reason for this is that the upper-case characters which make up the acronym break the flow of upper to lower case which makes CamelCase so nicely readable.

      ex:

       
      // not a great name
      StartCPROnTheTestDummy();
      
      

      Here we’ve blended the acronym CPR with the first letter of the word On, and hampered readability, even if only slightly.

  2. Abbreviations or Phonetic Substitutions
    1. This isn’t LOLCODE people, NoAbrvFuncNamsPls KTHXBYE
  3. Hungarian Notation

    Good advice from uncle bob.

    … nowadays HN and other forms of type encoding are simply impediments. They make it harder to change the name or type of a variable, function, member or class. They make it harder to read the code. And they create the possibility that the encoding system will mislead the reader.
    –Robert Martin

    And more here from Linus.

    Encoding the type of a function into the name (so-called Hungarian notation) is brain damaged—the compiler knows the types anyway and can check those, and it only confuses the programmer.
    –Linus Torvalds

    Hungarian Notation is a code smell. Forget you ever used it, and never use it again.

  4. and / or / with
    1. If your functions contain these words, you are probably over-describing what the function is doing, rather than how it should be used, or the reason it should be called.
  5. Implementation specifics

    I haven’t seen this in any publications, at least none that leap to mind, I’m just offering it as good advice if you want your code to remain self-documenting. If your function advertises itself as having certain behavior, and the underlying implementation changes, then the name of the function should change to remain consistent. Knowing this, avoid leaking information which the caller has no business knowing. Encapsulate your ideas and name your functions with a little detail about behavior, but mostly think about the reason someone might call your method, not exactly what it does.

    For instance, you wouldn’t want to choose a name like GetDataFromSqlServer because this includes information that the caller really shouldn’t be privy to. He shouldn’t care where you got the data from. If sometime later you decide to change the source of the data, this method (and all calls to it) would need to be changed. The inverse is almost worse in a way. If the implementation strategy changed but the callers still believed they were getting data from SQL Server that might also be bad. Again, consider why not how.

As programmers, we owe it to ourselves and our fellow geeks to write good code; code we can be proud of. I hope this list has been helpful, and if you have any additional tips I may have missed feel free to comment below.