Reading in the Stacks: Understanding StackProf

StackProf is an excellent tool for diagnosing and finding performance problems in Ruby applications. It’s not Rails specific or even web-app specific, and most importantly, it doesn’t suffer from the probe effect so it can easily be used in production environments.

I’ve been aggressively pushing people to use StackProf for over a year now, specifically through talks and a bit of short writing here and there. This past weekend I got to attend GoRuCo 2015 and saw my friend, neighbor, and general badass Eileen Uchitelle deliver a talk about using StackProf and other tools to diagnose performance problems in Rails core. She did a great job describing the tools and how to use them, but it made me realize that maybe the reason StackProf hasn’t really caught on in the Ruby community is that theres not a lot (read: any) good documentation about how to read or understand the information you get from it. So here I am, aiming to fix that.

Where does StackProf come from?

StackProf was created by Aman Gupta (@tmm1) at GitHub with the goal of creating a low-impact sampling profiler for Ruby. This was inspired by Google’s gperftools which was made popular again recently by Go’s implementation.

This is all possible because Ruby 2.1 shipped with an internal API (rb_profile_frames) that allows StackProf to grab stack samples with minimal CPU and memory overhead.

What is it good for?

StackProf is a sampling profiler with very low overhead which makes it ideal for diagnosing performance problems in production environments. That doesn’t mean its only useful for production, but because its only capturing samples every n milliseconds, its more useful on long running processes then quick bits of code. It doesn’t capture every frame or every call in your code (that’s the core tradeoff: lower resolution for lower impact) so there are some cases where your code is made up of many different quick calls which might escape its clutches.

What is a stack?

Stepping back a bit, when I first presented StackProf to some colleagues they didn’t really grasp how to understand the output because they didn’t understand the concept of a stack in this context. When we talk about “the stack” in ruby we’re talking about the method call stack. Most developers experience this most frequently as error backtraces when we hit an exception. A backtrace is a textual representation of the call stack (in reverse) from where the error took place. When we call a method that calls another method that calls another method, each of those calls in put onto a “stack” and then when the lower most method finishes, it is “popped off” the stack.

StackProf takes samples of the running code as individual “frames” which represent the stack at that given moment. Each frame gives us an ordered list of all the methods and lines of code that are on the stack, from the top most method (in a Rails app, probably your server/Unicorn) to the code on the bottom (the code currently running, like an ActiveRecord call).

StackProf Diagram

Here we can see a simple representation of the stack as time proceeds (left to right) and we descend and ascend the stack (up and down). When we take samples over time and then collect them into a single report, we get a simple way of representing what methods were on the stack the most and therefore taking up the most time (or CPU time) in the period that we were taking samples. 

How do I read the general output?

I created a simple gem called stackprof-remote that provides an easy way to collect dump’s from production systems. Maybe deceivingly, it also includes a simple interactive wrapper (based on pry) for navigating stackprof .dump files called stackprof-cli. Once you have a dump you can get into it via:

$ stackprof-cli last-run.dump

Get the top 5 methods:

stackprof> top 5
  Mode: cpu(1000)
  Samples: 5045 (3.28% miss rate)
  GC: 355 (7.04%)
     TOTAL    (pct)     SAMPLES    (pct)     FRAME
       736  (14.6%)         707  (14.0%)     ActiveSupport::LogSubscriber#start
       379   (7.5%)         379   (7.5%)     block in ActiveRecord::ConnectionAdapters::PostgreSQLAdapter#execute
      5248 (104.0%)         168   (3.3%)     Benchmark#realtime
       282   (5.6%)         117   (2.3%)     ActiveSupport::LogSubscriber#finish
        88   (1.7%)          88   (1.7%)     block (2 levels) in Sass::Importers::Filesystem#find_real_file

At the top of the output is some general info about the dump. We were running StackProf in CPU mode sampling every 1000 frames (the default). We took 5045 samples and 355 of those were during a GC pause and we can extrapolate that 7% of the time was spent doing GC.

Below the general info is a list of methods sorted by the samples column. The first two columns show the number and % of samples that this method was on the stack. The second two columns show the number and % of samples that this method was at the _top_ of the stack. By sorting by the samples column, we're showing the methods that our process was spending the most time in across all the samples. If we sorted by total (possible via total N in the CLI) we’re showing what methods were generally on the stack the most. This is useful too, but often just shows methods that are at the root of the stack (like server and middleware code that wrap everything else).

You can also dive into a specific method and see what the callers and callees (up/down the stack) were in the samples that were captured using method METHOD.

What’s a flame graph?

Flame graphs are a visualization of the same stack sample data popularized by Brendan Gregg. Flame graphs are awesome because they let us have an overview of an entire dump/captured session in a visual way that make it easier to pinpoint hot-points. Flame graphs display time left to right and the stack vertically. The colors in the StackProf Flame graphs are used to group different sources of code (i.e. different gems, controllers, models, etc) and are sorted with darkest red being the code that had the most TOTAL samples. In the example above, which is a profile of a Rails controller action you can see that ActionPack is at the top which makes sense because we’re spending most of our time dispatching/running a controller action. We can see that the stack moves up and down and there are clear sections where different parts of the action are happening. The lowest hanging fruit for optimization here are things that are wide and close to the top. Those are methods that are themselves taking up a lot of time.

Where to next?

If you set up stackprof-remote on your production app, you can make it easy to collect dumps from you’re running prod servers. You can do this at high traffic times, or just during a normal day, and get a good sense of where your app is spending time across actions/requests. This can be immensely useful as it can identify areas of code that are not in a single action, but common between actions that are taking up an unexpected amount of time or CPU. It can be useful locally, too, as Eileen showed in her optimization of the Rails test suite.


QuirkeyBlog is Aaron Quint's perspective on the ongoing adventure of Code, Life, Work and the Web.




QuirkeyBlog is proudly powered by WordPress