Skip to main content

Performance Analysis with Performance Co-Pilot, iPython and pandas

Introduction

One of many reasons to love Performance Co-Pilot, is the fact that it is a fully fledged framework to do performance analysis. It makes it extremely simple to extend and to build anything on top of it. In this post we shall explore how simple it is to analyze your performance data using iPython and pandas.

panda

Setup

To start we will need some PCP archives which contain some collected metrics from a system. In this post I will use the data I collect on my home firewall and will try to analyze some of the data there in. To learn how to store performance metrics in an archive, take a look at pmlogger and the Quickstart guide. For this example I collected data over the course of a day with a 1 minute interval.

iPython and PCP

First of all you need to import a small python module that bridges PCP and pandas/numpy:

git clone https://github.com/mbaldessari/pcpinteractive.git
cd pcpinteractive

Now let us start our iPython console, import our python module and load our archive:

At this point the data is fully parsed in memory and we can start analyzing it, using all the standard tools like pandas and matplotlib. Let's start by looking at how many metrics are present in the archive:

Pandas and PCP

Now we can get a pandas object out of a metric. Let's take incoming and outgoing network traffic expressed in bytes over time.

We can now graph the data obtained with a simple:

netpcpmatplot

And we can also explore the data with the use of the describe() method, but first let's force the output to be in non-scientific notation as it is more readable for network data:

Manipulate the data

Now let's see what is possible to do in terms of data manipulation: * Drop columns we do not care about:

Or, alternatively:

  • Resample data at lower intervals

  • Filter out all the zero columns

  • Show last element:

  • Select a smaller timeframe:

  • Get one column with:

  • Apply a function on the whole dataframe:

  • Sum all values for each column:

  • Calculate the mean for each column:

  • Find the time of day when the max values are reached

  • Select only the tun0 and eth0 devices:

Merge and group dataframes

Now let's merge the net_in and the net_out dataframes into a single one, in order to try and do some analysis on both traffic directions at the same time.

Another very interesting aspect is the plethora of statistical functions that come for free through the use of pandas. For example, to find covariance() and correlation() we can use the following methods:

We can also group columns like the following:

Calculate the rolling mean of an interface and plot it:

rollingmean

Export data

Save the data in csv file or in excel format:

Other outputs like latex, sql, clipboard, hd5f and more are supported.

Conclusions

The versatility of PCP allows anyone to use many currently available frameworks (numpy, pandas, R, scipy) to analyze and display the collected performance data. There is some work to be done to make this process a bit simpler with an out of the box PCP installation.