登录查看更多内容

R For SysAdmins: Working with SysStat

Kwan Lowe

发布日期: 2015年9月8日

+ 关注

LinkedIn's editor is not brilliant. Please use the Google Docs link below for a better formatted version.

Google Docs Link

R for SysAdmins

Working with sysstat Data

Author: Kwan Lowe

Date: 2015.09.08

The previous introduction on R for SysAdmins received a good response. Several asked about using R for graphing sysstat as I mentioned in the conclusion. This tutorial walks through using R to produce graphs of that data.

Keep in mind that R is not traditionally considered as a language for system administration. Normally we turn to bash, awk, Python or Ruby for many tasks. However, R is particularly well-suited for working with data and is arguably the best tool for the job.

This tutorial will walk through graphing the output from the sysstat/sar utility. There are other tools such as kSar, Cacti, etc. that do similar functions. R, however, can automate some of the analysis functions that are missing in most (all?) the freely-available tools. For example, we can easily run Bayesian analyses on our data to alert on resource trends or run an anomaly detection function to see hotspots in activity. Creating interactive applications for end-users is, though not trivial, facilitated with the many web packages available for R.

First, we generate some data using the sar utility.

sar > sar.out

This generates a file containing this output:

Linux 3.10.0-229.11.1.el7.x86_64 (cerberus.digitalhermit.com)   09/08/2015      _x86_64_        (2 CPU)

12:00:01 AM     CPU     %user     %nice   %system   %iowait    %steal     %idle
12:10:01 AM     all      0.11      0.00      0.17      0.02      0.00     99.70
12:20:01 AM     all      0.08      0.00      0.14      0.01      0.00     99.77
[...]

02:10:01 PM     all      1.17      0.00      0.30      0.06      0.00     98.48
02:20:01 PM     all      0.86      0.00      0.21      0.02      0.00     98.92
Average:        all      0.16      0.00      0.12      0.03      0.00     99.70

There are many options to sar but these are beyond scope of this tutorial.

Now we can start constructing our R script. First is to load in a few libraries to handle dates and graphics. Check the resources section below for more information on these packages.

library(lubridate)

library(lattice)

Next, we wead the input into a raw file. This will facilitate dealing with the non-data lines later. The sar output from my machine contains three lines of header and a final line with averages.

sar_raw <- readLines("sar.out")

The first line of our input is useful to keep. So we pull into its own variable for later use. We use the strsplit() and unlist() functions to parse the character vector (a "string" in normal Linux parlance). We also use the %>% operator to chain inputs, much like the pipe (|) char does on the command line.

sar_title <- sar_raw[1] %>% strsplit("\t") %>% unlist()

From the sar_title header, we can extract some useful bits such as the date of the report. This is important because the date is only kept in the header. To extract the date of the report we use the lubridate::mdy (month-day-year) function which extracts the time from a character string:

report_date <- mdy(sar_title[2])

This allows us to pull the year, month, date, etc. with

year(report_date)

Next, we actually grab the data into a proper data table (the above was just a raw read into a temporary table). This allows some easier manipulation such as dropping the last few lines.

sar_dat <- read.table(text=sar_raw, skip=2, nrows=(length(sar_raw) - 4), header=TRUE )

At this point we can delete the temporary sar_raw table:

rm("sar_raw")

Now we rename the tables to make it look prettier when graphed. There are other ways of labeling but this works in a pinch.

names(sar_dat) <- c("Time", "MM", "CPU", "User", "Nice", "System", "IOWait", "Steal", "Idle")

The main issue with using the names() function is that the input data order may change. Not a big worry here, but keep this in mind if you are reading in other types of data.

To avoid this, you can explicitly rename the columns using the setnames() function.

# data.table package.

# library(data.table)

# setnames(mydata, "newname", "oldname")

Then, add a new column pTime that's a combination of the Time and MM columns. Though the character vector for holding the date/time is perfectly readable for humans, to use it properly in R we need to convert it to a Posix date format. A couple notes:

We use the %I to indicate a 12-hour format. If %H were used, it would assume 24hr format and drop the %p specifier.
By default, the posixct time stamps gets the current date and time. We clean this up by using the date info we saved earlier in report_date.

We use dplyr::mutate to add another column to our data frame that contains the Posix date. We then add another column, pDate, that parses the pTime column into a Posix date and corrects the timestamp to that of the report date.

sar_dat <- within(sar_dat, pTime <- paste(Time, MM))

sar_dat <- mutate(sar_dat, pDate = parse_date_time(pTime, "%I%M%S %p", tz="America/New_York"))

year(sar_dat[,"pDate"]) <- year(report_date)

month(sar_dat[,"pDate"]) <- month(report_date)

day(sar_dat[,"pDate"]) <- day(report_date)

Now, say we're only interested in the CPU information. We can create a new dataframe containing just that information:

sar_cpu <- select(sar_dat, pDate, User, System)

Finally, graph it using the lattice graphics package. Here we plot the User vs pDate with data found in sar_cpu.

xyplot(User ~ pDate,sar_cpu, type=c("l"))

To make it complete, we add a title using the main= parameter, substituting the header from the sar_title we saved earlier.

sar_plot <- xyplot(User ~ pDate,sar_cpu, type=c("l"), main=sar_title[1])

print(sar_plot)

This produces the following graph:

For these examples I deliberately added a few extra steps for clarity. Even so, this is less than 20 lines of code. My actual R-script is under ten lines. It would be difficult to do it as easily in any other language. Even using Excel, which is notoriously difficult to automate, it takes multiple lines of code just to properly format the data.

Anyhow, I hope this whets your appetite for learning more about R.

Resources:

https://www.r-project.org/

https://cran.r-project.org/web/packages/lubridate/lubridate.pdf

https://cran.r-project.org/web/packages/lattice/lattice.pdf

https://cran.r-project.org/web/packages/dplyr/dplyr.pdf

要查看或添加评论，请登录

Kwan Lowe的更多文章

The New Shiny

2023年6月26日

The New Shiny

As many others here have done, I looked at the beginnings of ChatGPT early on. There was a blurb on a research paper, a…

1 条评论
Hammers and Screwdrivers

2022年8月19日

Hammers and Screwdrivers

There's an old adage that says, "If the only tool you have is a hammer, every problem begins to look like a nail." In…

1 条评论
Spotlights and Floodlights

2022年2月6日

Spotlights and Floodlights

There's an old Internet fable about a plumber charging an obscene amount of money for tapping a pipe with a hammer…

2 条评论
OODA Loops Revisited

2020年12月19日

OODA Loops Revisited

A gifted engineer once explained to me the concept of OODA loops. As many of you may know, the OODA loop is a cycle of…
Repurposing Old Hardware

2020年4月21日

Repurposing Old Hardware

Repurposing Old Hardware I'm writing this at 3AM on a Saturday morning in April 2020. Because of COVID-19, we are…

1 条评论
Adventures in Golang

2019年2月18日

Adventures in Golang

Kwan Lowe (February 18, 2019) Over the long President's Day weekend, I decided to learn Go. The Go Programming Language…

7 条评论
Ockham's Razor and IT

2017年3月31日

Ockham's Razor and IT

Ever heard of Ockham's Razor? Of course you have. No, it's not a new gadget that will topple the billion dollar…
Linux Containers with the Cockpit Utility

2016年3月13日

Linux Containers with the Cockpit Utility

Linux Containers with the Cockpit Utility Just thought I'd share what I've been working on over the weekend. Some…

1 条评论
Basic Linear Optimization with Gnu Octave

2015年10月4日

Basic Linear Optimization with Gnu Octave

https://docs.google.
Game Theory in TEOTWAWKI

2015年8月23日

Game Theory in TEOTWAWKI

https://docs.google.

6 条评论

See all articles

R For SysAdmins: Working with SysStat

Kwan Lowe

Kwan Lowe的更多文章

社区洞察

其他会员也浏览了

Dynamic Web Scraping with Python, Pandas and DuckDB

Who governs the open-source project you depend on?

My Streamlit and Gradio Story

Setting Up Your Local Machine for dbt Core: A Comprehensive Guide

Python_FTP_Server_and_Client : Distributed Log File Automated Archive System

Menu-Based System Health Check Script with Email Reports: Automating using Python and Bash

Local RAG Chatbot: Capabilities & Setup Guide

What is Regex?

Improved JDBC driver, stable/latest builds for self-hosted Timeplus Enterprise, and updated health check port

How to Read Emails, Extract OTPs, and URLs Using Python with the IMAP Tools Library

Kwan Lowe的更多文章

The New Shiny

Hammers and Screwdrivers

Spotlights and Floodlights

OODA Loops Revisited

Repurposing Old Hardware

Adventures in Golang

Ockham's Razor and IT

Linux Containers with the Cockpit Utility

Basic Linear Optimization with Gnu Octave

Game Theory in TEOTWAWKI

社区洞察

其他会员也浏览了

Dynamic Web Scraping with Python, Pandas and DuckDB

Who governs the open-source project you depend on?

My Streamlit and Gradio Story

Setting Up Your Local Machine for dbt Core: A Comprehensive Guide

Python_FTP_Server_and_Client : Distributed Log File Automated Archive System

Menu-Based System Health Check Script with Email Reports: Automating using Python and Bash

Local RAG Chatbot: Capabilities & Setup Guide

What is Regex?

Improved JDBC driver, stable/latest builds for self-hosted Timeplus Enterprise, and updated health check port

How to Read Emails, Extract OTPs, and URLs Using Python with the IMAP Tools Library