Tracking Progress in R
@goumbik Lukas Blazek - Via Unsplash

Tracking Progress in R

It sure does seem like "a watched pot never boils" when waiting for loops or mapped functions to complete many iterations. Adding a progress bar or other indicator to your code can give the user some peace of mind - making the wait seem more reasonable.

Clear expectations = Happy users

With analyses that are frequently repeated, the input data can vary in size. This means that we might not be able to predict how long a particular manipulation may take. Similarly, when writing new code, it is nice to get an indication of the expected processing time of a loop or function call.

Seeing a line of code processing but not having any measure of progress makes time slow... to... a... crawl.

Let's take a simple for loop, where we run through every row of a matrix. For the sake of simplicity, we will just have our action be a short pause for each iteration:

m <- 1:150 # Possible lengths the data might have
data <- matrix(rnorm(sample(m, 1), 1, .5)) # Random data matrix


for(i in 1:dim(data)[1]){
  Sys.sleep(0.2) # the action
}

Depending on how many rows the data has at any time and how much of a lift the action is, this can take longer or shorter. We have no way to see what iteration we are in or how much remains as the code runs.

Option 1: cat

cat() can be used to print from within a loop (Look here or here if not familiar with cat). This can let us know many things, such as the current contents of a variable, the number of the iteration, or even the percent completion.

Printing a dot (.) each iteration gives you a sign how quickly your code is running and that it is working, but does not say how much time is left. However, this can be useful if you print n item each iteration and want to see the progress for that specific variable, file, etc.:

for(i in 1:dim(data)[1]){
   cat("Now working on", i, ".")
   for(j in 1:10){
      Sys.sleep(0.2) # the action
      cat(".")
   }
}


Option 2: cat with modulus

Using a modulus (explanation here) will let us print our progress every 10 iterations instead of each time. If you have a large number of iterations, try every 100 or every 1000 iterations.

for(i in 1:dim(data)[1]){
  if(i %% 10==0) {cat(round((i/dim(data)[1])*100, digits=0), "% completed...")} 
  Sys.sleep(0.2) # the action
}
cat("Done!")


Option 3: progress bar

Progress bars (available in base R as shown here) can be used with or without a modulus and will grow in size as progress is made. The percent of completion shows at the right side of the bar.

The bar must be initialized with a min and max for the size of the chunks (a larger number means more small additions to the bar as it loads), and comes in a 3 styles.

We control the progress of the bar by setTxtProgressBar() in the loop. The first item is the bar you created, the second how you are counting your iterations (in this case, i).

# Number of iterations
imax<-c(10)
# Initiate the bar
bar <- txtProgressBar(min = 0, max = imax, style = 3)


for(i in 1:dim(data)[1]){
  Sys.sleep(0.2) # the action
   # Update the progress bar
   setTxtProgressBar(bar, i)
}

Where does it belong?

Place your progress bar, cat(), or other indicator at a spot in your loop where it will be progressed once for each time the main action is run.

Most of the time, this will be at the end of the outer loop, before the closing bracket. In the case of our cat statement where we wanted to add dots to each thing that was done to an item in the main loop, we placed the progress indicator in a nested loop. Play around with placement to get progress displayed in the way that is most meaningful to your situation!


HAPPY PROGRAMMING!

No alt text provided for this image


要查看或添加评论,请登录

Samantha Bell的更多文章

社区洞察

其他会员也浏览了