Tracking Progress in R
Samantha Bell
Veterinary Data Analysis | Dashboards & Reporting | LVT | E-commerce | Bioinformatics
It sure does seem like "a watched pot never boils" when waiting for loops or mapped functions to complete many iterations. Adding a progress bar or other indicator to your code can give the user some peace of mind - making the wait seem more reasonable.
Clear expectations = Happy users
With analyses that are frequently repeated, the input data can vary in size. This means that we might not be able to predict how long a particular manipulation may take. Similarly, when writing new code, it is nice to get an indication of the expected processing time of a loop or function call.
Seeing a line of code processing but not having any measure of progress makes time slow... to... a... crawl.
Let's take a simple for loop, where we run through every row of a matrix. For the sake of simplicity, we will just have our action be a short pause for each iteration:
m <- 1:150 # Possible lengths the data might have data <- matrix(rnorm(sample(m, 1), 1, .5)) # Random data matrix for(i in 1:dim(data)[1]){ Sys.sleep(0.2) # the action }
Depending on how many rows the data has at any time and how much of a lift the action is, this can take longer or shorter. We have no way to see what iteration we are in or how much remains as the code runs.
Option 1: cat
cat() can be used to print from within a loop (Look here or here if not familiar with cat). This can let us know many things, such as the current contents of a variable, the number of the iteration, or even the percent completion.
Printing a dot (.) each iteration gives you a sign how quickly your code is running and that it is working, but does not say how much time is left. However, this can be useful if you print n item each iteration and want to see the progress for that specific variable, file, etc.:
for(i in 1:dim(data)[1]){ cat("Now working on", i, ".") for(j in 1:10){ Sys.sleep(0.2) # the action cat(".") } }
Option 2: cat with modulus
Using a modulus (explanation here) will let us print our progress every 10 iterations instead of each time. If you have a large number of iterations, try every 100 or every 1000 iterations.
for(i in 1:dim(data)[1]){ if(i %% 10==0) {cat(round((i/dim(data)[1])*100, digits=0), "% completed...")} Sys.sleep(0.2) # the action } cat("Done!")
Option 3: progress bar
Progress bars (available in base R as shown here) can be used with or without a modulus and will grow in size as progress is made. The percent of completion shows at the right side of the bar.
The bar must be initialized with a min and max for the size of the chunks (a larger number means more small additions to the bar as it loads), and comes in a 3 styles.
We control the progress of the bar by setTxtProgressBar() in the loop. The first item is the bar you created, the second how you are counting your iterations (in this case, i).
# Number of iterations imax<-c(10) # Initiate the bar bar <- txtProgressBar(min = 0, max = imax, style = 3) for(i in 1:dim(data)[1]){ Sys.sleep(0.2) # the action # Update the progress bar setTxtProgressBar(bar, i) }
Where does it belong?
Place your progress bar, cat(), or other indicator at a spot in your loop where it will be progressed once for each time the main action is run.
Most of the time, this will be at the end of the outer loop, before the closing bracket. In the case of our cat statement where we wanted to add dots to each thing that was done to an item in the main loop, we placed the progress indicator in a nested loop. Play around with placement to get progress displayed in the way that is most meaningful to your situation!