Performing Iterative Processing
Sunayana Pati
Senior Associate @ PwC || Snowflake || Databricks || Power BI || dbt || MS SQL Server
Many programming tasks require that blocks of code be run more than once. SAS provides several ways to accomplish this. This article covers DO groups and DO loops.
DO Groups
To demonstrate a DO group, we start with the data file grade.txt containing some information on students: their age, gender, midterm grade, quiz grade, and final exam grade.
You want to read values from this file and compute two new variables—age group (Agegrp) and a value (Grade) computed from the midterm and final exam grades. If the age is less than or equal to 39, you want to set Agegrp equal to the Younger group and the grade to be computed as a weighted average of the midterm grade (40%) and the final exam grade (60%). If the age is greater than 39, you want to set Agegrp equal to the Older group and compute the grade as a simple average of the midterm and final exam grades.??We can achieve this with IF-ELSE statements, but let's use DO and END statements to make the code more efficient and easier to read:?
data grades;?
length Gender $ 1?
? ? ? ?Quiz $ 2?
? ? ? ?AgeGrp $ 13;?
infile '/home/u41108928/Ron Cody Data/grades.txt' missover;?
input Age Gender Midterm Quiz FinalExam;?
if missing(Age) then delete;?
if Age le 39 then do;?
?Agegrp = 'Younger group';?
?Grade = .4*Midterm + .6*FinalExam;?
end;?
else if Age gt 39 then do;?
?Agegrp = 'Older group';?
?Grade = (Midterm + FinalExam)/2;?
end;?
run;?
title "Listing of GRADES";?
proc print data=grades noobs;?
run;
All the statements between DO and END form a DO group. When the IF condition is true, all the statements in the DO group execute. A good way to think of this structure is “If the condition is true, do the following statements until you reach the end.” It is standard practice to indent all the statements in the DO group as shown here.?The DO group coding is not only more efficient than multiple IF statements, but it is also easier to read.
The Sum Statement?
There are two primary uses for a sum statement: one is to accumulate totals such as a month-to-date total and the other is to create a counter—a variable that is incremented by a fixed amount on each iteration of the DATA step.
Suppose you have a data set with one observation for each day of the week, and you want a program that will read in these values and compute a cumulative sum. The following program creates a test data set (Revenue) and attempts to create the cumulative sum.
data revenue;?
input Day : $3.?
Revenue : dollar6.;?
Total + Revenue;?
format Revenue Total dollar8.;?
datalines;?
?Mon $1,000?
?Tue $1,500?
?Wed .?
?Thu $2,000?
?Fri $3,000?
;
run;
title 'Using a sum statement to create a cumulative total';
proc print data= revenue;
run;
Here is the output:
A sum statement takes the following form: variable + increment;
Notice there is a plus sign and no equal sign in this statement. That’s what identifies this as a sum statement to SAS. This statement does the following:
Another very common use of a sum statement is to create counters, for example:?
data counter;?
input x;?
if missing(x) then MissCounter + 1;?
datalines;?
?2?
?.?
?7?
?.?
;
run;
title 'Using a sum statement to create a counter';
proc print data= counter;
run;
As you can see from the output, MissCounter is counting the number of missing values for x.?
The Iterative DO Loop?
Although it is useful to execute a group of codes when a condition is true, there are times when you would like to execute a group of SAS statements multiple times.
You want to compute the total amount of money you will have if you start with $100 and invest it at a 3.75% interest rate for 3 years. (Yes, there are formulas for compound interest as well as SAS functions, but this makes for a good example.)
The OUTPUT statement is an instruction for SAS to write out an observation to the output data set. An output usually occurs automatically at the bottom of the DATA step. But here, you want to output an observation each time you compute a new Total. When you include an OUTPUT statement anywhere in a DATA step, SAS does not execute an automatic output at the bottom of the DATA step.?
data compound;?
Interest = .0375;?
Total = 100;?
do Year = 1 to 3;?
?Total + Interest*Total;?
?output;?
end;?
format Total dollar10.2;?
run;
title 'Total amount every year';
proc print data= compound;
run;
When this program executes, Year is first set to 1, the lower limit in the iterative DO range. All the statements up to the END statement are executed and Year is incremented by 1 (the default increment value). SAS then tests if the new value of the Year is between the lower and the upper limit (the value after the keyword TO). If it is, the statements in the DO group execute again; if not, the program continues at the first line following the END statement.?
Here is the output:
领英推荐
One form of an iterative DO statement follows:
do index-variable = start to stop by increment;
If you leave off the increment, it defaults to 1.?
Another example to understand the DO loop: Suppose you want to generate a table of the integers from 1 to 10, along with their squares and square roots. An iterative DO loop makes simple work of this, as follows:?
data table;?
do n = 1 to 10;?
?Square = n*n;?
?SquareRoot = sqrt(n);?
?output;?
end;?
run;?
title "Table of Squares and Square Roots";?
proc print data=table noobs;?
run;
Notice that this program does not have any input data. It generates the value of n in the DO loop, computes the squares and square roots (SQRT is a square root function—it returns the square root of its argument), and outputs an observation to the data set. This continues for all the values from 1 to 10. Here is the output:?
What if you want a table where n has values of 0, 10, 20, 30, and so forth up to 100. The following DO statement does the trick:
do n = 0 to 100 by 10;
DO loops can also count backward. For example, the following statement produces values of 10, 8, 6, 4, and 2 for the index variable:
do index = 10 to 1 by -2;
When you use a negative increment value, the index value is decremented by this value for each iteration of the DO loop. When the value of the index variable is less than the stop value, the loop stops.
Here is another example. You have an equation relating X and Y and want a graph of Y versus X for values of X from –10 to +10. Again, an iterative DO loop makes this very easy:?
data equation;?
do X = -10 to 10 by .01;?
? ?Y = (2*X**3)-(5*X**2)+(15*X)-8;?
?output;?
end;?
run;?
symbol value=none interpol=sm width=2;?
title "Plot of Y versus X";
?
proc gplot data=equation;?
?plot Y * X;?
run;
The DO loop starts X at –10 and increments the value by .01 until X reaches +10. The OUTPUT statement inside the loop writes an observation out to the data set for each iteration of the loop.?PROC GPLOT is used to plot the line.
Other Forms of an Iterative DO Loop
SAS provides several other methods of specifying how a DO loop operates. You can provide a list of numeric or character values following the index variable. Here are some examples:
do x = 1,2,5,10; (values of x are: 1, 2, 5, and 10)
do month = 'Jan','Feb','Mar'; (values of month are: 'Jan', 'Feb', and 'Mar')
do n = 1,3, 5 to 9 by 2, 100 to 200 by 50; (values of n are: 1, 3, 5, 7, 9, 100, 150, and 200)??
If you use character values in the DO statement, the length of the first character value determines the storage length of the index variable. Therefore, you may need to use a LENGTH statement to set the storage length for the index variable. Using character values for DO loop indices can be especially useful, as illustrated by this example.?
data easyway;?
do Group = 'Placebo','Active';?
do Subj = 1 to 5;?
?input Score @;?
?output;?
end;?
end;?
datalines;?
?250 222 230 210 199?
?166 183 123 129 234?
;
run;
title 'Nesting of Loops';
proc print data= easyway;
run;
This program demonstrates two things: first, you can use character values in a DO loop, and second, you can nest one DO loop inside another. The outer loop first sets Group equal to Placebo.
Next, the inner loop iterates five times, reading score values and outputting an observation each time. When the inner loop is finished, control returns to the top of the outer loop where Group is set to Active. Five more?values of Score are read and five observations are written to the SAS data set. Here is a listing of data set Easyway:?
In the next article on Iterative Processing, I discuss DO WHILE statements, and DO UNTIL statements.?
Till then happy learning!
Senior SDET Engineer @ Delta Dental | Selenium | Cypress | Playwright | Java | JavaScript | TypeScript | BDD Cucumber | AWS | Azure | MySQL | RestAssured | Postman | Appium | JMeter
3 年Please also share daily posts of Python ????