SAS Programming
Darshika Srivastava
Associate Project Manager @ HuQuo | MBA,Amity Business School
Fundamentals Of SAS Programming
SAS Windows
Large organisations and training institutes prefer using SAS Windows. SAS Windows has a lot of utilities that help reduce the time required to write codes.
The following image shows the different parts?of SAS Windows.
Log Window: It is?an execution window. Here, you can check the execution of your program. It also displays errors, warnings and notes.?
Code Window:?This window is also known as editor window. Consider it as a blank paper or a notepad, where you can write your SAS code.
Output Window: As the name suggests, this window displays the output of the program/ code which you write in the editor.
Result Window: It is an index that list all the outputs of programs that are run in one session. Since it holds the results of a particular session, if you close the software and restart it, the result window will be empty.
Explore Window: It holds the list of all the libraries in the system. You can also browse the system supported files here.
A few organisations use Linux, however, with no graphical user interface you have to write code for every query. Hence it is inconvenient to use.?
SAS Data Sets
SAS data sets are called as data files. Data files constitute of rows and columns. Rows hold observations and columns hold?Variable names.
SAS?Variables
SAS has two types of variables:
Numeric variables:?This is the default variable type. These variables are used in mathematical expressions.
Character variables:?Character variables are used for values that are not used in mathematical expressions.
They are treated as text or strings. A variable becomes a character variable by adding a?‘$’ sign?at the end of the variable name.
SAS Libraries
SAS library is a collection of SAS files that are stored in the same folder or directory on your computer.?
Temporary Library: In this library, the data set gets deleted when the SAS session ends.
Permanent Library: Data sets are saved permanently. Hence, they are available across sessions.
Users can also create or define a new library known as user defined libraries by using the keyword?LIBNAME. These are also permanent libraries.
SAS Programming: SAS Code Structure?
SAS programming is based on two building blocks:
DATA?Step: The DATA step creates a SAS data set and then passes the data onto a PROC step
PROC?Step: The PROC step processes the data
?A SAS program should follow below mentioned rules:
Almost every code will begin with either DATA or a PROC Step
Every line of SAS code ends with a semi colon
A SAS code ends with RUN or QUIT?keyword
SAS codes are not case sensitive
You can write a code?across different lines or you can write multiple statements in one line
Now that we have seen a few basic terminologies, let us get started with SAS programming with this basic code:
1
2
3
4
5
6
7
8
9
10
DATA Employee_Info;
input Emp_ID Emp_Name$ Emp_Vertical$;
datalines;
101 Mak SQL
102 Rama SAS
103 Priya Java
104 Karthik Excel
105 Mandeep SAS
;
Run;
In the above code, we created a data set called as Employee_Info. It has three variables, one numeric variable as Emp_Id and two character variables as Emp_Name and Emp_Verticals. The Run command displays the data set in the Output Window.
The image below shows the output of the above mentioned code.
Suppose you want to see?the result in print view, well you can do that by using a PROC PRINT procedure, the rest of the code remains same.
1
2
3
4
5
6
7
8
9
10
11
12
DATA Employee_Info;
input Emp_ID Emp_Name$ Emp_Vertical$;
datalines;
101 Mak SQL
102 Rama SAS
103 Priya Java
104 Karthik Excel
105 Mandeep SAS
;
Run;
PROC PRINT DATA=Employee_Info;
Run;
The image below, shows the output of the above code.
We just created a data set and understood how the PRINT procedure works. Now, let us take the above data set and use it for further programming. Let’s say we want to add employee’s Date of joining to the data set. So we create a variable called as DOJ, give it as input and print the result.
1
2
3
4
5
6
7
8
9
10
11
12
DATA Employee_Info;
input Emp_ID Emp_Name$ Emp_Vertical$ DOJ;
datalines;
101 Mak SQL 18/08/2013
102 Rama SAS 25/06/2015
103 Priya Java 21/02/2010
104 Karthik Excel 19/05/2007
105 Mandeep SAS 11/09/2016
;
Run;
PROC PRINT DATA=Employee_Info;
Run;
The below image shows the output of the above code. It is visible that a variable was created, but the value of DOJ wasn’t printed. Instead, we see dots have replaced the date values.
Why did this happen? Well, DOJ variable is without a suffix ‘$’, that means, by default SAS will read it as a numeric variable. But, the data we entered has a special character?‘/’, hence it does not print the result since it is not purely numeric data. If you check the log window you will see an error message as ‘invalid data for variable DOJ’
Now how do we solve this problem? Well, one way to solve it is by using a suffix ‘$’ for DOJ variable. This will convert DOJ variable to character and you will be able to print date values. Let us make?the changes to the code and see the output.
1
2
3
4
5
6
7
8
9
10
11
12
DATA Employee_Info;
input Emp_ID Emp_Name$ Emp_Vertical$ DOJ$;
datalines;
101 Mak SQL 18/08/2013
102 Rama SAS 25/06/2015
103 Priya Java 21/02/2010
104 Karthik Excel 19/05/2007
105 Mandeep SAS 11/09/2016
;
Run;
PROC PRINT DATA=Employee_Info;
Run;
The output screen will display the following output.
You can?see that the data values are?displayed as dates?by converting DOJ to character. However, this is a temporary solution. Let me?explain it how?
Well, imagine a bank has a similar data set. The data set has account holder details like loan amount, installments,?and?due date for loan installment. Imagine,?the holder has missed his deadline to pay an installment and bank wants to calculate the delay. The bank will have to calculate the difference between the deadline date and the current date.
But, if the bank’s data set has dates in character format, then the bank won’t be able to perform mathematical operations on it. This issue may affect our data set too. So how do we solve this problem?
The next concept will help you overcome this issue.
Informats And Formats In SAS
It is?important that you understand this topic well if you want to be good at SAS programming.?If you can recall, I?mentioned earlier that SAS has two?standard?variable types:
SAS Training and Certification
Explore Curriculum
Numeric
Character
When SAS comes across non standard variables, SAS will throw an error or you won’t get the desired output. To overcome this problem, SAS uses?Informats?and Formats.
Informat
Informats are typically used to read or input data from external files or flat files (like?text files or sequential files). The informat instructs SAS on how to read?data into SAS variables. SAS ?has three types of Informats:?character, numeric, and date/ time. Informats are named according to the following?syntax structure:?
Character Informat: $INFORMATw.
Numeric Informat: INFORMATw.d
Date/ Time Informat: INFORMATw.
The ‘$’ indicates a character informat. INFORMAT refers to the sometimes optional?SAS informat name. The ‘w’ indicates the width (bytes or number of columns) of the?variable. The ‘d’ is used for numeric data to specify the number of digits to the right of?the decimal place. All informats must contain a decimal point(.) so that SAS can
differentiate an informat from a SAS variable.
Let us go back to our previous code and see if Date/ Time Informat helps us. So let’s?change the code accordingly and add a Date Informat to it as follows:
1
2
3
4
5
6
7
8
9
10
11
12
13
DATA Employee_Info;
input Emp_ID Emp_Name$ Emp_Vertical$ DOJ;
INFORMAT DOJ ddmmyy10.;
datalines;
101 Mak SQL 18/08/2013
102 Rama SAS 25/06/2015
103 Priya Java 21/02/2010
104 Karthik Excel 19/05/2007
105 Mandeep SAS 11/09/2016
;
Run;
PROC PRINT DATA=Employee_Info;
Run;
Line number 3 in the code instructs SAS to read in the variable ‘date of joining’ (DOJ) using the date
informat MMDDYYw. For?each date field occupies 10 spaces, the ‘w.’?qualifier is?set to 10.
The output of the code would look like as follows.
The result shows we still don’t have the desired result, instead the DOJ column is holding some numeric values and not the dates we specified. Now, why is that? Well, once a date is read with a date informat, SAS stores the date as a number. That means, it is read as the number of days between the date and January 1, 1960 (For example: 3/15/1994 is stored as 12492).
The reason behind this is that SAS has three separate counters which keep track of dates and time. These date counters started at zero on January 1, 1960. Hence dates before 1/1/1960 have negative values, and any date after has a positive value. Every day at midnight, the date counter is incremented by one.
One story has it that the founders of SAS wanted to use the approximate birth date of the IBM 370 system, and they chose January 1, 1960 as an easy to remember approximation.
Now that you know the reason why the column DOJ displayed those numbers, let us try to solve this problem. To overcome this problem we use Format.
Format
Informats are the instructions for reading data, whereas formats are the instructions used to display or?output data.?Defining a format for a variable is how you tell SAS to display the values in the variable. Formats are?grouped into the same three classes as informats (character, numeric, and date-time) and also always contain a dot.
The general form of a format statement is:
FORMAT variable-name FORMAT-NAME.;
Let us go back to our code having dataset Employee_Info to see if we can display the date correctly using FORMAT command.?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
DATA Employee_Info;
input Emp_ID Emp_Name$ Emp_Vertical$ DOJ;
INFORMAT DOJ ddmmyy10.;
FORMAT DOJ ddmmyy10.;
datalines;
101 Mak SQL 18/08/2013
102 Rama SAS 25/06/2015
103 Priya Java 21/02/2010
104 Karthik Excel 19/05/2007
105 Mandeep SAS 11/09/2016
;
Run;
PROC PRINT DATA=Employee_Info;
Run;
We have used FORMAT command in line number 4 in?the above code. The following output screen will give us the desired output.g