How to Analyze a Survey using SAS and Data from Qualtrics:  Fourth in a Series

How to Analyze a Survey using SAS and Data from Qualtrics: Fourth in a Series

Alice Feldesman, Assistant Director, Ph.D.

U.S. Government Accountability Office; 2024

The opinions expressed in this article are my own and do not necessarily reflect the view of the Government Accountability Office or the United States government.

?The fourth in a series of papers addressing how to use SAS to analyze and report on data from survey data. In this paper I describe how to analyze data with SAS that was collected using Qualtrics and uses the SPSS sav file.? Unlike the third paper in this series, this fourth paper simplifies the coding so that cutting and pasting is largely eliminated.

2019 Paper

In 2019 I delivered a presentation at the annual American Society of Public Administration and the SAS Global Conference titled “Surveys:? Finding the Message in the Tables”.? This paper is posted on my LinkedIn page.? In that presentation I made the point that while surveys can generate a lot of tables with numbers, it is often overwhelming to identify the message.? I observed many skilled survey researchers and executives spending days going over tables, sometimes copying numbers into spreadsheets, sometimes making graphs hoping that the message would just jump out. ?It always takes more time than anticipated. In this paper I introduced a technique for collapsing, ordering and color coding the results, and consolidating subgroups so that the message would be more accessible and apparent.

2021 Paper

In 2021I wrote a second paper titled “Sometimes a Good Idea Needs Polishing! “ that improved on this approach which you will also find on my LinkedIn page, This paper addressed 5 problematic areas? These areas were (1) demographic variables have different numbers of levels, (2) response scales have different numbers of levels, (3) sparse code in SAS does not work with proc frequency, (4) using proc means is also not a good alternative, and (5) you also need analyses that include all of the categories for all of the demographics and response scales.? I provided the solution and offered SAS code to demonstrate its implementation.

Adapting the Process for Qualtrics

In 2022 we transitioned from our custom software package called Questionnaire Programming Language (QPL) to the off-the-shelf package Qualtrics to conduct surveys. This transition created a need for a new approach to writing the SAS code to analyze the data. While there are other software packages that can be used to analyze data, my expertise was in SAS and so I needed a SAS solution to get the survey content and the data structure from Qualtrics into SAS.

This paper demonstrates my solution.? While alternative approaches may also prove effective, I have successfully implemented this method across more than 50 surveys. Moreover, I found it straightforward to train inexperienced SAS programmers to utilize this approach effectively. The fundamental principle is if the survey content is in machine readable format and can be exported, then software should convert the content into the appropriate format for the software analysis programming language.? Moreover, since I am repurposing the text that comes from Qualtrics into SAS code, some types of review will not be needed since there is no manual processing.

Qualtrics SAV versus TSV File

Qualtrics has several options for exporting data, one is called SAV which is a SPSS file and one is TSV (a tab delimited type of excel file).? In my earlier paper, I describe a process that was somewhat risky because it involved some manual cutting and pasting the TSV file as well as cutting the pasting the results of using the TSV file from SAS.? I eventually contacted the SAS help desk and they offered a solution involving writing out a file and including the file in SAS code that circumvented that manual process.? The code I will provide minimizes the need for editing these output files. However, there may be instances where minor adjustments are required due to specific words or characters used in the survey content. ?For example, converting ‘don't know’ to ‘don''t know’ (with a double apostrophe) might be necessary, although it's generally not ideal to use double apostrophes in survey instruments.

Qualtrics SPSS File

Exporting the SPSS SAV file offers the significant advantage of including formats for the response categories for your survey.? The disadvantage of using the SAV file is that you cannot edit the content in a simple editing package.? Instead you need to read the SAV file into SAS and then have SAS code to make any adjustments.? This might be necessary if you want to modify the content of formats. For example, the format might include a lengthy example and that is not needed for the report tables.

When exporting a file from Qualtrics, I select two options.? First, , I increase to the default character variables length to the maximum 32,000 characters to prevent the truncation.? Second, I select the option to code missing to the default -99 so I can assess the impact of missing values versus skip patterns.? Otherwise, I keep all of the defaults.? It also should be noted that each time you export a file from Qualtrics, the software creates a unique name to the file which includes the date.? So you need to be careful to update your SAS program so it uses the most recent exported file. ?And the defaults you select need to be selected again each time you export the file.

My SAS code and comments are as follows,

/*NOTE:? In bold are comments that describe what the SAS code will be executing.? In italics is where you would enter your file names. */

PROC IMPORT DATAFILE="your filename.sav" OUT=yourfilename DBMS=SAV replace; run;

/*I drop all of the unique Qualtrics variables so that I am only working with the survey variables.? Then the proc contents creates a file I called newone that has the variable names, formats, question text, and a sequence number so that the tables that are produced match the order of the original questionnaire.*/

Data yourfilename; set yourfilename; drop DistributionChannel Duration__in_seconds_ EndDate ExternalReference Finished IPAddress LocationLatitude LocationLongitude Progress RecipientFirstName RecipientLastName RecordedDate ResponseId StartDate Status UserLanguage; run;

?proc contents data=yourfilename out=newone; run;

data newone; set newone; keep varnum label format name; if substr(name,1,1)='Q'; run;

/*This sort keeps the tables that are produced to match the order of the original questionnaire.? This is done so that the tables that summarize the survey are in the same order as they appeared in the survey.*/

proc sort data=newone; by varnum; run;

/*While the contents of newone2 (see below) could be made in fewer steps, I chose not to do that because sometimes your survey might need an adjustment to the creation of the macro invocation.? By splitting it up in parts, you can have the flexibility to make adjustments, if needed.? The check variable is set up so that the macro an1 is for numeric variables(proc tabulate) and an2 is for character variables (proc report)—see below for more information about the check variable.? The "0D0A"X here addresses an unintended line feed in my dataset.*/

/*You might find the need to check for single and double quotes since I can’t anticipate whether your questionnaire text might use quotes. */

?/*Dropping the Finish Question.? Having a finish question allows the respondents to indicate that they want to submit their survey as final and should be used in the analysis.? This is a practice generally used in our surveys.? Sometimes the finish question has a variable name like Q10 rather than finish as the variable name, this is arbitrary and depends on how the survey was coded into Qualtrics.

Note: SAS has a function index that looks for a string in a variable name.? I use check variable below to see if the format has a $ which means that the variable is character.? Check takes on the value 0 when $ is not found and is greater than 0 when $ is found.? By using this approach I can create the appropriate macro invocation that will make a table using proc tabulate for numeric questions and a macro invocation that will make a listing for narrative questions using proc report. */

data newone2; set newone; if label='Finished?' then delete;

check=index(format,'$');

if check=0 then do;

pt1="%an1 (var=";

pt2=strip(pt1)||left(trim(name));

pt3=strip(pt2)||', title="';

pt3a=strip(pt3)||left(trim(name))||'. ' ;

PT4=strip(pt3a)||' '||left(trim(label))||' ';

PT5=strip(pt4)||'"); run; ';

pt5x=compress(compbl(pt5),"0D0A"X);

end;

?

if check gt 0 then do;

pt1="%an2 (var=";

pt2=strip(pt1)||left(trim(name));

pt3=strip(pt2)||', title="';

pt3a=strip(pt3)||left(trim(name))||'. ' ;

PT4=strip(pt3a)||' '||left(trim(label))||' ';

PT5=strip(pt4)||'"); run; ';

pt5x=compress(compbl(pt5),"0D0A"X);

end;

run;

data newfin; set newone2; keep pt5x; run;

/*This proc printto creates a file that has all of the invocations to the macros.? Depending on the content of your survey, you might need to edit the file pt5x. (I created the variable name pt5x, but you can name it anything you want. ) ??So I usually create it one time and do whatever editing is needed and then comment out the code so I don’t override the code. Using a large linesize and width below, minimizes truncation. ?Nodate and nonumber options removes the date and page number being displayed.*/

options nodate nonumber;

options linesize=236;

proc printto print=’yourfilepath \pt5x.txt' new; run;

proc report data=newfin noheader; column pt5x;

define pt5x/display width=230 flow;?

run;

proc printto; run;

?

?

/*this is my style that I like to use for tables, but it is arbitrary.? And the options that appear get rid of extraneous warnings, etc.*/

options nothreads;

OPTIONS SOURCE NOSTIMER NOCENTER PAGESIZE = 60 LINESIZE = 132 NOQUOTELENMAX;

options validvarname=v7;

options MSGLEVEL=I;

?proc template;

define style styles.test;

parent=styles.minimal;

style systemtitle from systemtitle /

?? font_face=helvetica font_size=3

?? font_weight=bold just=l;

style systemfooter from systemfooter /

? font_face=helvetica font_size=3;

style header from header/

? font_face=helvetica

?? just=l vjust=b;

style data from data/

? font_face=helvetica;

style rowheader from rowheader/

?font_face=helvetica;

?style table from table /

??? just=left;

end;?

run;

?/*We also needed the formats which associate the variable names to the SPSS generated statements for the response categories.? To do that we need to have the statement begin with the word format which is created in dataset form1.? Then, we need an ending that is a semicolon (in data set form2).? And we need to only keep the variables that are not narratives so that if the format='$' then delete those (in dataset form). Finally combo.txt (the dataset that gets created and proc printto writes out) associates the variable names with the appropriate format.? The sav file from SPSS does the matching.*/

?

data form1; length combo $30.; set newone; keep combo; combo='format'; if N=1;? run;

data form2; length combo $30.; set newone; keep combo; combo=';'; if N=1; run;?

data form; length combo $30.; set newone; keep combo; if format='$' then delete;

?combo=strip(name)||' '||strip(format)||'.'; run;

data both; set form1 form form2; run; ?

?/*This proc printto puts out the format statement into a file.? Again I comment out this code after running it the first time, just in case I edited the file so that I don’t re-write over it.? */

?options nodate nonumber;

proc printto print='’yourfilepath \combo.txt' new; run;

proc report data=both noheader; column combo;

define combo/display width=100 flow;?

run;

proc printto; run;

?/*This code puts the formats from the SAV file into a data set and adds a missing option to response categories. The cntlout options creates a data set that stores information about formats and informats in a catalog.*/

?proc format cntlout=fmt_list;

run;

?/*Get list of formats.*/

?proc summary nway missing data=fmt_list;

class fmtname;

output out=add_fmt;

run;

?

/*Add missing.*/

?data add;

set add_fmt;

start='.';

label='Missing';

run;

?/*Add missing to data and convert start to numeric;*/

?data new_fmt (keep=fmtname startn label rename=(startn=start));

set fmt_list add;

startn=start*1;

run;

?/*sort.*/

proc sort;

by fmtname start;

run;

?

/*I use the proc contents to get a list of variables so I can convert the -99 which was selected in the export from Qualtrics and make it missing if the variable is a numeric or blank if the variable is character.*/

?proc contents data= yourfilepath ?short; run;

?data all; set yourfilepath;

array ch

?[here is where you would list all of the numeric variables]

?do over ch;

if ch=-99 then ch=.; end

?array nar

?[here is where you list of the character variables];

?do over nar;

if strip(nar)='-99' then nar=' '; end;

run;

?/*This is where the dataset for the analysis is associated with the formats that were created in the combo.txt earlier.? The ods statement is used to create the report.? The options listed are the ones I use, but they are arbitrary. */

?libname out ' yourfilepath;';

data out.all; set all;

keep [list whatever variables you want to have in the analysis.? You might not need this keep statement if you want to have everything in the analysis.]

?%include "yourfilepath\combo.txt"; run;? ??????

?ods html body=" yourfilepath.analysis1. html" (Title='enter your title here that appears at the top of the file')

headtext="<style> hr {page-break-after:always} thead {display:table-header-group}

?@media all {font {font-size=80%}} </style>" style=styles.test;

?/*Here are two macros.? an1 is for the numeric variables.? It assumes a 6-point scale and includes both percentages and counts.*/

?%macro an1 (var=var, title=title);

proc tabulate data=out.all missing; class [enter your demographics here if you have them]; class &var/? preloadfmt mlf order=data;? where .<=&var<=6;

tables all='Total' [list your demographics here if you have them]', (&var=''*(pctn<&var>='%'*f=6.1 n='Number of Cases'*f=comma8.) all*n='Total Cases'*f=6.0) /

rts=55 printmiss misstext='0';

title &title;? RUN;

%mend an1;?

?/*An2 is for the narratives.? You might want to add to the column statement

if there is a variable that you want to display with the narrative.? For example, a demographic variable is often useful in interpreting the narrative.*/

?%macro an2 (var=var, title=title);

proc sort data=out.all; by [list your demographics here if you have them];

proc report data=out.all nowindows spacing=1 pspace=1 split='*' missing headline;

where &var ne ' ';

?columns [list your demographics here if you have them] &var;

?define [demographic1 would appear here]/display flow 'title of the variable' format=[list the format for demographic1 here and repeat for other demographics];

define &var /display flow 'Comment' format=$10000.?? style ={just=left cellwidth=6 in};

title1 &title;

run;?

?%mend an2;

?/*Here is where I am including all of the macro invocations, which were created in pt5x.txt*/

?%include " yourfilepath.\pt5x.txt"; run;

ods html close; run;

?Conclusion:

? /*The above information is all you need to translate a SAV from Qualtrics into SAS code to generate a summary of the numeric variables and a display of the narrative variables.?

?However, sometimes I want to collapse the numeric variables so, for example, all of the response categories are displayed while also collapsing and displaying the top two categories and bottom two categories.? This will enable me to modify the tabulate statement to use multilabel formats. (proc tabulate data=yourfile missing; class yourvariable/ preloadfmt mlf order=data).

?What I have found the easiest way is to read the SAV file in again (SAS seems to have a memory of the original formats)? and then process it with this code so that you create a file that can be edited.? Since every survey is unique, you will have to edit the newformat,txt file that gets created using the code below.? It, however, always uses the work.new_fmt file that comes directly from the SAV file exported from Qualtrics.

?Note:? When SAS writes a proc printto it produces a listing in the results window in addition to the file written to the directory.? You will see that the results window has a format without any extra spacing in comparison to the file written to the directory that has a lot of extra spaces.

?I have learned that the string variable that is being written has 132 byes length variable and that cannot be changed.? So you have one of two choices since you will be editing the file to collapse the categories (and this is always custom work depending on the survey content). (1) Edit the file that is written to the directory or (2) copy and paste the file that is written to the results window into the SAS program and edit that content.*/

?data makeformat1; set? WORK.NEW_FMT;

pt0='Proc format;';

keep pt0; run;

data makeformat1x; set makeformat1; if N=1; run;

?data t1; length list $132.;set makeformat1x; list=pt0; keep list; run;

?data makeformat1;set work.new_fmt;

?if substr(FMTNAME,1,1)='Q';

if start=1 then d?data makeformat1; set? WORK.NEW_FMT;

pt0='Proc format;';

keep pt0; run;

data makeformat1x; set makeformat1; if N=1; run;

?data t1; length list $132.;set makeformat1x; list=pt0; keep list; run;

?data makeformat1;set work.new_fmt;

?if substr(FMTNAME,1,1)='Q';

if start=1 then do;

pt1='; value'||' '||FMTNAME;

pt2='(multilabel notsorted)'o;

pt1='; value'||' '||FMTNAME;

pt2='(multilabel notsorted)'||start||'="';

pt3=pt1||pt2||label||'"';

keep pt3; end;

else if start ne 1 then do;

?pt1='; value'||' '||FMTNAME;

pt2=start||'="';

pt3=pt1||pt2||label||'"';

keep pt3; end;

?data t2; length list $132.; set makeformat1; list=strip(pt3); keep list; run;

?data t3; length list $132.; set form; list=combo; keep list; run;

?data try1; set t1 t2 t3; keep list; run;

?options nodate nonumber;

proc printto print=' yourfilepath.\newformat.txt' new; run;

proc report data=try1 noheader; column list;

define list/display width=130 flow;?

run;

proc printto; run;

?

要查看或添加评论,请登录

Alice Feldesman的更多文章

社区洞察

其他会员也浏览了