High-Frequency Checks- what they are and how they help
Zaeem-Al Ehsan
PhD-ing @ Duke University Public Policy (Economics concentration) | Prev-The World Bank, YRISE | ???????? | working to make the world a better place
Of the myriad of complexities a field survey brings, ensuring data quality takes the cake. High-frequency checks (HFC), backchecks and spot-checks are few of the verification mechanisms an RA can use to monitor data collection. In this short write-up, I will be going over the HFC pipeline we used to look over the data quality. There definitely won’t be a “one-size fits all” for HFCs, hence, what I did may not be relevant for you. It is also important to note that the data coming in encapsulates a life. HFCs try to bring conformity to the data coming in and may not represent every household, which may lead you to erroneously flagging it. It is important to complement the HFCs with field notes since their notes add life to the data points and help ground the entire process.
We had two 2 stages of looking at the data that was coming in. We had a live dashboard on Google Data Studio which got updated multiple times an hour to reflect new submissions. Additionally, we had a more comprehensive set of indicators which we compute on Stata that fed into an Excel dashboard. This was done at the end of data collection every day to provide feedback to the field team. Going into a bit more detail of each-
1.?????The live Google Data Studio dashboard (Fig 1):?When the survey is on the field, it is super helpful to have an outlet that allows the field team and the RA overseeing data collection to see sample progress. We used SurveyCTO for our survey which allows to publish the live data to Google Sheets, which can then be imported onto Data Studio. Data Studio allows you to use that data to create indicators of your own to monitor progress. We used the following:
a.?????# of Cases Attempted- This is the total number of cases that were attempted by the entire pool of enumerators. You can create an indicator on Data Studio that is a unique count of IDs submitted.
b.?????# of Cases Closed-?If a survey is submitted as a complete interview/consent refused, you can classify that as a “case” closed. You can create an indicator on Data Studio that is the sum of total cases closed by the entire pool of enumerators.
c.?????# of Cases Refused-It is important to monitor the non-response of the survey. Hence, it is ideal to track the total number of cases where the consent to conduct the interview was not received.
d.?????# of HH found & # of Adults found- Attrition of your sample is an extremely crucial indicator to monitor. For our survey, we interviewed households as well as selected adults. Thus, we had an indicator that computed the number of households & adults found. A household/adult “found” means we were able to confirm the location/presence of the HH/adult.
e.?????% Completed-?To appease the anxiety of the RA, the % completed calculates the percentage of cases closed out of the cases attempted. The healthy % ensures that the field team is performing well and have not succumbed to unforeseen hiccups (yet!!).
f.??????% of sample progress-?The % of sample progress shows out of the total cases that are to be completed were indeed closed till that date.?
1.?????Excel dashboard:?The live dashboard is not exhaustive enough to ensure data quality. It more so aids in seeing the relative speed of the data collection. The?actual?HFC is done offline after the end of data collection for that day. We downloaded the raw SurveyCTO data as csv files and performed the necessary data cleaning + analysis on Stata, following which the HFCs were published to Excel. We had the Excel structured as followed-
a.?????Snapshot?(Fig 2)?:?We had one sheet where we could check for the overall progress for up until that day. It showed-
???????????????????????????i.?????Overall sample progress till date:?The total cases attempted.
???????????????????????? ?ii.?????% Found-?Out of the total cases attempted, the % of HH/adult found.
b.????Overall Enumerator/Supervisor Dashboard?(Fig 3):?We had an overall enumerator/supervisor dashboard which served as more of an individual enumerator/supervisor diagnostic report. Surveys bring with it countless unforeseen hiccups. However, it is good to see if it disproportionately affects a particular enumerator or team. On this sheet we had-
????????????????? i.?????Cases attempted:?The total number of cases attempted by that particular enumerator.
领英推荐
???????????????? ?ii.?????HH/Adult Found:?The total number of HH//adults found by that particular enumerator.
?????? ??iii.?????% Refused:?The % of cases where the enumerator did not get consent to do the survey.
???????????????????????iv.?????Average survey time:?The average survey time is a very good indicator to have since it helps show 1) whether the enumerator is well versed on the instrument 2) they are going through the questionnaire properly 3) the complexity/need to finetune questionnaire.
c.?????Detailed indicator list by enumerator?(Fig 4):?On this sheet, we had a detailed set of indicators that allowed us to evaluate the quality of submissions by each enumerator. We had the following-
?????? ????i.?????HH/Adults found-The total number of HH//adults found by that particular enumerator.
???? ?ii.?????% completed-?The % of cases that were closed out of the cases attempted.
? ?iii.?????% refused-The % of cases where the enumerator did not get consent to do the survey.
??iv.?????# old forms-??Although not ideal, e-questionnaires are sometimes updated to add new questions/fix bugs.?For this to be updated on the tablets enumerators use to administer the survey, they have to get the “new form” loaded onto their tabs. SurveyCTO allows to compare Form IDs so it is possible to know which enumerators are still working on an old form. It is crucial to identify this as soon as possible so that the enumerator can update and collect on the new form.
????????????????? ??v.?????Average time taken per module-?This computes the average time taken per module by each enumerator. It allows to see if any enumerator is struggling with a particular module. Any indication of struggle might mean the enumerator needs to be trained more on this particular module or the respondents are having trouble understanding the module.
?????????????????????vi.?????Length of job descriptions-?For our survey we collected open-ended job descriptions which we later coded post data collection. To ensure we were collecting good job descriptions, one HFC we used was to see the average length of job descriptions collected. A good job description should collect the nature of the job and the exact duties performed,
??????????????? ???vii.?????Food consumption outliers-?We had a consumption module for which we monitored if we were getting unusual submissions.??We computed this as observations outside 2 standard deviations from the average of the sample.
??????????????????? viii.?????Income outliers-?For the income data collected, we did the same HFC where any entries outside 2 SDs were flagged.
?????????????????????ix.?????Anthropometrics outlier-?For our anthropometric module, height for age, height for weight and weight for age z-scores were used to flag improbable submissions.
HFCs are usually created in parallel to CAPI/CATI testing so that a final version is created before pilot. We usually had the RA handover the Excel dashboards to the field manager who found out problem areas of enumerators and liaised with the survey firm to work through those. As mentioned in the start, HFCs should be used in tandem with field notes to avoid jumping to erroneous conclusions.
Economist, Quantitative Researcher, Transport & Urban Economics and Planning - UGC-NET Certification (2021)
1 年A good read, Zaeem!
Executive Director, ARCED Foundation | Formar IPA/BRAC | Data Science/Impact Evaluation/Research
1 年You might be interested in the arceddataflow beta version. It is yet under testing though. https://github.com/ARCED-Foundation/arceddataflow