登录查看更多内容

High-Frequency Checks- what they are and how they help

Zaeem-Al Ehsan

PhD-ing @ Duke University Public Policy (Economics concentration) | Prev-The World Bank, YRISE | ???????? | working to make the world a better place

发布日期: 2023年8月23日

Of the myriad of complexities a field survey brings, ensuring data quality takes the cake. High-frequency checks (HFC), backchecks and spot-checks are few of the verification mechanisms an RA can use to monitor data collection. In this short write-up, I will be going over the HFC pipeline we used to look over the data quality. There definitely won’t be a “one-size fits all” for HFCs, hence, what I did may not be relevant for you. It is also important to note that the data coming in encapsulates a life. HFCs try to bring conformity to the data coming in and may not represent every household, which may lead you to erroneously flagging it. It is important to complement the HFCs with field notes since their notes add life to the data points and help ground the entire process.

We had two 2 stages of looking at the data that was coming in. We had a live dashboard on Google Data Studio which got updated multiple times an hour to reflect new submissions. Additionally, we had a more comprehensive set of indicators which we compute on Stata that fed into an Excel dashboard. This was done at the end of data collection every day to provide feedback to the field team. Going into a bit more detail of each-

1.?????The live Google Data Studio dashboard (Fig 1):?When the survey is on the field, it is super helpful to have an outlet that allows the field team and the RA overseeing data collection to see sample progress. We used SurveyCTO for our survey which allows to publish the live data to Google Sheets, which can then be imported onto Data Studio. Data Studio allows you to use that data to create indicators of your own to monitor progress. We used the following:

a.?????# of Cases Attempted- This is the total number of cases that were attempted by the entire pool of enumerators. You can create an indicator on Data Studio that is a unique count of IDs submitted.

b.?????# of Cases Closed-?If a survey is submitted as a complete interview/consent refused, you can classify that as a “case” closed. You can create an indicator on Data Studio that is the sum of total cases closed by the entire pool of enumerators.

c.?????# of Cases Refused-It is important to monitor the non-response of the survey. Hence, it is ideal to track the total number of cases where the consent to conduct the interview was not received.

d.?????# of HH found & # of Adults found- Attrition of your sample is an extremely crucial indicator to monitor. For our survey, we interviewed households as well as selected adults. Thus, we had an indicator that computed the number of households & adults found. A household/adult “found” means we were able to confirm the location/presence of the HH/adult.

e.?????% Completed-?To appease the anxiety of the RA, the % completed calculates the percentage of cases closed out of the cases attempted. The healthy % ensures that the field team is performing well and have not succumbed to unforeseen hiccups (yet!!).

f.??????% of sample progress-?The % of sample progress shows out of the total cases that are to be completed were indeed closed till that date.?

No alt text provided for this image — Fig 1: Google Data Studio Dashboard

1.?????Excel dashboard:?The live dashboard is not exhaustive enough to ensure data quality. It more so aids in seeing the relative speed of the data collection. The?actual?HFC is done offline after the end of data collection for that day. We downloaded the raw SurveyCTO data as csv files and performed the necessary data cleaning + analysis on Stata, following which the HFCs were published to Excel. We had the Excel structured as followed-

a.?????Snapshot?(Fig 2)?:?We had one sheet where we could check for the overall progress for up until that day. It showed-

???????????????????????????i.?????Overall sample progress till date:?The total cases attempted.

???????????????????????? ?ii.?????% Found-?Out of the total cases attempted, the % of HH/adult found.

b.????Overall Enumerator/Supervisor Dashboard?(Fig 3):?We had an overall enumerator/supervisor dashboard which served as more of an individual enumerator/supervisor diagnostic report. Surveys bring with it countless unforeseen hiccups. However, it is good to see if it disproportionately affects a particular enumerator or team. On this sheet we had-

????????????????? i.?????Cases attempted:?The total number of cases attempted by that particular enumerator.

领英推荐

Data Guardrails

Helen Wall 2 年前

Is your business trying to drink from a firehose?

Zoho One 3 个月前

4 Stages of the Data Cycle: How Home Workers Can Use…

Joseph Henry 5 个月前

???????????????? ?ii.?????HH/Adult Found:?The total number of HH//adults found by that particular enumerator.

?????? ??iii.?????% Refused:?The % of cases where the enumerator did not get consent to do the survey.

???????????????????????iv.?????Average survey time:?The average survey time is a very good indicator to have since it helps show 1) whether the enumerator is well versed on the instrument 2) they are going through the questionnaire properly 3) the complexity/need to finetune questionnaire.

c.?????Detailed indicator list by enumerator?(Fig 4):?On this sheet, we had a detailed set of indicators that allowed us to evaluate the quality of submissions by each enumerator. We had the following-

?????? ????i.?????HH/Adults found-The total number of HH//adults found by that particular enumerator.

???? ?ii.?????% completed-?The % of cases that were closed out of the cases attempted.

? ?iii.?????% refused-The % of cases where the enumerator did not get consent to do the survey.

??iv.?????# old forms-??Although not ideal, e-questionnaires are sometimes updated to add new questions/fix bugs.?For this to be updated on the tablets enumerators use to administer the survey, they have to get the “new form” loaded onto their tabs. SurveyCTO allows to compare Form IDs so it is possible to know which enumerators are still working on an old form. It is crucial to identify this as soon as possible so that the enumerator can update and collect on the new form.

????????????????? ??v.?????Average time taken per module-?This computes the average time taken per module by each enumerator. It allows to see if any enumerator is struggling with a particular module. Any indication of struggle might mean the enumerator needs to be trained more on this particular module or the respondents are having trouble understanding the module.

?????????????????????vi.?????Length of job descriptions-?For our survey we collected open-ended job descriptions which we later coded post data collection. To ensure we were collecting good job descriptions, one HFC we used was to see the average length of job descriptions collected. A good job description should collect the nature of the job and the exact duties performed,

??????????????? ???vii.?????Food consumption outliers-?We had a consumption module for which we monitored if we were getting unusual submissions.??We computed this as observations outside 2 standard deviations from the average of the sample.

??????????????????? viii.?????Income outliers-?For the income data collected, we did the same HFC where any entries outside 2 SDs were flagged.

?????????????????????ix.?????Anthropometrics outlier-?For our anthropometric module, height for age, height for weight and weight for age z-scores were used to flag improbable submissions.

HFCs are usually created in parallel to CAPI/CATI testing so that a final version is created before pilot. We usually had the RA handover the Excel dashboards to the field manager who found out problem areas of enumerators and liaised with the survey firm to work through those. As mentioned in the start, HFCs should be used in tandem with field notes to avoid jumping to erroneous conclusions.

Pavan Kumar Thimmavajjala

Economist, Quantitative Researcher, Transport & Urban Economics and Planning - UGC-NET Certification (2021)

1 年

A good read, Zaeem!

A R M Mehrab Ali

Executive Director, ARCED Foundation | Formar IPA/BRAC | Data Science/Impact Evaluation/Research

1 年

You might be interested in the arceddataflow beta version. It is yet under testing though. https://github.com/ARCED-Foundation/arceddataflow

3 次回应

查看更多评论

要查看或添加评论，请登录

Zaeem-Al Ehsan的更多文章

Selecting labor job codes in field surveys

2024年6月16日

Selecting labor job codes in field surveys

Labor markets across the world bring with it nuances that are difficult for researchers to standardize, especially for…
Stepping into the world of Economics- As a BBA student in Bangladesh

2023年8月13日

Stepping into the world of Economics- As a BBA student in Bangladesh

I graduated with a BBA from IBA, DU after which I completed a Masters in Economics from East West University. Then, I…

8 条评论
Field Research Associates- What do we do?

2023年8月2日

Field Research Associates- What do we do?

Surveys done by universities, development organizations and the like require local presence who facilitate all the…

7 条评论
Illuminating the agricultural sector- Does agriculture contribute to nightlights?

2023年7月26日

Illuminating the agricultural sector- Does agriculture contribute to nightlights?

If you follow me on LinkedIn/Twitter, you probably have seen me go on and on about nightlights for the better part of…

2 条评论
Classifying labor descriptions in field surveys

2023年7月17日

Classifying labor descriptions in field surveys

Labor market outcomes are usually a significant part of a household survey, and includes a barrage of questions…

1 条评论
Household surveys and the curious case of individuals

2023年6月26日

Household surveys and the curious case of individuals

Household surveys are complex enough as is and even more so when accompanied with a supplementary…

2 条评论
Global economic impact of COVID-19

2020年3月26日

Global economic impact of COVID-19

Capitalism sits on the premise of free market mechanisms guided by the “invisible hand”. Automatic stabilizers built…

2 条评论
Bangladesh's LDC graduation- what does the future hold?

2019年10月30日

Bangladesh's LDC graduation- what does the future hold?

Bangladesh has been experiencing unprecedented economic growth over the last decade and is expected to have a GDP…

4 条评论
Can nudges be the difference?

2018年12月29日

Can nudges be the difference?

The perception of the general populace towards “democracy” lay in shambles as we walk up to the 2019 Bangladeshi…

2 条评论

See all articles

High-Frequency Checks- what they are and how they help

Zaeem-Al Ehsan

PhD-ing @ Duke University Public Policy (Economics concentration) | Prev-The World Bank, YRISE | ???????? | working to make the world a better place

领英推荐

Zaeem-Al Ehsan的更多文章

社区洞察

其他会员也浏览了

Is your business trying to drink from a firehose?

4 Stages of the Data Cycle: How Home Workers Can Use the Data Cycle to Improve Their Output

No Baseline Data was collected? It is Not Game Over

5 steps for Quick Actions to give a Screen Flow the record it started from

Why are we still talking about Data? A guide to fixing the biggest problem in your commission system.

TURNING SURVEYS INTO GOLD By W H Inmon

When to use a Wide vs. Long Data?Format

Why You Should Know About Cycle Plots

Clipping the data without actually clipping the data

Fix Data Quality and make your CEO love you

领英推荐

Zaeem-Al Ehsan的更多文章

Selecting labor job codes in field surveys

Stepping into the world of Economics- As a BBA student in Bangladesh

Field Research Associates- What do we do?

Illuminating the agricultural sector- Does agriculture contribute to nightlights?

Classifying labor descriptions in field surveys

Household surveys and the curious case of individuals

Global economic impact of COVID-19

Bangladesh's LDC graduation- what does the future hold?

Can nudges be the difference?

社区洞察

其他会员也浏览了

Is your business trying to drink from a firehose?

4 Stages of the Data Cycle: How Home Workers Can Use the Data Cycle to Improve Their Output

No Baseline Data was collected? It is Not Game Over

5 steps for Quick Actions to give a Screen Flow the record it started from

Why are we still talking about Data? A guide to fixing the biggest problem in your commission system.

TURNING SURVEYS INTO GOLD By W H Inmon

When to use a Wide vs. Long Data?Format

Why You Should Know About Cycle Plots

Clipping the data without actually clipping the data

Fix Data Quality and make your CEO love you