Design of Experiment: Data Collection
Anyone can collect data; some people can collect good data. The key theme to any good data collection is data compliance. Good compliance leads to good data, and compliance is a theme seen at all levels, so let’s examine them.
Stages of a Data Collection:
Each step requires compliance. For the pre-study tasks, people are checking that the different components of the pipeline are complying to protocol. At Ok2Study, everyone is signing off that what has been done and what people committed to doing complies with what they initially wanted.
Compliance List:
Collection Software Compliance
This is simply making sure the software will not have issues during collection. Seems simple but bugs could cause data loss which is money. Running all study software through QA is key to catching those bugs before data loss.
Infrastructure Compliance
As part of the dry-run, you make sure the data goes all the way through infrastructure so that you know it is saving properly. This also is a good time to remove unforeseen bottlenecks. When collecting GB’s and TB’s of data per day, being able to ingest that data becomes just as important as the collection.
Hardware Compliance
Does the hardware collect what you want? Is it close? What are the caveats? Will there be any heating issues, charging issues, or any issues in different countries with different voltages/phases for their electricity.
Hardware Calibration Compliance
At Notre Dame, I didn't calibrate my setup every time because calibration took a long time. Some data paid the price later especially after someone kicked my setup by accident, and I had to fix everything. Calibration had two steps: taking checkerboard images of all the image planes and insuring the two laser planes were aligned. I always did the first step, but I didn’t do the second step as often, and that was the step that had issue. There was an iterative process I could do with a ball to check how well the two light screens were calibrated to one another rather quickly, but I didn't figure trick out until the end of my collection.
Safety Compliance
If your rig could cause health issues, you have a problem. In grad school, my setup used Class 3R lasers that were spread out into a line using 5mW of power. However, it was probably a lower class of laser because it was spread out using a beam spreader, but I didn’t have it tested. At the time, there wasn’t much concern from the safety office unless it was over 5mW. That changed right after I graduated, but laser safety is a constant issue with prototype hardware because usually the mechanisms to keep the lasers safe are still in development during hardware development.
Your protocol also shouldn’t put anyone’s health at risk. Even if you’re willing to take a chance, there is a large liability if someone gets hurt.
Legal Compliance
Some countries allow data like face images to be collected, but the laws vary. The aim is to not be in any gray zone about data. Face images are considered Personally Identifiable Information (PII), and in the past few years, especially since GDPR, governments have paid particular attention to privacy.
China, for example, doesn’t allow PII data to be exported. In the US, you can generally collect PII data in public, but in Europe, you can not. Unlike the US, in France and Germany, they don’t believe an employee can consent to a user study by their employer that collects PII data because the simple employee/employer relationship is a form of coercion.
Usually, there is some compensation for a user study, but keep in mind, too much compensation could also be seen as financial coercion.
Recruitment Compliance
Recruitment is quite an interesting bit of the process. You can’t discriminate, nor do you want to for a good dataset, but you also have to pay attention to special groups like pregnant women, the elderly, and young children.
On your end though, you should also check that the demographics you are looking for are being collected. If they aren’t, your dataset will be deficient. Don’t assume your algorithm is age or ethnicity or gender invariant; always check.
Dry-Run Compliance
Dry-run is key because it gives a chance not only to check for operator compliance but also to make corrections to the protocol that didn’t quite take into considering the subject or the actual data being collected. This is usually when the protocol is tweeted quite a bit.
In my first month at Apple, I designed my first user study for wrist detection. I felt pressure and signed off on the dry run before I fully looked at all the data. As soon as I did, I realized one of the settings from the app was off, and the app had to go through another revision before the study could continue. I learned not to rush otherwise I sacrifice quality.
Moderator Compliance
The other key is moderator compliance to the protocol because if the operator can’t comply, neither can the subject. I’ve seen this happen both ways. In one study at Apple, I found out that half the data had the wrong label because one of the two moderators swapped the order of collection. No data was lost, but I spent a far bit of time validating data and fixing labels until I had a complete dataset that I could trust.
At Notre Dame, I didn’t think through or pay attention to dry run data for my user studies with people walking through my 3D scanner. During the collection, they start off standing and then would start walking. What I didn’t consider is that people open their mouths slightly when they start to to walk. It is unintentional, but I definitely didn't consider that people might want to breath while walking. People also walked through too quickly. In both cases, I tried to do things like taking multiple scans or giving people directions, but it was not as effective as I wish it would have been.
Participant Compliance
When doing data collection at Digital Signal Corp, we wanted to get different head poses. In dry run, it appeared people were only moving their eyes to the appropriate marker but not their head as this more natural human motion. The fix was to put foot markers and ask people to move their feet to that position. This greatly reduced the errors and made the collection more intuitive for the participants.
At Notre Dame, I had issues with my data because people walked through too quickly. I needed them to go through at less and 1/4th full walking pace (3 mph), but most went through too fast. I didn’t think enough outside of the box to figure out how to get people to go slowly naturally, but a sign or blinking sign may have helped.
Random Compliance Checks
During the lead up to Face ID being launched, my team went out and collected a large set of potential aggressors to see if we were missing anything in our larger data collections, things would be normal to a regular user.
We used a script that cycled through a bunch of settings, and a month into our data exploration, the firmware was updated. We noticed some strange issues, and I’m not sure why we didn’t correlate it to the firmware update, but a few weeks in, I filed a bug because it was clear some of the data was way outside of expectation. It was a condition one would get into only because of cycling through the settings as we were doing.
This bug was caught by a few others about the same time, and it was fixed a week later. For us, it meant the loss of a full week’s worth of data but half of the past month had been salvageable. The bug should have been caught earlier if we had been doing regular compliance checking during our data collections.
Finally, the Data I’ve always wanted!
Usually, you don’t know the exact data you need on the first data collection. Data collection is part of this iterative loop of algorithm requirements, data collection, and failure analysis.