Dirty Data Done Dirt Cheap
Chris Feola
Author, Perfecting Equilibrium: For a brief, shining moment Web1 democratized data. Then Web2 came along and made George Orwell look like an optimist. Now Web3 is Perfecting John Nash’s Information Equilibrium.
Here's what you gotta do
Pick up the phone, I'm always home
Call me anytime Just ring, 3-6-2-4-3-6,
hey I lead a life of crime
Comment of the Week:
The great Steve Ross on Feed AIs Enterprise Data For The Win (Editor’s Note: Steve and the great Jim Brown taught me All Data Is Dirty when I first started Computer-Assisted Reporting and Research): Even corporate data is riddled with errors... often systematic errors. Far better than, say, X, but dangerous nonetheless. I would not rely on people or machines to totally get things right ALL of the time.
Just finished a weeklong job looking at 10 million medical records, culled from 40 million. The data runs, using neural network routines at MIT, identified just over 20,000 corrupted records. There are certainly more. LLMs use neural networking to get the job done. As a QA/QC guy, I was specifically looking for them and specifically wanted them culled for further examination. An error rate around 0.002 (0.2% or 1 in 500) is far lower than typical in corporate or government data, but could have particular effect when searching for a rare phenomenon, as we have been on this project.
Look at Boeing's predicament right now. Airframe manufacture in theory has very tight specs (design standards to design-in quality) and more-than-typical inspection regimes. But lots of stuff on the factory floor is never documented in any corporate record, in any factory other than factories that are 100% automated.
Typical protection against that is close-at-hand management (management level high enough to instantly allocate significant funds to fix an observed shortcoming). The 737-Max9 airframes are made in St. Louis, and the final plane assembly is in Washington State. And where has Boeing top management been since 2001? Chicago. The door plugs were supposed to be secured with 4 high-strength bolts -- which are usually quite brittle. Drop them onto a hard floor and they may crack internally. A typical "fix" would be to have the floor below the plug-install area padded and STILL throw away dropped bolts. Or "cue-up" the bolts in the assembly. Every once in awhile a bad bolt would sneak through, but the other three would do the job. The extra pad costs MONEY. Not much, but just a little. And it requires LABOR, which is in short supply.
BTW, I shot an episode of Invisible for Oprah Winfrey Network years ago at the Tucson aircraft boneyard. Worth the trip.
领英推荐
Well said, as always. I was gently pointing to this at the end of the piece, but have no fears-I'm going for the full beatdown in this week's piece on the UK/Fuji Horizon Postal Service disaster.
Perfecting Equilibrium Stories
Please help me grow Perfecting Equilibrium
I’d really appreciate it if you would be so kind as to invite friends to subscribe and read with us. Of course, there will be rewards! And a leaderboard!
Here’s how it works: Share Perfecting Equilibrium using the link available here. When you use this referral link you'll get credit for any new subscribers. Simply send the link in a text, email, or share it on social media with friends. When your friends use your referral link to subscribe (free or paid), you’ll receive complimentary access to the Perfecting Equilibrium paid archives (The newsletter remains free). Here’s the leaderboard and the details.
Next on Perfecting Equilibrium
Friday January 19th-Foto.Feola.Friday
Sunday January 21st-The Reader: Death by dirty data-At least four suicides. Hundreds of centuries-old family businesses bankrupted. Communities torn apart. More than 700 innocent people convicted of serious crimes: fraud. Embezzlement. Theft.
Except the crimes were imaginary; they only existed in the twisted memory of a bug-ridden computer system.
Dirty data sounds like one of those late-night TV diseases where some D-list star you don’t remember tells you of the heartbreak of some disease you never heard of...and then, of course, offers to sell you the cure. It isn’t. Corrupt data is everywhere, and it’s deadly.