When Robots Silence Smiling Children, the End Is Near
Fair Observer
Fair Observer is an independent, reader-funded nonprofit that engages in citizen journalism and civic education.
Dear FO° Reader,
This morning I finally got to “meet”, via Zoom videoconference, a group of ten school-aged children in a tropical country ten time-zones away. Their mentor and I have been talking about this encounter on that same mobile phone for weeks. I so wanted to hear the children’s voices!
Perhaps due to a rainstorm, the internet connection was worse than usual. Every time the camera moved, the sound and image cut out, froze or garbled. After five minutes, not a single full sentence got through. I’ve been an alpha algorithm jock for decades, so I already understood this particular problem: The phone’s extra-smart AI camera worked best if it only had to keep track of the motions of small faces, not of big bodies or a whole room. So, my kids and their mentor took five minutes to assemble a table on which to prop up the camera. As soon as the camera was still, utterly untouched, the video worked fine.
But only the video. Weirdly, the audio still misbehaved. Of the dozen different people who talked to me during that hour and a half, almost all of their voices took seconds to “lock in” once they started to talk, so I missed the first words, and their voices faded in and out. I couldn’t make sense of a thing, despite their speaking as loudly and clearly as possible. The only person I could understand was the mentor.
There was another problem too: The back-and-forth delay (called roundtrip latency) from the moment my lips spoke to when I saw a wrinkled brow or puzzled look was way too long, half a second at least. That meant that lifelong conversational habits, such as instinctively interrupting when you didn’t hear or when you think the other didn’t, instead accidentally caught us up in did-you-hear-me loops. The students were patient and even laughed, still listening carefully. They knew they were shy and not good at speaking in the loud, consistent phrases that the AI wanted to hear.
Since the whole point of the lesson was for me to teach them the science of human vibrational communication in all its forms, I invited them on the spot to invent, in their native language in their group, some quick and easy tricks to compensate for these unexpected technological hurdles. The task, as I declared, was to treat me as a hard-of-hearing baby, neither hearing nor understanding much, but capable of seeing them on video and eager to interact. I had already asked my students to speak slowly and loudly; what else might help?
In that part of the world, children respect their elders, teachers especially. They are loath to interrupt, to speak loudly or sometimes even to speak at all. I was now asking them to reverse their lifelong classroom training in an instant, through a phone.?
My students took up the task and delivered some good suggestions. For example, I should nod my head gently but continuously as long as I could hear and understand, yet make a face or gesture if the sound stops making sense. They should do the same for me. Their idea wound up working wonderfully, as long as I was doing the talking, while watching them nod back at me. Judging from their faces and reactions, they really learned a lot.
But I didn’t get much back. Over an hour I learned all their names, but not much more, because (again) every voice I heard but one would fade in and out like ships in fog, not connecting long enough to work. My algorithm officer instincts told me yet again this strange, physically unnatural acoustic fading profile had AI fingerprints all over it.
Do phones get to know their owners?
领英推荐
Each time the students spoke, the phone’s super-smart language processor waited as long as possible to gather as much (presumably noisy) voice data as it could, before committing to send the sound it found to me. Its algorithms then synthesized the final sonic result not directly from the speakers’ actual voices, but they also “cleaned up” the sound in complex and unpredictable ways, using longer-term acoustic cues like room noise, echoes, speaker volume etc. At this point, having ironed out as much “noise” as it could hope, the AI reconstituted the speakers’ voices without that distracting noise — but also without key subtle sonic cues that it missed. In short, the AI blurred the sound artificially. That’s what I heard, and what I think happened.
I saved the most important clue for last: Out of that whole hour and a half, talking with people a world away, who was the one person whose sound didn’t fade in and out? Who was the person the phone could hear well enough to send? In retrospect the answer is obvious: the owner of the phone himself.
The super-smart AI must already have hundreds of hours of high-resolution data of this one man’s voice because he talks on that phone all the time. Of course that phone hears him better than it hears a random stranger! The reason product managers put the AI inside the phone in the first place is to gather such long-term data, then use it to compress his voice yet further in the never-ending quest for lower costs. Every bit of compression saves the carrier money.
So the carrier saves obvious money, and creates non-obvious confusion. Optimization run amok.
It is mathematically provable that such short-term optimizing systems always fall in ruts. The rut-forming process is universal, based on self-reinforcing expectations and information flows: the more one keeps track of short-term benefits, the less one tracks long-term damage. Examples are literally everywhere. On the internet, such ruts are called “filter bubbles”; in a university class, “invincible ignorance”; in media, “cartoonified discourse”; in economics, “tulip bubbles”; in government, “regulatory capture.” Although the general idea came before Socrates, the most general rut-forming explanation puts ancient wine in the newest bottle by naming it “leading indicator dependency” (see the appendix on page 42 of this paper I co-wrote with Criscillia Benford).
When robots silence smiling children, the end of humanity is near. Only governments can stop it from getting worse, but for now, we have to find smaller ways to help.
At Fair Observer, we know firsthand how important meaningful connections are for teaching and for working towards a better future. We are thus striving to bring people together and create the conditions that make teaching, learning and connecting as enriching as possible.
As we develop more online training programs we’ll explore software solutions and in-person venues that prioritize group dynamics and conversation fluidity. We aim to build a digital ecosystem that values open dialogue and critical thinking. If you have any recommendations or ideas, we'd love to hear from you!
Best,
William Softky
Biophysicist, inventor and humanist