THE SPEECH CHAIN. HOW WE SPEAK, HOW WE HEAR, HOW WE UNDERSTAND
?I've become increasingly interested in linguistics in recent years, even though I'm poor at languages. Although my initial directions of approach were in the areas of knowledge representation and accents/dialects, linguistics can become quite a rabbit hole.
So I now have books on sociolinguistics, syntax, semantic theory, historical linguistics (which is the history of linguistic study, not the history of languages), the history of languages, phonetics, and language variation.
My latest book on the matter, however, was something of a departure. It's the "classic" book by Denes & Pinson on "The Speech Chain".
This is more science than normal linguistics, covering (mainly) the biology of speech, of hearing, and the physics of sound. It was, therefore, way outside my comfort zone, and in places beyond my abilities. But it was well worth reading.
My main takeaway from the book was that the human ability to convert sound waves into meaning is just staggering. I mean, when you think about it, it's ridiculous how the ear and the brain, working in tandem, can distinguish musical sounds, voices, meaning in those voices, the sound of a motorbike outside from a person in the room making a sound like a motorbike, and so on, and on, and on.
By comparison, the production of sound, while still incredibly complex, is simplicity itself. I mean, I can see how the brain could learn how to control the muscles and air output to make sounds of language. There's a lot of mechanics involved, but, then again, most musc;le control has a lot of mechanics involved.
I think that The Speech Chain was first written in in the 1960s and was massively updated in 1993. It could do with an update again. While the book does well in predicting the implications of the "digital age" in terms of text to speech and speech to text, machine comprehension of speech and machine speech of meaning (which I would guess no longer has to go through the sequence of meaning to text to speech) the way in which things have developed digitally would make up as much of the book as the bit on analogue communication.
I went for a hunt online for "synthetic speech" and I might have guessed that Google's offering would appear first in Chrome. The Denes and Pinson book does well in explaining the complexities of speech (and a read of this chapter would give you a strong hint of why voices such as satnav will fool no-one into thinking that it is a real person talking). The variables are enormous in terms of pitch, intonation, stress, and context. Now, all of these *can* be processed into an algorithm, and that's what Google has done. While I don't think anyone (yet) would be fooled into thinking that they were speaking to a real human when in fact they were speaking to a machine, I can't see it being far off where this could be achieved in at least a significant percentage of cases.
领英推荐
The Denes and Pinson book also refers (obliquely) to a couple of things which disconcert me when other people speak. the first is a common bugbear -- the rising inflection at the end of a sentence when a statement is being made, rather than only when a question is being asked. Denes and Pinson, I am pleased to say, state quite categorically that the end of a statement sentence should be indicated by a lowering of tone and volume on the final unstressed syllable. Unfortunately, it doesn't address the habit of some people not to obey this rule, or (therefore) try to explain why it happens. I think that it's fairly obvious why, but that's not part of this article.
However, did you know that not all questions end on a rising inflection? Try it. If you say "Are you coming?" then you get a rising inflection. But if you ask something like "Are we jumping that wall?" the emphasis will probably be on the "that" rather than on the "wall". Indeed, I suspect that the majority of questions uttered in speech don't have a rising inflection either. This could be why using a rising inflection to end a statement sounds even more "off".
The second bugbear of mine also relates to the end of sentences. Denes & Pinsen say that, on average, people need to pause every 2.5 seconds to take in air, in order to be able to carry on speaking (since making a noise requires the expulsion of air). This, they write, happens at the end of sentences, and within sentences if those sentences are longer than average.?
But there are some people who seem to break this rule deliberately. Three categories of person have cropped up in my experience.
1) Politicians who think they are orators. Michael Foot was the worst in recent times. His pauses were quite deliberately inserted only in the middle of sentences. This was because he was thinking of his next sentence while only half-way through the current one. This meant that the gap "required" by a full stop (or, as the Americans in this case call it more accurately, a "period") was non-existent.
2) Lawyers (frequently general counsel) speaking at conferences. I don't know if this is a habit that they have learned in court. But it seems a remarkably common flaw. And I call it a flaw deliberately, because it makes things very difficult for journalists who are trying (against a deadline) to put together an article based on the presentation. Not a few times I have given up taking notes on a speaker with a legal background, because the pauses at the "wrong" points in the speech (in the middle of sentences) had made it impossible for me to gauge the precise meaning of what was being said.
3) Interviewees on the Today programme. These are the worst of the worst, but I guess that they are only playing the game. If you are running out of time and you have a lot more that you want to say (a common occurrence), by never pausing at the end of a sentence, you make it far harder for the interviewer to bring the interview to and end. As a listener, it drives me mad, because you can feel the interviewer desperately wanting to bring the whole rant to an end, but unwilling to break the unwritten rule of interrupting another speaker in the middle of a sentence.
Fortunately the interviewers have got wise to this game now and tend to just interrupt at the start of a sentence with "thank you very much I am afraid we will have to leave it there".
?