Researchers estimated that they revealed these complex articulary movements are coordinated in the brain. When people speak, they engage nearly 100 muscles, continuously moving our lips, jaw, tongue, and throat to shape our breath into the fluent sequences of sounds that form our words and sentences. 

The new research reveals that the brain's speech centers are organized more according to the physical needs of the vocal tract as it produces speech than by how the speech sounds (its "phonetics"). Linguists divide speech into abstract units of sound called "phonemes" and consider the /k/ sound in "keep" the same as the /k/ in "coop."

But in reality, your mouth forms the sound differently in these two words to prepare for the different vowels that follow, and this physical distinction now appears to be more important to the brain regions responsible for producing speech than the theoretical sameness of the phoneme.

The findings, which extend previous studies on how the brain interprets the sounds of spoken language, could help guide the creation of new generation of prosthetic devices for those who are unable to speak: brain implants could monitor neural activity related to speech production and rapidly and directly translate those signals into synthetic spoken language.

A neural code for vocal tract movements

A specializes in surgeries to remove brain tissue that causes seizures in patients with epilepsy. In some cases, to prepare for these operations, he places high-density arrays of tiny electrodes onto the surface of the patients' brains, both to help identify the location triggering the patients' seizures and to map out other important areas, such as those involved in language, to make sure the surgery avoids damaging them.

In the new study, Chartier and Anumanchipalli asked five volunteers awaiting surgery, with ECoG electrodes placed over a region of ventral sensorimotor cortex that is a key center of speech production, to read aloud a collection of 460 natural sentences.  This comprehensiveness was crucial to capture the complete range of "coarticulation," the blending of phonemes that is essential to natural speech.

The research team was not able to simultaneously record the volunteers' neural activity and their tongue, mouth and larynx movements. Instead, they recorded only audio of the volunteers speaking and developed a novel deep learning algorithm to estimate which movements were made during specific speaking tasks.

Regarding coarticulation, the researchers discovered that our brains' speech centers coordinate different muscle movement patterns based on the context of what's being said, and the order in which different sounds occur.

The researchers found that neurons in the ventral sensorimotor cortex were highly attuned to this and other co-articulatory features of English, suggesting that the brain cells are tuned to produce fluid, context-dependent speech as opposed to reading out discrete speech segments in serial order.

Path to a Speech Prosthetic

They know now that the sensorimotor cortex encodes vocal tract movements, so we can use that knowledge to decode cortical activity and translate that via a speech prosthetic," said Chartier. This would give voice to people who can't speak but have intact neural functions.

Ultimately, the study could represent a new research avenue for Chartier and Anumanchipalli's team at UCSF. It is really made me think twice about how phonemes fit in–in a sense, these units of speech that we pin so much of our research on are just byproducts of a sensorimotor signal.