In an incredible breakthrough, UC San Francisco researchers last month revealed the first results from a clinical trial that successfully enabled a paralyzed person to communicate using advanced computational algorithms and a device that recorded signals from the surface of his brain. The technology is the first of its kind to translate speech-related brain activity directly into words and phrases as the user attempted to speak, rather than requiring a user to spell things out letter-by-letter.
As the story made headlines nationwide, the team of neuroscientists at the UCSF Chang Lab took to the science-savvy Reddit r/AskScience community to answer questions from the public.
Here are some of the top questions from Reddit, and the answers from Chang Lab members David A. Moses, PhD., postdoctoral engineer; Sean L. Metzger, MS, doctoral student; and Jessie R. Liu, BS, doctoral student.
Very exciting work! Is it only effective for those that possessed then lost the ability to speak normally? Would it be effective for those that suffered damage to their motor cortex at birth (e.g. brain hemorrhages caused by premature birth) which interferes with their ability to control speech?
Jessie Liu: This is an excellent question! Let’s start with some background. What we are decoding here are the cortical activity patterns related to attempting to speak. The reason why we can decode these are because those patterns are different when a person (BRAVO1, in this case) is attempting to say different words, like "hello" versus "hungry". This means, those neural activity patterns are "discriminable". At the moment, most of our knowledge of speech motor neuroscience is in people who can or could speak at some point, and we can see that there is still speech-related activity in their motor cortex. In these cases, if someone has lost the ability to speak, it seems likely that their motor cortex has not been severely damaged.
In principle, as long as someone is "cognitively intact," meaning that they are able to understand and formulate language in their head, then they may still have discriminable cortical activity patterns that would allow us to differentiate between different words when they are trying to say them. I want to clarify that I'm talking about intentional attempts to speak, and not the "inner speech" or "inner thoughts" that people experience.
Damage to the motor cortex is another interesting question! If there are discriminable patterns in other areas of the brain, then it seems like it could be possible, and there is some research on purely imagined speech that implicates other areas of the brain like the superior temporal gyrus and the inferior frontal gyrus. Certainly, this could be investigated with non-invasive methods like functional magnetic resonance imaging (fMRI).
Will the enormity of words in the human language ultimately be a limiting step in this technology? (Also, some words having similar meanings, etc., and the complexity of language.)
Sean Metzger: This could potentially be an issue. As you add more words to the device's vocabulary, it would become easier for a decoder to confuse them. For example, it would be hard to discriminate the neural activity associated with 'cow' and 'cows' as two separate words since they are very similar save for the final 's'. However, language modeling is extremely helpful, as it uses the rules of English and the context of each word to improve the predictions, as we did in our paper. For the 'cow' vs 'cows' example, a language model would change 'I saw two cow' to 'I saw two cows'.
What is promising is that you don't need too many more words to make the device useful in practical applications. According to this article published in 2018, you only need 800 words to understand 75% of the language used in daily life. There are also alternative approaches to increasing the vocabulary that have been demonstrated in people that can speak normally, like decoding subword units such as phonemes, as discussed in our lab’s 2019 study. That approach can be generalized to any size vocabulary, and it's what most artificial speech recognition systems use today.
Do you think the algorithms developed would be applicable between people?
David Moses: This is a great question, and there is definitely some evidence to suggest that this is possible, to an extent. This concept is known in the field as "transfer learning", where knowledge/model parameters can be transferred from one scenario to a slightly different one. Here, this can be from one person to another. Right now, we see that some model parameters can be learned using data from multiple participants, but some parameters of the model do best when they are learned using data from individual participants. In our lab, we have published some of these findings in a previous paper that involved participants who could speak.
Do you think it would be possible for your team to make a device that could turn the "inner voice" into audible sounds? I always thought it would be amazing if you could create music this way.
Moses: This is very nostalgic for me, as I remember wondering the same thing when I was first starting my PhD program! I was very interested in learning if someone could compose music simply by imagining the parts of individual instruments, etc.
Knowing what I know now, I think this would be extremely challenging. There is no definitive evidence that decoding an "inner voice" is even possible. This is a notion that we take seriously - we don't want our work to be viewed as mind-reading. Our technology works by trying to interpret the neural activity associated with volitional attempts to speak. The brain is incredibly complex, and we simply do not understand how things like "inner voice" or imagined music are represented.
Would multilingual subjects be able to seamlessly transition from one language to another with this technology (assuming language models existed for their known languages)? Also, for those who originally had speech impediments or tend to mix up words, would their verbal errors be evident in the translation?
Liu: From an engineering perspective, you could imagine that the model starts to figure out what language you are trying to speak in, like when you type into Google translate and it can autodetect the likely language. And implementing something like this sounds extremely fascinating.
From a neuroscience perspective, there is a lot of interesting research on whether multi-lingual people use shared representations of language. Depending on how shared or not shared representations are, it might be possible to detect the intended language right as the first word is trying to be said.
We’re not experts on speech impairments, but, taking stuttering as an example, the person knows exactly what they want to say but just has trouble getting it out. So, we’d still expect there to be neural activity corresponding to their intended speech that doesn’t contain the errors.
For other speech errors, like saying the wrong word, this is very interesting. We don’t yet understand in speech neuroscience how those errors happen and at what stage they happen. My best guess is that since we believe we are tapping into the intended motor commands for speech, that speech errors might arise! But to be clear, this is just a guess, and you have certainly gotten our minds thinking about this!
Is the ultimate goal to be able to have some sort of transcranial apparatus that would not have to be embedded directly into brain tissue?
Metzger: A transcranial apparatus would be extremely nice, but the signals acquired from this kind of technology are typically noisier than what you can get with implanted devices. One of our goals for the future is to have a fully implanted neural interface that can wirelessly transmit data outside of the skull, which should cosmetically look better than our current approach and wouldn't need to be wired to a computer. This should also reduce the required amount of medical care associated with the device because part of the device would no longer be embedded in the skull.