Does Language Shape What We See?
At this very moment, your eyes and brain are performing an astounding series of coordinated operations.
Light rays from the screen are hitting your retina, the sheet of light-sensitive cells that lines the back wall of each of your eyes. Those cells, in turn, are converting light into electrical pulses that can be decoded by your brain.
The electrical messages travel down the optic nerve to your thalamus, a relay center for sensory information in the middle of the brain, and from the thalamus to the visual cortex at the back of your head. In the visual cortex, the message jumps from one layer of tissue to the next, allowing you to determine the shape and color and movement of the thing in your visual field. From there the neural signal heads to other brain areas, such as the frontal cortex, for yet more complex levels of association and interpretation. All of this means that in a matter of milliseconds, you know whether this particular combination of light rays is a moving object, say, or a familiar face, or a readable word.
That explanation is far too pat, of course. It makes it seem like the whole process of visual perception has been figured out, when in fact the way our mind sees and interprets reality is in large part a mystery.
This post is about a question that’s long been debated among scientists and philosophers: At what point in that chain of operations does the visual system begin to integrate information from other systems, like touches, tastes, smells, and sounds? What about even more complex inputs, like memories, categories, and words?
We know the integration happens at some point. If you see a lion running toward you, you will respond to that sight differently depending on if you are roaming alone in the Serengeti or visiting the zoo. Even if the two sights are exactly the same, and presenting the same optical input to your retinas, your brain will use your memories and knowledge to put your vision into context and help you interpret the lion as threatening or cute. Here’s a less far-fetched example. In 2000 researchers showed that hearing simple sounds can drastically change how you perceive flashing circles. (If you’re up for a fun 44 seconds, go watch the video those researchers used to prove this effect.)
Some experts argue that our brains integrate information from other systems only after processing the basic visual information. So in the above example, they’d argue that the visual cortex processes the sight of circles first, and then cells in some later, or ‘higher order’, stage of neural computing — in the frontal cortex or temporal cortex, for example — deal with integrating the sound information with the visual information. I’ll call this the ‘modular’ camp, because these experts believe that the visual cells of the brain are encapsulated from other types of cells.
Other scientists, though, say that the brain is integrating information from other systems at the same time that it is processing the visual part. A study published yesterday in the Proceedings of the National Academy of Sciencesprovides some of the strongest evidence to date for this idea. Gary Lupyanof the University of Wisconsin, Madison found that language — one of our most sophisticated cognitive abilities — affects not only what we see with our eyes, but whether we see anything at all.
“We don’t imagine reality in any way we want — perception is still highly constrained,” Lupyan says. “But there are many cases where you want to augment or even override input from one modality with input from another.”
Lupyan’s study is notable for the clever way it tapped into our ‘lower level’ visual processing. The researchers showed participants different images in their right and left eyes at the same time. In one eye, they’d see a familiar picture, such as a kangaroo or a pumpkin, and in the other they’d see ugly visual noise: a rapidly changing mess of lines. When these two images are presented at the same time, our minds process only the noisy part and completely ignore the static, familiar image. Previous experiments have shown that this so-called ‘continuous flash suppression’ disrupts the early stages of visual perception, “before it reaches the levels of meaning”, Lupyan says.
In Lupyan’s study, participants sometimes heard the name of the static object — like the word ‘kangaroo’ or ‘pumpkin’ — played into their ears. And on these trials, the previously invisible object would pop into their conscious visual perception. If they heard a different word, though, they would not see the hidden object. “So it’s not that they are hallucinating or imagining a dog being there,” Lupyan says. “If they hear the label, they become more sensitive to inputs that match that label.”
Because flash suppression is thought to act on lower level visual processing, Lupyan says these data bolster the idea that even these lower levels are susceptible to inputs from outside of the visual system.
I reached out to several other scientists to see what they thought of the study. The two who responded were largely positive. Michael Spivey, a cognitive scientist at the University of California, Merced, said the study makes an important contribution to the literature, adding to already overwhelming evidence against the modular theory. “I’m constantly amazed that modular theorists of vision fight so hard,” says Spivey (who was Lupyan’s post-doc advisor). He points to anatomical studies showing that the brain has oodles of feedback loops between the frontal and visual cortices. “We don’t have a clear understanding yet of how those signals work, but they’re there.”
David Cox, a neuroscientist at Harvard, also thought the study’s methods were sound, though he wondered whether effects were due to language, per se. “I wonder if you could get a similar effect in animal — say, a monkey — by pre-cueing with a picture of the target of interest. There is a lot of evidence that performance can be increased in a variety of difficult detection scenarios if you know what you are looking for.” Whether the results are due to language or not, though, he says the study demonstrates that non-visual inputs affect early visual processing: “None of this takes away from the result; it’s more a question of interpretation.”
Lupyan is interested in this question, too. In future studies, he plans to repeat the experiment using not only word labels but also non-verbal cues, and associate them not only with familiar objects, but unfamiliar ones. “We’re interested in the origin of these effects, and how much training is necessary,” Lupyan says. “My prediction is you probably don’t need very much experience with a label in order [for it] to modulate visual processing.”
The scientists I really want to hear from, of course, are those in the “vision is modular” camp. Unfortunately none of them responded to my inquiries. If any of you are reading this now, please leave a comment and tell us what you think.