Speech Perception Lecture
Short Description
Download Speech Perception Lecture...
Description
Acoustics • Acoustics = physics of sound
• Sound = moving air particles • Frequency of motion is measured in Hz
(= hertz = cycles/sec)
• Complex sounds = consist of many different frequencies simultaneously – slowest frequency = fundamental frequency (F0) • determines pitch
– other higher frequencies = harmonics = overtones • determine timbre
• The voice is a complex sound 09/01/10
Psyc / Ling / Comm 525 Fall 2010
Some Different Ways to Depict Sound
09/01/10
Psyc / Ling / Comm 525 Fall 2010
Acoustics of Speech • Fundamental Frequency (F0) – basic pitch of voice – rate at which whole vocal cords vibrate • Plus harmonics (= overtones) – other higher frequencies in voice – faster rates at which parts of vocal cords & other structures vibrate • Resonance (= sympathetic vibration) – rest of vocal tract enhances some frequencies & inhibits others – freqs that are enhanced or inhibited depends on vocal tract shape – which depends on positions of articulators – Produces formants • enhanced frequency bands • usually 3-4 formants in speech: F1, F2, & F3 09/01/10
Psyc / Ling / Comm 525 Fall 2010
Speech & Hearing Frequencies • Human hearing – 20 - 20,000 Hz – Most sensitive at 500 - 5,000 Hz
• Human voice fundamental frequency – Average for men – Average for women
= 80 - 200 Hz = up to 400 Hz
• Telephone: – Cuts off at ~3000 Hz – Crucial information for identifying some sounds lost (fricatives) 09/01/10
Psyc / Ling / Comm 525 Fall 2010
09/01/10
Psyc / Ling / Comm 525 Fall 2010
From Carroll (2004), The psychology of language, 4th Ed.
09/01/10
Psyc / Ling / Comm 525 Fall 2010
09/01/10
Psyc / Ling / Comm 525 Fall 2010
09/01/10
Psyc / Ling / Comm 525 Fall 2010
09/01/10
Psyc / Ling / Comm 525 Fall 2010
09/01/10
Psyc / Ling / Comm 525 Fall 2010
English Spelling A Dreadful Language I take it you already know Of tough and bough and cough and dough. Others may stumble, but not you, On hiccough, thorough, touch, and through; Well done! And now you wish perhaps To learn of less familiar traps? Beware of heard, a dreadful word That looks like beard and sounds like bird. And dead: it's said like bed, not bead For goodness sake, don't call it "deed". Watch out for meat and great and threat (They rhyme with suite and straight and debt). A moth is not a moth in mother, Nor both in bother, nor broth in brother. And here is not a match for there, Nor dear and fear for bear and pear. And then there's dose and rose and lose Just look them up - and goose and choose, And cork and work and word and sword, And do and go and thwart and cart. Come, come I've hardly made a start. A dreadful language? Man alive I mastered it when I was five!
09/01/10
Psyc / Ling / Comm 525 Fall 2010
International Phonetic Alphabet (IPA) • 1 sound = 1 symbol • Symbols for all speech sounds in all languages • Phonetic writing makes pronunciation completely unambiguous – Some languages have writing systems that are close to phonetic (Korean, Italian) – Some other languages have writing systems that indicate less about pronunciation (Mandarin?)
09/01/10
Psyc / Ling / Comm 525 Fall 2010
(“Standard” American)
From Carroll (2004), The psychology of language, 4th Ed.
09/01/10
Psyc / Ling / Comm 525 Fall 2010
From Carroll (2004), The psychology of language, 4th Ed.
09/01/10
Psyc / Ling / Comm 525 Fall 2010
Coarticulation • Each sound partially shaped by sounds before & after it – keel vs kill vs cool – / kil / vs / kIl / vs / kul / (IPA characters)
– place of articulation and rounding on the k differ a lot – so, different versions of “the same sound” in different contexts – and from different speakers
• This is what allows us to talk so fast 09/01/10
Psyc / Ling / Comm 525 Fall 2010
Coarticulation Across Languages • How different can different versions of a sound be & still be heard as “the same sound”? – Different for different languages
– A back rounded k and a front unrounded k sound like “the same sound” to English speakers • but that same difference is enough to make them sound like 2 different sounds in some other languages
09/01/10
Psyc / Ling / Comm 525 Fall 2010
Phonemes • In English, a difference in voicing makes 2 sounds “different sounds” – –
pill
/pIl/
vs vs
bill
/bIl/
– p = voiceless – b = voiced
• Can find many other minimal pairs of English words where the only difference is whether or not one sound is voiced – – – – –
rip bat tip cap back
rib bad dip cab bag
• Therefore, voicing is a distinctive feature in English
– and 2 sounds that differ only in voicing are different phonemes – phoneme = sound that can signal a meaning difference
09/01/10
Psyc / Ling / Comm 525 Fall 2010
09/01/10
Psyc / Ling / Comm 525 Fall 2010
Phonemes vs Allophones • There’s another difference between pill and bill in English – The p in pill is aspirated, but the b in bill is not • /phIl/ vs /bIl/ • aspiration = air puff when stop consonant is released
• But, there are no minimal pairs of English words that differ only in whether or not one sound is aspirated – So, aspiration is a non-distinctive feature in English – 2 sounds that differ only in aspiration are allophones of the same phoneme – allophones = different versions of the “same sound” • But in Korean, it’s the opposite of English – aspiration is phonemic – voicing is allophonic 09/01/10
Psyc / Ling / Comm 525 Fall 2010
Another Cross-Linguistic Example • In English, there is a minimal pair rip and lip – & many other pairs that differ in just r vs l – so r and l are different phonemes in English • In Japanese, there are no minimal pairs that differ only in r vs l – Instead, there’s a single phoneme that’s somewhere between the English r and l – and it has different pronunciations in different contexts • sometimes it sounds more like English r • and sometimes like English l • r and l are both allophones of a single phoneme
• Makes it very difficult for Japanese speakers to hear the difference in English – Japanese speakers have learned to categorize all the allophones as “the same sound” 09/01/10
Psyc / Ling / Comm 525 Fall 2010
Distinctive Features Across Languages • There are many kinds of differences between speech sounds – Some are important (= distinctive) & some are not – Which is which varies across languages
• So, have to learn which are the important ones for your language • For English consonants, the distinctive features are: – Voicing (Voice Onset Time) – Place of articulation – Manner of articulation 09/01/10
Psyc / Ling / Comm 525 Fall 2010
Speech Perception is Hard! • Coarticulation – allows us to talk fast – which leads to lack of invariance in acoustic signal
09/01/10
Psyc / Ling / Comm 525 Fall 2010
Variability in Vowel Production
From Kuhl, et al. (2004), Nat Rev Neurosci
09/01/10
Psyc / Ling / Comm 525 Fall 2010
09/01/10
Psyc / Ling / Comm 525 Fall 2010
Speech Perception is Hard! • Coarticulation
– allows us to talk fast – which leads to lack of invariance – a series of musical notes changing as fast as speech sounds do would sound like a blur – we would not be able to perceive individual notes – yet we have the impression that we hear each speech sound
• This has led some researchers to propose that:
– speech perception requires a hard-wired uniquely human ability that evolved specifically for speech
• What sort of evidence would support this idea? 09/01/10
Psyc / Ling / Comm 525 Fall 2010
Evidence about special status of speech perception • Categorical Perception – Inability to hear differences between members of a category – where category = phoneme – e.g., variants of /p/ with different VOTs – Together with ability to hear differences of the same size when the 2 sounds are members of different categories – e.g., /p/ vs /b/ • Adults can easily hear only the differences that are important in their language – e.g., English speakers easily hear difference between /r/ & /l/ • i.e., they sound like "different sounds“
– while Japanese speakers find it very hard to hear same diff • i.e., they sound like "the same sound" 09/01/10
Psyc / Ling / Comm 525 Fall 2010
Categorical Perception • Categorical perception is strongest for voicing & place of articulation for consonants – Weaker effect for vowels called a “magnet effect”
• Adults show categorical perception for the differences that are distinctive in their language – So, it depends on learning – How early is it learned?
09/01/10
Psyc / Ling / Comm 525 Fall 2010
From Carroll (2004), The psychology of language, 4th Ed.
09/01/10
Psyc / Ling / Comm 525 Fall 2010
From Carroll (2004), The psychology of language, 4th Ed.
09/01/10
Psyc / Ling / Comm 525 Fall 2010
From Carroll (2004), The psychology of language, 4th Ed.
09/01/10
Psyc / Ling / Comm 525 Fall 2010
Testing Infant Speech Perception • Use a habituation paradigm to test perception – Infants suck on a pacifier with a transducer in it – Measure how hard & how often they suck – Whenever something interesting happens, they suck more • Play synthetic speech syllables that vary on some feature – e.g., VOT – Keep playing same syllable over & over until they're bored with it and their sucking rate decreases (= habituation) – Then change the syllable
– If sucking rate goes up, they must have heard the change – If rate does not go up, either they couldn't hear the change, or it wasn’t interesting enough 09/01/10
Psyc / Ling / Comm 525 Fall 2010
Categorical Perception in Infants • For VOT – Play a clear pa over and over – If then change to one with a different VOT, but that adults would call ba • English-hearing infants will speed up sucking rate • Therefore, they hear the difference
– If instead change to one with a VOT that’s just as different from the first one, but it’s one adults would still call pa • Infants don’t speed up • Therefore, they didn’t hear the change (or it’s not interesting)
• Suggests infants cannot discriminate between different versions of pa, but can discriminate between pa and ba – Just like English-speaking adults – So, English-hearing infants already have categorical perception 09/01/10
Psyc / Ling / Comm 525 Fall 2010
From Eimas et al. (1971), Science
09/01/10
Psyc / Ling / Comm 525 Fall 2010
Infant Speech Perception Across Languages • Infants easily hear many differences that adults don’t – they start out able to hear differences that are not important in the language spoken around them • Japanese-hearing infants start out being able to hear the difference between r and l just as well as English-hearing infants
• but by ~1 year old, they no longer hear that difference
• All children start out able to hear (most of) the differences that are important in any human language – But over their 1st year, they lose the ability to hear differences that are not important in the language they’re hearing – the speech perception system gets tuned to hear only the differences that are important for the language being learned
• Why by 1 year? • Maybe because that’s when they start to say words? (Werker) 09/01/10
Psyc / Ling / Comm 525 Fall 2010
Video segment from PBS series The Mind (1989)
09/01/10
Psyc / Ling / Comm 525 Fall 2010
Are there limits to the differences infants can hear? • Yes: Lasky et al. (1975)
– Voicing is distinctive for stop consonants in English, Spanish, & Thai – But the boundary between voiced & voiceless is at different VOT values Thai
Spanish
English
------------------------------------------------------------------------60 -40 -20 0 +20 +40 +60 VOT (msec) • The Thai & English boundary values are common to many languages • The Spanish one is unusual – Spanish-hearing infants less than 1 year old
• hear the difference between pairs of sounds that straddle both the Thai & English category boundaries • but not ones that straddle the Spanish boundary
• So, infants hear most, but not all, differences used in any language 09/01/10
Psyc / Ling / Comm 525 Fall 2010
Categorical Perception, cont’d • The same synthesized stimuli can be perceived as speech or not – Play formant transition to one ear (sounds like a chirp) – and steady-state part to other ear (sounds like vowel) • If tell people it’s speech, they integrate 2 ears & hear it as speech – but if don't tell them, they don't hear it as sounding like speech
• When they do hear it as speech, get categorical perception – but not when they don’t hear it as speech • CP effects much stronger for consonants than for vowels • What seems to be critical is: – a short rapidly changing sound (e.g., consonant) – followed by a longer slower-changing sound (e.g., vowel) – where both heard as part of a single input 09/01/10
Psyc / Ling / Comm 525 Fall 2010
Is categorical perception unique to humans? (i.e., Is it evidence that speech perception is special?)
• No – Many other animals show results like human infants in habituation paradigms – They can discriminate between sounds that humans would call different phonemes – and cannot discriminate between sounds that humans would call the same phoneme
• So, human speech takes advantage of properties of auditory system – by generally using the differences that are easy to hear to signal important contrasts in the language 09/01/10
Psyc / Ling / Comm 525 Fall 2010
What GOOD is categorical perception??? • Categorical Perception = a failure to discriminate speech sounds any better than you can identify them • How can it be desirable to lose the ability to hear differences??? – Speech is hugely variable • • • •
coarticulation different speech rates different speakers with different voices & accents ...
• - The auditory system learns to attend to the differences that are important and to ignore the ones that are not • - Lets us tune out a lot of irrelevant variability • - Can adults re-learn to hear differences they’ve learned to ignore? - Yes, but it requires a particular kind of training 09/01/10
Psyc / Ling / Comm 525 Fall 2010
McGurk Effect
Visual cues in speech perception • Conflicting acoustic and visual cues can lead to blended perception of sound – If there’s a sound in the language that’s • close enough to the acoustic signal • & fits with the visual cues
09/01/10
Psyc / Ling / Comm 525 Fall 2010
More on Visual Context Effects (Gilbert, Lansing, & Garnsey, in prep) • Participants heard either /ba/ or /ga/ (50-50) • Task = Did you hear /ba/? (50-50) • Syllables embedded in several levels of noise as well as in quiet • Simultaneous visual cue – – – – 09/01/10
Static rectangle Static smiling face Chewing face (irrelevant motion) Speaking face (relevant motion) Psyc / Ling / Comm 525 Fall 2010
Accuracy d' Senstivity
3.5 3.0 2.5 2.0 1.5
Quiet 0 dB SNR -9 dB SNR
1.0 0.5 0.0
-18 dB SNR
Rect AR
Smile ASF
Chew ADF
Speak AV
Visual Cue Type Presntation Condition
VO
- Informative facial motion completely compensates for noise - Other facial cues have no effect on accuracy
09/01/10
Psyc / Ling / Comm 525 Fall 2010
Event-Related Brain Potentials (ERPs) N100 component
Speak
Chew
- Earlier & smaller when speech easy to identify
- Irrelevant face motion speeds up N100 just as much as relevant motion - But doesn’t reduce its amplitude
N100
- Maybe potentially relevant face motion serves an alerting function? 09/01/10
Psyc / Ling / Comm 525 Fall 2010
Smile
Phoneme Restoration • Replace one phoneme in an utterance with noise – If the phoneme is predictable from context, people “hear” the missing sound (e.g., legi*lature) – If tell them a sound has been replaced, they’re not accurate at identifying which sound it is – Warren & Warren (1970) • Stimuli (acoustically identical except for last word) – – – –
It was found that the *eel It was found that the *eel It was found that the *eel It was found that the *eel
was on the orange. was on the axle. was on the shoe. was on the table.
• People believed they had heard the phoneme that made sense given the final word – Final word can’t have influenced what they heard at *eel 09/01/10
Psyc / Ling / Comm 525 Fall 2010
View more...
Comments