Answer by Marc Ettlinger, Ph.D., linguistics, U.C. Berkeley:
English speakers don't actually differentiate "th" and "f" all that well. Indeed, in certain speech perception tests, native English speakers can perform as poorly as random guessing in distinguishing "th" and "f" because it's one of the most difficult contrasts in English.
That should be clear when you look at the spectrograms:
While "s" and "sh" have pretty clear differences in the amount of energy in the mid- to upper part of the spectrum, "th" and "f" are barely distinguishable, corroborating what we find in perception tests. The nature of some of these tests gives us insight into how this contrast is perceived.
First of all, people have done tests juxtaposing purely auditory stimulus with auditory plus visual. You'll notice that although this is one of the most difficult perceptual distinctions in language, it is also among the easiest visual distinctions. They're made with the lips (as noted in), which we can see, but in different positions. Indeed, seeing the lips accounts for about a 20-30 percent difference in performance, all other things being equal.
Second, people are particularly bad at this contrast when any noises are present or when they have any hearing loss. This is because the acoustic differences are primarily in the upper part of the speech spectrum (see figure above), and the upper part of the spectrum is where noise and hearing loss are particularly problematic. So, your typical elderly person with mild hearing loss will perform around guessing level for out-of-context "th" and "f," too.
Given those perceptual challenges, we English speakers clearly use an appreciable amount of context in differentiating these sounds.
Luckily (but not coincidentally), the English language facilitates that. Thefor this contrast is relatively low compared to other contrasts, meaning, aside from thin and fin there aren't too many words that critically rely on differentiating these sounds.
So, the answer to your question of how? Not all that well. And when we do, it's often due to context or visual cues. Otherwise, it's that small difference in the upper part of the speech spectrum, around 8 kHz, that serves as the differentiator.
More questions on Quora:
TODAY IN SLATE
Meet the New Bosses
How the Republicans would run the Senate.
The Government Is Giving Millions of Dollars in Electric-Car Subsidies to the Wrong Drivers
Scotland Is Just the Beginning. Expect More Political Earthquakes in Europe.
Cheez-Its. Ritz. Triscuits.
Why all cracker names sound alike.
Friends Was the Last Purely Pleasurable Sitcom
This Whimsical Driverless Car Imagines Transportation in 2059
- Protesters Take to the Streets to Sound Alarm on Climate Change in New York, Across the World
- Knife-Carrying White House Jumper is Vet who Feared “Atmosphere Was Collapsing”
- North Korea: American Sentenced to Hard Labor Wanted to Become “Second Snowden”
- Almost One in Four Americans Support Idea of Splitting From the Union
Did America Get Fat by Drinking Diet Soda?
A high-profile study points the finger at artificial sweeteners.