The clip below from PBS demonstrates the McGurk Effect: When you hear a sound (like "ba") that conflicts with how you're seeing someone "produce" it (like "ga"), your mind tries to reconcile them by making you think you're hearing something more consistent with your visual input (like "da").
If you want to prove to yourself that both versions in the video above are the same sound, try rewatching the first part of the video with your eyes closed. Sure enough, you'll hear "ba ba ba" instead of "da da da."
For another example of the McGurk Effect, check out this clip from the BBC, which shows how "ba ba ba" can sound like "fa fa fa" when accompanied by the appropriate visuals, and includes a brief interview with the psychologist Lawrence Rosenblum, who's been studying the McGurk Effect for decades.
The McGurk Effect is surprisingly robust: According to this list from Rosenblum, it persists with babies as young as 4-5 months, with speakers of every language that's been tested, when the audio and video are from people of different genders, when viewers don't realize they're looking at a face, when viewers touch a face instead of looking at it, and when the audio and video aren't precisely synched. It does work better with certain consonant pairs than others, and less well with vowels or non-speech sounds, such as plucking versus bowing sounds on a cello. But it even happens when the viewer knows perfectly well to be expecting the McGurk Effect, such as Rosenblum himself!