I don’t know whether this is the case today, but my understanding is that while movies don’t, sadly, ship with the voice track separate, it is apparently surprisingly common to have the voice track that’s mixed in be in mono. That means that with some clever processing, it’s possible to mostly-isolate the voice from background sound.
I’d bet that fancier processing could do a better job, and searching turns up stuff like https://vocalremover.org/ .
If one can isolate the voice, then one can boost its volume relative to other audio.
Voice is a single source of audio (e.g. mono) so it’s typically recorded with mono mics. There can be multiple of them (lav on the body, boom over the frame is the usual) but both sources are mono and will indeed be mixed right down the middle unless they’re trying to make the viewer understand the location of the person speaking (for example imagine you’re watching the main character from behind while they’re in their room using the computer, then you hear their mom talk to them off camera, the voice is coming from a side and then the next shot you see the mom was located on that side, stuff like that).
Another method home sound systems use is to boost the EQ where voice is found (somewhere in the middle), or to apply compression to reduce the dynamic range, for example Sonos offers both these options in their home theater line, but they call them “speech enhancement” and “night mode” respectively.
I don’t know whether this is the case today, but my understanding is that while movies don’t, sadly, ship with the voice track separate, it is apparently surprisingly common to have the voice track that’s mixed in be in mono. That means that with some clever processing, it’s possible to mostly-isolate the voice from background sound.
I’d bet that fancier processing could do a better job, and searching turns up stuff like https://vocalremover.org/ .
If one can isolate the voice, then one can boost its volume relative to other audio.
Voice is a single source of audio (e.g. mono) so it’s typically recorded with mono mics. There can be multiple of them (lav on the body, boom over the frame is the usual) but both sources are mono and will indeed be mixed right down the middle unless they’re trying to make the viewer understand the location of the person speaking (for example imagine you’re watching the main character from behind while they’re in their room using the computer, then you hear their mom talk to them off camera, the voice is coming from a side and then the next shot you see the mom was located on that side, stuff like that).
Another method home sound systems use is to boost the EQ where voice is found (somewhere in the middle), or to apply compression to reduce the dynamic range, for example Sonos offers both these options in their home theater line, but they call them “speech enhancement” and “night mode” respectively.