It seems obvious, but it's actually much more difficult than it sounds (no pun intended). The human ear and brain are magically capable of separating out different harmonics. We can hear two distinctly different sounds like a "crash" of a cymbal and the "jurrr" of an electric guitar chord and they clearly sound different to us, even when heard simultanously.
But that's because our brains are magic. Doing the same thing (separating two sounds) is extremely difficult to do in software. When you have just a single complex sound wave, it is very difficult to know which harmonics belong to which "sound". Especially since you don't even know how many "sounds" are in the wave to begin with.
For example that "jurrr" electric guitar sound probably has dozens of harmonics forming it. Each of varying frequency and amplitude. You don't even want to know how many the cymbal crash has.
"Wave subtraction" really only works if you have the exact copy of the wave you want to remove. If the wave is even just a little off, subtraction won't work. You might only muffle the sound, or possibly even amplify
or echo it depending on how far off you are.
Case in point: have you ever seen any of those karaoke programs which try to remove vocals from songs? They all suck. Conceptually it's simple: find the vocals and subtract them out. But finding the vocals is crazy hard.
For an extremely contrived example...
Say you want to mix two strings of samples. One guitar and one cymbal:
guitar = 5, 3, 8
cymbal = 2, 9, 0
mix = 7, 12, 8 (the sum of both strings)
Simple enough, right?
Now try to do that in reverse. Here's another arbitrary mix:
mix = 15, 3, 7
Now try to subtract the guitar samples from that to leave only the cymbal samples. (Read: you can't without having an exact copy of the guitar wave)
|And another more complex idea: You know how movies play in multiple languages? Technically, if you have the sound of the movie in each language, you should theoretically be able to eliminate the vocals completely and isolate just the ambiance and music. I don't normally see this either, though.|
It wouldn't make sense to do this.
Let's say you have a dual audio film. English and Spanish.
The way most movies do it is they have 2 audio tracks: one on each language. Each audio track also has bgm and sound effects and all that other stuff.
What you're proposing would require 3 audio tracks:
1) English audio only
2) Spanish audio only
3) Bgm and sound effects
It would also require that at least two of these audio tracks be mixed during playback, which is extra (unnecessary) work.
It would be better/simpler to just have the BGM as its own audio track without having to mix them:
1) English Audio (with bgm + sound)
2) Spanish (with bgm)
3) Just Bgm
But not a whole lot of people want just bgm, so it's not worth it to put that on the movie.
Did I misunderstand the movie example? I read it again and it sounds like the movie would just have the normal 2 audio tracks (each language with combined bgm) and you could use those waves to "cancel out" the bgm?
That doesn't work either. This can be illustrated with some algebra:
let B = bgm only
let E = English + bgm
let P = Spanish + bgm
let e = english only
let p = spanish only
// we know the following:
e = E - B
p = P - B
B = E - e
B = P - p
// so given E and P, how can we find B?
// hint: you can't
// we can try cancelling out the BGM to isolate just the vocals:
nobgm = E - P
e ?= nobgm - E // you'd think vocals only - vocals+bgm would give you the vocals
// but this doesn't work because we've just added the spanish vocals (inverted):
nobgm - E = (E - P) - E = -P
-P != e