Combining audio signals

What's an algorithm that can be used to combine two or more audio signals (preferably an arbitrary number) into a single signal? I've searched on Google but I haven't found anything useful.

What I'm specifically trying to do is have my PC, laptop, XBox and any other devices send audio output to my Raspberry Pi (I'm going to use a 3.5mm-to-USB adaptor (I checked, they do exist); the Raspberry Pi only has two USB ports but I can just use a hub to attach more) so that I can plug my speakers or earphones into the Pi and hear all of the devices at once. I'm writing a program that will run on the Raspberry Pi and process the audio from all of the USB devices and send it to the line-out which will be connected to my speakers. I'm also planning to extend it to do other things, for example, I've written high-pass and low-pass filters and I'll have a configuration file that tells the program what to do to each input signal before combining them and then what to do to the output signal.
Hmm maybe something similar to CDMA? Are you wanting to be able to separate it back into the original signals?
Last edited on
ResidentBiscuit wrote:
maybe something similar to CDMA

Thanks, I'll take a look. [edit] I don't see how CDMA would achieve what I'm trying to do.

Are you wanting to be able to separate it back into the original signals

I can't imagine I would ever need to do that.
Last edited on
Ermm... I might be misunderstanding the question... but you join audio signals by adding them together:

1
2
3
4
for(int i = ...)
{
    sample_out[i] = sampleA_in[i] + sampleB_in[i];
}


With bounds checking/clipping of course... and maybe some normalization to reduce the amount of clipping but that probably wouldn't be necessary unless you are dealing with very loud signals.
@Disch
But if I add two samples together, won't I just get a higher pitch sample?
Thanks Grey Wolf, that looks perfect!
But if I add two samples together, won't I just get a higher pitch sample?


No. The sample value itself does not represent pitch.
Disch wrote:
No. The sample value itself does not represent pitch.

Oh, okay. I need to do more reading, clearly!
PCM basics:


Complex audio is a combination of "harmonics" (basically a sine wave). Therefore the simplest tone is a sine wave, because it consists of only 1 harmonic.



The pitch of a sine wave is determined by how many times the wave completes a full sequence per second. IE: if the sine wave repeats ~216 times per second (Hz), the tone is middle C. Faster rates produce higher tones.

The volume of a sine wave is determined by its amplitude, or "height".

Basically the "taller" the wave, the louder it is... and the "wider" it is, the lower pitch it is.

Complex audio is the same principle, only is it the combination of thousands/millions of different harmoics, all of varying length/pitch/amplitude.


This data is stored digitally as a sequence of "samples". Samples represent the point on a sound wave. Therefore the actual sound wave is constructed by stringing together multiple samples. One sample on its own is meaningless.

In other words... take a 2D grid where the X axis is time (the position of the sample) and the Y axis is the amplitude (the value of the sample). Plot out all your sample data on that grid. Then "connect the dots"... and that is your sound wave.

Example... if you have the following string of samples:

2, 2, 2, 2, -2, -2, -2, -2, 2, 2, 2, 2, -2, -2, -2, -2

That would look something like this:

1
2
3
4
5
6
7
8
9
+4
+3
+2  ****    ****
+1      |   |   |
 0  ....|...|...|...
-1      |   |   |
-2      ****    ****
-3
-4


In change the volume, you'd make the wave "taller":
4, 4, 4, 4, -4, -4, -4, -4, 4, 4, 4, 4, -4, -4, -4, -4
1
2
3
4
5
6
7
8
9
+4  ****    ****
+3      |   |   |
+2      |   |   |
+1      |   |   |
 0  ....|...|...|...
-1      |   |   |
-2      |   |   |
-3      |   |   |
-4      ****    ****

This would play the same tone, but it would be louder.


If you want to alter the pitch, you'd have to make it wider/narrower:
2, 2, -2, -2, 2, 2, -2, -2, 2, 2, -2, -2, 2, 2, -2, -2,
1
2
3
4
5
6
7
8
9
+4
+3
+2  **  **  **  **
+1    | | | | | | |
 0  ..|.|.|.|.|.|.|.
-1    | | | | | | |
-2    **  **  **  **
-3
-4


This would be higher pitch (specifically, one octave higher since it's 2x the frequency)



EDIT: I'm probably misusing the term "amplitude" here.... "magnitude" would be a better word.
Last edited on
@Disch
Thanks a lot for all that information. I didn't know what the samples actually represented: I thought it was a sequence of frequencies, not amplitudes, but I should have realised based on what I already know about acoustic waves. I understand now. Thanks.

On an unrelated note, my brain keeps telling me that sequence of 4s and -4s is in a maroon coloured font; at first I was wondering how you had coloured it since this forum has no [color] tags. Strange.
You are correct in using the term amplitude for the volume of the wave. Magnitude could work but is a bit ambiguous (I think so at least).
Amplitude is definitely the correct word; the amplitude of a wave is the distance from the highest peak to the lowest trough. The magnitude is the distance from the equilibrium position to the highest peak (usually half the amplitude).
I didn't know what the samples actually represented


As an oversimplified view of analog audio:

A microphone may contain a membrane made of some piezo electric material. A piezo electric material is a material that when stressed or "pushed" generates a voltage ( and hence also a current, though it is the voltage that is important). The induced voltage is proportional to the displacement of the membrane perpendicular to the plane of the membrane. As air molecules strike the membrane (vibrating in the complex way Disch described above) this induces a fluctuating voltage in the wire from the microphone whose frequency content is ideally the same as that of the original sound wave. The signal is then amplified through various stages, ideally changing only the amplitude of the electical signal and not its frequency content. The final stage outputs the signal to the speakers which are driven to vibrate in response to the fluctuating voltage. The moving cones in the speakers force the air molecules to vibrate reproducing the original sound wave. What is sampled in digital audio is the fluctuating voltage in the electrical signal. The sampling rate is the number of samples per second (CD rate is 44.1 kHz) and the sampling depth is the number of bits used to represent the voltage values (8-bit, 16-bit etc.). Hope this helps a little.
Topic archived. No new replies allowed.