how to generate sounds of mixed waveforms

Pages: 12
I have used beep() and sound() functions. But both of them generate sounds of only fixed frequency. Is there any way i can generate a sound of mixed waveforms.
I am currently using this code, but it isnt reliable at all times and for all frequencies,
1
2
3
4
5
6
7
for(i=0;i<500;i++)
{
   sound(500);
   delay(2);
   sound(250);
   delay(2);
}
somebody help please
Why not use something like SFML: http://www.sfml-dev.org/

Then you can use all kinds of sounds: http://www.sfml-dev.org/tutorials/2.0/audio-sounds.php
Beep and sound are not designed for complex output. They are designed for quick and easy feedback -- like to report an error or abnormal behavior to the user.

Outputting complex audio is a significantly larger task.


The easiest way is to do something like Mats is suggesting. Rather than playing individual tones... simply generate a .wav or .ogg file that has the sounds you want to play... then get a lib to load and play it. SFML can do this... but comes bundled with a lot of other non-audio stuff that you may or may not want (though you can specifically use only the audio portion, as it's in a separate module).

Other libs which focus solely on audio output include BASS, and FMOD... the latter of which is pretty complicated... but the former of which isn't so bad.

Or you can use built-in functionality of the OS's API. On Windows there are functions to load and play wave files (PlaySound).. and I think there are even higher level functions which can playback compressed formats like ogg or mp3, but I'm not familiar with them.



If you want to generate raw waveforms tone by tone and do not want to simply play back a file... this can certainly be done (and I have done it many times in the past). I don't mind giving you a crash course on how to do it, but I don't want to spend the time explaining it if it's not what you're looking for. Also be warned that it's pretty involved and playing back an existing audio file is significantly easier.
I was able to generate sounds like those in pac-man and it is just as you said 'generating raw waveforms' by quickly switching between two frequencies. I definitely do not intend to playback an mp3 file.(I saw the format of an mp3 file on wikipedia and it's rather much complicated). I'd like to have your crash course
Last edited on
I definitely do not intend to playback an mp3 file.(I saw the format of an mp3 file on wikipedia and it's rather much complicated).


Well... you would not have to deal with the mp3 file format. The lib takes care of that.

Typically playing an mp3 with an audio lib is a simple ~2 lines of code:

1
2
Foo myAudioFile = lib::LoadFile("audio.mp3");
myAudioFile.play();


There may be a few more lines involved with setting up the lib... but really it usually boils down to something that simple.

So I recommend reconsidering.


That said... I'm at work now so I can't dedicate the time to an explanation of generating raw pcm and streaming it. When I get home (6-7 hours from now) I'll try to remember to come back to this thread and give a proper response.
Here is a simple program I wrote for a C workshop I taught as a tutor last year. It just does sin waves, and doesn't play them, but writes them to file. It depends on libsndfile.

For more interesting sounds, try generating saw tooths, or square waves.

The buffer should be on the heap, but the students hadn't learned about dynamic memory allocation yet.

Also, you would probably want to make sure each added waveform ends with 0. You could just write each waveform to a temporary buffer, then work backwards zeroing out until you hit 0, then add that to the main buffer.

The program is intended to accept a text file piped in, containing the information describing a musical piece, similar to a midi file, but much more basic. Each line would have three white space separated values, frequency, offset from the start of the file (in samples), and duration of the tone (in samples).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
#include <stdio.h>
#include <math.h>
#include <stdlib.h>
#include <sndfile.h>

#define BUFFER_SIZE 1000000
#define SAMPLE_RATE 44100.0
#define PI 3.141592654

void writeAudioFile(double *buffer) {
    
    SF_INFO sndInfo;

    sndInfo.samplerate = SAMPLE_RATE;
    sndInfo.channels   = 1;
    sndInfo.frames     = BUFFER_SIZE;
    sndInfo.format     = SF_FORMAT_PCM_16 | SF_FORMAT_WAV;

    SNDFILE *outFile = sf_open("sin.wav", SFM_WRITE, &sndInfo);

    sf_writef_double(outFile, buffer, BUFFER_SIZE);
    sf_close(outFile);
}

int main() {

    double buffer[BUFFER_SIZE];

    for (int s = 0; s < BUFFER_SIZE; ++s)
        buffer[s] = 0;

    // sine wave: y(t) = amplitude * sin(2 * PI * frequency * time), time = s / sample_rate

    /*
       A  : 440
       A# : 466.16
       B  : 493.88
       C  : 523.25
       C# : 554.37
       D  : 587.33
       D# : 622.25
       E  : 659.26
       F  : 698.46
       F# : 739.99
       G  : 783.99
       G# : 830.61
    */

    double frequency;
    int offSet; 
    int durration;

    printf("\nSin Wave Generator\n\n");
    while (scanf("%lf %d %d", &frequency, &offSet, &durration) == 3) {

        double amplitude = 1.0;

        printf("%lf Hz\t%d offset\t%d durration\n", frequency, offSet, durration);

        for (int s = offSet; s < BUFFER_SIZE && s < offSet + durration; ++s) 
            buffer[s] += amplitude * sin( (2.0 * PI * frequency) * (s / SAMPLE_RATE));      
    }

    double max = 1.0;
    for (int s = 0; s < BUFFER_SIZE; ++s)
        if (buffer[s] > max)
            max = buffer[s];

    //normalize
    for (int s = 0; s < BUFFER_SIZE; ++s)
        buffer[s] /= max;

    writeAudioFile(buffer);
}
Last edited on
Basics

So I don't know how much you know about digital audio. I'm going to assume it's not a lot... so let's start with some basic terms.

Multi-purpose digitial audio is typically stored in a format known as "PCM". I'll get into the details of it later, but basically you have a series of "samples" that join together to form an audible wave. Much like how "pixels" join together to form a picture.

The "sample rate" term that is thrown around determines how many samples are used to represent 1 second worth of audio. So a 44100 Hz sample rate dictates that if you have 44100 samples, then you have 1 second worth of audio. For another image analogy... you can think of this as being similar to "megapixels".

The "bits per sample" (bps) term implies the size of each sample. Higher = better quality. For yet another image analogy, bps can be compared to bits per pixel (ie: 24-bit ("true" color) vs. 8-bit color (only 256 possible colors).

The key difference between the two... is that unlike image data which is typically 2 dimentional (in both X and Y axis, to form a square)... PCM data is 1 dimentional (the only dimention is time).


PCM

So what exactly does each sample represent? You might think that the sample data contains things like tone and volume... but you'd be wrong! Or at least you would if that's what you're thinking.

Samples are just evenly spaced 'snapshots' (or... "samples"... the name is very apt) of a sound wave taken at different points in time. Each sample is just a numerical value, which has a upper-bound and a lower-bound, usually determined by the bps. For example, a 16-bit signed sample has a range of -32768 (low) to 32767 (high). A single sample would just be a number somewhere in that range.

For example... let's say we have the below series of samples:

5, 7, 8, 8, 9, 9, 8, 8, 7, 6, 4, 3, 2, 1, 1, 0, 0, 1, 1, 2, 3


If you take each of these samples and plot them on a grid... where the sample value is the Y axis and the X axis represents time (or the position of the sample)... you can just 'connect the dots' to map out a sound wave. In this case... it's a very crude imitation of a sine wave:

1
2
3
4
5
6
7
8
9
10
 9  |        **
 8  |      **  **
 7  |     *      *
 6  |             *
 5  |    *                    
 4  |              *          
 3  |               *        *
 2  |                *      *
 1  |                 **  **
 0  |                   **



This sound wave is ultimately be fed to your speakers, and basically is the pattern in which they will vibrate to produce the sound you hear.

"taller" waves are louder. "wider" waves are lower pitch. (This is a very simplistic and basic way to look at it, but it'll suffice for your purposes).



So how do you generate this audio?

htirwin shows how to generate a basic sine wave... so that's a good code example to start with. But let's try to explain what's going on behind it.

The most fundamental type of sound wave is a sine wave. There are reasons for this, but I only vaugly understand them myself. A sine wave has no edges (it is perfectly round) and therefore is the 'softest' and 'least complex' sound wave possible. So that's typcially where people start. You'll find that 'sharper' shaped waves tend to sound 'rough', whereas rounded ones tend to sound 'smooth'.

The "pitch" of a generated tone is measured in Hertz... or "times per second". The above illustrated psuedo-sine wave would be one cycle of that sound wave. You can repeat the pattern again and again:

1
2
3
4
5
6
7
8
9
10
 9  |        **                   **                
 8  |      **  **               **  **              
 7  |     *      *             *      *             
 6  |             *                    *            
 5  |    *                    *                    
 4  |              *                    *           
 3  |               *        *           *        * 
 2  |                *      *             *      *  
 1  |                 **  **               **  **   
 0  |                   **                   **     


The number of times you complete the full pattern in one second determines the "frequency" or "pitch" of the generated tone. If you complete this pattern 440 times in 1 second... then that produes a 440 Hz tone (concert A).


Remember that samples are a function of time... so if you have a sequence that is the right width to play at 440Hz but you do not loop it.. then it will only play for a fraction of a second before stopping.

In that same vein... you can't "hold" a note simply by outputting the same sample over and over. By doing that, you'll "flatline" your output and will not have motion. No motion = speakers stay still = no audio.



Mixing multiple sounds together

Broadly speaking, there are 2 ways to do this: In software and in hardware.

Hardware mixing is pretty easy. You just open two audio buffers (using whatever audio API you're using) and give it two independent sound waves. The sound card (hardware) will do the work of combining them together so that the user hears both of them at the same time.

Software mixing is when you combine the sound waves together into one complex wave before sending it to the sound card. Naive software mixing is incredibly simple. To do it, you just add each wave's sample together. Really. That's it:

1
2
3
4
5
// combine wave1 and wave2 into a single 'outputwave'
for( ... each sample in the waves ... )
{
   outputwave[i] = wave1[i] + wave2[i];
}


Now of course there are some caveats.
- both sound waves must be the same samplerate
- you have to be careful not to exceed the min/max allowed values for any given sample.


But whether you want to do software or hardware mixing depends on what you're doing.


Streaming audio over time

-- I'll get into this more tomorrow if there's still interest in the topic. But right now I'm tired and exhausted. And I'm not even sure if this info is going to be used or not.


But whatever. Let me know if you're still interested and I can keep going.
Probably you have explained that i already know like the sampling frequency, sine wave, etc. I mentioned that i have already about mp3 format on wiki (which covered almost everything). But i appreciate the effort. My code (in the very beginning of this thread) produces a sound of which is formed by superpositionof 2 frequencies (this is what happens in real life audio). I am still interested in producing some sounds. Or do i have to only play a .wav file. I am still interested.
Err....

My code [snip] produces a sound [snip]. I am still interested in producing some sounds.


If you're already producing sounds... then what exactly is your question?
Interesting post Disch!
the sounds produced by beep and sound seem to be more electronic and digital. I am interested in 'how audio is streamed over time' (just where you left it). I mean there mustbe something that the library and os itself might be using to turn on the speaker to a certain frequency.
the sounds produced by beep and sound seem to be more electronic and digital.


They're probably producing different waves than a basic sine wave.

Instead of a sine wave then... experiment with other waveforms. Here are some other basic waveforms you can tinker with:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
Triangle:
=====================


 9  |        *
 8  |       * *
 7  |      *   *
 6  |     *     *                    
 5  |    *       *                             
 4  |             *       *
 3  |              *     *
 2  |               *   *
 1  |                * *
 0  |                 *
 
 
Sawtooth:
=====================

 9  |    *         *
 8  |     *        |*
 7  |      *       | *
 6  |       *      |  *           
 5  |        *     |   *                     
 4  |         *    |    *
 3  |          *   |     *
 2  |           *  |      *
 1  |            * |       *
 0  |             *         *
 
 
50% Pulse  (Square):
=====================
 9  |   ********        ********
 8  |           |       |       |
 7  |           |       |       |
 6  |           |       |       |
 5  |           |       |       |
 4  |           |       |       |
 3  |           |       |       |
 2  |           |       |       |
 1  |           |       |       |
 0  |           ********        ********
 
 
25% Pulse:
=====================
 9  |   ****            ****
 8  |       |           |   |
 7  |       |           |   |
 6  |       |           |   |
 5  |       |           |   |
 4  |       |           |   |
 3  |       |           |   |
 2  |       |           |   |
 1  |       |           |   |
 0  |       ************    ************


These are the waveforms typically used by "chiptune" and retro video game systems like the NES, Gameboy, etc.


I am interested in 'how audio is streamed over time


Judging from the rest of your post, I'm not so sure this is really what you're interested in. Or at least maybe you think I mean something other than what I do.

For now... don't worry about streaming audio... but instead just generate a wave file and play that in whatever media player you want (similar to what htirwin did in his example). You seem to be more interested in generating the audio than actually playing it.


I mean there mustbe something that the library and os itself might be using to turn on the speaker to a certain frequency.


You don't "turn the speaker to a frequency". Reread my earlier post a bit more closely. The information the speaker gets is just sample data. Each sample is a "position" that the speaker moves to. Then 1/44100th of a second later, it moves to the position of the next sample... then after another 1/44100th of a second it goes to the next position, etc, etc. That rapid movement is what generates audio.

The "sound"/instrument/whatever, the frequency, the volume.... none of that is set by a single sample. But rather is set by the wave as a whole.
I need to be clear a bit. The most used sampling frequency is 44100Hz, that corresponds to about 0.0000227 second. If you see my code, the sampling frequency is only 500Hz,only 0.002 seconds. This is what results in the unrealistic audio. But delay() doesnt allow me to give a less value than 1 millisecond, like say 0.2ms. Is the solution related to audio or is completely different?
Last edited on
No, see... nowhere in your original code are you specifying the sample rate. You are specifying the tone. There are two different "Hz" here.. and I think you are mixing them up.

The sample rate in Hz is the number of samples per second.

The tone in Hz is the number of repetitions per second.

So a 400 Hz tone is some waveform which repeats itself 400 times every second. But the waveform itself must consist of several samples.



I think you are still misunderstanding the big picture here. You seem to have it in your head that you can just call a function to play a specific tone. While this is true for beep/sound functions... this is not how most audio output works and you need to put this out of your head and move passed it.

beep/sound are generating the waveform internally. If you are doing your own audio generation.. then YOU have to generate the waveform. This means you do not simply tell it "play a tone that is 400 Hz".. it means you have to construct a PCM wave which contains the samples necessary to play a 400 Hz tone. It's significantly more complicated.
The upper frequency range of an audio wave that can be reproduced digitally is dependent on the sample rate.

You can reproduce a wave a frequency of half the sample rate (called the Nyquist rate). So for 44100, you can get up to 22500 Hz sound waves. This is a bit past the normal human hearing range of 20000.

With a sample rate of 500 Hz, you would only be able to produce up to 250 Hz, which is fairly low.

Edit
I'm not sure this applies to what you're doing.

@Disch
You should write an article.
Last edited on
*seconds the idea of Disch doing an article on this*
@letscode

If you are still here, here's an example program I slapped together to give you sort of an idea of what disch and the others are talking about:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
#define NOMINMAX
#include <windows.h>
#include <iostream>
#include <cmath>
#include <limits>
#include <fstream>
#include <string>

typedef signed short					BitDepth;//	16bit audio
typedef unsigned long long QWORD;

const double pi = 3.141592653589;
const DWORD samplerate				=	44100;
const WORD channels					=	2;
const unsigned short SOUND_DURATION =	1;//	1 second for example.
const QWORD NUM_SAMPLES				=	SOUND_DURATION * samplerate * channels;

void sineWave(BitDepth buffer[], double freq) {
	BitDepth amplitude = std::numeric_limits<BitDepth>::max()*0.5;
	QWORD c=0;
	double d=(samplerate/freq);
	for(QWORD i=0; i<NUM_SAMPLES; i+=channels) {
		double deg=360.0/d;
		buffer[i] = buffer[i+(1*(channels-1))] = sin((c++*deg)*pi/180)*amplitude;
	}
}

template<typename _Ty> void write(std::ofstream& stream, const _Ty& ty) {
	stream.write((const char*)&ty, sizeof(_Ty));
}

void writeWaveFile(const char* filename, BitDepth* buffer) {
	std::ofstream stream(filename, std::ios::binary);

	stream.write("RIFF", 4);
	::write<int>(stream, 36+(NUM_SAMPLES*sizeof(BitDepth)));
	stream.write("WAVEfmt ", 8);
	::write<int>(stream, 16);
	::write<short>(stream, 1);
	::write<unsigned short>(stream, channels);
	::write<int>(stream, samplerate);
	::write<int>(stream, samplerate*channels*sizeof(BitDepth));
	::write<short>(stream, channels*sizeof(BitDepth));
	::write<short>(stream, sizeof(BitDepth)*8);
	stream.write("data", 4);
	::write<int>(stream, NUM_SAMPLES*sizeof(BitDepth));
	stream.write((const char*)&buffer[0], NUM_SAMPLES*sizeof(BitDepth));
	stream.close();
}

int main(int argc, char** argv) {
	SetConsoleTitleA("PCM Audio Example");

	std::string filename = "sine";

	BitDepth* buffer = new BitDepth[NUM_SAMPLES];
	memset(buffer, 0, NUM_SAMPLES*sizeof(BitDepth));

	sineWave(buffer, 440.0);


	writeWaveFile(std::string(filename+std::string(".wav")).c_str(), buffer);
	delete[] buffer;

	std::cout << filename << ".wav written!" << std::endl;
	std::cin.get();
	return 0;
}


This program writes an RIFF .wav file with PCM sine wave data.
You can make this prettier in a lot of ways, like with RIFF header structs and less hardcodedness. It also only supports the writing of little-indian, canonical .wav files with no additional chunks. Here's a good reference:
https://ccrma.stanford.edu/courses/422/projects/WaveFormat/
@letscode
I think what Disch, and others as well, is getting at is that it's technically possible to do what you're trying to do, which is to write a function that has the ability to generate frequencies of sound that you can specify. While this is possible, however, you have to know a LOT about trigonometry, wave propagation, calculus, c++, your computer's specific hardware, and your operating system's API and background function, plus the concept of how to integrate your program into the timing cycle of your processor.

...or you can just use already built libraries to do all of that for you, which is what they're trying to get you to do.
(excuse the poor formatting of the post, i am typing on my cellphone). According to me there are 2 ways to create a sound. First way - if you know about superposition principle, it says that any two or more than 2 waves produces its own displacement irrespective of other waves. Conversely, any waveform can be resolved into pure sine waves of different amplitudes and frequencies. This what i tried to do by playing 2 frequencies simultaneously instead of playing them simultaneously. But here's where my code fails, it keeps on switching between two sounds quickly. The second way (that i understand now what you were trying to say) is to output the amplitude ( or the samples) itself with respect to time. Is my concept right now so that we can proceed ? Does channel mean the number of input waves ? Also @xismn and htirwin once we have written the samples to our .wav file can a standard .wav encoder or library be able to play it, coz it is missing some things like headers and other chunks as you had said
Last edited on
Pages: 12