Audio Sampling in C++

I'm trying to read from a WAVE file for a school assignment. The project is to sample from the file in order to get enough points to perform a DFT and isolate each of the frequencies from the sample. I can use a library for audio sampling, but I would like not to, just out of personal interest. can anyone point me in the direction of some source code for doing this? I have already figured out the DFT, I just need sampling.

A couple points
- I'm using windows, but can change to Linux if necessary
- I don't need to use WAVE files, but they are what my teacher recommended
- I can use a Library if necessary or if this will take a very long time

Thanks
If I understand your problem, you just need to parse WAV files. As https://en.wikipedia.org/wiki/WAV says, they contain headers and data in Linear Pulse-Code Modulation format, which is just an array of audio samples. These are the samples you need for DFT.
so ive been able to read the header of the wave file, but not much beyond that.
the file im using is 30 seconds of the 'a' note (440 Hz), and the header reading is fine. This is what i get for reading the header:

chunk size 16
format type 1
channels 1
sample rate 44100
average bytes per second 88200
bytes per sample 2
bits per sample 16

which, to my inexperienced eyes, looks okay. What confuses me is what comes next.
I put the following code snippet in an infitite loop, hoping to read the data that comes next.
1
2
3
4
int data; // unsure of type
fread(&data, bytes_per_sample, data_size / bytes_per_sample, file_pointer);
std::cout << data << std::endl;
Sleep(100);


what i get is the number 107675648 repeated over and over. I tried changing the type to long long, then the number i get is 1380085881232883712 repeated. its in an infinite loop, so i get why its repeating, but what is the significance of those numbers? am I doing something wrong?
Thanks.

EDIT:

Ive also tried an 880 Hz file (one octave higher) that is otherwise the same. The numbers I get are 215023615 and 2710913431087677439 if that helps.
Last edited on
Instead of int which is typically 32 bits, or long long, typically 64 bits, try short, which should be 16 bits.

More dependably, include the C++ header <cstdint>
and use type int16_t which will be a 16-bit integer.
http://www.cplusplus.com/reference/cstdint/

When using the fread() function, check the return value.

1
2
3
4
int16_t data; 
size_t elements_read = fread(&data, bytes_per_sample, 1, file_pointer);
std::cout << "requested: " << 1 << "  actually read: " << elements_read << std::endl;
std::cout << data << std::endl;

http://www.cplusplus.com/reference/cstdio/fread/
Return Value
The total number of elements successfully read is returned.
If this number differs from the count parameter, either a reading error occurred or the end-of-file was reached while reading.



edit: The code snippet is ambiguous. What is data_size here? It could be the size in bytes of the data chunk in the file. Or it could be the size in bytes of the variable int data;
Last edited on
So I've tried a few things. Chervil is right about several points, using int16_t gave me some (somewhat) sensible results. basically, after changing that snippet to
1
2
3
4
5
6
7
8
9
10
bool keep_going = true;

while (keep_going)
{
	int16_t data;
	int read_size = chunk_size / bits_per_sample;  
	size_t elements_read = fread(&data, bytes_per_sample, read_size, file_pointer);
	cout << read_size << " " << elements_read << "\t" << (int)data << endl;
	if ((int)elements_read == 0) keep_going = false;		
}


So I get 19 pieces of data with this, and that's wrong. I should be getting a lot more (it's a 30-second clip with a sampling rate of 44100 Hz)
so I'm pretty sure my issue is with the fread() function, specifically the element size and number of elements arguments (I currently have bytes_per_sample and read_size respectively). Can anyone provide insight on the arguments I should use? the variables I'm pulling from the wave header are:
chunk size
sample rate
average bytes per second
bytes per sample
bits per sample
data size
Thanks
Last edited on
I suggested you use 1 as the number of elements to read at a time, because that's all the space you have in a single int16_t data;

I don't know what is the resulting value of
 
int read_size = chunk_size / bits_per_sample; 

but if it is any bigger than 1, the rest of the data being read must overwrite (corrupt) some other area of memory. The likelihood of a program crash or unpredictable behaviour seems great.
yeah so I've basically tried exactly that, with my fread() reading 2 bytes at a time (16 bits, should be fine), and I get 19 pieces of data, which doesn't make sense. Turns out that chunk size and bits per sample are basically the same thing. I should be gettting upwards of 1 million samples, so this doesn't make a whole lot of sense to me.

30 seconds of audio * 44100 samples / sec = ~1.3 Million samples
Turns out that chunk size and bits per sample are basically the same thing
Which chunk is that?
When I looked at a simple WAV file, there were at least 3 chunks, the "RIFF" chunk, the format chunk and the data chunk. That last one should contain the samples, the size should be something like 2646000, for a 30-second mono 16-bit audio clip at 44100kHz sample rate.


It would help if you showed the complete code, otherwise there are too many holes where things have to be guessed at.

So heres the code, I'm only trying to get the job done, the code's a bit messy:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
// Reads Header and Data from WAVE file
// 2017-01-03

/*
 *	wave file header format:
 *		(4 byte) type = "RIFF"
 *		(4 byte) size (data, not file)
 *		(4 byte) type = "WAVE"
 *		(4 byte) type = "fmt "
 *
 *		(32 bit) chunk_size
 *		(16 bit) format_type = 1 (1 = PCM, uncompressed)
 *		(16 bit) number_of_channels (1 = mono, 2 = stereo, can go on to several channels)
 *		(32 bit) sample_rate (samples per second)
 *		(32 bit) average_bytes_per_second
 *		(16 bit) bytes_per_sample
 *		(16 bit) bits_per_sample (bytes_per_sample * 8) (quality of sound, 8 bit, 16 bit)
 *
 *		(4 byte) data = "data"
 *		(32 bit) size (size of data)
 *
 */

#include "stdafx.h" // Visual Studio only
#include <iostream>

using namespace std;

int main()
{
	// create a file pointer
	FILE *file_pointer = NULL;

	// open file
	fopen_s(&file_pointer, "440.wav", "r");

	if (!file_pointer)
	{
		cout << "Error: missing / bad file" << endl;
		return 1;
	}

	// declare variables
	// TODO - make wave header class
	char type_1[4]; // "RIFF"
	char type_2[4]; // "WAVE"
	char type_3[4]; // "fmt "
	DWORD size, chunk_size;
	short format_type, channels;
	DWORD sample_rate, avg_bytes_per_sec;
	short bytes_per_sample, bits_per_sample;
	char type_4[4]; // "data"
	DWORD data_size;

	// check for RIFF format
	fread(type_1, sizeof(char), 4, file_pointer);
	if (!strcmp(type_1, "RIFF"))
	{
		cout << "Error: not RIFF format";
		return 1;
	}

	// data size
	fread(&size, sizeof(DWORD), 1, file_pointer);

	// check for WAVE format
	fread(type_2, sizeof(char), 4, file_pointer);
	if (!strcmp(type_2, "WAVE"))
	{
		cout << "Error: not WAVE format";
		return 1;
	}

	// check for "fmt " string
	fread(type_3, sizeof(char), 4, file_pointer);
	if (!strcmp(type_3, "fmt "))
	{
		cout << "Error: missing format string";
		return 1;
	}

	// chunk size
	fread(&chunk_size, sizeof(DWORD), 1, file_pointer);

	// format type
	fread(&format_type, sizeof(short), 1, file_pointer);

	// number of channels
	fread(&channels, sizeof(short), 1, file_pointer);

	// sample rate
	fread(&sample_rate, sizeof(DWORD), 1, file_pointer);

	// average bytes per second
	fread(&avg_bytes_per_sec, sizeof(DWORD), 1, file_pointer);

	// bytes per sample
	fread(&bytes_per_sample, sizeof(short), 1, file_pointer);

	// bits per sample
	fread(&bits_per_sample, sizeof(short), 1, file_pointer);

	// check for "data" string
	fread(type_4, sizeof(char), 4, file_pointer);
	if (!strcmp(type_4, "data"))
	{
		cout << "Error: missing data string";
		return 1;
	}

	// data size
	fread(&data_size, sizeof(DWORD), 1, file_pointer);

	// output header info
	cout << "chunk size " << chunk_size << endl;
	cout << "format type " << format_type << endl;
	cout << "channels " << channels << endl;
	cout << "sample rate " << sample_rate << endl;
	cout << "average bytes per second " << avg_bytes_per_sec << endl;
	cout << "bytes per sample " << bytes_per_sample << endl;
	cout << "bits per sample " << bits_per_sample << endl;
	cout << "data size " << data_size << endl;

	Sleep(100);

	// attempt data collection for wave file
	cout << "attempting data collection" << endl;

	bool keep_going = true;

	while (keep_going)
	{
		int16_t data;
		size_t elements_read = fread(&data, sizeof(int16_t), 1, file_pointer);
		cout << read_size << " " << elements_read << "\t";
		
		if ((int)elements_read == 0)
		{
			keep_going = false;
			continue;
		}
		cout << (int)data << endl;
	}

	fclose(file_pointer);
	return 0;
}


the "440.wav" file in the code is just 30 seconds of a 440 Hz sine wave.
Thanks for posting the code.

I've tried to run it, think I got it figured out now. There are a number of problems in the use of strcmp(). Remember c-strings must have a null terminator, so you need to allocate an array of 5 characters and set the last one to zero.
 
	char type_1[5] = { 0 }; // "RIFF" 


1
2
3
4
5
	if ( strcmp(type_1, "RIFF")  )  // remove the ! not operator
	{
		cout << "Error: not RIFF format";
		return 1;
	}


The other problem is here at line 35
 
    fopen_s(&file_pointer, "440.wav", "rb");  // open in binary mode 
Thanks a lot! the binary mode was what screwed me up. The strcmp() worked fine without the char type_1[5] = { 0 }; bit, when I changed the code it didn't work. I changed the "r" to "rb" and it looks like its working (I need to test the data to make sure its right).
Thanks, that really helps!
The strcmp() worked fine without the char type_1[5] = { 0 }; bit
No, it didn't.

What actually happened was instead of stopping the compare after the first four characters, strcmp() carried on searching for a null terminator, until it eventually stopped and reported that the strings were different (not least because one was longer than the other).

when I changed the code it didn't work.
Yes, because strcmp found that the strings were the same and returned a value of 0 as expected. After applying the ! (logical NOT operator) the result changed from false to true, hence the error code was executed and the program terminated.

Below: using a vector to read the entire audio sample data in one go.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
// Reads Header and Data from WAVE file
// 2017-01-03

/*
 *  wave file header format:
 *      (4 byte) type = "RIFF"
 *      (4 byte) size (data, not file)
 *      (4 byte) type = "WAVE"
 *      (4 byte) type = "fmt "
 *
 *      (32 bit) chunk_size
 *      (16 bit) format_type = 1 (1 = PCM, uncompressed)
 *      (16 bit) number_of_channels (1 = mono, 2 = stereo, can go on to several channels)
 *      (32 bit) sample_rate (samples per second)
 *      (32 bit) average_bytes_per_second
 *      (16 bit) bytes_per_sample
 *      (16 bit) bits_per_sample (bytes_per_sample * 8) (quality of sound, 8 bit, 16 bit)
 *
 *      (4 byte) data = "data"
 *      (32 bit) size (size of data)
 *
 */

//#include "stdafx.h" // Visual Studio only
#include <iostream>
#include <iomanip>
#include <cstdio>
#include <vector>
#include <windows.h>

using namespace std;


void fopen_s(FILE ** file_pointer, const char * fname, const char * mode)
{
    *file_pointer = fopen(fname, mode);
}

int main()
{
    // create a file pointer
    FILE *file_pointer = NULL;

    // open file
    fopen_s(&file_pointer, "440.wav", "rb"); // binary mode

    if (!file_pointer)
    {
        cout << "Error: missing / bad file" << endl;
        return 1;
    }

    // declare variables
    // TODO - make wave header class
    char type_1[5] = { 0 }; // "RIFF"
    char type_2[5] = { 0 }; // "WAVE"
    char type_3[5] = { 0 }; // "fmt "
    DWORD size, chunk_size;
    short format_type, channels;
    DWORD sample_rate, avg_bytes_per_sec;
    short bytes_per_sample, bits_per_sample;
    char type_4[5] = { 0 }; // "data"
    DWORD data_size;

    // check for RIFF format
    fread(type_1, sizeof(char), 4, file_pointer);
    if (strcmp(type_1, "RIFF"))
    {
        cout << "Error: not RIFF format";
        return 1;
    }

    // data size
    fread(&size, sizeof(DWORD), 1, file_pointer);

    // check for WAVE format
    fread(type_2, sizeof(char), 4, file_pointer);
    if (strcmp(type_2, "WAVE"))
    {
        cout << "Error: not WAVE format";
        return 1;
    }

    // check for "fmt " string
    fread(type_3, sizeof(char), 4, file_pointer);
    if (strcmp(type_3, "fmt "))
    {
        cout << "Error: missing format string";
        return 1;
    }

    // chunk size
    fread(&chunk_size, sizeof(DWORD), 1, file_pointer);

    // format type
    fread(&format_type, sizeof(short), 1, file_pointer);

    // number of channels
    fread(&channels, sizeof(short), 1, file_pointer);

    // sample rate
    fread(&sample_rate, sizeof(DWORD), 1, file_pointer);

    // average bytes per second
    fread(&avg_bytes_per_sec, sizeof(DWORD), 1, file_pointer);

    // bytes per sample
    fread(&bytes_per_sample, sizeof(short), 1, file_pointer);

    // bits per sample
    fread(&bits_per_sample, sizeof(short), 1, file_pointer);

    // check for "data" string
    fread(type_4, sizeof(char), 4, file_pointer);
    if (strcmp(type_4, "data"))
    {
        cout << "Error: missing data string";
        return 1;
    }

    // data size
    fread(&data_size, sizeof(DWORD), 1, file_pointer);

    // output header info
    cout << "chunk size " << chunk_size << endl;
    cout << "format type " << format_type << endl;
    cout << "channels " << channels << endl;
    cout << "sample rate " << sample_rate << endl;
    cout << "average bytes per second " << avg_bytes_per_sec << endl;
    cout << "bytes per sample " << bytes_per_sample << endl;
    cout << "bits per sample " << bits_per_sample << endl;
    cout << "data size " << data_size << endl;

    Sleep(100);

    // attempt data collection for wave file
    cout << "attempting data collection" << endl;
    unsigned long number_of_samples = data_size/bytes_per_sample;
    std::cout << "number of samples = " << number_of_samples << '\n';
    
    std::vector<short> samples(number_of_samples);
    size_t elements_read = fread(samples.data(), sizeof(short), number_of_samples, file_pointer);    
    

    fclose(file_pointer);

    cout << "samples read = " << elements_read << "\n";
    
//  for (size_t i = 0; i<samples.size(); ++i)  // this works but could take a long time
    for (size_t i = 0; i<50; ++i)
    {
        cout << setw(10) << i << setw(10) << samples[i] << '\n';
    }

    return 0;
}
Last edited on
Teah you're right, my mistake. I actually can't remember why I put the not operator there... Thanks a lot!
Out of interest, my version. It doesn't work any better, but its more C++ than plain C.

main.cpp
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
#include <iostream>
#include "wave.h"

int main() 
{
    Wave wave("note_a.wav");
    std::cout << wave.status() << '\n';
    wave.showRiff();
    wave.showFormat();
    wave.showData();
    
    std::cout << "Number of samples = " << wave.numberOfSamples() << '\n';
    
    for (int i=0; i<20; ++i)
        std::cout << wave.samples[i] << '\n';
        
}


wave.h
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
#ifndef WAVE_H
#define WAVE_H

#include <cstdint>
#include <fstream>
#include <vector>

//-------------------------------------------------------------------------//
//              Wave file stuff                                            //
//-------------------------------------------------------------------------//

// Three "chunks", 'RIFF', 'fmt' and 'data'.

////////////////
// RIFF chunk //
////////////////
struct riffChunk {
    char     id[4];
    uint32_t size;
    char     fmt[4];
};

///////////////////
// fmt subchunk  //
///////////////////
struct fmtChunk {
    char id[4];
    uint32_t size;
    uint16_t audioFormat;
    uint16_t numChannels;
    uint32_t sampleRate;
    uint32_t byteRate;
    uint16_t blockAlign;
    uint16_t bitsPerSample;
};


///////////////////
// data subchunk //
///////////////////
struct dataChunk {
    char id[4];
    uint32_t size;
    // Actual sample data should be appended here...
};


//----------------------------------------------------------------------------

class Wave {

private:
    riffChunk rchunk;
    fmtChunk  fchunk;
    dataChunk dchunk;
    bool stat;
    
public:
    
    std::vector<uint16_t> samples;
        
    Wave(const char * fname);
    ~Wave();

    bool status() const { return stat;  }
    void showRiff() const;
    void showFormat() const ;
    void showData() const ;
    size_t numberOfSamples() const { return samples.size(); }
};

//----------------------------------------------------------------------------

#endif 


wave.cpp
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
#include "wave.h"
#include <iostream>
#include <iomanip>

Wave::Wave(const char * fname)
{
    std::ifstream infile(fname, std::ios::binary);
    infile.read(reinterpret_cast<char *>(& rchunk ), sizeof(rchunk));
    infile.read(reinterpret_cast<char *>(& fchunk ), sizeof(fchunk));
    infile.read(reinterpret_cast<char *>(& dchunk ), sizeof(dchunk));
    
    samples.resize(dchunk.size/2 );
    infile.read(reinterpret_cast<char *>(samples.data()), dchunk.size );
    stat = bool(infile);
}

Wave::~Wave()
{

}

void Wave::showRiff() const
{
    std::cout.write(rchunk.id, sizeof(rchunk.id)   ) << ' ';
    std::cout.write(rchunk.fmt, sizeof(rchunk.fmt)   ) << ' ';
    std::cout << rchunk.size << '\n';
}

void Wave::showFormat() const
{
    std::cout.write(fchunk.id, sizeof(fchunk.id)   ) << '\n';
    std::cout << std::setw(16) << "size"          << std::setw(16) << fchunk.size          << '\n'; 
    std::cout << std::setw(16) << "audioFormat"   << std::setw(16) << fchunk.audioFormat   << '\n'; 
    std::cout << std::setw(16) << "numChannels"   << std::setw(16) << fchunk.numChannels   << '\n'; 
    std::cout << std::setw(16) << "sampleRate"    << std::setw(16) << fchunk.sampleRate    << '\n'; 
    std::cout << std::setw(16) << "byteRate"      << std::setw(16) << fchunk.byteRate      << '\n'; 
    std::cout << std::setw(16) << "blockAlign"    << std::setw(16) << fchunk.blockAlign    << '\n'; 
    std::cout << std::setw(16) << "bitsPerSample" << std::setw(16) << fchunk.bitsPerSample << '\n'; 
}

void Wave::showData() const
{
    std::cout.write(dchunk.id, sizeof(dchunk.id)   ) << '\n';
    std::cout << std::setw(16) << "size"          << std::setw(16) << dchunk.size          << '\n'; 
}

Last edited on
I actually had better luck using ffmpeg :)
ffmpeg -re -i assets/440hz-puretone.9615hz.wav -r 9615 -f s16le -acodec pcm_s16le - 2> /dev/null | ./analyze-in-realtime bin-size

This trick saved me quite a bit of time recently.
Last edited on
Topic archived. No new replies allowed.