Convert Between 16 bit Scaled PCM Data to Floating Point PCM Data

I cannot convert between 16 bit Scaled PCM Data and Floating Point PCM Data in C++. I think I must be close because the output audio somewhat resembles what I expect, but it is distorted.

The reason I am doing this is because I am running ScummVM games in the browser. The ScummVM code runs on the server and my custom code posts audio and images up to a website. I am using the 'Web-Audio Api' to play sound in JavaScript on the front end.

I am trying to provide raw PCM data segmented by channel to the JavaScript. My logic here is that there will be less latency if no decoding is required.

I know that the audio data must be good, because I have successfully converted this into a wave file format and played it.

I am trying to get a multidimensional array of floats. The first dimension represents the channel and the second is the sample data for that channel. The JavaScript web audio api used floats between -1 and 1.

I ask that we not discuss the usefulness of the project. This is a learning project for me, not an attempt to build a widely used product.

Here is my algorithm, with comments describing my reasoning.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
typedef unsigned char byte;
double** OSystem_Cli::mixCallback(uint len)
{
	const int NO_CHANNELS = 2;
	double** result = new double*[NO_CHANNELS];
	int sampleSize = len;
	byte* samples = new byte[len];

	_mixerImpl->mixCallback(samples, len);  //Get the 16-bit PCM audio from Scumm VM. eg. [91,11,91,11 . . .] and stores it in samples

	for (int channel = 0; channel < NO_CHANNELS; channel++)
	{
		for (int byteNo = channel * 2, channelByteNo = 0; byteNo < sampleSize - 1; byteNo = byteNo + NO_CHANNELS * 2, channelByteNo++)
		{
			if (channelByteNo == 0)
			{
				result[channel] = new double[sampleSize / NO_CHANNELS / 2];
			}
			unsigned short unsignedCombination = (static_cast<unsigned short>(samples[byteNo]) << 8) + samples[byteNo + 1]; //Join two bytes together to get 1 sixteen bit number. 
			short signedCombination;

			memcpy(&signedCombination, &unsignedCombination, sizeof(signedCombination));

			double signedFloat = static_cast<double>(signedCombination);
			signedFloat = signedFloat / (float)32768;  //Divide it to get the floating point representation, as https://stackoverflow.com/questions/15087668/how-to-convert-pcm-samples-in-byte-array-as-floating-point-numbers-in-the-range states.

			if (signedFloat > 1)
			{
				signedFloat = 1;
			}
			else if (signedFloat < -1)
			{
				signedFloat = -1;
			}
			result[channel][channelByteNo] = signedFloat;

		}
	}
	delete[] samples;

	return result;
}


JavaScript Side (In type script):
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
   pushOntoAudioQueue(pcmBytesByChannel: number[][]) {
        const streamer: WebAudioStreamer = this;
        const samplePartsPerChannel = pcmBytesByChannel[0].length;
        const noChannels = pcmBytesByChannel.length;

        if (this.audioStack.length > MaxQueueLength) {
            this.audioStack = [];
        }

        const buffer = this.context.createBuffer(noChannels, samplePartsPerChannel, 16000); //Length is total number of bytes for all channels

        for (let channel = 0; channel < noChannels; channel++) {
            let float32ChannelData = new Float32Array(pcmBytesByChannel[channel].length); 
            float32ChannelData.set(pcmBytesByChannel[channel]);
            buffer.copyToChannel(float32ChannelData, channel);
        }

       streamer.audioQueue.push(buffer);
    }
Last edited on
Are you pasting the 16-bit value together in the right order?
It looks like you're assuming big-endian.
Are you sure it's not little-endian?
And should you be creating 64-bit (double) floats or just 32-bit float?

And what does "scaled" mean?

EDIT: Maybe try something like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
float** OSystem_Cli::mixCallback(uint len)
{
    auto result = new float*[2];
    result[0] = new float[len / 2];  // may as well keep the data in one block
    result[1] = result[0] + len / 4;
    auto samples = new int16_t[len/2];

    _mixerImpl->mixCallback((uint8_t*)samples, len);

    for (uint32_t i = 0; i < len; i += 2)
    {
        result[0][i/2] = samples[i]   / float(1 << 15);
        result[1][i/2] = samples[i+1] / float(1 << 15);
    }

    delete[] samples;

    return result;
}

Last edited on
Yes your solution worked thank you. I think that you were right about the endianness.

Thank you.

I changed the code slightly. Please correct me if I have made any silly mistakes. I am new to C++ coming from a C# background.

I changed line 5 to say: result[1] = new float[len / 2];
Surely we need to use len / 2 here as that is the correct buffer length. Also why didn't you allocate any memory on that line?

Also I updated the 'loop end condition' (is that the right term) on line 10 to be len /2. As we are working in 16 bit the length of the samples array is half the length of the 8bit data used by mixCallback.

My understanding is that scaled PCM means that the lowest value represents the lowest pitch and highest the highest pitch. I haven't being able to verify this; I got the term from the answer to this question: https://stackoverflow.com/questions/6010708/html5-web-audio-api-porting-from-javax-sound-and-getting-distortion

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
float** OSystem_Cli::mixCallback(uint len)
{
	auto result = new float* [2];
	result[0] = new float[len / 2];  // may as well keep the data in one block
	result[1] = new float[len / 2]; 
	auto samples = new int16_t[len / 2];

	_mixerImpl->mixCallback((uint8_t*)samples, len);

	for (uint32_t i = 0; i < len / 2; i += 2)
	{
		result[0][i / 2] = samples[i] / float(1 << 15);
		result[1][i / 2] = samples[i + 1] / float(1 << 15);
	}

	delete[] samples;

	return result;
}
Okay, just for the record I'm kind of drunk so I'm not entirely sure of this, but... (I started editing it to say that maybe you were right but now I think I was right again, so ...)

I updated the 'loop end condition' (is that the right term) on line 10 to be len /2.

I used len as the limit since in the update part of the loop I'm adding 2. So the loop will only iterate len/2 times. i will be 0, 2, 4, 6 ... in the loop, so i/2 gives the correct index for the channels and i and i+1 give the correct indices to access the left/right interleaved samples.
EDIT:: Okay, no, I am wrong here. It should be len/2 even though I'm adding 2. You're right.

result[1] = new float[len / 2]; Surely we need to use len / 2 here as that is the correct buffer length. Also why didn't you allocate any memory on that line?

Actually, if you're going to do it that way you should allocate len / 4 elements to each channel (assuming len is the number of bytes, not samples, in the audio). Each element of the channel arrays is 2 bytes, and each channel gets half the data, so that's len/4 int16_t's.

But what I was actually doing was using a common idiom in which you allocate all of the floats of the two-dimensional array in one block. That is the "natural" way that a static 2D array would be allocated and it takes up less space. But since your 2D array only has two rows, it's not that much of a savings in this case.

The idea is to allocate two one-dimensional arrays. One is an array of "row pointers", which has a size of 2 in your case. The other is for all of the data, the pointer to which we initially store in the first row pointer (result[0]). Then we set the remaining row pointers to point at the beginning of their rows in the data block. In your case there's only one other row pointer, so we make that one point halfway into the data block (len/4 elements, since each element is 2 bytes).

To delete the data you just need two statements:

1
2
delete[] result[0];
delete[] result;

Here's a bit of a readability rewrite:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
float** OSystem_Cli::mixCallback(uint32_t bytes)
{
    using SampleType = int16_t;
    const uint32_t NumChannels = 2;
    const uint32_t NumSamples = bytes / sizeof(SampleType);

    auto result = new float*[NumChannels];
    result[0] = new float[NumSamples];
    result[1] = result[0] + NumSamples / NumChannels;

    auto samples = new SampleType[NumSamples];
    _mixerImpl->mixCallback((uint8_t*)samples, bytes);

    for (uint32_t i = 0; i < NumSamples; i += NumChannels)
    {
        result[0][i / 2] = samples[i    ] / float(1 << 15);
        result[1][i / 2] = samples[i + 1] / float(1 << 15);
    }

    delete[] samples;
    return result;
}

BTW, if you really want doubles just change all the "float"s to "double"s.
Last edited on
See drunken edits above.
New code works :). Thanks for you help and for the C++ primer.
Topic archived. No new replies allowed.