C++ AMP parallel_for how to define pointer complex to real

I have a section of C++ AMP code which I have been expanding to handle multiple fft windows into GPU and perform multiple fft transforms on the data and then transfer it all back.

1. I want to continue to use the Concurrency::array and parallel_for to load and unload each window's data into and out of the processing chain in windows sections into and out of temporary buffer's.

2. I am using all_in_array , and all_out_array, to preserve speed by using large CPU<=>GPU transfers, using larger workloads, and window size buffers to
send data into the FFT transform call (in_array) and get windowed buffers out (transformed_array).

3. Everything seems to work, but the small output array from the transform, (transformed_array) has complex values, and appears to need a pointer defined to function. You can see in the main code below on the area I marked (Line 425: - Line 428:)

*******************************************

This section of code is not compiling with the following errors:


425: parallel_for(0, 1024, [&transformed_array, &all_out_array](int i)
426: {
427: all_out_array[i] = transformed_array[i].real;
428: });

Both errors pointing at line 427:

Error 21 error C3867: 'std::_Complex_base<float,std::_Fcomplex_value>::real': function call missing argument list; use '&std::_Complex_base<float,std::_Fcomplex_value>::real' to create a pointer to member

Error 22 error C2568: '=' : unable to resolve function overload

******************************************

From other examples online of using parallel_for and these errors I believe I need to define a pointer for, (transformed_array), most likely.

These two arrays I am copying from (transformed_array) and to (all_out_array), have different types but just DE-referencing to the .real value from the complex set of values in (transformed_array), is not the solution. It must need a reference instead.

array<float, 1> all_out_array(windowCount*windowLength);
array<std::complex<float>, 1> transformed_array(e);

This is a simple C++ AMP task I believe the parallel_for would be faster for copying the windowed data back and forth into and out of Array's on the GPU side given the small kernal on line 427. By building up a final window, and processing on several windows of fft data at a time before transfering into and out of the GPU, I should reach a better performance, that is real time fft transforms for 44,000 samples per second of audio data. Currently one window at at time is about 1/4 th slower than real time.

*************** Code Example *****************

extern "C" __declspec (dllexport) const bool __stdcall Marshall_Window1024(float* input_data, int sampleCount, int windowSkipCount, float* output_data, int windowCount)
{
const int windowLength = 1024;
const int y_FREQOUT_Resolution = windowLength;
accelerator accl = accelerator();

extent<1> e;
e[0] = windowLength;

// Create the FFT transformation object
fft<float, 1> transform(e);

array<float, 1> all_in_array(sampleCount);
array<float, 1> all_out_array(windowCount*windowLength);

array<std::complex<float>, 1> transformed_array(e);

array<float, 1> in_array(e);

copy(input_data, all_in_array);

for (int windowIndex = 0; windowIndex < windowCount; windowIndex++)
{
for (int inSampleIndex = 0; inSampleIndex < windowLength; inSampleIndex++)
{
in_array[inSampleIndex] = input_data[(windowIndex * windowSkipCount) + inSampleIndex];
}

transform.forward_transform(in_array, transformed_array);

Line 425: parallel_for(0, 1024, [&transformed_array, &all_out_array](int i) restrict(amp)
Line 426: {
Line 427: all_out_array[i] = transformed_array[i].real;
Line 428: });

}
copy(all_out_array, &output_data[0]);

return true;
}
Last edited on
all_out_array[i] = transformed_array[i].real;

std::complex::real is a function.

http://en.cppreference.com/w/cpp/numeric/complex/real

I think you need to add the parenthesis.

all_out_array[i] = transformed_array[i].real();
Topic archived. No new replies allowed.