Mutexes vs Atomic Operations

Hi everyone
Thanks for all your previous help. I am actually looking at writing a float value in thread A and reading it in thread B. I used mutexes and it was really really slow.
I believe that atomic operations are only for integers and not floating point variables.
I am using a uni-core ARM processor (raspberry pi), hence spin locks wont work. Please suggest me a better option to get faster multi-threading.

Chandra
Last edited on
You should have continued your previous thread, rather than started a new one.

Committing a float to memory can be reduced to committing an integer to memory.
1
2
3
4
5
6
7
//global scope or equivalent
volatile float shared_float;

//function scope
float float_with_new_data;
float temp = float_with_new_data;
interlocked_exchange((volatile int32_t *)&shared_float, (int32_t *)&temp);
Last edited on
I believe that atomic operations are only for integers and not floating point variables.


Atomics can be used for anything. Though if a variable cannot be naturally atomic, then the implementation will use mutexes underneath. You can check whether or not this is the case with the 'is_lock_free' member:

1
2
3
4
5
6
7
8
9
#include <atomic>
#include <iostream>

std::atomic<float> var(0.0f);

int main()
{
    std::cout << var.is_lock_free();
}



Please suggest me a better option to get faster multi-threading.


My first instinct is to avoid multi-threading entirely.
Or... if you need to multi-thread... then don't synchronize so much. Every synchronization is a performance ding, and if you're doing it all the time, it'll slow you down considerably.


Multi-threading works best when you have multiple threads doing independent work that they can complete without having to constantly poll each other.



EDIT:

@helios:
volatile != atomic

Just having a variable be volatile will not protect you from race conditions.
Last edited on
Hence why I used interlocked_exchange(). I only made it volatile because MS requires that for its own implementation of the operation.
Hi guys
Thanks for the help. I am actually reading sensor data from IMU and camera and fusing them together. The IMU gives data update at about 5ms through UDP(ethernet) and camera updates at about 30ms. I thought that multi-threading is the only approach to get data from IMU and camera. If not please suggest any other methods.

Chandra
@helios:

Ah, okay. =)

The IMU gives data update at about 5ms through UDP(ethernet) and camera updates at about 30ms. I thought that multi-threading is the only approach to get data from IMU and camera. If not please suggest any other methods.


It depends on the interface you have.

If you have non-blocking calls that you can use to poll whether or not data is ready, then you do not need additional threads. You can just have your main thread periodically poll for more data.
EDIT: Irrelevant
Last edited on
@ Disch
Can you explain me on how to do that i.e. How to ask the main function to periodically check if new UDP data has been received while running the main loop continuously ?

If I check the main loop only once (lets say at the start of the while loop), as the camera takes 30ms, I lose data from the IMU.

I want to update my variables as the data is reached from IMU (without delay). It is ok if one/two packets from the IMU data( UDP packet) are missed while being blocked, but I do not want the vision code to ever stop its work.

I think try_lock would should work but never tried it before. Do you think it will be fast ?

Please note a timing delay of 2-3 ms matters a lot here.

Chandra
Can you explain me on how to do that i.e. How to ask the main function to periodically check if new UDP data has been received while running the main loop continuously ?


I can't give you specifics without seeing some reference docs on whatever API you're using. Though conceptually it's as simple as this:

1
2
3
4
5
6
7
8
9
10
11
while(whatever)
{
    if( isIMUDataAvailable() )
    {
        processIMUData();
    }
    if( isCameraDataAvailable() )
    {
        processCameraData();
    }
}


The only thing is that the isXDataAvailable functions must be non-blocking. That's really the crux of it.

If I check the main loop only once (lets say at the start of the while loop), as the camera takes 30ms, I lose data from the IMU.


You're calling a function to check to see if the camera has data, right? Is that function blocking?

A "blocking" function will wait until data is available, which prevents your program from doing other things.

A "non-blocking" function will return immediately even if no data was available, which will allow you to do something else.

If you want to do this in a single thread, you will need a non-blocking interface.

I think try_lock would should work but never tried it before. Do you think it will be fast ?


try_lock is a non-blocking mutex lock. If it can't immediately lock the mutex, it won't wait... it'll just return immediately (but the mutex won't be locked). So yes it might be faster... but it'll only be faster when it fails.

Though it might still have to interrupt the pipeline and do memory sync stuff... so it might not even be faster. I dunno... try it and see.





EDIT:

I just want to clarify --- I'm not saying that multithreading is a bad choice here. It might be the right way to go. But if you don't want it to be slow you'll have to cut back on how frequently you synchronize threads.
Last edited on
@ Disch: The OP stated that this was running on a raspberry pi which traditionally uses a *nix kernel. Instead of poking around with blocking this and non-blocking that, what do you think about suggesting handling data with an interrupt service routine? I'm not a *nix guy so I don't know how involved this would be for that OS but on Windows I know that it's not some insurmountable task.
@Computergeek:

That works for me. I'm not really familiar with Raspberry Pi so I'm trying to keep my replies more general.
@ OP: My idea would involve writing a stub into whatever passes for a driver chain manager in your current flavor of Linux. This would mean that your application (there may end up being two of them) would no longer have to worry about these race conditions that are present in a normal executable. Instead, you would have your routines sitting in the kernel handling the IRQ's as they come in. If your interested in this approach then we need to know what kernel you are running and what API you are using.
@Disch
If I use your code, what really happens is that while I do the processCameraData() the isIMUDataAvailable() becomes true and processIMUData() doesn't get called till the processCameraData() gets completed. In order to avoid this, I am using multi-threading.

@Computergeek01
I have heard that interrupt based programming is much slower than using multi-threading with mutexes. That is why I tried to do multi-threaded programming. Correct me if I am wrong in this case.

1
2
3
4
5
6
7
8
9
10
11
while(whatever)
{
    if( isIMUDataAvailable() )
    {
        processIMUData();
    }
    if( isCameraDataAvailable() )
    {
        processCameraData();
    }
}

They have a potentially crippling effect on user mode applications because of their much higher priority and of course the rest of the driver chain will contribute to performance hits as well. But for an ASIC device like this I don't think that would be an issue. There are a number of other factors of course which is why I asked about which kernel and API you are using in this project.
@ Computergeek01
I am not sure of what kernels and API's are but I am using raspbian( a form of debian linux) and gnu g++ for compiling. Hope it helps.

Chandra
Topic archived. No new replies allowed.