their trade off is that they are slower than lock-based algorithms.
Only on single core.
Just four examples that contradict your theory:
1. MongoDB vs Cassandra. MongoDB uses global db lock. Cassandra uses mostly lock-free approach (core structures are immutable, zero locking on reads, a very limited number of CAS on writes). It is not hard to find that Cassandra blows MongoDB out of the water in *throughput* on multicore machines. MongoDB does not scale at all, Cassandra scales to millions of ops per second.
2. Hadoop vs Spark. Hadoop codebase is heavily threaded, lock on lock on lock (acually terribly messy). Spark code is based on Akka (async message passing, immutable structures, no locking). Spark is not only much more "real-time", it also has a much higher throughput, because there are almost no moments when CPU is idling. Hadoop is very hard to setup in a way that during the whole job all your processing power is used fully.
3. Oracle vs DB/2. Oracle uses MVCC (similar to lockless in database world). DB/2 uses row-level locks. Guess which one scales better? Guess why most of serious modern database systems moved to MVCC?
4. Synchronized Map vs ConcurrentHashMap (sorry for Java example, but C++ does not have anything of this kind yet in standard lib). Guess which one has higher throughput on multicore?
The main problem of locking is that under severe load, most of your processing power tends to being spent in two ways:
1. idling (CPU cores just waiting for other things, often needlessly)
2. executing system code (context switching, OSes are pretty bad at scheduling, much worse than custom application level schedulers)
If you do coarse grained locking, you make problem 1. more severe. If you do fine grained locking, you make problem 2. worse. You can't avoid both at the same time.
There are also some other problems like inversion of priority, but they affect mostly latency, not throughput.
The only problem of lockless algos is sometimes wasted effort (on contention) - you have to throw away an update and retry. This is however much less severe than in locking, because it happens less often, as contention is only on WW (write-write) collisions. In lockfull algorithms you've got contention typically on WW and RW and very often even RR.