Any Tips On Writing More Efficient Code?

Pages: 12345
Disch wrote:
And certain operations (like obtaining string length) are faster with std::string

Oh yeah, I hadn't though of that, since it's effectively just accessing a variable.

Semi-off-topic, how expensive if std::string::c_str()? I keep using it when passing strings to OpenGL.
I looked it up; on most implementations it just returns a pointer to the internal buffer, so there's next-to-no overhead.
Last edited on
std::string knows where the end of the string data is. Finding the length is as easy as finding the difference between two pointers... O(1).

strlen does not know where the end of the string is. It has to iterate over the entire string. O(n). strcat has a similar problem.

EDIT:
chrisname wrote:
I looked it up; on most implementations [c_str] just returns a pointer to the internal buffer, so there's next-to-no overhead.


With compiler optimizations turned on, and function inlining... it's frequently zero overhead. Hence why std::string can perform just as well as char arrays.
Last edited on
Thanks, I'll convert my resource manager to use std::strings.

Still waiting on your battle system Mats.
Well I think I have resolved the problems in my battle system. It's written in a game engine, but it essentially comes down to this:

All units that can be involved in melee combat have a bool incombat, which is set to true only if they are currently in combat and bool moving, which sets to true if the unit is moving someplace. I also take advantage of a thing in the engine called linked objects. You could however, link two objects using variables simply by giving them an ID and making the target of each unit, the ID of the other.

My combat code works on the idea that:
-Only one non ranged unit can fight one non-ranged unit.
-Enemies that are targeted stop where they are and the player units move to them to fight them.
-Other than that, it's standard RPG fighting.

Here is a pseudo-code explanation:
What my units do:

If bool moving == 0 && bool incombat == 0
loop a: Iterate through each enemy and see if the distance is below playerunit.range (int variable)
If we found an enemy in range, check if the bool incombat for the enemy == 0.
If incombat == 0, link the objects enemy and playerunit.
Create a combatsquare at (some place near the enemy unit), link square and player unit and move the player unit towards the combatsquare.
playerunit.moving = 1
enemyunit.incombat = 1
break from the loop a.


Every 0.05s
If playerunit is in collision with combatsquare
If playerunit is linked to combatsquare
Stop playerunit
playerunit.incombat = 1

Every 0.1s
if playerunit.incombat = 1
++playerunit.combatcounter.
same for enemy units...

Every 0.1s
if playerunit.combatcounter >= playerunit.combatspeed
then we do the attack action. (reduce enemy health, play animation)
same for enemy units.

What happens if a ranged unit kills someone in the middle of battle?

Check if a unit is linked to the unit which just died.
If there is a linked unit, then do = 0 to the variables combatcounter and incombat.
In fact all deaths are treated like this.

There is another piece of code which looks for units which are not incombat or moving and deals with them.

This same thing happens if you move a unit away from combat.

Ranged units

My ranged units do this:

Every 0.1s
++rangedunit.combatcounter

Every 0.1s
If rangedunit.combatcounter >= rangedunit.combatspeed
-Look for enemies in range
-If we find one...
rangedunit.combatcounter = 0
create object shot
link shot and rangedunit
shot.damage = rangedunit.damage
shot.movespeed = rangedunit.shotspeed
move the shot to enemy unit

When the shot reaches ANY enemy unit.
Every
enemyunit.health = enemyunit.health - shot.damage
delete shot


Anyway, this stuff appears to work. I don't know if it will help you or not, nor do I know if this is the best way to do this, but there it is anyway.

Edit: Typos
Last edited on
You didnt said you want Diablo 3 you just said Diablo-ish view.
I should have been more specific then. Mats, is this Battle System turn based, tactical, or real time?
Real time.

Where I put 'every <some number>s' means every that many seconds all that stuff is checked. So basically the whole battle system is just a bunch of timers running, with functions that get executed only when the each timer reaches its time. Of course the timers are also reset when they reach their time too. ;)
Last edited on
Some microoptimization performance tips:

- don't pass large objects by value if you only need to read, not modify, nor copy
- don't pass by (const) reference if you need to obtain a copy
- put large fields first, e.g. pointers before shorts or booleans, in order to save space on padding - C++ does not allow to reorder fields, so it won't do it for you
- pad objects to multiple of cache-line size, if they are used by multiple threads
- use initialization lists instead of assignment in constructors
- don't use shared_ptr (multithreading scalability killer)
- don't use locking/mutexes (another multithreading scalability killer)
- use STL with care (theoretically STL is very fast, in practice often it is not so)
- use exceptions only for really exceptional cases
- avoid virtual calls if you can (C++ compilers are not very smart at optimizing them out)
- preallocate memory, use pools / arena allocators (new/delete are slow and nondeterministic)
- beware of pointer aliasing
- beware of memory fragmentation

Of course all of above have little value if the architecture / algorithms / data structures of your program suck. So go and fix them first.
Last edited on
rapidcoder wrote:
- don't use shared_ptr (multithreading scalability killer)
- don't use locking/mutexes (another multithreading scalability killer)

What do you recommend instead?
closed account (S6k9GNh0)
He was talking about lockless programming. http://preshing.com/20120612/an-introduction-to-lock-free-programming/

What do you recommend instead?


Well, lockless programming. But if you really need to do hardcore lockless/waitfree programming, then C++ is really not your friend. It doesn't even have any lockless datastructures in the standard library.
> What do you recommend instead?

Very strong recommendation:
One measurement is worth a hundred opinions. A million bigoted, uninformed opinions.

When discussing the performance of non-blocking data structures, one has to distinguish between amortized and worst-case costs. The definition of 'lock-free' and 'wait-free' only mention the upper bound of an operation. Therefore lock-free data structures are not necessarily the best choice for every use case. In order to maximise the throughput of an application one should consider high-performance concurrent data structures.

Lock-free data structures will be a better choice in order to optimize the latency of a system or to avoid priority inversion, which may be necessary in real-time applications. In general we advise to consider if lock-free data structures are necessary or if concurrent data structures are sufficient. In any case we advice to perform benchmarks with different data structures for a specific workload. - http://www.boost.org/doc/libs/1_54_0/doc/html/lockfree.html


I just skipped reading all of the previous posts, which is not recommended, so I am sure I am repeating something that has already been said, but it is what needs to be said, so it might as well be said more than once: Favor clarity, generality, and elegance over optimization except where it actually counts to make optimizations. i.e. don't go out of your way saving every nanosecond here and there making all possible micro optimizations. Think about what is the bottleneck; where is the majority of the time spent. Then think about whether a more efficient algorithm can be used in the implementation. Then, or actually while you were thinking about the algorithm, think if it is a problem that is easily done, effectively, in parallel. Then, lastly, think about micro-optimizations which save you miniscule amounts of time, if you need to squeeze out every last bit of performance possible.

When discussing the performance of non-blocking data structures, one has to distinguish between amortized and worst-case costs. The definition of 'lock-free' and 'wait-free' only mention the upper bound of an operation. Therefore lock-free data structures are not necessarily the best choice for every use case. In order to maximise the throughput of an application one should consider high-performance concurrent data structures.

Lock-free data structures will be a better choice in order to optimize the latency of a system or to avoid priority inversion, which may be necessary in real-time applications. In general we advise to consider if lock-free data structures are necessary or if concurrent data structures are sufficient.


This is obviously not the whole story. The main problem with standard blocking is not latency but lack of scalability. Blocking causes serializing access to shared resources, which limits the effective parallelism level. Even if only 1% of the critical path needs to be blocked (single-threaded), then you'll notice very diminishing returns after going beyond 100 cores (see Amdahl's law). And this is in ideal case. In reality it would be much worse because of context switching and cache thrashing caused by the synchronization points.
Last edited on
> then you'll notice very diminishing returns after going beyond 100 cores

Repeat: Measure. Measure for the typical use case of the program in question.

If your program is not going to run on machines where going beyond 100 cores is even a theoretical possibility, this 'after going beyond 100 cores ...' remains just one more political statement.


> see Amdahl's law

Amdahl's law is not the one and only inviolable law which must hold universally. See Gustafson's law.
There's this law and that law but in real life I just watch the performance logs and if they start to look shaky, then I go and sort out some optimization, starting with the biggest functions that are called most often.
I think that's the best way to do it. Personally, I don't worry too much about performance because I've never worked on a project that was performance-critical. The closest I've come are games and operating systems, but neither of them are very far into development, and both are learning projects, and I think it's a mistake to really optimise before a project gets to its first stable, feature-complete version. Plus, for learning projects, it's far more important to produce clean, correct code that you can refer to later, unless your goal is to learn optimisations.

Speaking of code correctness, I'm currently reading this interesting article on compiler optimisations and code that subtly, and unintentionally, exploits undefined behaviour (and subsequently gets optimised away by compilers that assume code will not exploit undefined behaviour). If you've ever had a program that worked with optimisations disabled, but failed when you turned optimisations on, you should read this: http://pdos.csail.mit.edu/~xi/papers/stack-sosp13.pdf
Last edited on
rapidcoder wrote:
The main problem with standard blocking is not latency but lack of scalability. Blocking causes serializing access to shared resources, which limits the effective parallelism level. Even if only 1% of the critical path needs to be blocked (single-threaded)

It's kind of fun watching this train wreck of a sentence.

Just to clear things up, the main difference between lock-free and lock-based algorithms is that lock-free algorithms have progress guarantees which are important for hard real-time applications (my former line of work, incidentally), and their trade off is that they are slower than lock-based algorithms. Everything else (scalability etc) is similar and depends on the algorithm itself.
Last edited on

their trade off is that they are slower than lock-based algorithms.


Only on single core.

Just four examples that contradict your theory:

1. MongoDB vs Cassandra. MongoDB uses global db lock. Cassandra uses mostly lock-free approach (core structures are immutable, zero locking on reads, a very limited number of CAS on writes). It is not hard to find that Cassandra blows MongoDB out of the water in *throughput* on multicore machines. MongoDB does not scale at all, Cassandra scales to millions of ops per second.

2. Hadoop vs Spark. Hadoop codebase is heavily threaded, lock on lock on lock (acually terribly messy). Spark code is based on Akka (async message passing, immutable structures, no locking). Spark is not only much more "real-time", it also has a much higher throughput, because there are almost no moments when CPU is idling. Hadoop is very hard to setup in a way that during the whole job all your processing power is used fully.

3. Oracle vs DB/2. Oracle uses MVCC (similar to lockless in database world). DB/2 uses row-level locks. Guess which one scales better? Guess why most of serious modern database systems moved to MVCC?

4. Synchronized Map vs ConcurrentHashMap (sorry for Java example, but C++ does not have anything of this kind yet in standard lib). Guess which one has higher throughput on multicore?

The main problem of locking is that under severe load, most of your processing power tends to being spent in two ways:
1. idling (CPU cores just waiting for other things, often needlessly)
2. executing system code (context switching, OSes are pretty bad at scheduling, much worse than custom application level schedulers)

If you do coarse grained locking, you make problem 1. more severe. If you do fine grained locking, you make problem 2. worse. You can't avoid both at the same time.

There are also some other problems like inversion of priority, but they affect mostly latency, not throughput.

The only problem of lockless algos is sometimes wasted effort (on contention) - you have to throw away an update and retry. This is however much less severe than in locking, because it happens less often, as contention is only on WW (write-write) collisions. In lockfull algorithms you've got contention typically on WW and RW and very often even RR.
ConcurrentHashMap is a fairly typical (from my, C++ programmer's POV) lock-based concurrent data structure, an example of what the boost doc quote earlier in the thread mentioned as more appropriate when lockfree guarantees are not needed. You seem to be arguing for concurrent programming instead of serialized, using erroneous terms.
Pages: 12345