C++ versus Java

Pages: 1 23

@chisname On some microbenchmarks I did for myself -- yes, but I don't usually work with tasks where JIT compilation has any benefit.

chrisname (7395)

Thanks. I'm sure rapidcoder will chime in about the potential for scalability.

rapidcoder (1010)

@chrisname: yes

See this:
http://lemire.me/blog/archives/2012/07/23/is-cc-worth-it/

And especially read the comments (higher is better):

Interestingly, here is what I get for the basic sum implementation:

Java: ~1470
gcc 4.7 with -O3: ~960
clang 3.1 with -O3: ~1350

optimized C++ version:
gcc 4.7: ~1350
clang: ~1470

Also, if you replace your use the std::partial_sum function, with gcc 4.7, you get ~1100, but it get slower with clang …

Looks like JVM outperformed most C++ compilers in loop-unrolling / vectorisation in this particular microbenchmark.

Faster execution speeds(because it's fully compiled)

The biggest C++ performance advantage over Java right now is its support for value-types + move semantics. Something that Java is missing and it can result in a huge performance penalty if you're not aware of it. So e.g. coding a thing like Point2D in Java as a class is a performance nogo.

There are plans to change it though, and IBM has already got a prototype of PackedObjects - which let you control memory layout manually (and even interface with native code without copying). Hope it gets into mainstream JVMs soon.

Another misconception about Java is that slow startup is caused mainly by JITing and slow execution until the code gets JITTed. In most cases it is not. It is caused by lazy classloading and object model requiring to load classes fully. E.g. a basic Swing application needs to load thousands (!) of classes from rt.jar which is ~50MB - this is a huge I/O impact.

Cheraphy (1730)

Completely off topic, but Cubbi, I found you on another forum when googling tail recursion.
http://answers.yahoo.com/question/index?qid=20120207154219AAynk4d

Last edited on

closed account (N36fSL3A)

Can I get the files to benchmark on my computer?

Cubbi (4774)

@Lumpkin the benchmark files are on the blog rapidcoder linked to, although you might want to fix the access out of bounds in the iterator-based test (it segfaulted when I tried running it) and add std::partial_sum as the actual default C++ approach.

Daniel Lemire wrote:
Of course, from a sample of 3 compilers on a single problem, I only provide an anecdote

As a matter of anecdote, on Oracle's own M5000 (32x2.5 GHz), I got:

Java (1.6.0_22) best out of 50 runs was 352.11 (ran as java -server -d64)

C++ (Sun Studio 12, compiled as CC -m64 -xO3) gave (best of 5)

straight sum (C-like) 381.679
basic sum (C++-like) 413.223
iterator-based sum (C++-like) 413.223 <- had to fix this one
std::partial_sum  409.836 <- added this one
...the "smart" sums were all much slower than Java

(it was hard to find a dev box which had java on it)

As for Intel, I looked at the assembly produced by JIT (1.7.0_45) on my old core i7 920, the main loop went this way:

  0x00007f26b505fb01: mov    0x10(%rbx,%rbp,4),%r8d
  0x00007f26b505fb06: add    0xc(%rbx,%rbp,4),%r8d 
  0x00007f26b505fb0b: mov    %r8d,0x10(%rbx,%rbp,4)
  0x00007f26b505fb10: movslq %ebp,%r11
  0x00007f26b505fb13: add    0x14(%rbx,%r11,4),%r8d 
  0x00007f26b505fb18: mov    %r8d,0x14(%rbx,%r11,4) 
  0x00007f26b505fb1d: add    0x18(%rbx,%r11,4),%r8d 
  0x00007f26b505fb22: mov    %r8d,0x18(%rbx,%r11,4)
  0x00007f26b505fb27: add    0x1c(%rbx,%r11,4),%r8d 
  0x00007f26b505fb2c: mov    %r8d,0x1c(%rbx,%r11,4)
  0x00007f26b505fb31: add    0x20(%rbx,%r11,4),%r8d
  0x00007f26b505fb36: mov    %r8d,0x20(%rbx,%r11,4)
  0x00007f26b505fb3b: add    0x24(%rbx,%r11,4),%r8d
  0x00007f26b505fb40: mov    %r8d,0x24(%rbx,%r11,4)
  0x00007f26b505fb45: add    0x28(%rbx,%r11,4),%r8d
  0x00007f26b505fb4a: mov    %r8d,0x28(%rbx,%r11,4)
  0x00007f26b505fb4f: add    %r8d,0x2c(%rbx,%r11,4)
  0x00007f26b505fb54: add    $0x8,%ebp
  0x00007f26b505fb57: cmp    %r10d,%ebp
  0x00007f26b505fb5a: jl     0x00007f26b505fb01

and yes, as the blog pointed out, gcc (I tried 4.8.2) doesn't unroll this loop at -O3 without -funroll-loops - this test is indeed simple enough for JIT to be competitive (ignoring startup, etc)

...now, whatever happened to the great language shootout website?

Mats (1398)

Best of and best of and with 50 and 5 runs? Whatever happened to taking proper statistics and doing the same number of tests?

Cubbi (4774)

@Mats that's why it's just another anecdote, like the original post.

Topic archived. No new replies allowed.

Pages: 1 23