When to use what language and why

Pages: 1234
Well, I tried Ruby (1.9.2), Vala, C, C++, Fortran 95 (Gfortran and Ifort), Java and C# (Mono and .Net) with the DGEMM implementations (indices correctly established for each case) for matrices up to 2500x2500.

C,C++ and Fortran came **clearly** on top from around 1000x1000 up.

Fortran is **still the king** with around 10% faster runs.

Vala also reaches the top after setting some specific flags.

Java, C# are around 2-3x slower

and Ruby is orders of magnitude slower.

This is also the conclusion of our friends at
http://shootout.alioth.debian.org/u32/which-programming-languages-are-fastest.php

For high-performance computing compiled languages are obviously better.

Ruby **is** convenient as long as there is no need to debug, the code is small and doesn't perform any relevant calculations.

Basically, C++ is the universal language.
Last edited on
There are three kinds of lies:
a lie, a big lie, and a computer language shootout benchmark.

Seriously, benchmark with no context, no sourcecode, no information how it was run and measured is just a piece of rubbish. I can tell you, I saw C code that after being translated to Ruby ran faster than the original version. Without quoting the source code would you believe me?

BTW: The shootout benchmarks are very poor quality, and they benchmark only one, single JVM and .NET implementation, one C++ compiler implementation, which are all not known as "the fastest you can get". E.g. here someone rerun those benchmarks on Excelsior JET: http://www.stefankrause.net/wp/?p=9. Java and GCC were equal here within 5% in most benchmarks, and in one of them JET 6.4 significantly outperformed GCC.
Last edited on

Let me be direct: Unless you can show some "wonder implementation", up to this point in time Java doesn't stand a chance. With the matrix multiplication algorithm without "tricks" like skipping operations, you cannot make a Java implementation faster than C++ or Fortran for matrices above 1000x1000.

Java appears to be close to C and C++ but only for very small size matrices

However, IFF you are actually calling BLAS or a similar library behind the scenes, then you can make those claims. Python fans say the same, with a LAPACK/BLAS (either ATLAS, GotoBLAS or MKL) library doing the actual work. It is often very clear.

Commands and options:

gfortran (gcc 4.5) -O3
ifort (ifort 10.1) -O3
mcs (mono 2.8) -optimize+
.net (VS 2010 Express) optimize flag
ruby (1.9.2) (no options)
java - did it in eclipse months ago and doesn't stand a chance.

Fortran95:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
subroutine cronom(ijob,nseg)
  real,save::tempo1=0.0
  real::tempo2
  select case(ijob)
  case(1)
     call cpu_time(tempo1)
  case(2)
     call cpu_time(tempo2)
     nseg=nint(tempo2-tempo1)
  endselect
end subroutine cronom
integer function id2d(n,i,j)
  id2d=i+(j-1)*n
end function id2d
program teste
  integer,parameter::sizem=2500
  real(8),dimension(:),allocatable::a,b,c
  allocate(a(sizem**2),b(sizem**2),c(sizem**2))
  do j=1,sizem
     do k=1,sizem
        do i=1,sizem
           c(id2d(sizem,i,j))=c(id2d(sizem,i,j))+a(id2d(sizem,i,k))*b(id2d(sizem,k,j))
        enddo
     end do
  enddo

end program teste


Csharp:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace ConsoleApplication1
{
    class Program
    {
        static int id2d(int m, int i, int j)
        {
            return(i+j*m);
        }
        static void Main(string[] args)
        {
            const int dim=2500;
            double[] a = new double[dim * dim];
            double[] b = new double[dim * dim];
            double[] c = new double[dim * dim];
            DateTime t1 = DateTime.Now;
            for (int j = 0; j < dim; j++)
            {
                for (int k = 0; k < dim; k++)
                {
                    for (int i = 0; i < dim; i++)
                    {
                        c[id2d(dim,i,j)]+= a[id2d(dim, i, k)] * b[id2d(dim, k, j)];
                    }
               
                }
            }
        }
    }
}


C++
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
// testemult.cpp : Defines the entry point for the console application.
//
#include <iostream>
#include <vector>
#include <list>
using namespace std;
struct mtx
{
  friend mtx& operator*(const mtx&,const mtx&);
public:
  mtx(){}
  inline mtx(int dim1,int dim2):m(dim1),n(dim2)
  {
    M.resize(m*n);
  }
  inline ~mtx()
  {
    M.clear();
  }
  inline double& val(int i,int j){return M[i+j*m];}
  inline const double& val(int i,int j) const {return M[i+j*m];}
private:
  int m,n;
  vector<double> M;
};
inline mtx& operator*(const mtx& a,const mtx& b)
{
  mtx* c=new mtx(a.m,b.n);
  for (int j = 0; j < c->n; j++)
	{
		for (int k = 0; k < a.n; k++)
	  {
	    for (int i = 0; i < c->m; i++)
			{
				c->val(i,j) +=a.val(i,k)*b.val(k,j);
			}	    
	  }
	}
  return(*c);
}
int main(int argc, char* argv[])
{
	const int dim=2500;
	mtx a(dim,dim);
	mtx b(dim,dim);
	mtx c;
	c=a*b;
	return 0;
}


Vala:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
   public class Program : Object
    {
        static int id2d(int m, int i, int j)
        {
            return(i+j*m);
        }
        public static void main(string[] args)
        {
            int dim=2500;
            double[] a = new double[dim * dim];
            double[] b = new double[dim * dim];
            double[] c = new double[dim * dim];
            for (int j = 0; j < dim; j++)
            {
                for (int k = 0; k < dim; k++)
                {
                    for (int i = 0; i < dim; i++)
                    {
                        c[id2d(dim,i,j)]+= a[id2d(dim, i, k)] * b[id2d(dim, k, j)];
                    }
                }
            }
        }
    }


Ruby (forgot where I put my last code... so it is actually ripped-off and I suppose it is not really optimized):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39

size = 2500

def mkmatrix(rows, cols)
    count = 1
    mx = Array.new(rows)
    for i in 0 .. (rows - 1)
    row = Array.new(cols, 0)
    for j in 0 .. (cols - 1)
        row[j] = count
        count += 1
    end
    mx[i] = row
    end
    mx
end

def mmult(rows, cols, m1, m2)
    m3 = Array.new(rows)
    for i in 0 .. (rows - 1)
    row = Array.new(cols, 0)
    for j in 0 .. (cols - 1)
        val = 0
        for k in 0 .. (cols - 1)
        val += m1[i][k] * m2[k][j]
        end
        row[j] = val
    end
    m3[i] = row
    end
    m3
end

m1 = mkmatrix(size, size)
m2 = mkmatrix(size, size)
mm = Array.new
n.times do
    mm = mmult(size, size, m1, m2)
end


Java: please insert yours.
Last edited on
So you are benchmarking multiplication of zero-filled matrices and can't give exact compiler options. The -O3 option for GCC is also not the best you can set for this kind of benchmark. Holy cow! And you think we should trust YOU and not. e.g. scientists from IBM Research who benchmarked BLAS Java multiplication and found it is 90% of performance of optimised Fortran[1]? You seem to be a troll, sir. :D

[1] http://www.research.ibm.com/ninja/




Last edited on
I was just thinking, it doesn't really seem to be a fair test. And even it it was, it doesn't prove one is faster than the other, just that one is faster than the other in some specific case.
Last edited on
@chrisname: exactly!

Anyway, just out of curiosity. Here it is:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
public class MatrixMultiplication {

	public static void main(String[] args) {						
		for (int l = 0; l < 5; l++) 
			performTest();
	}
		
	public static void performTest() {
	        long start = System.currentTimeMillis();
		
		final int dim=2500;
                double[][] a = new double[dim][dim];
                double[][] b = new double[dim][dim];
                double[][] c = new double[dim][dim];                       
        
                for (int i = 0; i < dim; i++)
              	     for (int k = 0; k < dim; k++)                    
          		  for (int j = 0; j < dim; j++)
                               c[i][j] = a[i][k] * b[k][j];
        
                long end = System.currentTimeMillis();
                System.out.println("Elapsed: " + (end - start) / 1000.0 + "s");
	}
}


Hardware: Intel Core 2 Duo 2.2 GHz T6670, DDR2 @ 800 MHz.

GCC:
gcc.exe (TDM-2 mingw32) 4.4.1
Copyright (C) 2009 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.


Java:
java version "1.6.0_22"
Java(TM) SE Runtime Environment (build 1.6.0_22-b04)
Java HotSpot(TM) Client VM (build 17.1-b03, mixed mode, sharing)

GCC options: -O3 (left as in original benchmark, although I think it can be improved)
Java options: -server -XX:+AggressiveOpts

Results:
Java code output:
1
2
3
4
5
Elapsed: 30.749s
Elapsed: 30.997s
Elapsed: 30.734s
Elapsed: 29.734s
Elapsed: 30.836s


C++ code output (manually executed five times under code::blocks):
1
2
3
4
5
Process returned 0 (0x0)   execution time : 39.198 s
Process returned 0 (0x0)   execution time : 38.154 s
Process returned 0 (0x0)   execution time : 38.586 s
Process returned 0 (0x0)   execution time : 38.702 s
Process returned 0 (0x0)   execution time : 39.011 s



Don't tell simplas2002 where the "hack" is, if you see this :D
If I use the same optimisation in the C++ code, then the both versions are of exactly the same speed (~ 3%). So no, simplas2002, you have to think better to find a case where C++ outperforms Java by a factor of 2-3. Matrix multiplication is definitely not this case.
Last edited on
Dear RapidCoder:

1) Concerning BLAS, I mentioned before, it is different, since if you are using GotoBLAS, for example, in any language you can obtain extraordinary speed. And you can wrap it for Ruby.

2) Test of JAVA and C++ (gcc 4.6.0): C++ 42% faster (using JAVA as denominator- see below)

3) Have you read the report from IBM? It has (FIG.11) plain java with 21.4 MFLOPS and Fortran with 119.6 MFLOPS. How do you know who I am?


Your Java Code:
1
2
3
4
5
6
7
8
9
10
java -server -XX:+AggressiveOpts MatrixMultiplication 
Elapsed: 22.096s
Elapsed: 21.311s
Elapsed: 22.099s
Elapsed: 21.036s
Elapsed: 21.126s

java version "1.6.0_22"
Java(TM) SE Runtime Environment (build 1.6.0_22-b04-307-10M3261)
Java HotSpot(TM) 64-Bit Server VM (build 17.1-b03-307, mixed mode)

The previous C++ code:
1
2
3
4
5
6
7
8
9
10
11
12
g++ -O3 -o matrixmultiplication matrixmultiplication.cpp

time ./matrixmultiplication:

real    0m12.902s
user    0m12.786s
sys     0m0.099s

g++ (GCC) 4.6.0 20100703 (experimental)
Copyright (C) 2010 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.


Machine: macbookpro 2.66

I can send you a printscreen if you want.

Last edited on
Still flawed:
1. An experimental, alpha quality version of GCC compared to a stable version of Java. Compare to Java 7 or some beta of Excelsior JET. How can you assume that GCC created even correct code? You don't check it. Knowing the history of GCC, it is quite probable.
2. Java for Mac is not oficially supported.
3. Still not even a 2x difference (= negligible in most applications).
4. Such microbenchmark shows only that for that particular pointless task optimisations in this particular Java HotSpot VM for that particular architecture are not up to the optimisations in that particular experimental GCC[1]. So its practical value is near 0. Just like most shootout benchmarks.

On a different architecture (Intel Core 2 Duo), for which Java is officially supported and optimised, Java wins your benchmark by more than 25%. Which means that Java-the-lanugage is not the limiting factor. The limiting factor is the compiler and its optimisation set. There is no
compiler optimisation done in C++ that could not be done in the Java compiler. On the other hand there are lots of optimisations that cannot be done in a C++ compiler, and can be (some are) employed in Java.

The final conclusion is: Java, C# C, C++ are in the same league of performance. All are compiled, all are statically typed. Performance differences are just too small (and much more dependent on circumstances), to be a reason for choosing one over another.

[1] Assuming the good performance result of GCC is not caused by a bug in GCC causing the generated code to do not what it is expected.

BTW: Another reason why Java can be considered the same performance league as C++ are some high-performance-computing contests. E.g. this one: http://sortbenchmark.org/
Last edited on
3. Still not even a 2x difference (= negligible in most applications).
Excuse me, but a 70% difference is hardly negligible.

4. Such microbenchmark shows only that for that particular pointless task optimisations in this particular Java HotSpot VM for that particular architecture are not up to the optimisations in that particular experimental GCC[1]. So its practical value is near 0. Just like most shootout benchmarks.
So, basically what you're saying is that there's no benchmark that can be used as evidence that C++ is faster, correct? *

The final conclusion is: Java, C# C, C++ are in the same league of performance.
Could I see the premises, then? At least simplas2002 is producing some empirical evidence.



*Though I suspect you'd gladly accept such a benchmark if it gave Java the edge.

Excuse me, but a 70% difference is hardly negligible.


For the creator of the compiler - yes, it is not negligible. It is probably a bug in one of the compilers (some optimisations not applied correctly).

For someone who decides which language to pick and for the final user of the product - except if it is an AAA game, it is negligible. Most of the C/C++ software out there could be significantly sped up by better algorithms, or hand coded assembly, etc. But almost nobody cares, except some CS students that have just picked C or C++ and want to show how smart they are. I've yet to see some software (in any language) that really uses all the power of my computer. What I really care for, is that there are no suboptimalities of magnitude 1000 times, e.g. like there are in the recent OpenOffice release (written in C++), or hangups like in recent Firefox (if the download list is too long - it freezes for seconds, ofc, also in C++), or crashes like in KDE, or delays like in Eclipse 3.6 (caused not by GC, but by sloppy coding). It just has to have acceptable performance, not maximum performance.


*Though I suspect you'd gladly accept such a benchmark if it gave Java the edge.


I've already shown the exactly same code running on Java@Core2Duo significantly faster.
But no, I would not accept is as evidence. It would show only that there is a still room for improvement in the compiler technology, on both sides, but particularly in VMs, which are younger than static compilers. But these are compiler implementations, not languages. Excelsior shows that Java can be compiled statically to as efficient code as C++ can be.

(Actually I wanted to say 64 bit Java on MacBook is performance-wise crappy and that is all - they have to work harder. But GCC several years ago was also crappy and produced extremely suboptimal machine code - yet no-one claimed they have to write in assembly instead of C++, because of that. See: http://stackoverflow.com/questions/1834607/64-bit-java-vm-runs-app-10x-slower).


At least simplas2002 is producing some empirical evidence.


The only evidence he produced is that some alpha version of a C++ compiler outperformed Java on some niche hardware platform. But on the most popular hardware/OS platform, providing his suboptimalities in code are corrected, there is a tie.

On the other hand, I can show you some niche hardware platforms, where you cannot write C++ code (because there is no native API and publicly availabale compiler), but Java runs perfectly. Would it be any evidence that "Java is better"? No.


So, basically what you're saying is that there's no benchmark that can be used as evidence that C++ is faster, correct? *


First define what does it mean "C++ is faster?". If it means that the best C++ compilers produce sometimes better code than some not-so-good Java VMs, then yes, I agree. However, the opposite is also true. Want to compare VC++ with Excelsior JET?

Last edited on
alpha version of a C++ compiler
This isn't a valid complaint. Sure, you could argue that output is wrong. I could also argue that the optimization algorithm has a bug and missed some possible optimizations that slowed down the generated code.

some niche hardware platform.
x86-64 is niche hardware?

Would it be any evidence that "Java is better"? No.
No, it would be evidence of artificial scarcity.

First define what does it mean "C++ is faster?".
You're dodging the question. The point is that for any benchmark, you're just going to produce an argument to supposedly invalidate it (only if it disagrees with your preconceptions, obviously).

OpenOffice release (written in C++), or hangups like in recent Firefox (if the download list is too long - it freezes for seconds, ofc, also in C++), or crashes like in KDE, or delays like in Eclipse 3.6 (caused not by GC, but by sloppy coding)
Remember, kids: bad program behavior in C++ is C++'s fault. Bad program behavior in Java is your own fault.

x86-64 is niche hardware?


I've meant hardware + software. Java on Mac is not supported and 64 bit desktop is still a niche.
Ok, hardware is capable of 64 bits since very long, yet most people in my neighbourhood use 32-bit OSes. 64 bit Java is optimised mostly for Linux and Solaris, for big iron servers. No-one uses PowerBooks for such applications, and 32-bit is still sufficient for Desktop, even if one installs a memory hog Vista (oh, yes, also C and C++ - you see, the programs don't get automatically faster and snappier only because they were written in these languages - the much more important factors are programmers, time and budget).


Remember, kids: bad program behavior in C++ is C++'s fault. Bad program behavior in Java is your own fault.


I have nowhere said it was a C++ fault. Seems, you read just what you want to read, and not what is really written. Most software has bugs and is suboptimal - because of poor programming, not the platform. Just analyse the differences between various implementations of the same benchmark in the great language shootout. These things affect overall performance much more than a 70% better compiler. If they compiled OpenOffice with a 70% better compiler, the problem would not go away. It would hang for a second instead of two whenever I edit text in a text frame in Impress - still unacceptable and irritating. In this case, discussing if one compiler can get you a 2x speedup or slowdown is an academic dispute.



You're dodging the question. The point is that for any benchmark, you're just going to produce an argument to supposedly invalidate it (only if it disagrees with your preconceptions, obviously).


I'm not invalidating __the benchmark__. I'm invalidating the conclusions he has drawn from his benchmark. Notice, that I'm not claiming Java to be 30% faster than C++ as it has come from my benchmark. I've seen lots of such benchmarks and most of them, when made by both good Java and C++ programmers, resulted in performance differences < 20% in __both__ directions. Especially when it comes to large practical applications, not some artificial microbenchmarks doing nothing. So any statements that Java/C# and other languages using similar execution model are not suitable for high performance applications are simply an uninformed FUD.

BTW: Something that produces wrong machine code sometimes is alpha-quality:
http://archives.free.net.ph/message/20100705.062720.5d776623.pt-BR.html



Last edited on
Programming is fun, do it all the time. Truthfully in my experience I've never used an application that I thought performed exceptionally well on my hardware--except for two video games, the first being Assassin's Creed II, and the second being Mass Effect 2. Both astounded me by how smooth they ran.

That's my opinion, and I'm tired of these crap benchmarks. One last thing, in this code:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
inline mtx& operator*(const mtx& a,const mtx& b)
{
  mtx* c=new mtx(a.m,b.n);//<-- Am I an idiot, or do you usually leak memory for benchmarking?
  for (int j = 0; j < c->n; j++)
	{
		for (int k = 0; k < a.n; k++)
	  {
	    for (int i = 0; i < c->m; i++)
			{
				c->val(i,j) +=a.val(i,k)*b.val(k,j);
			}	    
	  }
	}
  return(*c);
}


Am I mistaken, or is it actually acceptable to leak memory for a benchmark?
Yes, I agree it was a *niche system*. And I also agree that GCC is not the absolute best C++ compiler available.

With this in mind I found out that "...Java always initializes arrays when you create them..".

That of course explains why the performance, although worse than C++, it is still better than expected. In Linux you're right, results are worse. However, in C++ we can also use Standard arrays and initialize them... Please see the C++ code similar to JAVA:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
inline int id2d(int m, int i, int j)
{
  return(i+j*m);
}
int main(int argc, char* argv[])
{
  const int dim=2500;
  double *a,*b,*c;
  a=new double[dim*dim];
  b=new double[dim*dim];
  c=new double[dim*dim];
  for(int i=0;i<dim*dim;++i)
    {
      a[i]=2.0e00;
      b[i]=4.0e00;
      c[i]=0.0e00;
    }
  for(int j=0;j<dim;j++)
    {
      for (int k = 0; k < dim; k++)
	{
	  for (int i = 0; i < dim; i++)
	    { 
	      c[id2d(dim,i,j)]+= a[id2d(dim, i, k)] * b[id2d(dim, k, j)];
	    }
	}
    }
  return 0;
}



Let us now try in a less-niche system and 2 c++ compilers for the following system:

Intel Core(TM)2 Duo CPU T9600 @ 2.80 GHz

Linux Ubuntu 10.10 Maverick
Kernel Linux 2.6.35-23-generic-pae

g++:
real 0m25.125s
user 0m24.818s
sys 0m0.288s


gcc (Ubuntu/Linaro 4.4.4-14ubuntu5) 4.4.5
Copyright (C) 2010 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

icc:
real 0m22.100s
user 0m21.989s
sys 0m0.092s


icc (ICC) 11.1 20100414
Copyright (C) 1985-2010 Intel Corporation. All rights reserved.


JAVA:
Elapsed: 28.087s
Elapsed: 27.169s
Elapsed: 28.4s
Elapsed: 27.781s
Elapsed: 28.048s


java version "1.6.0_20"
OpenJDK Runtime Environment (IcedTea6 1.9.1) (6b20-1.9.1-1ubuntu3)
OpenJDK Server VM (build 17.0-b16, mixed mode)


The conclusion is that JAVA is slower than C++, either using gcc 4.6, gcc 4.4.5, icc 10.1, Linux or Mac. **For this specific test**
Last edited on
OMG! Please shut the fuck up!

Edit:If you want to continue this futile discussion, take it to the Lounge

If you want a real challenge for your speed tests, implement a deterministic algorithm for primality and test the following number:

1645742825183467619487091143114742813111578265693364754500359
0347255103912018989569978272825902654731502101121217503326338
9300898343647261648503777405076868923505466907255695313192782
3038824414275347022148320732189275923

Last edited on
Not a real challenge. Not prime and it took 1 sec.:

PrimeQ[1645742825183467619487091143114742813111578265693364754500359
0347255103912018989569978272825902654731502101121217503326338
9300898343647261648503777405076868923505466907255695313192782
3038824414275347022148320732189275923]
false
Hmm...I don't see that you have implemented a deterministic algorithm for primality, all I see is a dubious use of Mathematica.
1
2
3
4
5
6
enum Answer { YES, NO, I_DONT_KNOW };

Answer isPrime ( some very large number type )
{
    return I_DONT_KNOW; // Not much precise but deterministic 
}
:0)
Nice.

BTW:

"...Java always initializes arrays when you create them..".


Array initialization in this microbenchmark is negligible, because it is O(n^2), while the multiplication algorithm is O(n^3). Additionally the statement is not generally true. Some Java compilers optimize array initialization away, if they are sure it is safe. Sun's JVM probably not (it could do much better in many cases).

The performance differences in this code may probably come from:
- loop unrolling
- vectorisation (SIMD instructions)
- array bound checking (JVM has to "think" hard here to eliminate it - C++ compiler throws this responsibility at the programmer)

Finally, when you made more checks (but still just single JVM - is a flaw in this test, you should certainly check Excelsior JET), seems that Java is not 2-3x slower for this test, but about 25% slower than icc and 12% slower than gcc 4.5.

Last edited on
Pages: 1234