c++ factorial implementation

Pages: 12
C has lots of syntax inconsistencies, and with assembly it depends on the architecture. Personally I think the x86 instruction set is overly complex, but the syntax is very simple. At least, if you use an Intel-style ("mov dest, src") assembler rather than GNU as "(movblq ???$(({[src]}))\, $(({[dest]})))???" /* what is this i don't even lolololololololololol */").
That's the impression I got from the fact that so far, I haven't seen you make such a list.

-Albatross
it only proves my point that you are one of those C++ programmers who think they write fast code only because they wrote it in C++.
Wow. What a dick.
@helios lol
Wow, the times you cite mean I really do things wrongly. Anyways, I was just wondering by how much is my code off from the python implementation: I will try to test right now.

[Edit]: Indeed, it is off by an order of magnitude... what am I doing wrong?
[Edit]: Here is the python equivalent of my c++ code (versus the simplest possible python code pointed in the previous posts):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
def findAllPrimes(n):
	x=[0]*n
	y=[]
	for i in range(2,n):
		if x[i]==0:
			for j in range(2*i,n,i):
				x[j]=1
			y.append(i)
	return y

def factorial(n):
	allprimesBelowN=findAllPrimes(n)
	result=1
	for i in range(0, len(allprimesBelowN)):
		thePowerOfThePrime=0
		currentPower=allprimesBelowN[i]
		thePowerOfThePrime+=n/currentPower
		while (currentPower<=n):
			currentPower*= allprimesBelowN[i]
			thePowerOfThePrime+=n/currentPower
		result=result*(allprimesBelowN[i]**thePowerOfThePrime)
	print(result)
	return result

factorial(100000)

Still haven't figured out how to time things out though ...
I raced the two implementations and this is indeed somewhat faster, but not by that much (less than 1.5 times) than the straightforward multiplication in a cycle. This means the optimization is probably not worth the effort... I thought the effect should be larger.

As far as my c++ large integer/power raising functionality: well, clearly I wrote it pretty badly. Maybe I should google around on how to speed things up ...
Last edited on
Two hotspots you have are the call to std::vector::resize() in LargeUI::AddShiftedUIntSmallerThanCarryOverBound() and everything involving long longs. I believe you should see an improvement if you limit yourself to things that fit in CPU registers (unsigned) and switch to raw arrays. Perhaps trying to estimate the digit count would help.
In python:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
import time
def RaiseToPower(x,thePower):
	if thePower<4:
		return x**thePower
	result=1
	squares=x;
	
	while thePower>0 :
		if (thePower%2)==1:
			result=squares*result
		squares=squares*squares
		thePower = thePower/2
	return result
def findAllPrimes(n):
	x=[0]*n
	y=[]
	for i in range(2,n):
		if x[i]==0:
			for j in range(2*i,n,i):
				x[j]=1
			y.append(i)
	return y
def factorial(n):
	allprimesBelowN=findAllPrimes(n)
	result=1
	for i in range(0, len(allprimesBelowN)):
		thePowerOfThePrime=0
		currentPower=allprimesBelowN[i]
		thePowerOfThePrime+=n/currentPower
		while (currentPower<=n):
			currentPower*= allprimesBelowN[i]
			thePowerOfThePrime+=n/currentPower
		result=result*(RaiseToPower(allprimesBelowN[i],thePowerOfThePrime))
	return result
x=50000


t1=time.time()
print x, "! equals: ", factorial(x)
t2=time.time()
print "total running time raising powers by primes: ", t2-t1, " seconds"

output:
total running time raising powers by primes: 20.7800540924 seconds

and the straighforward method:
1
2
3
4
5
6
7
8
9
10
11
import time
x=50000
def factorial_straighforward(n):
	result=1
	for i in range(1, n+1):
		result=result*i
	return result
t1=time.time()
print x, "! equals: ", factorial_straighforward(x)
t2=time.time()
print "total running time using the straightforward method: ", (t2-t1), " seconds"

output:
total running time using the straightforward method: 27.1603929996 seconds

Alright, that was an equivalent of what I wrote in C++ in Python (this is my first Python program hehe - my mistake made me embarass myself, but also learn "Hello world" in Python :). The second factorial is the straightforward implementation, and the first factorial is the implementation with sorting out by primes and raising to powers like I did it originally.

However, I think that x**y does the same as my by-hand implementation RaiseToPower(x,thePower), so I doubt I gain any speed there.
[Edit:] Indeed, the two programs
1
2
3
4
5
import time
t1=time.time()
print 7, "^ 100000 equals: ", 7**100000
t2=time.time()
print "total running time: ", t2-t1, " seconds"

and
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
import time
def RaiseToPower(x,thePower):
	if thePower<4:
		return x**thePower
	result=1
	squares=x;
	
	while thePower>0 :
		if (thePower%2)==1:
			result=squares*result
		squares=squares*squares
		thePower = thePower/2
	return result
t1=time.time()
print 7, "^ 100000 equals: ", RaiseToPower(7,100000)
t2=time.time()
print "total running time: ", t2-t1, " seconds"


showed practically the same time.

So, the simpler version of the factorial computed and printed to the screen 100 000! in 1 minute and 40 seconds 50000! in . The one based on the same tricks as my original c++ program runs in 1 minute 10 seconds.

The two factorials practically run at the same speed (maybe some 20-30% better by sorting out the prime powers first ), so for so much code, I guess it is not worth it at all.

Oh well, this taught me a lesson to first check out the easy stuff :). Also, when trying to translate the monstrous RaiseToPower from C++ to python , I figured out all my code distills down to the above short and sweet realization (will translate back to C++ now) .... oh well :)
Last edited on
In high frequency trading systems, even 1 second difference make a lot of difference cuz that 1 second could have a buy/sell trade that is huge in quantity of lots and imagine the queueing system implemented to determine who is first or second.

Of cuz in general, 1 second doesn't matter but when we are talking about niche systems like trading or other betting related systems that has time component as high priority, 1 second difference is not acceptable. In fact they talk about sub-second in their systems.

It is these niche systems (there are others maybe) that will still depend on C/C++ for pure execution speed. The other would be our more familiar graphics intensive fast moving games. It need C/C++ to give the edge. Besides trading/betting/gaming, I am still wondering what kind of systems need to be developed in C/C++. Hmmmm.....
Topic archived. No new replies allowed.
Pages: 12