Need optimisation guidelines

Hello
I'm preparing for OIG (a Polish programming competition) and while doing exercises from previous years I encountered a problem.
Here is a rough translation of the task:

TASK:

The evil dragon Bitol invaded the country of dwarfs and enslaved its inhabitants. He assigned one workplace to each of the n dwarfs and started to supervise their work lying on a pile of stolen goods.

Being an extremely lazy dragon, Bitol does not look at all the slaves constantly. Instead, he only guards dwarfs working at a group of workplaces at any given time. During that period all the other dwarfs can meet and swap their workplaces (the dragon does not remember which dwarf works where). The subset of observed dwarfs changes every hour.

The dwarf Bayazil, working at workplace 1, wants to rise the others to fight Bitol. In order to do so he needs to meet another dwarf, Baytozil, who works at workplace n. Swapping places with the others, he needs to reach such a situation where neither his nor Baytozil's workplace n is observed.

Your task is to find out how soon he is able to succeed.
You know the dragon will go to sleep in m hours and then no workplace is observed.

INPUT:

The first line of standard input contains the numbers n, m (1<=n,m<=1000000).
The next m lines contain the numbers of observed workplaces at consecutive hours. Every one of these lines consist of number Ki marking the number of observed workspaces and Ki numbers marking the workspace numbers (in increasing order). All the numbers in a line are separated by a single space.

OUTPUT:

The only line of the output should consist of the number of hours until the dwarfs meet.


I wrote a solution in C++ (the allowed languages are C, C++, pascal):
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
#define _CRT_SECURE_NO_WARNINGS
#include<cstdio>
#include<vector>
#include<algorithm>
#include<cstdlib>
#include<cmath>
#include<cstring>
#include<stdint.h>

using namespace std;

static inline void setFalse(uint64_t *a, int pos)
{
	a[pos / 64] &= ~((uint64_t) 1 << (pos % 64));
}
static inline int getState(uint64_t *a, int pos)
{ 
	return (a[pos / 64] & (uint64_t)1 << (pos % 64)) >> pos % 64;
}
static inline void setTrue(uint64_t *a, int pos)
{
	a[pos / 64] |= (uint64_t) 1 << (pos % 64);
}

int main(void)
{
	int n, m;
	scanf("%i", &n);
	scanf("%i", &m);
	uint64_t *safe = new uint64_t[n / 64 + 1]();
	for (int a = 0; a < n / 64 + 1; a++)
		safe[a] = 0;
	safe[0] = 1;
	uint64_t *safeT = new uint64_t[n / 64 + 1]();
	for (int i = 0; i < m; i++)
	{
		memset(safeT,255,n / 8);
		for(int z = 0; z < n % 8; z++)
			setTrue(safeT,n - (n % 8) + z);
		int length;
		scanf("%i", &length);
		for (int j = 0; j < length; j++)
		{
			int temp;
			scanf("%i", &temp);
			setFalse(safeT, temp - 1);
		}
		for (int z = 0; z < n / 64 + 1; z++)
		{
			if (safe[z] & safeT[z])
			{
				for (int p = 0; p < n / 64 + 1; p++)
					safe[p] |= safeT[p];
				break;
			}
		}
		if (getState(safe, n - 1))
		{
			printf("%i\n", i);
			//system("pause");				//DEBUG
			return 0;
		}
	}
	printf("%i\n", m);
	//system("pause");						//DEBUG
}

But it is about 2 times too slow.
Help?
Last edited on
Honestly your task is too complex for me.

So all I can give you is general guidelines and advice, hoping that other members will contribute to this thread further.

One small suggestion would be: provide an example input for the program.

First, you have a lot of for() loops in your code.
Nested for() loops are especially bad for speed. You even have a third level for() loop.
http://en.wikipedia.org/wiki/Time_complexity

You probably need to rethink your method in order to use fewer loops.
Maybe you can think of this problem as some kind of optimum path search?
http://en.wikipedia.org/wiki/Pathfinding

Second, you are using uint64_t.
Be advised that working with 64-bit types on a 32-bit computer is slow. If you have a 64-bit processor and running a 64-bit operating system, by compiling the code as a 32-bit executable you will very likely negate that advantage.

Edit: layout, grammar.
Last edited on
http://www.cplusplus.com/reference/bitset/bitset/
http://codeforces.com/blog/entry/925 (input)

It would be nice if you describe your approach.
WARNING:
Everything I state is specific to the task, tested and should be treated as such.
//That's because people on a PL forum sad what I write is bad because it gives a wrong lesson to new programmers.
@Catfish4
I actually did provide input.
You have a bigger sample here:
https://www.dropbox.com/s/kvqfhwi14tunouz/skr5l.in
The answer is 638002.
Here is a valgrind log for the code with the input above:
http://pastebin.com/vSYyNsCb
And here comes the controversial part:
On a 32bit system 64bit is faster than 32.
Why?
Less loop jumps and less bitwise operators.
strange but true.
@ne555
Why notvector/whatever:
Because of my approach (code is nice but will explain anyway):
I read the observed places and create a 0 (observed) / 1 (not) array of them (64 / int64_t) (safeT).
Check if it overlaps with places where our dwarf could already be (safe).
If yes OR the tables.
There is the advantage of 64 dwarfs/int.
I do 1 & and 1 | insteaf of 64 = and 64 ==.
Bitset looks promising though, will look into it tomorrow.
And (see warning) scanf is faster than cin even without sync.
Last edited on
> I actually did provide input.
¿where? All I see is the format of the input, but not an example.

> You even have a third level for() loop.
Actually, he doesn't. When that inner loop completes, it breaks the outer one.
So at most it does 2n iterations

> I read the observed places and create a 0 (observed) / 1 (not) array of them (safeT)
> Check if it overlaps with places where our dwarf could already be (safe).
> If yes OR the tables.
The problem is that you are traversing all the `n' elements every time you make an observation.
This leads to O(n^2) which is too big. As n<=1000000 you'll need O(n) or maybe O(n lg n)

The easiest way to reduce that is to traverse only the places that are observed, as the others are '0' would not trigger the condition or affect the OR.
Sorry, you are right, I only wanted to post it.
Anyways here it is:

INPUT:
5 4
3 1 3 4
2 3 5
3 1 2 3
2 1 2

OUTPUT:
2

INPUT:
6 2
4 2 3 4 5
6 1 2 3 4 5 6

OUTPUT:
0

INPUT:
10 4
1 1
2 9 10
7 1 3 4 7 8 9 10
2 1 10

OUTPUT:
4

Really LOVE bitsets.
The problem is the solution is much slower than the original.
Last edited on
Topic archived. No new replies allowed.