Tell me how crazy this idea sounds...

Pages: 123 4

careful Helios :) counting sort...
even so, that is kind of a cheat, and outside the theory/math. It only works in general if you have infinite ram, which we do not, or on small integer ranges (small keeps getting bigger though, it can do a range of millions on a home pc).

while nlgn is the top algorithm performance, and it can be proven, how you do that has been tweaked at least 3 times in my lifetime, maybe more. You can improve the implementations, but not how much work you have to do to get there, in other words.

I don't expect to be replacing introsort anytime soon, unless its a threaded tool, but playing is fun. I never expected to beat std::sort with my shell sort either, but I have played with it off and on for 30+ years.

TheIdeasMan (6855)

@seeplus

TheIdeasMan.IIRC = false;

You are right, I meant std::unordered_map.

Regarding the rehashing, my argument only applies if rehashing is required:

https://en.cppreference.com/w/cpp/container/unordered_map/insert

cppreference wrote:
If rehashing occurs due to the insertion, all iterators are invalidated. Otherwise iterators are not affected. References are not invalidated. Rehashing occurs only if the new number of elements is greater than max_load_factor()*bucket_count(). If the insertion is successful, pointers and references to the element obtained while it is held in the node handle are invalidated, and pointers and references obtained to that element before it was extracted become valid. (since C++17)

cppreference wrote:

If rehashing occurs due to the insertion, all iterators are invalidated. Otherwise iterators are not affected. References are not invalidated. Rehashing occurs only if the new number of elements is greater than max_load_factor()*bucket_count(). If the insertion is successful, pointers and references to the element obtained while it is held in the node handle are invalidated, and pointers and references obtained to that element before it was extracted become valid. (since C++17)

So if one had 8 million items in a std::unordered_map , and then added sufficient items to require rehashing, then one is still better off with std::vector.

@markyrocks

I didn't intend to put you off that much that you would play cards instead of coding. There are plenty of challenging things to do, like Project Euler for instance, that ought to get the brain cells working :+) Also, there is nothing stopping one from trying ideas, testing them and seeing how the results go. That after all is the basis of science :+)

https://projecteuler.net/

seeplus (6653)

One typically doesn't optimize an algorithm by adding arbitrary levels of indirection.

Quote from David Wheeler "All problems in computer science can be solved by another level of indirection"

JLBorges (13770)

> Quote from David Wheeler "All problems in computer science can be solved by another level of indirection"

Here's another frequently cited quote:
"Most performance problems in computer science can be solved by removing a layer of indirection" [unknown]

markyrocks (346)

long story short what I was attempting to accomplish through this process is find a cheap way to do a type of insertion sort. I havent looked at what has been said here but this is what I been working. Its not perfect but I'm continuing to make progress.

I have since working on this figured out that i can avoid the for loops that shift the array by
just swapping out the next number and doing a reverse on the section thats been written to

like if the array was {5,4,3,2,1} the comparisons would write the array as {4,3,2,1,5} then after the reverse operation it would end up {1,2,3,4,5}

again this isn't perfect bc of one small error and I plan on making an == end section aswell but just havn't got there yet.


void runsort(int* ar, int size) {

	int* begin = ar;
	int* end = ar + size-1;
	int t;
	int tt;
	int l_count = 0;
	int h_count = 0;
	int* cc_a = ar + 1;
	int* cc_end = end;

	while (begin < end) {
		l_count = 0;
		h_count = 0;
		cc_a = begin;
		cc_end = end;
		while (cc_a < cc_end+1) {

			if (*cc_a < (*begin)) {
				if (l_count) {
					t = *(begin + l_count);
					for (auto i = begin + l_count; i > begin; i--) {
						*i = *(i - 1);
					}
					*begin = *cc_a;
					*cc_a = t;
					l_count++;
				}
				else {
					_swap(cc_a, begin);
					l_count++;
				}
			}
			else if (*cc_a == *begin && cc_a != begin ) {
				
				
				t = *(begin + (l_count?l_count:1));
				*(begin+1) = *cc_a;
				for (auto i = begin + (l_count?l_count:1) ; i > begin; i--) {
					*i = *(i - 1);

				}
				*cc_a = t;
				l_count++;
			}
			if (*cc_a > (*end)) {

				if (h_count) {
					t = *(end - h_count);
					for (auto i = end - h_count; i < end; i++) {
				
						*i = *(i + 1);
					}
					*end = *cc_a;
					*cc_a = t;
					h_count++;
				}
				else {
					_swap(cc_a, end);
					h_count++;
					cc_end--;
				}
			}
			cc_a++;
		}
		begin++;
		end--;
	}
}

helios (17607)

That's very different from what you described earlier.

AFAICT, that takes O(n^2) time. I haven't analyzed how often the inner for loops need to run, so it might even be O(n^3).

"All problems in computer science can be solved by another level of indirection"

"Except the problem of too many levels of indirection."

markyrocks (346)

Notice how I said earlier that I figured out how to get rid of those inner for loops. That and every time a number gets pushed to the end cc_end(is supposed to) get smaller. I been working out the bugs so that aspect hasn't been fully implemented. The idea is everytime a run of numbers is placed at either end the beginning and end shrink in the same amount. I'm not like the expert on time complexity but the whole array shrinks by at least 2 everytime and im trying to make it so it shrinks by big chunks everytime. I'm thinking that it could possibly shrink like 25% or more each pass unless a pass is really unlucky and the low and hi numbers are plucked right away. The chances of that happening at both ends on the same pass seem very slim.

The idea is very similar as b4. The only difference is the array that's shifting around is the ends which given the first post in this thread Clearly I was trying to avoid.

Last edited on

helios (17607)

Notice how I said earlier that I figured out how to get rid of those inner for loops.

What you described before is largely equivalent. One way or another you'll have an extra n factor on the complexity, unless I'm misunderstanding something.

im trying to make it so it shrinks by big chunks everytime. I'm thinking that it could possibly shrink like 25% or more each pass unless a pass is really unlucky

I think you're overestimating how often you'll get lucky. A randomly shuffled sequence (what's known as the average case for sorting algorithms) will rarely have discernible patterns like long runs of consecutive numbers.

The idea is very similar as b4.

If you say so. I really don't see what's similar about them.

markyrocks (346)

Omg im not basing it off of getting lucky with long runs... it makes the runs.

As it currently stands the way the above code is structured it does the following.

If you have an array... { 5,4,6,8,4,3,6} it looks at the 1st position 5. Checks the end(6) to see if its > no so it moves on. Current position is 4 checks to see if is < begin(5) yes so it swaps {4,5,6,8,4,3,6} Current position 5, checks vs end > no. Current position 8 checks vs end > yes swap {4,5,6,6,4,3,8} Current position becomes 6, checks vs 4 < no.

Current position 4 checks vs begin == yes array shift and it looks like
{4,4,5,6,6,3,8} (were still on the first pass mind you... Current position (2nd 6) check>8 no. Current position 3 checks <4 yes array shift ends up {3,4,4,5,6,6,8}. Obviously at this point it's sorted...

I'm not sure what the problem is with what I'm doing here... Like I said I figured out how to eliminate the shifts the beginning of the array would have ended up looking like 4,4,3... and I would just do a reverse on those 3 digits in one pass to get it in order. Now Obviously you can't say that every pass is going to be perfect but regardless the last digit found as the lowest depending on the number of times this operation happens I think a certain number of them are guaranteed to be in order and the lowest digits. Obviously I'll know more with testing. But whatever that number is the array can be shrunk and those digits ignored. But mind you I'm doing the same operation at both ends.

doug4 (1538)

When measuring time complexity, the big-O corresponds to worst case scenario.

What happens if you start with a correctly-sorted array with no duplicate values:

{3, 4, 5, 6, 7, 8, 9, 10}

You end up checking 3 against all of the other values. Then you check 4 against all of the rest of the values. In no loop does the value being checked get moved at all.

When you count the number of comparisons you make and relate it to the size of the array, the count is a constant multiple of n² (where n is the number of elements in the array). So, the big-O notation for this algorithm is O(n²)

helios (17607)

A'ight.
Just FYI, the implementation above completes 100'000 ints in slightly under 3 seconds, while std::sort() does it in less than 10 ms. Doubling the input size roughly quadruples the run time, which is consistent with a quadratic time algorithm.

(For the time being I'm ignoring that your algorithm doesn't sort properly.)

EDIT: After thinking about it, this just looks like a poorly-implemented selection sort. Each time through the outer loop you're searching for the least and greatest elements in the subarray and putting them at the front and back of the subarray. You also shift some elements around, but that's basically pointless, since you're still moving them into the unsorted region. I can fix your function by commenting out most of it:

void runsort(int* ar, int size) {
    int* begin = ar;
    int* end = ar + size-1;
    int t;
    int tt;
    int l_count = 0;
    int h_count = 0;
    int* cc_a = ar + 1;
    int* cc_end = end;

    while (begin < end) {
        l_count = 0;
        h_count = 0;
        cc_a = begin;
        cc_end = end;
        while (cc_a < cc_end+1) {

            if (*cc_a < (*begin)) {
                /*if (l_count) {
                    t = *(begin + l_count);
                    for (auto i = begin + l_count; i > begin; i--) {
                        *i = *(i - 1);
                    }
                    *begin = *cc_a;
                    *cc_a = t;
                    l_count++;
                }
                else {*/
                    std::swap(*cc_a, *begin);
                    //l_count++;
                //}
            }
            /*else if (*cc_a == *begin && cc_a != begin ) {
                
                
                t = *(begin + (l_count?l_count:1));
                *(begin+1) = *cc_a;
                for (auto i = begin + (l_count?l_count:1) ; i > begin; i--) {
                    *i = *(i - 1);

                }
                *cc_a = t;
                l_count++;
            }*/
            /*if (*cc_a > (*end)) {

                if (h_count) {
                    t = *(end - h_count);
                    for (auto i = end - h_count; i < end; i++) {
                
                        *i = *(i + 1);
                    }
                    *end = *cc_a;
                    *cc_a = t;
                    h_count++;
                }
                else {
                    std::swap(*cc_a, *end);
                    h_count++;
                    cc_end--;
                }
            }*/
            cc_a++;
        }
        begin++;
        //end--;
    }
}

Now it sorts correctly, although it's a fair bit slower than a naive selection sort:

void selection_sort(int *a, int n){
    for (int i = 0; i < n - 1; i++){
        int min = i;
        for (int j = i + 1; j < n; j++)
            if (a[j] < a[min])
                min = j;
        if (min != i)
            std::swap(a[i], a[min]);
    }
}

Last edited on

markyrocks (346)

dude Im not sure why you keep cutting me down. you even said stuff is being moved around bc you don't know why. If you ain't even going to bother figuring out what its supposed to do then don't bother replying. i could also make some lame swap bubble sort in seconds. thats not the point of this exercise. I SAID THAT IT WASN"T WORKING CORRECTLY and that it was a work in progress.... I just posted it so maybe you could have a better understanding of what i was trying to accomplish. I was in a hurry this morning and I had to go do that thing... what do they call it? O work.... Like seriously dude I couldn't really care less about your condescending attitude. I've been pretty polite during this entire conversation and you just don't quite. This is just a hobby for me I'm successful in every other aspect of my life. I'm pretty sure that I'm an expert in quite a few things that you don't have the slightest clue about. If what you wrote above is supposed to be impressive or something and you consider yourself to be some master codesmith , excuse me while I go yawn...

Did I make any claims about the above code being anything special or super fast or anything along those lines? No, so thanks for running a timer on a piece of code that you me and everyone else knows isn't going to break any speed records. Not sure why you say it like you opened the curtain to some magical piece of information.

Like i'm not even mad, I'm just annoyed that you keep going this route. This is exactly why I come and go from these forums bc theres always some wanabe knowit all dick weed. If you were so fantastic you'd be working at google or something. Not wasting your life away in some corner of the basement internet forum. Clearly you have been sorted and....this is where you were placed... You made it all the way to the top of the beginners forum... #lifegoals

helios (17607)

You're only making yourself look foolish with that infantile tirade.
I'm only going to respond to this because you're insulting other people by implication:

If you were so fantastic you'd be working at google or something. Not wasting your life away in some corner of the basement internet forum.

Are you implying everyone who browses this forum is, to use the terminology I imagine you'd use, a loser? I'm confused why you're bothering to post here if that's what you think.

markyrocks (346)

snip

Last edited on

TheIdeasMan (6855)

markyrocks wrote:
Did I make any claims about the above code being anything special or super fast or anything along those lines?

Here is the first paragraph of the first post, emphasis mine:

markyrocks wrote:
]Ok so I've been playing around with pointers and sorting ALOT almost too much. An idea that keeps reoccurring in my mind is if it's possible to find a way to slide pointers down an array without doing all the swapping... hence efficiency.

It's not that anyone to trying to cut you down, it's just we are pointing out just how inefficient your code is, with constructive criticism. Ideally, you need to see this constructive criticism for what it is, and accept it, and learn from it.

helios has provided some timing (for the first time in this topic) that showed your code being 300'000 times slower than std::sort, and it has quadratic complexity. You say your code isn't complete yet, but do you really think you can make a marked improvement on that result?

Like I said earlier, science is all about doing some experiments with controls, getting data, and analysing that data. Not everything scientists do works, but they see the bad results and move on with lessons learnt.

Part of what is going on here, is what I call small data syndrome, it's a common thing for beginners IMO. At school, they are given assignments: Sort these 50 numbers; or put them in a linked list ; or put them in a binary tree. This is all well and good, but in the real world computers deal with millions or billions of data items. So the problem is that people try out their new ideas with a very small data set, and don't realise just how bad it gets with a larger data set.

I am guessing that you haven't yet done the course on Algorithms & Data Structures? In that course you will learn that a bad algorithm can take billions of years to do something, but a good one can do the same job in less than 1 second. An example is Euler Project problem 67, and that has a small amount of data (A tree with a depth of only 100 )

So my advice is to learn more about the good sorting algorithms that are out there, like quicksort for example. In my mind this is way better than starting with a bad algorithm and trying to "fix it up".

againtry (2313)

@OP
I ran your code at your latest post but modified main() to time various size arrays with the following results. I am in the process of trying a billion data items using std:sort and will get back to you with the timings. In the meantime thanks for having a go.
PS I would be surprised if a dataset of a billion items is ever sorted as a single exercise.

int main()
{
    int LIMIT{10};

    for(int j = 1; j < 5; j++)
    {
        LIMIT *= 10;
        int* a = new int[LIMIT]{};
        int* b = new int[LIMIT]{};

        for (int i = 0; i < LIMIT; i++) {
            a[i] = rand() % LIMIT;
            b[i] = a[i];
        }

        auto t1 = Clock::now();
        newrunsort(a, LIMIT);
        auto t2 = Clock::now();
        std::cout
        << std::setw(10) << LIMIT << " time diff "
        << std::setw(12) << std::chrono::duration_cast<std::chrono::nanoseconds>(t2 - t1).count()
        << " nanoseconds, ";


        auto t3 = Clock::now();
        std::sort(b, b + LIMIT);
        auto t4 = Clock::now();
        std::cout
        << std::setw(12) << std::chrono::duration_cast<std::chrono::nanoseconds>(t4 - t3).count()
        << " nanoseconds\n";

        delete[] a;
        delete[] b;
    }
}


100 time diff        10567 nanoseconds,         4537 nanoseconds
      1000 time diff       801741 nanoseconds,        28542 nanoseconds
     10000 time diff     61704220 nanoseconds,       390369 nanoseconds
    100000 time diff   6054458822 nanoseconds,      4824304 nanoseconds
Program ended with exit code: 0

markyrocks (346)

@agantry I have to get back to it. Theres still optimizations in my code that I have yet to work out all the kinks. You are right I'm thinking I need to do a couple different things to get it in-line. It looks like the results you're getting are different than mine. It appears that std sort is faster working with memory on the heap. I also figured out another fatal flaw I was doing last night. But after all this is about learning.

@theideasman. Posting my code again just for the sake of commenting it all out is just a dick move. I'm all for constructive criticism but that's just sucking up space on the thread. I appreciate what you're saying but the comments about timing, controls etc are just basics on how to run an experiment... no doubt. I realize people don't know how much I know but when I hear stuff like that I'm assuming that whoever is saying it thinks I'm an idiot so I usually just respond with snarky comments. I know how to run an experiment. This thread may have started off poorly bc I do get some wild ideas sometimes. Outside the box thinking is how new stuff gets made. I'm sure many great people were looked down on by others bc of their wild ideas... like the guy that figured out the earth was round...what a quack. Don't worry guys just keep coding inside the lines and you don't ever have to worry about looking foolish.

Last edited on

helios (17607)

It appears that std sort is faster working with memory on the heap.

Unless you're on some really weird architecture, memory is memory. The only difference in access time could be if one region is in cache and the other isn't, hence what I said earlier about controlling for cache warming.

Posting my code again just for the sake of commenting it all out is just a dick move.

How is that a "dick move"? I don't get it. I didn't say anything that wasn't objectively true. Your function was not sorting before and after I modified it it worked correctly. Would you prefer it if when you made mistakes people never say anything and let you continue in your error?

TheIdeasMan (6855)

markyrocks wrote:
It appears that std sort is faster working with memory on the heap.

Like I mentioned earlier, there are 3 things: move semantics ; concurrency (multi threading) ; and cache effects.

But one has to have a good algorithm first: otherwise one is like a one arm brick layer in Berlin in 1946 :+)

The trouble with putting all the data on the stack, is of course stack overflow.

The other thing for you to realise is that if one puts stuff on the internet, and there are others with more knowledge / experience that think it is wrong, then they will say so, even if it is a minor thing.

By all means have a go, but also realise that people will have much more respect for you if you respond with some acceptance and constructive questions of your own.

markyrocks (346)

The things you said were true but as I have said b4 I had already said that it wasn't working correctly.... so how is it constructive for you to repeat what I already said and know.. just a waste of time . I wasn't asking for help so when I need it I'll let you know.

Last edited on

Pages: 123 4

C++

Forum

Tell me how crazy this idea sounds...