[Mode Function] Help needed

Hello Community!

I once again hope for some help with the following problem. I have a function that should find the mode in an array. Here is my initial function (and please ignore the name for array 1: virus - this is just sort of the theme of my program (Think of Rockman.exe the game (GBA)):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
int getMode(int *virus, int *frequency, int numVir)
{
   int count = 0,
       index = 0,
       mode = 0;

   for (index = 0; index < numVir; index++)
   {
	  count = 1;

	  while (*(virus + index) == *(virus + index + 1))
	  {
		 ++count;
		 ++index;
	  }
	  
	  *(frequency + index) = count;
   }

   for (index = 0; index < numVir; index++)
   {
	  if (*(frequency + index) > 0)
		 cout << *(virus + index) << setw(17) << right << *(frequency + index) << " \n";

   }
		 

   return mode;
}


Above is my initial function, the array is compared to itself, counter 1 counts the frequency, and counter two iterates through the loop. The other for loop is just for testing purposes and not needed beyond checking the count is correct. Here is my second version, determining the mode, and returning it:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
int getMode(int *virus, int *frequency, int numVir)
{
   int count = 0,
	   index = 0,
	   mode = 0,
	   frLow = 0;

   for (index = 0; index < numVir; index++)
   {
	  count = 1;

	  while (*(virus + index) == *(virus + index + 1))
	  {
		 ++count;
		 ++index;
	  }
	  
	  *(frequency + index) = count;

	  if (*(frequency + index) > frLow)
	  {
		 frLow = *(frequency + index);
		 mode = *(virus + index);
	  }
   }

   return mode;
}


Now here is my problem with it ... and for the sake of it can't figure it out ... and believe me, I have been working on this single condition + the function as such to get it the way it is, since over a week now. All I need is a second condition in the if statement that takes into account that there can be more than one mode. And it has to be such ways that there are no ill side-effects. Meaning that all of a sudden finding 1 2 2 2 4 4 7 (There is no mode). Here is an example of what I tried (one of dozens of tries ...)

1
2
3
4
5
 
    if (frLow == 1 || frLow % 2 == 1)
    {
       return -1;
    }


Above would work for sequences such as 2 2 2 3 3 3, but anything above that, and even IF there is any higher value, no mode is detected. It is for crying out loud to know that the solution probably is very simple, yet I can't figure it out ... So, please help me with this!
Last edited on
Hello Misenna

I can speak for myself only, but without whole code is difficult to test any suggestions. Provide tools to help you and then someone might be able to do just that ;)

have a great Sunday.
Last edited on
Hello xxvms!

Thanks, the same to you, and thank you for taking interest in this topic && my problem. You are probably right, so here goes. And don't look too closely, most my source comments are missing, but things should hopefully be clear - my somewhat strange naming convention aside. ;)

/Code removed/

Last edited on
So I finally solved it (I hope nobody was going thru the trouble of trying to help me in the meantime ... ) Here is my solution in the hopes that it'll help someone else out there. :)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
int getMode(int *virus, int *frequency, int numVir)
{
   int count = 0,
	   index = 0,
	   mode = 0,
	   frLow = 0,
           total = 0;

   for (index = 0; index < numVir; index++)
   {
	  count = 1;

	  while (*(virus + index) == *(virus + index + 1))
	  {
		 ++count;
		 ++index;
	  }
	  
	  *(frequency + index) = count;
	  
	  if (*(frequency + index) > frLow)
	  {
		 frLow = *(frequency + index);
                 total += *(frequency + index);
		 mode = *(virus + index);
	  }

	  else if (frLow == 1 || frLow == *(frequency + index))
	  {
		 return -1;
	  }
   }

    return mode;
}
Last edited on
@Misenna,

Suppose your data is
1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5

What do you suppose the mode(s) is(are)?
lastchance, thank you for pointing out the mistake I made ... (Made me feel terribly stupid reading your question, and considering the answer which should be: 2, 3, and 4, making this a tri-modal, correct?

I've been thinking this through, and tried to solve the problem that I now have, which has changed just a little. And again ran my head against a wall, failing to see the obvious (short) answer to it ... Or let me rather say the short answer that there has to be that I fail at seeing.

Consider there is a set of numbers looking like this 3 3 3 4 4 4 (3, 3) == no mode.

I just can't seem to manage to get to say that if they are equal (3, 3) - which, considering I have my counter, it counts correctly, and as far as mode(s) goes does what it should do, should be child's play - yet I fail at it ...

So, any hint as to how to go about solving it so I can finally move on to the next project my book has to offer, would be highly appreciated!
@Misenna

The mode is a useful statistic. It is how democratic elections work (usually; the USA seems to have produced a counter-example recently). It is also the only one of the measures of central tendency - mean, median and mode - that isn't necessarily numerical, and also the only one that must take a feasible value for the data (as opposed to fractional values in between).

A set of data can have multiple modes. In the extreme case, where all the data are different, there could be as many modes as there are data points. In the example I gave there are 3 modes ("tri-modal", as you say.)

Because you don't know in advance how many modes there are, then if you want to return them (as opposed to just calculating and printing them out) then the appropriate vehicle is a vector (vector<int> in your case).

I have had a go at something similar recently:
http://www.cplusplus.com/forum/beginner/210951/#msg989022
This is templated (i.e. can work for other data types than ints) and you would have to adapt it for your own use and type of dataset. However, I have tried to comment it appropriately.

If you don't want this, there are other ways of computing modes; e.g.
- for small sets of possible values, like days of the week, you might create an array and set up the tally, then print out the maximum;
- my example is a little inefficient, insofar as it keeps emptying a vector when it finds a new mode and putting that single value back in; an alternative would be to parse the array twice: the first to identify the modal count (i.e. highest frequency) and the second to either push_back() the values into a vector (if you want to return them) or just print them out (if you don't).

Hope this helps.
Last edited on
Let me thank you for the explanation, and pointing me to your code! Indeed it is helpful, and will consider it for future reference - you never know! (And also thanks for the laughs - referencing some recent elections *haha*)

Now, back to the problem ... I think it makes sense to explain what my book asks, which isn't much:

Find a mode in an array (check)
Use pointers (check)
Return it if one is found (check), return -1 if there is none.

All of the following:

- Histogram, returning all modes etc. is unasked for. But as far as histogram goes, I wish to implement this. (Which is no problem, the virus array is sorted, and my frequency holds the correct count). Returning all modes, also no problem, for the same reason.

My real problem is this, which - I think, is not addressed by your code example, nor any of the others I saw in the topic you so kindly provided:

If any number higher than - or equal to - 1 is present, and they are distributed equally in a set of numbers, meaning there is no number higher than any of the others:

1 1 1 1 1 1 (1, 1)
3 3 3 6 6 6 (3, 3)
1 1 2 2 3 3 (2, 2, 2)
5 5 5 5 1 1 1 1 (4, 4)

My program should say - no, there is no mode - or simply - return -1;

This is what I don't manage to get across in code ... Meantime before reading and considering your latest answer I tried sorting my frequency array. Playing on the idea that:

- If the largest frequency, and/or equally distributed frequencies are at the front, I could say:

1
2
3
4
5
...
if (*(frequency + index) == *(frequency + index + 1) || *(frequency + 1) < (*frequency + index))
{
    return -1;
}


I also tried making use of the function that sorts my virus array - making it a dual-sort, having the smallest values at the front, which would be 0, saying:

if x < x + 1 - return - 1;

Or simply something as simple as:

1
2
3
4
if (frLow == 2) 
{
   return - 1;
}


No matter which way, all I manage to produce is one of these two outcomes:

- Either a mode was detected in a set such like 1 4 4 4 5 5 5 - or no mode in a set like 5 5 5 6 6 6 6 2 2.

This is where it is at right now, and myself unable to get past this very problem. Ultimately I'm out of ideas and the only thing I know is there is some simple if statement that would solve it, without introducing any additional sort function, which is what I'd ask for.

-

Maybe it is just my habit of creating a problem out of nothing (or wishing to do more than I'm able to ... I guess at my age I should know better ... but, *sigh*)

What I'm getting at is this: I simulated a 50.000 random numbers search for a mode just now, the numbers being in the range of 1 to 65, and it finds the mode without fail ... And while there are some certain equal values in high numbers, it wouldn't occur that the frequencies are 3 / 3 (or whatever). So I can't help but think that I should just ignore this problem as entering numbers manually is also not asked for in the first place. It only occurs if a user would enter the numbers, yet still I can't help feeling that this problem should find an answer (a very short one that there must be at that ...) Any thoughts?
Last edited on
hey lastchance, could you pls explain a bit further what you meant by ...

The mode is a useful statistic ... It is also the only one of the measures of central tendency - mean, median and mode - that isn't necessarily numerical, ...

thanks
Last edited on
And I allow myself to chime in with one more (very short) question so perchance both yours gunnerfunner as well as mine will find an answer in one go.

I was trying to educate myself about mode a little more - as this is the central question. As was established, a set can have many more than one mode. In all of my above examples, I was of the impression that there is no mode, because no one value is appearing more often than any of the others.

Yet, I was consulting http://www.calculatorsoup.com/calculators/statistics/descriptivestatistics.php (An invaluable website for all sorts of calculations)

When entering 3, 3, 3, 4, 4, 4 - it gives me Mode: 3, 4

So, I simply don't understand, what is the logic behind this? On the one hand explaining that:

Mode
The value or values that occur most frequently in the data set.


on the other hand giving me two modes as a result? (And this is not the only trusty website that gives me this exact result). If this, and other examples, basically all of the above I thought aren't modes, are modes, except 1, ..., n - 1 being no mode, what is this with the frequency, what makes the above or things like 5, 5, 5, 5, 6, 6, 6, 6 - a set of two modes, 5 and 6? (I tried finding the answer to that one but couldn't so ...)
Last edited on
hey lastchance, could you pls explain a bit further what you meant by ...

The mode is a useful statistic ... It is also the only one of the measures of central tendency - mean, median and mode - that isn't necessarily numerical, ...

thanks


Hi @GunnerFunner,

Consider the following dataset:
{ red, green, blue, blue, green, blue }
The mode is: blue
The mean and median are meaningless (no numbers or order attached to each data item), but the mode still exists.


Or consider a representative set of voters in a recent election.
{ Trump, Clinton, Clinton, Trump, Clinton }
No mean or median definable. The Americans will have to explain how they came up with the mode.


I'll have to look at your question when I get home from work, @Misenna - apologies.
so simple, yet so elegant. cheers lastchance
Thank you for being so considerate, but there is really no need to apologize for anything! Whenever you find some time to answer my question is fine, no matter it being today or tomorrow. :)

I must also thank you yet again for helping me. Without your input I would still try to fix things not broken in the first place, meaning my code. So, as I already mentioned, whenever you find the time. :-)
Hello @Misenna

I think you are finding the all-too-common fact that different sources use different definitions. We often encounter this in real life. To our HR department "positive feedback" is a good thing; to a control engineer it is usually a disaster!

The definition that I would always work to is that a mode (and there may be several) is a value in a given dataset with the highest frequency of occurrence. On this basis, all non-empty datasets will have at least one mode, and many will have several. Unfortunately, your book (you don't say which it is) seems to be saying that if there are two (or more) values with the highest frequency then there is no mode. I'm afraid that I (and all my colleagues) would say that this situation simply had several modes. We quite regularly talk about bimodal distributions, for example.

I'm afraid that I would struggle to write sample code in your style (and I'm afraid I don't fully follow your version with pointers as it is unclear to me whether frequencies have already been counted or are being calculated in the routine). If you want to reject cases with more than one occurrence of the highest frequency then simply keep a count of items with the current highest frequency. For my sample code, simply test whether m.size() > 1, where m is my returned vector of "modes".

One further point related to modes of non-numerical data (like red/green/blue or c++/fortran/python etc.) is that value + frequency pairs can quite usefully be stored in a map.

Enjoy your coding.
Hello lastchance!

My book, Chapter 9, (which is Starting Out With C++, From Control Structures to Objects (a fantastic book!), in essence, says the following:

The function should determine the mode of the array. That is, it should determine which value in the array occurs most often. The mode is the value the function should return.

There is no mention of more than one mode and how to handle the situation (Maybe the author thought this is too difficult? Who knows). So when I started working on this one, I did my "homework". Which functions to use, what each should do, writing a small driver program to test things in small scale, so on so forth. What I didn't do this time around, as things seemed clear enough, is doing some research on the topic. This, I guess, is what I like most about learning to code in C++ (which, of course, I don't learn for the sake of learning it, but for a purpose) - I learn a thousand things besides writing code, making it all the more worthwhile to solve every problem the book has to offer. But I digress.

Of course I did some sort of research, meaning that I soon found out about the possibility of there being more than one mode. So I wanted to take this fact into account, which, without - would just make my program feel incomplete. This is also where my problem started ... The reason being that, for whatever reason, I started thinking that if all numbers in a set are equally distributed, there can be no mode. The first question you asked is the best example: What are the mode(s)?

The answer came easy: 1 2 2 3 3 4 4 5 has three modes, because 1 and 5 are both in there only once, while 2, 3, and 4 are present twice each, making it three modes. In the same vein I was looking on my problem, which led to my conclusion that:

A set consisting of these numbers 3, 3, 2, 2, 1, 1 has no mode at all, because they are equally distributed, and no number is higher or lower. The long and short of it is: No number higher or lower, no mode, return -1, get rid of it all (which I failed to do.) Now you are saying, and I allow myself to quote:

I'm afraid that I (and all my colleagues) would say that this situation simply had several modes. We quite regularly talk about bimodal distributions, for example.


And this is exactly the answer I was looking for, solving all my problems! And just to make absolutely sure that we are on the same page on this: 2, 2, 3, 3 = bimodal, 2 modes? Which is also what your code returns as a result when given the following input: 3, 3, 3, 2, 2, 2, (Modes are 3, 2), as does calculator-soup, and some other online calculator sites offering to find median, mode, mean, ...).

-

I'm afraid that I would struggle to write sample code in your style (and I'm afraid I don't fully follow your version with pointers as it is unclear to me whether frequencies have already been counted or are being calculated in the routine)


And I would never ask for that (or any code for that matter, as it is my goal to solve the problems on my own, as this is - after all, the purpose of learning. Going by the hints being offered when I struggle and have to ask.) I only published the code, not only of course to maybe get an answer, but also to show that I did all the work, and not simply ask: "Please solve a problem, give me code ..." But I would still like to explain my function and what it does, and how it does it.

Before the function to determine the mode(s) is entered, the virus array is sorted in ascending order. The first while loop compares the virus array to itself, and increments count each time an equal number of numbers is found. Meaning 1, 2, 2, 2, 3, 3, 3, 5 - and, upon exiting this first loop, this count then is stored in the appropriate position in the frequency array, looking like this 1, 2, 3, 0, 0, 1.

In the if statement that follows, all the rest of the work is done, in determining the highest frequency if there is any. frLow is assigned frequency[index] containing all the counts, and mode is assigned virus, signifying the highest number that is encountered in frequency, based on the index. (I hope this explanation makes sense. :))

frequency in this function, and on the whole, should serve the following purposes:

1: To keep track of count
2: To find the mode
3: To determine whether there is A mode. (This comparison should happen in a second if-statement.) And if there is no mode (going on the believe that - as mentioned - 1, 1, 2, 2, 3, 3, == no mode, should simply return -1).
4: To create a histogram in another function

Once all this is done, the correct mode should be returned.

It finds the mode, or modes, and the results have been correct so far. If there is any more than one, this isn't taken into account by the function. It only ever returns the most frequent one. I will alter this, so that if there is A mode, then this is returned and displayed accordingly, and if there is more than one, I use the frequency array to display all of them in a separate function.

In essence I do not wish to discount anything but the highest mode. If there is more than one the output to screen should reflect this fact. Anything else I would consider making my program seem incomplete.

As to your suggestion about using a map, I haven't learned that. The farthest I have come is pointers, next lesson being *lemme think* Characters, C-Strings, and I believe what is called String class. But there is some more problems to be solved, among which finding the median and average. ;-)

After this wall of text, I'd like to sincerely thank you for answering my question, which has helped me getting rid of a problem which never was. (Until I managed to turn it into one ....). You be sure to be given proper credit for your invaluable help!
Last edited on
Topic archived. No new replies allowed.