Algorithm for data row decimation

Forum

Forum
General C++ Programming
Algorithm for data row decimation

Algorithm for data row decimation

Hello,
my problem is how to show many points in graph. Many points mean over milions points. I need some kind of decimation method with minimum/maximum value detection.
For example I have 10E6 data points and my gprah has 1E3 points. So each 10E3 from data points must be reduced into one pixel in graph. It is simple for this divisible values. Question is what algorithm use for non commensurable values? Any examples or ideas for my problem?
Thank you

Duthomhas (13147)

What kind of graph are you displaying? An x/y plot?

tcs (519)

I would try using a Convex Hull algorithm (see http://en.wikipedia.org/wiki/Convex_hull). With the convex hull it would be easier to find the min/max rectangle.

spiritus (5)

Iam using "normal" graph with one continous curve where is time on X axis and power on Y axis. My data is one dimensionl array which I need to decimate into output array. Output array has same length like pixel count in graph.

I think that algoritm for convex envelop is too robust for my purpose. I cant imagine the speed of it for millions points.

Duthomhas (13147)

Convex hull is not the answer here.

How scattered are your data? Can you simply sample it to get a representative graph?

spiritus (5)

Scattered? It means what variation is? It is not predictable. It could be random. One value is not depand on previous.
Simple sample is not possible. Beacuse in representativ graph you must show min and max value. This min and max valueas are important to consider the original curve.
I thing sampling function must take in care not only one point but range around certain point and compute min/max and rhis min/max value show in representatuve graph.

Duthomhas (13147)

Which of these does your data look like?

 | .
 |         .        .
 |  .   .     .
 |
 |     .  .       .
 |  .                .
 |              .
 +-----------------------
 
 
 |
 |
 |         .   .
 |    .  .  . .  .
 |   . .    .     .   .
 |  .               .  .
 |
 +-----------------------

Because if your data looks like the first (because it's random) then there is no representative graph.

The way you represent your data really does have a lot to do with what the data is and what kind of information you want to get from it. Your OP specified neither, and so far the best answer I've gotten about it is that it is a "normal" graph and a vague description of an X-Y plot.

With so much data, if you want to decimate it in some way for purposes of quickly graphing something that represents it fairly well, then random sampling is actually your best choice.

So, again, do you want to best-fit it to a curve?
Are you interested in outliers? (Like maximum power output?)
Is there any bias to the data?

spiritus (5)

For my purpose both kinds data sets are possible and decimation algorithm must working for each of them. I understand that for the second type you can aproximate it by flat curve. But in my ouptut Iam not interested in flat curve beacuse I need to see all local maximum and mimimum in original set.

I try it to better describe on this example:
I have original long input number sequence (in real case i have 1E6 points here):
23, 12, 45, 67, 34, 23, 56, 04, 34, -20, 45, 67, 98, 23, 44, 875, 875, 76, 87, 12, 87, 56, 35, 98, 12, 34, 98, -39, 23, 45.

And I have output graph which have 5 pixels in width (in real case graph has aprox. 1000pixels width. 5pixel is only for example and it is too small and it is nonsense).

So input sequence has 30points and output graphs must have 5 pixels. Sop decimation factor is 30/5 = 6. Than I can use easy algorithm to find local maximum and minimum:
Min1 = minimum(23, 12, 45, 67, 34, 23);
Max1 = maximum(23, 12, 45, 67, 34, 23);
Min2 = minimum(56, 04, 34, -20, 45, 67);
Max2 = maximum(56, 04, 34, -20, 45, 67);
Min3 = minimum(98, 23, 44, 875, 875, 76);
Max3 = maximum(98, 23, 44, 875, 875, 76);
Min4 = minimum(87, 12, 87, 56, 35, 98);
Max4 = maximum(87, 12, 87, 56, 35, 98);
Min5 = minimum(12, 34, 98, -39, 23, 45);
Max5 = maximum(12, 34, 98, -39, 23, 45);

In output graph on the first pixel i draw vertiacal line from Min1 to Max1. For the seconds pixel I draw vertical line from Min2 to Max2 ... and so on..... And this graph constructed by this way represent the input sequence and showing all minimums and maximums from input sequence. Up to this point everything is clear for me and I wrote this algorithm.

Problem starts when length of input set cannot be devided by number of pixels in output graph without remainder...
Or exists some better way how to achieve this result?

Now is my illustration better? Is it more clear what I expected from deciamtion algorithm?

Last edited on

vaportrack (20)

I'm not understanding the end game of what your graph would be.
Do you want a streaming graph that redraws as input progresses or a graph from a total set?

The later is much easier since all you have to do is divide total data by pixels of graph and just average each subset. That gives you a line graph.

For something like a blocky fat looking graph you would want to draw a line from min to max value for each subset in that anything in between will be drawn over anyway.

Like graphics programming, a streamed graph will require a redraw for each addition to the set. This will need a state machine type of algorithm due to the unknown facet of whats the next data going to be.

You could at a lot of overhead re-sample/calculate the entire set periodically for each update or more smartly utilize a statistical analysis algorithm in resizing the graph you have already drawn applicable to the new data that arrives.

The simplest I can think of is just a plain old scalar; a scalar matrix based upon max Y and new Z cords that are always changing.

Smac89 (1727)

When you say "point", the first thing I think of is points on an X-Y plane. To me, this means that each point is a tuple consisting of an x and a y value. But in the example you have shown, you are showing decimal numbers and calling them points. Do those numbers represent some sort of calculation made by using the "points"? If not, then what do they represent?

Also when you are talking about vertical lines from min to max, I thought of least squares lines which are lines that are drawn on a graph so that each line approximates/hits as many points as possible. Is this what you wanting to achieve?

Also found something on mesh simplification
http://herakles.zcu.cz/~skala/PUBL/PUBL_2002/2002_Mesh-Simplification-ICCS2002.pdf

Is this related to what you are wanting to do?

Last edited on

spiritus (5)

The later is much easier since all you have to do is divide total data by pixels of graph and just average each subset. That gives you a line graph.

Yes. But Iam not sure how to solve problem of indivisible total data length and pixel count. And Iam not calculate avg of subset but minimum/maximum of subset.

When you say "point", the first thing I think of is points on an X-Y plane

My task can be simplyfy to 1D problem. In real case I have points with XY coordinates. But algorithm which i need to find can working only with Y values. It is corespnding to my latest example...

....approximates/hits as many points as possible Is this what you wanting to achieve?

May be Minimum Maximum little complicate my case. In first step there can be find maximum only for each subset. And it is enought for me (all rest of task like plotting, finding X coordinates, repainting, zooming... are ready and working).
I wrote it on my previos letter:

One million numbers must be reduced to 1000 numbers. Where this 1000 numbers describing the maximum value of each subset from million numbers. When the lengths are divisible algorithm is very simple (i try it describe in my previous text).
This is very simple resampling algorithm I think.

JLBorges (13770)

> When the lengths are divisible algorithm is very simple

Since the orders of magnitude are not even close, just make the lengths divisible.

For instance, if the number of points in the input sequence is 1284342 and the number of pixels are 1000, break up the first 1284000 points into 1284 intervals of 1000 points each.

There are three simple options to deal with the tail containing the residue of 342 points:
a. discard the tail
b. extrapolate the tail to make 1000 points and add 1 more interval (1284+1)
c. discard if number of points in the tail is less than 500, extrapolate otherwise. (This is probably what I would do.)

Topic archived. No new replies allowed.

C++

Forum

Algorithm for data row decimation