Bilinear interpolation in c++ RGBA image?

I'm currently using this function to use bilinear interpolation during emulation:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
//Divide data for fuse_pixelarea!
#define DIVIDED(v,n) (byte)SAFEDIV((double)v,(double)n)
#define AVGRGB(r,g,b,n) RGB(DIVIDED(r,n),DIVIDED(g,n),DIVIDED(b,n))

inline uint_32 fuse_pixelarea(word x1, word y1, word x2, word y2) //Fuse an area together into 1 pixel!
{
	uint_32 numpixels = (((x2-x1)+1)*((y2-y1)+1)); //Ammount of pixels!
	uint_32 r = 0, g = 0, b = 0;
	if (numpixels==0) return 0; //No pixels = no data!

	word y; //Current coordinates!
	for (y=y1;y<=y2;)
	{
		uint_32 *pixel = &GPU.emu_screenbuffer[y][x1]; //Start X to begin with!
		uint_32 *lastpixel = &GPU.emu_screenbuffer[y][x2]; //End X!
		for (;;) //Loop for each pixel!
		{
			uint_32 thepixel = *pixel; //Get the pixel!
			r += (thepixel&0xFF); //Red!
			g += ((thepixel>>8)&0xFF); //Green!
			b += ((thepixel>>16)&0xFF); //Blue!
			if (pixel==lastpixel) break; //Next row when needed!
			++pixel; //Next pixel!
		}
		++y; //Next Y!
	}
	
	return AVGRGB(r,g,b,numpixels); //RGB value of all pixels combined!
}


Is there any way to make this faster? It's currently taking about 1.75 seconds just rendering a mode 3 (see interrupt 10h video modes) from VGA (720x400 pixels, 24-bit colors rendering). SAFEDIV is just a safe divide function, making divide by 0 resulting 0. (#define SAFEDIV(x,y) ((!y)?0:(x/y)) )
I think I know how,

basicaly you need to run across all the red of all the pixels and sum them up and then divide by the number of pixels, right? same goes for the blue and the green.

If we can assume that each color takes exactly one byte, then you can create three uint_8 pointers and increment them sepperatly something like this.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
Uint32 red,blue,green;
int w = x2-x1 +1;
int h = y2 - y1 +1;

//now we sum using all three pointers.
for(int i=0; i<h; ++i){

   Uint8 * r = &GPU.emu_screenbuffer[y1 + i][x1];
   Uint8 * g = r + 1;
   Uint8 * b = g + 1;

   for(int j=0; i<w; ++i){
      red += *r;
      blue += *b;
      green += *g;

      r += //bytes per pixel
      b += //bytes per pixel
      g += //bytes per pixel
   }

}

red = red/pixel_num;
green = green/pixel_num;
blue = blue/pixel_num;

return red&(green << 8)&(blue << 16)


didnt test anything.
But I believe that this will work faster. That is because you dont need to break each pixel into the sepparate colors, you imidiatly arrive at the next red of the next pixel each itteration inside the loop.

what graphic library are you using?
Last edited on
Unfortunately, accurate linear down-scaling is an inherently costly procedure. Over a second is far too slow, though. Are you sure you have optimizations turned on?
@TinyTeeTree: I'm not using any library atm (OSLib is installed afaik).

I've rewritten the code a little, it now runs at about 4fps (240000 microseconds/250ms per frame on average (varying between 230ms-250ms, mostly ~245ms)).
Is it possible to make this run at 60fps (since the VGA I'm emulating is running at 60fps too afaik)?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
#define DIVIDED(v,n) (byte)SAFEDIV((double)v,(double)n)
#define AVGRGB(r,g,b,n) RGB(DIVIDED(r,n),DIVIDED(g,n),DIVIDED(b,n))
#define CONVERTREL(src,srcfac,dest) SAFEDIV((double)src,(double)srcfac)*(double)dest

void GPU_fullRenderer()
{
	if (!(GPU.GPU_screenxsize|GPU.GPU_screenysize)) //Nothing to render?
	{
		memset(GPU.psp_screenbuffer,0,PSP_SCREENBUFFERSIZE*4); //Unknown resolution to buffer!
		return; //Done rendering!
	}

	word x1,y1,x2,y2; //Buffer start&end!
	word y,x; //X&Y in the emulated screen!
	uint_32 r,g,b; //RGB seperated sum values!
	uint_32 pixels; //Ammount of pixels to process
	//Since we have something to render, render it!
	int pspy,pspx; //psp x&y!
	double xfactor = (1/PSP_SCREEN_COLUMNS)*GPU.GPU_screenxsize;
	double yfactor = (1/PSP_SCREEN_ROWS)*GPU.GPU_screenysize;
	byte *pixel; //Current pixel A, B, G, R data!
	for (pspy=0;pspy<PSP_SCREEN_ROWS;pspy++) //Process row!
	{
		for (pspx=0;pspx<PSP_SCREEN_COLUMNS;pspx++) //Process column!
		{
			//Calculate our buffer min/max coordinates!
			x1 = pspx*xfactor; //Current x
			y1 = pspy*yfactor; //Current y
			x2 = (pspx+1)*xfactor; //Next x
			y2 = (pspy+1)*yfactor; //Next y
			pixels = (y2-y1)*(x2-x1); //Ammount of pixels we count for!
			r=g=b=0; //Reset RGB!
			//Calculate total ammount of pixels!
			for (y=y1;y<y2;) //Process all rows we represent in our pixel!
			{
				for (x=x1;x<x2;) //Loop for each pixel!
				{
					pixel = ((byte *)&EMU_BUFFER(x,y))+1; //Load our pixel, skip A!
					b += *pixel++; //Add B!
					g += *pixel++; //Add G!
					r += *pixel; //Add R, don't increase, because we're the final byte!
					++x; //Next pixel!
				}
				++y; //Next row!
			}
			PSP_BUFFER(pspx,pspy) = RGB(r/pixels,g/pixels,b/pixels); //RGB value of all pixels combined!
		}
	}
}


Makefile parts:
1
2
3
4
5
6
OPTIMIZATIONFLAG = -O10
...
CFLAGS += $(OPTIMIZATIONFLAG) -G0 -Wall
CXXFLAGS += $(CFLAGS) -fno-exceptions -fno-rtti
CXXFLAGS += -fexpensive-optimizations
ASFLAGS = $(CFLAGS)
Last edited on
3 nested, loops is a no go, thats O(n^3). Don't know how to help you with bilinear interpolation, I can barely use, understand, and implement regular interpolation.
Last edited on
Two things to speed up:
(1) copy the entire GPS buffer out to main memory before sampling. Every access to the GPS is slower than accesses to your CPU's RAM.
(2) get rid of the FPU divisions. You can accomplish the same using integer arithmetic, and much faster.

I haven't given your code much more thought than that quick glance. Hope this helps.
@Duoas:
(1) By GPS, do you mean the emulated screen buffer or the real device's VRAM? This function just works with data allocated with malloc(in my case zalloc, (malloc which uses memset(x,0,sizeof(emulated or real vram)) when allocated)
(2) I'm averaging the R/G/B values of all pixels for the target device (the PSP). I'm simply using the double values to provide color accuracy when rendering. How would you suggest making this integer arithmetic? (Providing the same accuracy with color combination)
Last edited on
(1)
s/GPS/GPU - sorry, typo
Yes, transferring data to/from the graphics card's memory is costly. Put everything you can on the graphics card and let it do as many operations on the data as possible. I'm pretty sure the Xn cards can do pretty good image rotation for you.

(2)
You are not increasing accuracy by using double over int, just spending a lot more time per operation. The total difference in color accuracy when performing an mean calculation in a 6-bit/channel video display (VGA) will be 1/64th per channel - which will not be noticed by anyone playing the game.
Topic archived. No new replies allowed.