### Optimizing bilinear interpolation?

I've made a function to do some bilinear interpolation on some rendered VGA memory (just an array GPU.emu_screenbuffer[y][x], each item is a pixel to be shown).

 ``1234567891011121314151617181920212223242526272829`` ``````//Divide! #define DIVIDED(v,n) (byte)((double)v/(double)n) #define AVGRGB(r,g,b,n) RGB(DIVIDED(r,n),DIVIDED(g,n),DIVIDED(b,n)) inline uint_32 fuse_pixelarea(word x1, word y1, word x2, word y2) //Fuse an area together into 1 pixel! { uint_32 numpixels = (((x2-x1)+1)*((y2-y1)+1)); //Ammount of pixels! uint_32 r = 0, g = 0, b = 0; word y; //Current coordinates! for (y=y1;y<=y2;++y) { uint_32 *pixel = &GPU.emu_screenbuffer[y][x1]; //Start X to begin with! uint_32 *lastpixel = &GPU.emu_screenbuffer[y][x2]; //End X! for (;;) //Loop for each pixel! { uint_32 thepixel = *pixel; //Get the pixel! r += (thepixel&0xFF); //Red! thepixel >>= 8; g += (thepixel&0xFF); //Green! thepixel >>= 8; b += (thepixel&0xFF); //Blue! if (pixel==lastpixel) break; //Next row when needed! ++pixel; } } return AVGRGB(r,g,b,numpixels); //RGB value of all pixels combined! }``````

Anybody knows why this is being so slow? It takes about 5 seconds for the PSP to render 1 VGA frame this way (the CPU running at 333MHz). (This should at least be 0.125 seconds for full VGA speed, preferably lower).
