Thanks for reply.
May be what you told about 1,2 and 3 is correct.But if i replace 1 with
*(uchar_ptr + *int_ptr1) (no assignment) then it will behave same as 3.
If pointers are doing extra work then it should take more time than 3.
In case of 4 and 5 both will read same amount of memory(sizeof(int)).
Below i gave my working code
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67

int ht = 504,wt = 474,H = 60,W = 56;
int i,j,m,n,k,sum;
unsigned char *i_ptr = NULL,*g_ptr_u = NULL;
int *g_ptr = NULL,*r_ptr =NULL;
for(k = 0; k < 3360; k++)
{
i_ptr = iData+boffset[k]; // uchar iData[wt X ht] , int boffset[3360]
for(i = 0; i < 16; i++)
{
r_ptr = rs; // inr rs[24]
g_ptr = grids[i]; /int grids[16][24*24]
//g_ptr_u = grids_u[i];
for(m = 0; m < 24; m++)
{
//rs[i] = 0;
sum = 0;
g_ptr += 8;
//g_ptr_u += 8;
for(n = 0; n < 8;n++)
{
//sum += *(i_ptr+ n); // st 1
sum += *(i_ptr+ *g_ptr); // st 2
//sum += *(i_ptr+ *g_ptr_u); st 3
g_ptr++;
//g_ptr_u ++;
}
g_ptr += 8;
//g_ptr_u += 8;
//*r_ptr++ = n; //st 4
*r_ptr++ = sum; //st 5
}
}
}

This is my code and i want to optimize it.
I test this code in embedded system(ARM 9 processor, linux 2.6.30).
The execution time :
e1 : sum = uchar_ptr + int st 1
e2 : sum = uchar_ptr + int_ptr st 2
e3 : sum = uchar_ptr + uchar_ptr st 3
e4 : st 4
e5: st 5
e1 and e4 36 millisecond e1 and e5 183 millisecond
e2 and e4 159 millisecond e2 and e5 1878 millisecond
e3 and e4 34 millisecond e3 and e5 296 millisecond
From here you can sea the effect of st4 and st5.
1 will behave as st 2.
2 will behave as st 3.
3 will behave as st 3.
My actual case is e2 and e5 and i want to reduce the time.
Please help me to optimize it.
thanks