Linear Regression

I need help with my linear regression formula. I think I have most of the code correct but my result is off by a bit. It's reading from a text file. So here are my (x,y) coordinates...

0 0
1 6.88
1.5 15.48
2 27.52
2.5 43
3 61.92
3.27 73.56
3.496 84.08
3.5 84.28
3.524 85.44
3.732 95.82
4 110.1
4.5 139.3
5 172

Can someone please explain what I'm doing wrong? Thanks in advance!!

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
  //regressiondata.txt

using namespace std;

#include <iostream>
#include <fstream>
#include <cmath>

//global varibles
ifstream Infile;
int i, n;
float x[15],y[15], sumXY, sumX2; 
float m, b, A, B, C, D;

//functional prototypes
float sumX();
float sumY();
float slope();

int main()
{
Infile.open("regressionData.txt");
while(!Infile.eof())
{
Infile>>x[i]; Infile>>y[i];
cout<<x[i]<<"  "<<y[i]<<endl<<endl;
i++;
}
Infile.close();
n=i-1;

cout<<"The number of data points is "<<n<<endl;

cout<<"The value of sumX is "<<sumX()<<endl;

cout<<"The value of sumY is "<<sumY()<<endl;

sumXY = sumX() * sumY();
cout<<"The value of sumXY is "<<sumXY<<endl;

sumX2 = sumX2 + sumX() * sumX();
cout<<"X2 "<<sumX2<<endl;

cout<<"The linear regression equation is "<<"y="<<slope()<<"x-"; cout<<b<<endl;
   
return 0;
}

float sumX()
{
  int i;  
  float sumX=0;
  
  for(i=0; i<=n; i++)
       sumX = sumX + x[i];
 
	return sumX;
}

float sumY()
{
  int i;  
  float sumY=0;
  
  for(i=0; i<=n; i++)
       sumY = sumY + y[i];
 
 	return sumY;
}

float slope()
{
	int i; 
	float slope=0;
	
	A=sumX();
	B=sumY();
	C=sumX2;
	D=sumXY;


	for(i=0; i<=n; i++)
	{
		
	A +=x[i];//sumX
        B +=y[i];//sumY          
        C +=x[i]*x[i];
        D +=x[i]*y[i];//sumXY
	}
	
          slope = (n*D-A*B) / (n*C-A*A);
	  b = (B-slope*A) / n;
	
return slope;
	
}
First thing I would correct is the:
for(i=0; i<=n; i++)
in lines 54 and 65. I'm pretty sure this should be:
for(i=0; i<n; i++)
This is probably not the mathematical problem, as you are probably adding an extra 0.0.
Note that this error is masked by your arrays being 15 long, and your data being 14 long.

There is another "future problem" / style issue in lines 10 to 13:
See http://www.cplusplus.com/forum/general/33612/#msg180771
which describes declaring variables as close as possible to where they are used.

Note that i is used "uninitialized" to zero from lines 25 on-wards. However this is not a problem as these variable are initialized to zero on start up.

Your main problem is that you don't calculate SumX2 and SumXY correctly. They should be done term-by-term. So presumably in a loop like SumX and SumY.

Also, as a computational issue, you have called sumX() and sumY() a number of times. Do these once each, and assign result to a variable, and use that variable thereafter. eg float SumX = sumX();
Last edited on
Topic archived. No new replies allowed.