Model Building

Y

X1

X2

X3

X4

27

20

50

75

15

23

27

55

60

20

18

22

62

68

16

26

27

55

60

20

23

24

75

72

8

27

30

62

73

18

30

32

79

71

11

23

24

75

72

8

22

22

62

68

16

24

27

55

60

20

16

40

90

78

32

28

32

79

71

11

31

50

84

72

12

22

40

90

78

32

24

20

50

75

15

31

50

84

72

12

29

30

62

73

18

22

27

55

60

20

 

Regression

 

Notes

Output Created

30-Nov-2020 14:44:26

Comments

 

Input

Active Dataset

DataSet0

Filter

<none>

Weight

<none>

Split File

<none>

N of Rows in Working Data File

18

Missing Value Handling

Definition of Missing

User-defined missing values are treated as missing.

Cases Used

Statistics are based on cases with no missing values for any variable used.

Syntax

REGRESSION

  /MISSING LISTWISE

  /STATISTICS COEFF OUTS R ANOVA

  /CRITERIA=PIN(.05) POUT(.06)

  /NOORIGIN

  /DEPENDENT y

  /METHOD=STEPWISE x1 x2 x3 x4.

 

Resources

Processor Time

00:00:00.125

Elapsed Time

00:00:00.108

Memory Required

2524 bytes

Additional Memory Required for Residual Plots

0 bytes

 

[DataSet0] 

 

Variables Entered/Removeda

Model

Variables Entered

Variables Removed

Method

1

x4

.

Stepwise (Criteria: Probability-of-F-to-enter <= .050, Probability-of-F-to-remove >= .060).

2

x1

.

Stepwise (Criteria: Probability-of-F-to-enter <= .050, Probability-of-F-to-remove >= .060).

3

x2

.

Stepwise (Criteria: Probability-of-F-to-enter <= .050, Probability-of-F-to-remove >= .060).

a. Dependent Variable: y

 

 

Model Summary

Model

R

R Square

Adjusted R Square

Std. Error of the Estimate

1

.522a

.273

.227

3.699

2

.711b

.506

.440

3.150

3

.811c

.657

.584

2.716

a. Predictors: (Constant), x4

 

b. Predictors: (Constant), x4, x1

 

c. Predictors: (Constant), x4, x1, x2

 

 

ANOVAd

Model

Sum of Squares

df

Mean Square

F

Sig.

1

Regression

82.164

1

82.164

6.004

.026a

Residual

218.947

16

13.684

 

 

Total

301.111

17

 

 

 

2

Regression

152.289

2

76.144

7.675

.005b

Residual

148.822

15

9.921

 

 

Total

301.111

17

 

 

 

3

Regression

197.856

3

65.952

8.942

.001c

Residual

103.255

14

7.375

 

 

Total

301.111

17

 

 

 

a. Predictors: (Constant), x4

 

 

 

 

b. Predictors: (Constant), x4, x1

 

 

 

 

c. Predictors: (Constant), x4, x1, x2

 

 

 

d. Dependent Variable: y

 

 

 

 

 

 

 

Coefficientsa

Model

Unstandardized Coefficients

Standardized Coefficients

t

Sig.

B

Std. Error

Beta

1

(Constant)

30.253

2.399

 

12.613

.000

x4

-.324

.132

-.522

-2.450

.026

2

(Constant)

24.456

2.988

 

8.186

.000

x4

-.383

.115

-.617

-3.336

.005

x1

.225

.084

.492

2.659

.018

3

(Constant)

30.736

3.608

 

8.519

.000

x4

-.401

.099

-.647

-4.042

.001

x1

.435

.112

.951

3.896

.002

x2

-.181

.073

-.598

-2.486

.026

a. Dependent Variable: y

 

 

 

 

 

 

Excluded Variablesd

Model

Beta In

t

Sig.

Partial Correlation

Collinearity Statistics

Tolerance

1

x1

.492a

2.659

.018

.566

.963

x2

.112a

.510

.618

.131

.990

x3

.079a

.360

.724

.093

.996

2

x2

-.598b

-2.486

.026

-.553

.423

x3

-.062b

-.318

.755

-.085

.917

3

x3

.224c

1.166

.265

.308

.648

a. Predictors in the Model: (Constant), x4

 

 

b. Predictors in the Model: (Constant), x4, x1

 

 

c. Predictors in the Model: (Constant), x4, x1, x2

 

d. Dependent Variable: y

 

 

 

 

Developing multiple linear regression­­­­­­­­­­

Table 1 – Correlation matrix between the response variable and explanatory variables

 

 

Y

X1

X2

X3

X4

Y

Pearson Correlation

1

.373

.059

.048

-.522*

Sig. (2-tailed)

 

.127

.815

.852

.026

X1

Pearson Correlation

.373

1

.758**

.288

.192

Sig. (2-tailed)

.127

 

.000

.247

.444

X2

Pearson Correlation

.059

.758**

1

.555*

.099

Sig. (2-tailed)

.815

.000

 

.017

.697

X3

Pearson Correlation

.048

.288

.555*

1

.060

Sig. (2-tailed)

.852

.247

.017

 

.813

X4

Pearson Correlation

-.522*

.192

.099

.060

1

Sig. (2-tailed)

.026

.444

.697

.813

 

*. Correlation is significant at the 0.05 level (2-tailed).

**. Correlation is significant at the 0.01 level (2-tailed).


RESULTS IN TABLE INDICATE THE FOLLOWINGS.

·       OF THE FOUR EXPLANATORY VARAIBLES ONLY THE PAIRS OF X1 & X2 (r=.758, p =0.0) and  x2 & x3 (r=.555, p=.017) are significantly correlated.

·       THE RESPONSE VARIABLE IS SIGNIFICANTLY CORELATED ONLY WITH X4 (r= -.522, p=.026)

 

Table 2 – Useful statistical indicators of observed variables

 

 

Minimum

Maximum

 

Mean

Std. Error

Y

16

31

24.78

.992

X1

20

50

30.22

2.172

X2

50

90

68.00

3.278

X3

60

78

69.89

1.425

X4

8

32

16.89

1.598

 

·       The response variable varies between 16(min) and 31(max) with a mean of 24.8 and SE of mean of .992, confident interval(at 95% confident) = 24.8 – 2*.992, 24.8+2*.992

·       X1 varies between 20 (min) and 50 (max) with a mean of 30.2 and SE of mean 2.2

·       X2 varies between 50 (min) and 90 (max) with a mean of 68.00 and SE of mean 3.3

·       X3 varies between 60 (min) and 78 (max) with a mean of 69.9 and SE of mean 1.4

·       X4 varies between 8 (min) and 32 (max) with a mean of 16.9 and SE of mean 1.6

 

Model Building

(normally we start with the highest correlation variable with response)

As X4 has the highest correlation with Y the first model was developed with X4. The ANOVA table is shown in below.

 Table 3

ANOVA table for the linear model with X4

Model

Sum of Squares

df

Mean Square

F

Sig.

1

Regression

82.164

1

82.164

6.004

.026

Residual

218.947

16

13.684

 

 

Total

301.111

17

 

 

 

(R2 – 27.3%)

The model explains only 27% of the observed variability. That is about 73% has not been explained by the linear model with X4

(then we select the next highest correlation variable with response)

Now we include X1 into the model and fitted a linear model. The ANOVA table is shown in Table 4 and the properties of estimators are shown in Table 5

Table 4

ANOVA for the linear model of Y with X4 & X1

Model

Sum of Squares

df

Mean Square

F

Sig.

1

Regression

152.289

2

76.144

7.675

.005

Residual

148.822

15

9.921

 

 

Total

301.111

17

 

 

 

(R2 = 50.6%)

 

 

Table 5

Properties of the estimator of fitted model

Model

Unstandardized Coefficients

Standardized Coefficients

t

Sig.

B

Std. Error

Beta

1

(Constant)

24.456

2.988

 

8.186

.000

X4

-.383

.115

-.617

-3.336

.005

X1

.225

.084

.492

2.659

.018

RESULTS IN TABLE 5 INDICATE THAT BOTH PARAMETERS ARE SIGNIFICANT;

 

 

SS (X4) = 82

SS (X4, X1) = 152

SEQ. SS OF X1 WHEN THE MODEL HAS X1 = 152-82=70

HO: ADDING X1 IN TO THE MODEL HAVING X4 IS NOT SIGNIFICANT.

 TEST STATISTIC: F = 70/MSE OF THE FULL MODEL

                                      = 70/9.92 = 7.06 ~ F (1, 15)

THIS IS SIG.

IT CAN BE CONCLUDED WITH 95% CONFIDENT THAT THE INCLUSION OF X1 IN TO MODEL HAVING X4 IS SIGNIFICANT.

 

 

NOW WE INCLUDE X2

 

THE ANOVA WITH X4, X1, X2 ARE SHOWN IN TABLE 6 AND THE PROPERTIES OF THE PARAMETS ARE SHOWN IN TABLE 7.

Table 6

 ANOVA FOR THE LINEAR MODEL OF Y WITH X4,X1,X2

Model

Sum of Squares

df

Mean Square

F

Sig.

1

Regression

197.856

3

65.952

8.942

.001

Residual

103.255

14

7.375

 

 

Total

301.111

17

 

 

 

(R2 = 65.7%)

 

 

Table 7

 

-Coefficients

Model

Unstandardized Coefficients

Standardized Coefficients

t

Sig.

B

Std. Error

Beta

1

(Constant)

30.736

3.608

 

8.519

.000

X4

-.401

.099

-.647

-4.042

.001

X1

.435

.112

.951

3.896

.002

X2

-.181

.073

-.598

-2.486

.026

 

MODEL IS SIGNIFICANT AND ALSO 3 PARAMETERS ARE SIGNIFICANT.

 

HO: THE INCLUSION OF X2 IN TO THE MODEL HAVING X4 AND X1 IS NOT SIGNIFICANT.

 

PARTTAIL F TEST - 

TEST STAT = (198-152)/7.37 ~ F (1, 14)   THIS ONE ALSO SIGNIFICANT.

 

WHEN WE INCLUDE X3

Table 8

ANOVA

Model

Sum of Squares

df

Mean Square

F

Sig.

1

Regression

207.625

4

51.906

7.218

.003

Residual

93.486

13

7.191

 

 

Total

301.111

17

 

 

 

(R2 = 69%)

 

MODEL IS SIGNIFICANT.

 

 

Table 9

 PROPERTIES OF THE ESTIMATOR OF THE MODEL

Model

Unstandardized Coefficients

Standardized Coefficients

t

Sig.

B

Std. Error

Beta

1

(Constant)

22.625

7.819

 

2.894

.013

X4

-.407

.098

-.656

-4.150

.001

X1

.468

.114

1.024

4.112

.001

X2

-.235

.086

-.777

-2.747

.017

X3

.156

.134

.224

1.166

.265

 PARTIAL F TEST:

 

HO: INCLUSION OF XIN TO THE MODEL HAVING X4, X1 AND X 2 IS NOT SIGNIFICANT.

 

TEST STAT = (208-198)/7.2 = 10/7.2 < 2         F (1, 13)

NOT SIGNIFICANT

HIS ACCEPTED

THE MODEL WITH 4 PARAMETERS IS NOT ACCEPTED.(even model is significant)

 

THE BEST MODEL IS Y WITH X4, X1, X2.

 

THE ORDER OF INLCUSION OF VARAIBLES IS IMMETERARIAL.

Model

Unstandardized Coefficients

Standardized Coefficients

t

Sig.

B

Std. Error

Beta

1

(Constant)

30.736

3.608

 

8.519

.000

X4

-.401

.099

-.647

-4.042

.001

X1

.435

.112

.951

3.896

.002

X2

-.181

.073

-.598

-2.486

.026

 

(When we fit the model by changing the variable order, it does not differ with our best model.)

 

 

 

 

Coefficientsa

Model

Unstandardized Coefficients

Standardized Coefficients

t

Sig.

B

Std. Error

Beta

1

(Constant)

30.736

3.608

 

8.519

.000

X1

.435

.112

.951

3.896

.002

X2

-.181

.073

-.598

-2.486

.026

X4

-.401

.099

-.647

-4.042

.001

 

 

Coefficientsa

Model

Unstandardized Coefficients

Standardized Coefficients

t

Sig.

B

Std. Error

Beta

1

(Constant)

30.736

3.608

 

8.519

.000

X2

-.181

.073

-.598

-2.486

.026

X4

-.401

.099

-.647

-4.042

.001

X1

.435

.112

.951

3.896

.002

 

Model

Unstandardized Coefficients

Standardized Coefficients

t

Sig.

B

Std. Error

Beta

1

(Constant)

22.625

7.819

 

2.894

.013

X4

-.407

.098

-.656

-4.150

.001

X1

.468

.114

1.024

4.112

.001

X2

-.235

.086

-.777

-2.747

.017

X3

.156

.134

.224

1.166

.265

Coefficientsa

Model

Unstandardized Coefficients

Standardized Coefficients

t

Sig.

B

Std. Error

Beta

1

(Constant)

22.625

7.819

 

2.894

.013

X2

-.235

.086

-.777

-2.747

.017

X1

.468

.114

1.024

4.112

.001

X3

.156

.134

.224

1.166

.265

X4

-.407

.098

-.656

-4.150

.001

 
THE FINAL (BEST FITTED MODEL) IS,

Model

Unstandardized Coefficients

Standardized Coefficients

t

Sig.

B

Std. Error

Beta

1

(Constant)

30.736

3.608

 

8.519

.000

X4

-.401

.099

-.647

-4.042

.001

X1

.435

.112

.951

3.896

.002

X2

-.181

.073

-.598

-2.486

.026

 

Interpretation of the model

 Y = 30.736 + 0.435(X1) – 0.181(X2) – 0.401(X4)

. apart from that this point 3 means,

·       A unit increase of X1 helps to increase Y by 0.435 when X& Xare fixed.

·       A unit increase of Xhelps to decrease Y by 0.401 unit when X1 & X2 are fixed.

·       A unit increase of Xhelps to decrease Y by 0.181 unit when X1 & X4 are fixed.

(These are based on Unstandardized Coefficients)

 

X4

-.647

 X1

.951

X2

-.598

We can compare the impact of these variables. Comparisons of the Standardized Coefficients confirm that of the 3 significant variables.X1 is more conferential on Y than X4 & X2.

(These are based on Standardized Coefficients)

Model

R

R Square

Adjusted R Square

Std. Error of the Estimate

1

.811a

.657

.584

2.716

 

 

 

 

 

 

R = 65.7%

The model explain 65.7% of the observed variability explained.

Adj R2 = 58.4%

In a good model R2 should be close to  Adj R2

Adj R2 does not imply the % of the observed variability explained.

(In the multiple regressions we used both)

Model diagnostics

·       Random

·       Constant variance

·       Normality

·       Mean zero

Model Summary

Model

R

R Square

Adjusted R Square

Std. Error of the Estimate

Durbin-Watson

1

.811a

.657

.584

2.716

2.513

 

 

 

 

 

 

Coefficients

Model

Unstandardized Coefficients

Standardized Coefficients

t

Sig.

Collinearity Statistics

 

B

Std. Error

Beta

 

 

Tolerance

VIF

 

1

(Constant)

30.736

3.608

 

8.519

.000

 

 

 

X4

-.401

.099

-.647

-4.042

.001

.958

1.044

 

X1

.435

.112

.951

3.896

.002

.411

2.434

 

X2

-.181

.073

-.598

-2.486

.026

.423

2.367

 

 

VIF – VARAINCE INFLATION FACTOR

TO TEST THE IMPACT OF MULTI COLLINERALTY (SIG. CORRELATION AMONG EXPLANTORY VARAIBLES)

NOTE: VIF < 10 TO IGNORE THE IMPACT OF MULTI COLLINERALRTY FOR THE FITTED MODELS.

CONCLUSIONS

DW IS CLOSE TO 2 (outliers) THUS EROORS ARE RANDOM.

VIF FOR ALL 3 VARABLE IN THE MODEL IS LESS THAN 10.THUS NO SIGNIFICANT FROM VARIABLES TO THE MODEL.

 

 

PLOT OF PREDICTED VS RESIDUALS LOOKS A RANDOM NATURE. IT CONFIRM ERROR HAVE CONSTANT VARAINCE.

 

Tests of Normality

 

Kolmogorov-Smirnova

Shapiro-Wilk

Statistic

df

Sig.

Statistic

df

Sig.

Unstandardized Residual

.168

18

.193

.949

18

.405

 

S.W. TEST STATISTICS IS NOT SIGNIFICANT AS THE CORRESPONDING P VALUE IS GRATER THANT 5%.

IT CONFIRMS THAT DISTRIBUTIOIN OF ERRORS ARE NOT SIGNIFICANTLY DEVIATE FROM THE NORMA DISTRIBUTUIOIN.

95% CONFIDENT INTAVAL OF ERROR MEAN CONATINS ZERO.

IT CONFIRMS MEAN OF THE ERRORS IS NOT SIGNIFICANTLY DEVIATE FROM ZERO.

SINCE ALL FOURS CONDITIONS SATISFIED ERRPRS OF THE FITTED MODEL WE CAN THAT ERROR WHITE NOISE.

THUS FITTED MODEL CAN BE ACCPTED.

(TO CONLCUCDE ABOUT FORECASTING BETTER TO CHECK % ERROR)

Devoloping nonlinear model.

 

x

y

1

1989

3

2

1990

4.2

3

1991

5

4

1992

10

5

1993

14

6

1994

28

7

1995

30

8

1996

45

9

1997

58

10

1998

60.1

11

1999

84.3

12

2000

87

 



 

Not linear. Looks likes exponential. (We can assume y = a * eb (t) )

ln (y) =ln (a) + bt

so we have to create ln(y) values,



Then it became as a linear model.

 

·      Now we compare both ANOVA tables.

 

1st case   y = a + b(t)

ANOVA  for y = a + b(t)

Model

Sum of Squares

df

Mean Square

F

Sig.

1

Regression

9783.328

1

9783.328

158.653

.000

Residual

616.649

10

61.665

 

 

Total

10399.977

11

 

 

 

 

 

Model Summary

Model

R

R Square

Adjusted R Square

Std. Error of the Estimate

Durbin-Watson

1

.970a

.941

.935

7.85270

.969

 

In this R2 is very high, but the DW is very low that means the errors are not random.

2nd case   ln (y) =ln (a) + b(t)

ANOVA  for   ln (y) =ln (a) + b(t)

Model

Sum of Squares

df

Mean Square

F

Sig.

1

Regression

15.382

1

15.382

237.466

.000

Residual

.648

10

.065

 

 

Total

16.030

11

 

 

 

 

 

Model Summary

Model

R

R Square

Adjusted R Square

Std. Error of the Estimate

Durbin-Watson

1

.980a

.960

.956

.25451

.838

 

·      In this case also we can see the low DW. Though R2 is high.

·      Note: data is time series data. Most time series data are not independent. In regression we assume that Y is independent.

·      In time y’s are dependent.

There is a dependent structure. That is, {y1, y2, , , yt}  yt depends on yt-

1. This is known as autocorrelation.

 

 

Coefficients

Model

Unstandardized Coefficients

Standardized Coefficients

t

Sig.

B

Std. Error

Beta

1

(Constant)

.925

.157

 

5.905

.000

t

.328

.021

.980

15.410

.000

 

ln y = 0.925 +0.328*(t)

Y =a*e(bt)                ln y = ln(a) + b(t)

ln(a) = 0.925      b = 0.328

a = exp(0.925)=2.521

Y = 2.521* e (0.328*t)

 

In these two type don’t compare R2 values. those are 2 models.

 

EXAMPLES

4.3.1 Example 1

Price

Sales (observed)

70

37

65

70

60

110

55

250

50

288

45

460

40

742

35

1220

30

1800

25

3340

20

5200

 

*     We plot these data.

 




According to this we can assume the data like exponential model (sales =a*e (b*price) ).

Now it transform into , ln(sales)=ln(a)+b*price

Then we can fit linear model (ln(sales) vs price)

 

 

Outputs,

 

Model Summary

Model

R

R Square

Adjusted R Square

Std. Error of the Estimate

Durbin-Watson

1

.997a

.994

.994

.12481

2.040

 

ANOVA

Model

Sum of Squares

df

Mean Square

F

Sig.

1

Regression

25.239

1

25.239

1.620E3

.000a

Residual

.140

9

.016

 

 

Total

25.379

10

 

 

 

 

Coefficients

Model

Unstandardized Coefficients

Standardized Coefficients

t

Sig.

B

Std. Error

Beta

1

(Constant)

10.471

.114

 

92.235

.000

Price(x)

-.096

.002

-.997

-40.251

.000

 

Based on the results the fitted model is:

ln(sales) = 10.471-0.096*Price

The transpose of this into the exponential model, y = sales = 35277.5 * e(−0.096*Price)

 

 

4.3.2 Example 2

This example plot KMPL vs HP. The scatter plot become,  



According to this pattern we can’t fit the exponential model. So for a linear model we have to plotted KMPL vs 1/HP (KMPL = a +b*1/HP)

Then it became,


 


Model Summary

Model

R

R Square

Adjusted R Square

Std. Error of the Estimate

Durbin-Watson

1

.895a

.800

.799

2.931

1.345

 

 

ANOVA

Model

Sum of Squares

df

Mean Square

F

Sig.

1

Regression

4986.987

1

4986.987

580.478

.000a

Residual

1245.721

145

8.591

 

 

Total

6232.707

146

 

 

 

 

 

Coefficients

Model

Unstandardized Coefficients

Standardized Coefficients

t

Sig.

B

Std. Error

Beta

1

(Constant)

13.631

.649

 

20.994

.000

1/hp

2692.467

111.753

.895

24.093

.000

 

So we can fit the model as,       KMPL=13.631+2692.467*(1 / HP)

If we fitted for the raw data we could found,

KMPL = 38.73 - 0.048*HP 

[R2 = 59%, SE of the estimate = 4.175, DW = 1.4]

 

 

4.3.3 Example 3

Country

Imports (IMP)

GDP

Country

Imports (IMP)

GDP

1

20.3

391

13

30.8

122

2

68

528

14

3.1

9.8

3

1.5

21.4

15

292.1

3550

4

57.7

1340

16

0.17

3.6

5

229

923

17

76.9

200

6

4.8

25.9

18

2

12.9

7

47.9

155.5

19

201.1

434

8

164

258

20

13.7

105.9

9

31.8

136.2

21

6.7

16.9

10

303.7

1540

22

0.9

0.62

11

31.4

201.1

 

12

0.98

12

 

 

 

 

 





Scatter Plot of IMP vs GDP



Based on this scatter plot we fit a linear model. Therefore we consider power model for this data.

(After fitted other types we can consider this one)

So,

       Y = a * x (b)      by transforming,   log(y) = log(a) + b* log(x)

                                                                                                                (In econometric studies and it is known as log - log model.)

By taking log values for both x & y we can plot a scatter plot. And we can fit a linear model for this.



 

Model Summary

Model

R

R Square

Adjusted R Square

Std. Error of the Estimate

Durbin-Watson

1

.915a

.836

.828

.38503

1.647

 

 

ANOVA

Model

Sum of Squares

df

Mean Square

F

Sig.

1

Regression

15.166

1

15.166

102.302

.000a

Residual

2.965

20

.148

 

 

Total

18.131

21

 

 

 

 

 

Coefficients

Model

Unstandardized Coefficients

Standardized Coefficients

t

Sig.

B

Std. Error

Beta

1

(Constant)

-.537

.195

 

-2.751

.012

Log(GDP)

.900

.089

.915

10.114

.000

 

So we can have the log model,

                                                Log(IMP) = -0.537 +0.9*Log(GDP)

 

By transposing this we can have the model for forecasting,

                                                                                   IMP = 0.2905*GDP(0.9)