Models Buildings

Model Building

Y	X1	X2	X3	X4
27	20	50	75	15
23	27	55	60	20
18	22	62	68	16
26	27	55	60	20
23	24	75	72	8
27	30	62	73	18
30	32	79	71	11
23	24	75	72	8
22	22	62	68	16
24	27	55	60	20
16	40	90	78	32
28	32	79	71	11
31	50	84	72	12
22	40	90	78	32
24	20	50	75	15
31	50	84	72	12
29	30	62	73	18
22	27	55	60	20

Regression

Notes
Output Created		30-Nov-2020 14:44:26
Comments
Input	Active Dataset	DataSet0
	Filter	<none>
	Weight	<none>
	Split File	<none>
	N of Rows in Working Data File	18
Missing Value Handling	Definition of Missing	User-defined missing values are treated as missing.
Missing Value Handling	Cases Used	Statistics are based on cases with no missing values for any variable used.
Syntax		REGRESSION /MISSING LISTWISE /STATISTICS COEFF OUTS R ANOVA /CRITERIA=PIN(.05) POUT(.06) /NOORIGIN /DEPENDENT y /METHOD=STEPWISE x1 x2 x3 x4.
Resources	Processor Time	00:00:00.125
	Elapsed Time	00:00:00.108
	Memory Required	2524 bytes
	Additional Memory Required for Residual Plots	0 bytes

[DataSet0]

Variables Entered/Removed^a
Model	Variables Entered	Variables Removed	Method
1	x4	.	Stepwise (Criteria: Probability-of-F-to-enter <= .050, Probability-of-F-to-remove >= .060).
2	x1	.	Stepwise (Criteria: Probability-of-F-to-enter <= .050, Probability-of-F-to-remove >= .060).
3	x2	.	Stepwise (Criteria: Probability-of-F-to-enter <= .050, Probability-of-F-to-remove >= .060).
a. Dependent Variable: y

Model Summary
Model	R	R Square	Adjusted R Square	Std. Error of the Estimate
1	.522^a	.273	.227	3.699
2	.711^b	.506	.440	3.150
3	.811^c	.657	.584	2.716
a. Predictors: (Constant), x4
b. Predictors: (Constant), x4, x1
c. Predictors: (Constant), x4, x1, x2

ANOVA^d
Model		Sum of Squares	df	Mean Square	F	Sig.
1	Regression	82.164	1	82.164	6.004	.026^a
	Residual	218.947	16	13.684
	Total	301.111	17
2	Regression	152.289	2	76.144	7.675	.005^b
	Residual	148.822	15	9.921
	Total	301.111	17
3	Regression	197.856	3	65.952	8.942	.001^c
	Residual	103.255	14	7.375
	Total	301.111	17
a. Predictors: (Constant), x4
b. Predictors: (Constant), x4, x1
c. Predictors: (Constant), x4, x1, x2
d. Dependent Variable: y

Coefficients^a
Model		Unstandardized Coefficients		Standardized Coefficients	t	Sig.
Model		B	Std. Error	Beta	t	Sig.
1	(Constant)	30.253	2.399		12.613	.000
1	x4	-.324	.132	-.522	-2.450	.026
2	(Constant)	24.456	2.988		8.186	.000
	x4	-.383	.115	-.617	-3.336	.005
	x1	.225	.084	.492	2.659	.018
3	(Constant)	30.736	3.608		8.519	.000
	x4	-.401	.099	-.647	-4.042	.001
	x1	.435	.112	.951	3.896	.002
	x2	-.181	.073	-.598	-2.486	.026
a. Dependent Variable: y

Excluded Variables^d
Model		Beta In	t	Sig.	Partial Correlation	Collinearity Statistics
Model		Beta In	t	Sig.	Partial Correlation	Tolerance
1	x1	.492^a	2.659	.018	.566	.963
	x2	.112^a	.510	.618	.131	.990
	x3	.079^a	.360	.724	.093	.996
2	x2	-.598^b	-2.486	.026	-.553	.423
2	x3	-.062^b	-.318	.755	-.085	.917
3	x3	.224^c	1.166	.265	.308	.648
a. Predictors in the Model: (Constant), x4
b. Predictors in the Model: (Constant), x4, x1
c. Predictors in the Model: (Constant), x4, x1, x2
d. Dependent Variable: y

Developing multiple linear regression

Table 1 – Correlation matrix between the response variable and explanatory variables

		Y	X1	X2	X3	X4
Y	Pearson Correlation	1	.373	.059	.048	-.522^*
Y	Sig. (2-tailed)		.127	.815	.852	.026
X1	Pearson Correlation	.373	1	.758^**	.288	.192
X1	Sig. (2-tailed)	.127		.000	.247	.444
X2	Pearson Correlation	.059	.758^**	1	.555^*	.099
X2	Sig. (2-tailed)	.815	.000		.017	.697
X3	Pearson Correlation	.048	.288	.555^*	1	.060
X3	Sig. (2-tailed)	.852	.247	.017		.813
X4	Pearson Correlation	-.522^*	.192	.099	.060	1
X4	Sig. (2-tailed)	.026	.444	.697	.813
*. Correlation is significant at the 0.05 level (2-tailed).
**. Correlation is significant at the 0.01 level (2-tailed).

RESULTS IN TABLE INDICATE THE FOLLOWINGS.

· OF THE FOUR EXPLANATORY VARAIBLES ONLY THE PAIRS OF X1 & X2 (r=.758, p =0.0) and x2 & x3 (r=.555, p=.017) are significantly correlated.

· THE RESPONSE VARIABLE IS SIGNIFICANTLY CORELATED ONLY WITH X4 (r= -.522, p=.026)

Table 2 – Useful statistical indicators of observed variables

	Minimum	Maximum
	Minimum	Maximum	Mean	Std. Error
Y	16	31	24.78	.992
X1	20	50	30.22	2.172
X2	50	90	68.00	3.278
X3	60	78	69.89	1.425
X4	8	32	16.89	1.598

· The response variable varies between 16(min) and 31(max) with a mean of 24.8 and SE of mean of .992, confident interval(at 95% confident) = 24.8 – 2*.992, 24.8+2*.992

· X1 varies between 20 (min) and 50 (max) with a mean of 30.2 and SE of mean 2.2

· X2 varies between 50 (min) and 90 (max) with a mean of 68.00 and SE of mean 3.3

· X3 varies between 60 (min) and 78 (max) with a mean of 69.9 and SE of mean 1.4

· X4 varies between 8 (min) and 32 (max) with a mean of 16.9 and SE of mean 1.6

Model Building

(normally we start with the highest correlation variable with response)

As X4 has the highest correlation with Y the first model was developed with X4. The ANOVA table is shown in below.

Table 3

ANOVAtable for the linear model with X4
Model		Sum of Squares	df	Mean Square	F	Sig.
1	Regression	82.164	1	82.164	6.004	.026
	Residual	218.947	16	13.684
	Total	301.111	17

(R² – 27.3%)

The model explains only 27% of the observed variability. That is about 73% has not been explained by the linear model with X4

(then we select the next highest correlation variable with response)

Now we include X1 into the model and fitted a linear model. The ANOVA table is shown in Table 4 and the properties of estimators are shown in Table 5

Table 4

ANOVAfor the linear model of Y with X4 & X1
Model		Sum of Squares	df	Mean Square	F	Sig.
1	Regression	152.289	2	76.144	7.675	.005
	Residual	148.822	15	9.921
	Total	301.111	17
(R² = 50.6%)

Table 5

Properties of the estimator of fitted model
Model		Unstandardized Coefficients		Standardized Coefficients	t	Sig.
Model		B	Std. Error	Beta	t	Sig.
1	(Constant)	24.456	2.988		8.186	.000
	X4	-.383	.115	-.617	-3.336	.005
	X1	.225	.084	.492	2.659	.018
RESULTS IN TABLE 5 INDICATE THAT BOTH PARAMETERS ARE SIGNIFICANT;

SS (X₄) = 82

SS (X₄, X₁) = 152

SEQ. SS OF X₁ WHEN THE MODEL HAS X₁ = 152-82=70

H_O: ADDING X₁ IN TO THE MODEL HAVING X₄ IS NOT SIGNIFICANT.

TEST STATISTIC: F = 70/MSE OF THE FULL MODEL

= 70/9.92 = 7.06 ~ F (1, 15)

THIS IS SIG.

IT CAN BE CONCLUDED WITH 95% CONFIDENT THAT THE INCLUSION OF X₁ IN TO MODEL HAVING X₄ IS SIGNIFICANT.

NOW WE INCLUDE X2

THE ANOVA WITH X₄, X₁, X₂ ARE SHOWN IN TABLE 6 AND THE PROPERTIES OF THE PARAMETS ARE SHOWN IN TABLE 7.

Table 6

ANOVA FORTHE LINEAR MODEL OF Y WITH X4,X1,X2
Model		Sum of Squares	df	Mean Square	F	Sig.
1	Regression	197.856	3	65.952	8.942	.001
	Residual	103.255	14	7.375
	Total	301.111	17
(R² = 65.7%)

Table 7

-Coefficients
Model		Unstandardized Coefficients		Standardized Coefficients	t	Sig.
Model		B	Std. Error	Beta	t	Sig.
1	(Constant)	30.736	3.608		8.519	.000
	X4	-.401	.099	-.647	-4.042	.001
	X1	.435	.112	.951	3.896	.002
	X2	-.181	.073	-.598	-2.486	.026

MODEL IS SIGNIFICANT AND ALSO 3 PARAMETERS ARE SIGNIFICANT.

H_O: THE INCLUSION OF X₂ IN TO THE MODEL HAVING X₄ AND X₁ IS NOT SIGNIFICANT.

PARTTAIL F TEST -

TEST STAT = (198-152)/7.37 ~ F (1, 14) THIS ONE ALSO SIGNIFICANT.

WHEN WE INCLUDE X₃

Table 8

ANOVA
Model		Sum of Squares	df	Mean Square	F	Sig.
1	Regression	207.625	4	51.906	7.218	.003
	Residual	93.486	13	7.191
	Total	301.111	17
(R² = 69%)

MODEL IS SIGNIFICANT.

Table 9

PROPERTIES OF THE ESTIMATOR OF THE MODEL
Model		Unstandardized Coefficients		Standardized Coefficients	t	Sig.
Model		B	Std. Error	Beta	t	Sig.
1	(Constant)	22.625	7.819		2.894	.013
	X4	-.407	.098	-.656	-4.150	.001
	X1	.468	.114	1.024	4.112	.001
	X2	-.235	.086	-.777	-2.747	.017
	X3	.156	.134	.224	1.166	.265

PARTIAL F TEST:

H_O: INCLUSION OF X₃IN TO THE MODEL HAVING X₄, X₁ AND X ₂ IS NOT SIGNIFICANT.

TEST STAT = (208-198)/7.2 = 10/7.2 < 2 ~ F (1, 13)

NOT SIGNIFICANT

H_OIS ACCEPTED

THE MODEL WITH 4 PARAMETERS IS NOT ACCEPTED.(even model is significant)

THE BEST MODEL IS Y WITH X4, X1, X2.

THE ORDER OF INLCUSION OF VARAIBLES IS IMMETERARIAL.

Model		Unstandardized Coefficients		Standardized Coefficients	t	Sig.
Model		B	Std. Error	Beta	t	Sig.
1	(Constant)	30.736	3.608		8.519	.000
	X4	-.401	.099	-.647	-4.042	.001
	X1	.435	.112	.951	3.896	.002
	X2	-.181	.073	-.598	-2.486	.026

(When we fit the model by changing the variable order, it does not differ with our best model.)

Coefficients^a
Model		Unstandardized Coefficients		Standardized Coefficients	t	Sig.
Model		B	Std. Error	Beta	t	Sig.
1	(Constant)	30.736	3.608		8.519	.000
	X1	.435	.112	.951	3.896	.002
	X2	-.181	.073	-.598	-2.486	.026
	X4	-.401	.099	-.647	-4.042	.001

Coefficients^a
Model		Unstandardized Coefficients		Standardized Coefficients	t	Sig.
Model		B	Std. Error	Beta	t	Sig.
1	(Constant)	30.736	3.608		8.519	.000
	X2	-.181	.073	-.598	-2.486	.026
	X4	-.401	.099	-.647	-4.042	.001
	X1	.435	.112	.951	3.896	.002

Model		Unstandardized Coefficients		Standardized Coefficients	t	Sig.
Model		B	Std. Error	Beta	t	Sig.
1	(Constant)	22.625	7.819		2.894	.013
	X4	-.407	.098	-.656	-4.150	.001
	X1	.468	.114	1.024	4.112	.001
	X2	-.235	.086	-.777	-2.747	.017
	X3	.156	.134	.224	1.166	.265

Coefficients^a
Model		Unstandardized Coefficients		Standardized Coefficients	t	Sig.
Model		B	Std. Error	Beta	t	Sig.
1	(Constant)	22.625	7.819		2.894	.013
	X2	-.235	.086	-.777	-2.747	.017
	X1	.468	.114	1.024	4.112	.001
	X3	.156	.134	.224	1.166	.265
	X4	-.407	.098	-.656	-4.150	.001

THE FINAL (BEST FITTED MODEL) IS,

Model		Unstandardized Coefficients		Standardized Coefficients	t	Sig.
Model		B	Std. Error	Beta	t	Sig.
1	(Constant)	30.736	3.608		8.519	.000
	X4	-.401	.099	-.647	-4.042	.001
	X1	.435	.112	.951	3.896	.002
	X2	-.181	.073	-.598	-2.486	.026

Interpretation of the model

Y = 30.736 + 0.435(X₁) – 0.181(X₂) – 0.401(X₄)

. apart from that this point 3 means,

· A unit increase of X₁ helps to increase Y by 0.435 when X₂& X₄are fixed.

· A unit increase of X₄helps to decrease Y by 0.401 unit when X1 & X2 are fixed.

· A unit increase of X₂helps to decrease Y by 0.181 unit when X1 & X4 are fixed.

(These are based on Unstandardized Coefficients)

X4	-.647
X1	.951
X2	-.598

We can compare the impact of these variables. Comparisons of the Standardized Coefficients confirm that of the 3 significant variables.X1 is more conferential on Y than X4 & X2.

(These are based on Standardized Coefficients)

Model	R	R Square	Adjusted R Square	Std. Error of the Estimate
1	.811^a	.657	.584	2.716

R² = 65.7%

The model explain 65.7% of the observed variability explained.

Adj R² = 58.4%

In a good model R² should be close to Adj R²

Adj R² does not imply the % of the observed variability explained.

(In the multiple regressions we used both)

Model diagnostics

· Random

· Constant variance

· Normality

· Mean zero

Model Summary
Model	R	R Square	Adjusted R Square	Std. Error of the Estimate	Durbin-Watson
1	.811^a	.657	.584	2.716	2.513

Coefficients
Model		Unstandardized Coefficients			Standardized Coefficients		t		Sig.		Collinearity Statistics
		B	Std. Error	Beta						Tolerance		VIF
1	(Constant)	30.736	3.608			8.519		.000
	X4	-.401	.099	-.647		-4.042		.001		.958		1.044
	X1	.435	.112	.951		3.896		.002		.411		2.434
	X2	-.181	.073	-.598		-2.486		.026		.423		2.367

VIF – VARAINCE INFLATION FACTOR

TO TEST THE IMPACT OF MULTI COLLINERALTY (SIG. CORRELATION AMONG EXPLANTORY VARAIBLES)

NOTE: VIF < 10 TO IGNORE THE IMPACT OF MULTI COLLINERALRTY FOR THE FITTED MODELS.

CONCLUSIONS

DW IS CLOSE TO 2 (outliers) THUS EROORS ARE RANDOM.

VIF FOR ALL 3 VARABLE IN THE MODEL IS LESS THAN 10.THUS NO SIGNIFICANT FROM VARIABLES TO THE MODEL.

PLOT OF PREDICTED VS RESIDUALS LOOKS A RANDOM NATURE. IT CONFIRM ERROR HAVE CONSTANT VARAINCE.

Tests of Normality
	Kolmogorov-Smirnov^a			Shapiro-Wilk
	Statistic	df	Sig.	Statistic	df	Sig.
Unstandardized Residual	.168	18	.193	.949	18	.405

S.W. TEST STATISTICS IS NOT SIGNIFICANT AS THE CORRESPONDING P VALUE IS GRATER THANT 5%.

IT CONFIRMS THAT DISTRIBUTIOIN OF ERRORS ARE NOT SIGNIFICANTLY DEVIATE FROM THE NORMA DISTRIBUTUIOIN.

95% CONFIDENT INTAVAL OF ERROR MEAN CONATINS ZERO.

IT CONFIRMS MEAN OF THE ERRORS IS NOT SIGNIFICANTLY DEVIATE FROM ZERO.

SINCE ALL FOURS CONDITIONS SATISFIED ERRPRS OF THE FITTED MODEL WE CAN THAT ERROR WHITE NOISE.

THUS FITTED MODEL CAN BE ACCPTED.

(TO CONLCUCDE ABOUT FORECASTING BETTER TO CHECK % ERROR)

Devoloping nonlinear model.

x		y
1	1989	3
2	1990	4.2
3	1991	5
4	1992	10
5	1993	14
6	1994	28
7	1995	30
8	1996	45
9	1997	58
10	1998	60.1
11	1999	84.3
12	2000	87

Not linear. Looks likes exponential. (We can assume y = a * e^{b (t)})

ln (y) =ln (a) + bt

so we have to create ln(y) values,

Then it became as a linear model.

· Now we compare both ANOVA tables.

1^st case y = a + b(t)

ANOVA for y = a + b(t)
Model		Sum of Squares	df	Mean Square	F	Sig.
1	Regression	9783.328	1	9783.328	158.653	.000
	Residual	616.649	10	61.665
	Total	10399.977	11

Model Summary
Model	R	R Square	Adjusted R Square	Std. Error of the Estimate	Durbin-Watson
1	.970^a	.941	.935	7.85270	.969

In this R² is very high, but the DW is very low that means the errors are not random.

2^nd case ln (y) =ln (a) + b(t)

ANOVAfor ln (y) =ln (a) + b(t)
Model		Sum of Squares	df	Mean Square	F	Sig.
1	Regression	15.382	1	15.382	237.466	.000
	Residual	.648	10	.065
	Total	16.030	11

Model Summary
Model	R	R Square	Adjusted R Square	Std. Error of the Estimate	Durbin-Watson
1	.980^a	.960	.956	.25451	.838

· In this case also we can see the low DW. Though R2 is high.

· Note: data is time series data. Most time series data are not independent. In regression we assume that Y is independent.

· In time y’s are dependent.

There is a dependent structure. That is, {y1, y2, , , yt} yt depends on yt-

1. This is known as autocorrelation.

Coefficients
Model		Unstandardized Coefficients		Standardized Coefficients	t	Sig.
Model		B	Std. Error	Beta	t	Sig.
1	(Constant)	.925	.157		5.905	.000
1	t	.328	.021	.980	15.410	.000

ln y = 0.925 +0.328*(t)

Y =a*e^(bt) ln y = ln(a) + b(t)

ln(a) = 0.925 b = 0.328

a = exp(0.925)=2.521

Y = 2.521* e ^(0.328*t)

In these two type don’t compare R² values. those are 2 models.

EXAMPLES

4.3.1 Example 1

Price	Sales (observed)
70	37
65	70
60	110
55	250
50	288
45	460
40	742
35	1220
30	1800
25	3340
20	5200

We plot these data.

According to this we can assume the data like exponential model (sales =a*e ^(b*price)).

Now it transform into , ln(sales)=ln(a)+b*price

Then we can fit linear model (ln(sales) vs price)

Outputs,

Model Summary
Model	R	R Square	Adjusted R Square	Std. Error of the Estimate	Durbin-Watson
1	.997^a	.994	.994	.12481	2.040

ANOVA
Model		Sum of Squares	df	Mean Square	F	Sig.
1	Regression	25.239	1	25.239	1.620E3	.000^a
	Residual	.140	9	.016
	Total	25.379	10

Coefficients
Model		Unstandardized Coefficients		Standardized Coefficients	t	Sig.
Model		B	Std. Error	Beta	t	Sig.
1	(Constant)	10.471	.114		92.235	.000
1	Price(x)	-.096	.002	-.997	-40.251	.000

Based on the results the fitted model is:

ln(sales) = 10.471-0.096*Price

The transpose of this into the exponential model, y = sales = 35277.5 * e^{(−0.096*Price)}

4.3.2 Example 2

This example plot KMPL vs HP. The scatter plot become,

According to this pattern we can’t fit the exponential model. So for a linear model we have to plotted KMPL vs 1/HP (KMPL = a +b*1/HP)

Then it became,

Model Summary
Model	R	R Square	Adjusted R Square	Std. Error of the Estimate	Durbin-Watson
1	.895^a	.800	.799	2.931	1.345

ANOVA
Model		Sum of Squares	df	Mean Square	F	Sig.
1	Regression	4986.987	1	4986.987	580.478	.000^a
	Residual	1245.721	145	8.591
	Total	6232.707	146

Coefficients
Model		Unstandardized Coefficients		Standardized Coefficients	t	Sig.
Model		B	Std. Error	Beta	t	Sig.
1	(Constant)	13.631	.649		20.994	.000
1	1/hp	2692.467	111.753	.895	24.093	.000

So we can fit the model as, KMPL=13.631+2692.467*(1 / HP)

If we fitted for the raw data we could found,

KMPL = 38.73 - 0.048*HP

[R2 = 59%, SE of the estimate = 4.175, DW = 1.4]

4.3.3 Example 3

Country	Imports (IMP)	GDP	Country	Imports (IMP)	GDP
1	20.3	391	13	30.8	122
2	68	528	14	3.1	9.8
3	1.5	21.4	15	292.1	3550
4	57.7	1340	16	0.17	3.6
5	229	923	17	76.9	200
6	4.8	25.9	18	2	12.9
7	47.9	155.5	19	201.1	434
8	164	258	20	13.7	105.9
9	31.8	136.2	21	6.7	16.9
10	303.7	1540	22	0.9	0.62
11	31.4	201.1
12	0.98	12

Scatter Plot of IMP vs GDP

Based on this scatter plot we fit a linear model. Therefore we consider power model for this data.

(After fitted other types we can consider this one)

So,

Y = a * x ^(b) by transforming, log(y) = log(a) + b* log(x)

(In econometric studies and it is known as log - log model.)

By taking log values for both x & y we can plot a scatter plot. And we can fit a linear model for this.

Model Summary
Model	R	R Square	Adjusted R Square	Std. Error of the Estimate	Durbin-Watson
1	.915^a	.836	.828	.38503	1.647

ANOVA
Model		Sum of Squares	df	Mean Square	F	Sig.
1	Regression	15.166	1	15.166	102.302	.000^a
	Residual	2.965	20	.148
	Total	18.131	21

Coefficients
Model		Unstandardized Coefficients		Standardized Coefficients	t	Sig.
Model		B	Std. Error	Beta	t	Sig.
1	(Constant)	-.537	.195		-2.751	.012
1	Log(GDP)	.900	.089	.915	10.114	.000

So we can have the log model,

Log(IMP) = -0.537 +0.9*Log(GDP)

By transposing this we can have the model for forecasting,

IMP = 0.2905*GDP^(0.9)

STATISTICS RESEARCHES REPORTS

Models Buildings

Post a Comment

0 Comments

Followers

INTRODUCTION

CONECTIONS

Contact Form

Menu Footer Widget

Y	X1	X2	X3	X4
27	20	50	75	15
23	27	55	60	20
18	22	62	68	16
26	27	55	60	20
23	24	75	72	8
27	30	62	73	18
30	32	79	71	11
23	24	75	72	8
22	22	62	68	16
24	27	55	60	20
16	40	90	78	32
28	32	79	71	11
31	50	84	72	12
22	40	90	78	32
24	20	50	75	15
31	50	84	72	12
29	30	62	73	18
22	27	55	60	20

Y	X1	X2	X3	X4
27	20	50	75	15
23	27	55	60	20
18	22	62	68	16
26	27	55	60	20
23	24	75	72	8
27	30	62	73	18
30	32	79	71	11
23	24	75	72	8
22	22	62	68	16
24	27	55	60	20
16	40	90	78	32
28	32	79	71	11
31	50	84	72	12
22	40	90	78	32
24	20	50	75	15
31	50	84	72	12
29	30	62	73	18
22	27	55	60	20

Y	X1	X2	X3	X4
27	20	50	75	15
23	27	55	60	20
18	22	62	68	16
26	27	55	60	20
23	24	75	72	8
27	30	62	73	18
30	32	79	71	11
23	24	75	72	8
22	22	62	68	16
24	27	55	60	20
16	40	90	78	32
28	32	79	71	11
31	50	84	72	12
22	40	90	78	32
24	20	50	75	15
31	50	84	72	12
29	30	62	73	18
22	27	55	60	20