Regression | Reliable Papers

● Regression I: Simple Regression○ introduction■ linear pattern■ residual plot■ correlation of determination○ simple regression model■ linear model■ inference■ prediction○ regression diagnosticsTSTA602 Week 7Linear PatternLinear PatternLinear PatternLinear PatternfactorLinear PatternfactorresponseLinear PatternfactorresponseslopeLinear Patternfactorresponseintercept slopeLinear Patternfactorresponsefitted value intercept slopeLinear Patternfactorresponsefitted value intercept slopeestimate value of y by the modelLinear PatternfactorresponseThis vertical distancefitted value intercept slopeestimate value of y by the modelLinear PatternfactorresponseThis vertical distancePositive residualfitted value intercept slopeestimate value of y by the modelLinear PatternfactorresponseThis vertical distancePositive residualfitted value intercept slopeestimate value of y by the modelThis vertical distanceLinear PatternfactorresponseThis vertical distancePositive residualfitted value intercept slopeestimate value of y by the modelThis vertical distancenegative residualLinear PatternfactorresponseThis vertical distancePositive residualfitted value intercept slopeHow to find the fitted line?estimate value of y by the modelThis vertical distancenegative residualLinear PatternfactorresponseThis vertical distancePositive residualfitted value intercept slopeHow to find the fitted line?minimize sum of all residualwhere n is the number of data pointsestimate value of y by the modelThis vertical distancenegative residualLinear PatternfactorresponseThis vertical distancePositive residualfitted value intercept slopeHow to find the fitted line?minimize sum of all residualwhere n is the number of data pointsestimate value of y by the modelThis vertical distancenegative residualLinear PatternfactorresponseThis vertical distancePositive residualfitted value intercept slopeHow to find the fitted line?minimize sum of all residualwhere n is the number of data pointsWhereis the correlation between x and y;estimate value of y by the modelThis vertical distancenegative residualLinear PatternfactorresponseThis vertical distancePositive residualfitted value intercept slopeHow to find the fitted line?minimize sum of all residualwhere n is the number of data pointsWhereis the correlation between x and y;, are sample standard deviation of x and yare respectively.estimate value of y by the modelThis vertical distancenegative residualLinear PatternfactorresponseThis vertical distancePositive residualfitted value intercept slopeHow to find the fitted line?minimize sum of all residualwhere n is the number of data pointsWhereis the correlation between x and y;, are sample standard deviation of x and yare respectively.estimate value of y by the modelThis vertical distancenegative residualleast square estimateLinear PatternfactorresponseThis vertical distancePositive residualfitted value intercept slopeHow to find the fitted line?minimize sum of all residualwhere n is the number of data pointsWhereis the correlation between x and y;, are sample standard deviation of x and yare respectively.estimate value of y by the modelThis vertical distancenegative residualleast square estimatelinear regression lineResidual PlotResidual PlotIs there a pattern?Residual PlotIs there a pattern?If the model captures therelationship between xand y, there should be nopattern.Residual PlotIs there a pattern?If the model captures therelationship between xand y, there should be nopattern.If a linear model isappropriate, we shouldsee a random plot.Correlation of DeterminationCorrelation of DeterminationCorrelation of DeterminationCorrelation of DeterminationCorrelation of DeterminationCorrelation of DeterminationR-squared determines thefraction of the variationaccounted for by the (leastsquares) regression line.Correlation of DeterminationR-squared determines thefraction of the variationaccounted for by the (leastsquares) regression line.If , regression lineexplains none of the variationin the data;If , regression lineexplains all the variation inthe data.Correlation of DeterminationR-squared determines thefraction of the variationaccounted for by the (leastsquares) regression line.If , regression lineexplains none of the variationin the data;If , regression lineexplains all the variation inCorrelation is not causation. the data.Simple Regression Model● Simple Regression Model formulates a relationship between two datasetsSimple Regression Model● Simple Regression Model formulates a relationship between two datasets○ (e.g.) linear relationship – Simple Linear Regression Model or (simply) Linear ModelSimple Regression Model● Simple Regression Model formulates a relationship between two datasets○ (e.g.) linear relationship – Simple Linear Regression Model or (simply) Linear Model● The response variable is represented by the factor and (fitted) parametersSimple Regression Model● Simple Regression Model formulates a relationship between two datasets○ (e.g.) linear relationship – Simple Linear Regression Model or (simply) Linear Model● The response variable is represented by the factor and (fitted) parameters● The equation for the linear model isSimple Regression Model● Simple Regression Model formulates a relationship between two datasets○ (e.g.) linear relationship – Simple Linear Regression Model or (simply) Linear Model● The response variable is represented by the factor and (fitted) parameters● The equation for the linear model isX = xSimple Regression Model● Simple Regression Model formulates a relationship between two datasets○ (e.g.) linear relationship – Simple Linear Regression Model or (simply) Linear Model● The response variable is represented by the factor and (fitted) parameters● The equation for the linear model isX = x errorSimple Regression Model● Simple Regression Model formulates a relationship between two datasets○ (e.g.) linear relationship – Simple Linear Regression Model or (simply) Linear Model● The response variable is represented by the factor and (fitted) parameters● The equation for the linear model isX = x errorThe assumptions on the error term1. For a fixed X = x, is a randomvariable andSimple Regression Model● Simple Regression Model formulates a relationship between two datasets○ (e.g.) linear relationship – Simple Linear Regression Model or (simply) Linear Model● The response variable is represented by the factor and (fitted) parameters● The equation for the linear model isX = x errorThe assumptions on the error term1. For a fixed X = x, is a randomvariable and2. For different x, are independent.Simple Regression Model● Simple Regression Model formulates a relationship between two datasets○ (e.g.) linear relationship – Simple Linear Regression Model or (simply) Linear Model● The response variable is represented by the factor and (fitted) parameters● The equation for the linear model isY = y X = x errorThe assumptions on the error term1. For a fixed X = x, is a randomvariable and2. For different x, are independent.Simple Regression Model● Simple Regression Model formulates a relationship between two datasets○ (e.g.) linear relationship – Simple Linear Regression Model or (simply) Linear Model● The response variable is represented by the factor and (fitted) parameters● The equation for the linear model isY = y X = x errorleast squareThe assumptions on the error term1. For a fixed X = x, is a randomvariable and2. For different x, are independent.Simple Regression Model● Simple Regression Model formulates a relationship between two datasets○ (e.g.) linear relationship – Simple Linear Regression Model or (simply) Linear Model● The response variable is represented by the factor and (fitted) parameters● The equation for the linear model isY = y X = x errorleast squareThe assumptions on the error term1. For a fixed X = x, is a randomvariable and2. For different x, are independent.Simple Regression Model● Simple Regression Model formulates a relationship between two datasets○ (e.g.) linear relationship – Simple Linear Regression Model or (simply) Linear Model● The response variable is represented by the factor and (fitted) parameters● The equation for the linear model isY = y X = x errorleast squareThe assumptions on the error term1. For a fixed X = x, is a randomvariable and2. For different x, are independent.population averageInference● Confidence intervalInference● Confidence intervalFor i = 0, 1, we haveInference● Confidence intervalFor i = 0, 1, we havewhere , ,Inference● Confidence intervalFor i = 0, 1, we havewhere , ,(e.g.) 95% confidence intervalInference● Hypothesis TestingFor j = 0, 1, H0: H1:Inference● Hypothesis TestingFor j = 0, 1, H0: H1: Coefficients:EstimateStd. Errort valuePr(>|t|)(Intercept)3.21340.204415.722.68e-07***x2.59730.178914.524.95e-07***—Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1Residual standard error: 0.4158 on 8 degrees of freedomMultiple R-squared: 0.9634, Adjusted R-Squared: 0.9589F-statistic: 210.9 on 1 and 8 DF, p-value: 4.955e-07 Inference● Hypothesis TestingFor j = 0, 1, H0: H1: Coefficients:EstimateStd. Errort valuePr(>|t|)(Intercept)3.21340.204415.722.68e-07***x2.59730.178914.524.95e-07***—Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1Residual standard error: 0.4158 on 8 degrees of freedomMultiple R-squared: 0.9634, Adjusted R-Squared: 0.9589F-statistic: 210.9 on 1 and 8 DF, p-value: 4.955e-07 estimate coefficientsInference● Hypothesis TestingFor j = 0, 1, H0: H1: Coefficients:EstimateStd. Errort valuePr(>|t|)(Intercept)3.21340.204415.722.68e-07***x2.59730.178914.524.95e-07***—Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1Residual standard error: 0.4158 on 8 degrees of freedomMultiple R-squared: 0.9634, Adjusted R-Squared: 0.9589F-statistic: 210.9 on 1 and 8 DF, p-value: 4.955e-07 standard error for coefficientsestimate coefficientsInference● Hypothesis TestingFor j = 0, 1, H0: H1: Coefficients:EstimateStd. Errort valuePr(>|t|)(Intercept)3.21340.204415.722.68e-07***x2.59730.178914.524.95e-07***—Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1Residual standard error: 0.4158 on 8 degrees of freedomMultiple R-squared: 0.9634, Adjusted R-Squared: 0.9589F-statistic: 210.9 on 1 and 8 DF, p-value: 4.955e-07 standard error for coefficientsestimate coefficientsR squaredInference● Hypothesis TestingFor j = 0, 1, H0: H1: Coefficients:Estimate Std. Error t value Pr(>|t|)(Intercept)3.21340.204415.722.68e-07***x2.59730.178914.524.95e-07***—Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1Residual standard error: 0.4158 on 8 degrees of freedomMultiple R-squared: 0.9634, Adjusted R-Squared: 0.9589F-statistic: 210.9 on 1 and 8 DF, p-value: 4.955e-07 standard error for coefficientsp-valueestimate coefficientsR squaredInference● Hypothesis TestingFor j = 0, 1, H0: H1: Coefficients:Estimate Std. Error t value Pr(>|t|)(Intercept)x—Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1Residual standard error: 0.4158 on 8 degrees of freedomMultiple R-squared: 0.9634, Adjusted R-Squared: 0.9589F-statistic: 210.9 on 1 and 8 DF, p-value: 4.955e-07 3.2134 0.2044 15.72 2.68e-07 ***2.5973 0.1789 14.52 4.95e-07 ***standard error for coefficientsp-valueestimate coefficientsR squaredIf the significance level is 0.05,then we conclude that bothestimates are significant. Sincethey both less than 0.05.Prediction● Once we have the fitted model,we may able to predict our response for a new factor asRegression Diagnostics● Regression diagnostics involves testing model assumptions of regressionRegression Diagnostics● Regression diagnostics involves testing model assumptions of regression● A common way to do regression diagnostic is through analysis of residual, i.e., use residual plotRegression Diagnostics● Regression diagnostics involves testing model assumptions of regression● A common way to do regression diagnostic is through analysis of residual, i.e., use residual plot● Residual plot is the plot of residual versus factorRegression Diagnostics● Regression diagnostics involves testing model assumptions of regression● A common way to do regression diagnostic is through analysis of residual, i.e., use residual plot● Residual plot is the plot of residual versus factor● Residual often can be seen as estimates of errorRegression Diagnostics● Healthy residual PlotresidualfactorRegression Diagnostics● LinearityresidualfactorRegression Diagnostics● LinearityThere is a obvious pattern,thus the model does notcapture all the relationship,we do not have a linearrelation.residualfactorRegression Diagnostics● Constant VarianceresidualfactorRegression Diagnostics● Constant VarianceThe variability of residual ischanging which implies anon-constant variance.residualfactorRegression Diagnostics● Constant VarianceThe variability of residual ischanging which implies anon-constant variance.residualfactorNon-constant varianceis often refer to asHeteroscedasticRegression Diagnostics● NormalityRegression Diagnostics● Normalityquantile-quantile plotRegression Diagnostics● Normalityquantile-quantile plota kth q quantile of a random variable X is thesmallest number x such thatIn other words, quantitle divides dataset intoequal parts. (e.g.) quartilesRegression Diagnostics● NormalityRoughly a straight line along thediagonal line which implies thaterror terms do come from Normaldistribution.quantile-quantile plota kth q quantile of a random variable X is thesmallest number x such thatIn other words, quantitle divides dataset intoequal parts. (e.g.) quartilesRegression Diagnostics● NormalityRoughly a straight line along thediagonal line which implies thaterror terms do come from Normaldistribution.quantile-quantile plota kth q quantile of a random variable X is thesmallest number x such thatIn other words, quantitle divides dataset intoequal parts. (e.g.) quartilesdiagonal lineReference● Robert Stine and Dean Foster, Statistics for Business, 3rd ed, 2017● Dr David Liu, TSTA401 Session 2, 2019, Lecture Notes● Ken Black, Business Statistics, 8th ed., 2014, John Wiley & Sons, Inc.● Coefficient of determinationhttps://en.wikipedia.org/wiki/Coefficient_of_determination● Error and Residualshttps://en.wikipedia.org/wiki/Errors_and_residuals● Quantilehttps://en.wikipedia.org/wiki/Quantile