We’re able to explain 85% of the variation in mpg from the auto.fit10 regression model derived from the given data. How accurately can we predict mpg from the given data?.We could not estimate or isolate the effect of cylinders, displacement and acceleration accurately. The variables not included had values larger than 0.05 or had a very small coefficient compared to the others. We are able to estimate the effect of model_year, weight, origin and horsepower on mpg with high accuracy in auto.fit10 model as the p-values for all the included independent variables are much less than 0.05. How accurately can we estimate the effect of each specification on mpg?. We see that weight and horsepower^1 lower mpg. Which specifications contribute to raising or lowering mpg?.The relationship is quite strong as 85% of the variance in the dependent variable is explained by these independent variables. There seems to be a strong relationship between mpg (represented above as the dependent variable) and the independent variables model_year, weight, origin and horsepower. Is there a relationship between mpg and other variables?.Let’s use the model obtained to answer the questions. These are based on the excellent analysis in “An Introduction to Statistical Learning” textbook. So even if the effect were small, we would have kept it. Also, intuition dictates that mpg should be dependent on the weight of the vehicle. So, the effect in reality is quite significant. The coefficient for weight is quite small compared to the others but the weight values are in thousands. Adjusted R-squared is the second highest. Multiple R-squared: 0.8506, Adjusted R-squared: 0.8487 Let’s try just a few more combinations: auto.fit9 If the effect is small and we are not able to explain why the independent variable should affect the dependent variable in a particular way, we may be risking overfitting to our particular sample of data. If the context dictates that that particular variable is important to explaining the outcome, we will retain it in the model even if the coefficient is very small. While creating models we should always bring business understanding into consideration. So we should consider removing it unless horsepower^3 has an intuitive or business meaning to us in the given context. Another thing to note is that even though the p-value for horsepower^3 is very small (relationship is significant), the coefficient is tiny. The Adjusted R-squared is the highest so far. Multiple R-squared: 0.8571, Adjusted R-squared: 0.8548 Since mpg clearly depends on all the variables, let derive a regression model, which is simple to do in RStudio. I excluded them here because the plot image about would become too large to be easily intelligible. Your plot would also show relationships among mpg, model-year and origin variables. Mpg decreases with increase in number of cylinders, displacement, weight, horsepower and increases with acceleration (the variable acceleration represents time taken to acceleration from 0 – 60 mph, so the higher the acceleration value, the worse the actual acceleration). See how quickly a scatter plot helps see the relationships between the variables. >auto colnames(auto) auto$horsepower auto pairs(~mpg + cylinders + displacement + horsepower + weight + acceleration + model_year+origin) I found a dataset on mpg (miles per gallon) on UCI Machine Learning Repository and other car data and regression on that was quite fun. If you know any such dataset with media-specific advertising spend and sales for the corresponding period with at over 40 or so rows, do share in the comments. I couldn’t find a public dataset for the advertising use-case even though I tried for a while. Predicting Miles per Gallon from Auto Specifications Another common one is predicting house prices based on inputs like sqm/sqft area of the house, the location, number of rooms etc. Where is it applicable?Ī very common use case is predicting sales from advertising spend on various media. Scatter plots can help you tease out these relationships as we will show in the R section below. The trick is to apply some intuition as to what terms could help determine Y and then test the intuition. Sometimes there may be terms of the form b4x1.x2 + b5.x1^2… that add to the accuracy of the regression model. It is a very useful and simple form of supervised learning used to predict a quantitative response.īy building a regression model to predict the value of Y, you’re trying to get an equation like this for an output, Y given inputs x1, x2, x3… Regression is the first technique you’ll learn in most analytics books.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |