Sunday, November 6, 2016

GIS 4930: Special Topics; Project 3: Stats Analyze Week

Welcome to the continuation of our look at statistical analysis with ArcMap.  Recall that the theme being explored with statistics is methamphetamine lab busts around Charleston, West Virginia.  These past few weeks of analysis have been the bulk of the work for this project.  The overall objectives of the analysis portion are to review and understand regression analysis basics, and a couple key techniques.  Define what the dependent and independent variables are for the study as they apply to the regression analysis.  Perform (multiple renditions) of an Ordinary Least Squares (OLS) regression model.  Finally, complete 6 statistical sanity checks based on the OLS model outcomes.  

In the previous post, we looked at a big overview of the area that is being analyzed.  There are 54 lab busts from the 2004-2008 time frame taken from the DEA's National Clandestine Laboratory data.  Decennial census data from 2000 and 2010 at the census tract level was spatially joined to these 54 lab busts.  The data was then normalized into  a percentage by the census tract into 31 categories for analysis in the OLD model.  These 31 categories of data were then fed into the model and systematically removed while analyzing their affect on the model.  Ultimately as good of a model as possible was arrived at with some results shown below. 




This is a generated output depicting the OLS results created in ArcMap. Key things to note from the table is that there are only nine variables being incorporated into the OLS model of the original 31.  How were variables removed you might ask?  There are six checks, or questions, to answer to determine the validity of a variable's use in the OLS model: does an independent variable help or hurt the model; is the relationship to the dependent variable as expected; are there redundant explanatory variables; is the model biased; are there variables missing or unexplained residuals; how well does the model predict the dependent variable?  The first three of these were generally grouped into one solid check for determining if a variable should stay or go.  The remaining checks were applied to the model results as a whole.  As long as a variable had a coefficient that wasn't near zero, a probability lower than 0.4, and a VIF less than 7.5, it could stay.  After looking at this data table, it's time to transition to the visual interpretation, shown below. 



This map depicts the standard residual for the OLS model depicted in the table.  It symbolizes areas using a standard deviation style outlook.  However, rather than wanting a more Gaussian curve style of data showing some of every color, you ideally want values to be in the +/- 0.5 range because that is said to be highly accurate.  Darker browns indicate areas that the model predicted less meth labs actually were.  Whereas, darker blues indicate high value areas where the model expected more meth labs than those that were actually present.  

This week's focus was not to describe the data results, but to accomplish the analysis leading up to it.. 

No comments:

Post a Comment