The first column contains variable names as reported in the Zillow dataset while the other contains a description of the variable and how it was calculated. There’s nothing flashy about this data dictionary. One of the files I downloaded earlier along with the datasets. Some of the names are fairly intuitive, but others require use of the Data Dictionary. In Excel, this is easy to get by just looking at the top row. To start your data exploration, it can help to generate a list of your variables. If you already know the answers to your questions, then research serves no purpose. This is a time for creativity and being wrong. If you’re like me and know next to nothing about housing, try thinking intuitively and generate hypotheses based on what you already know. If you already know something about your topic, then you can generally start by verifying existing knowledge. Just kidding, that’s probably not very helpful. Instead we should start with a few “important” variables. We could theoretically just calculate descriptive statistics for all of them, but that results in a lot of noise and not a lot of useful information.
Those, along with the percentiles and standard deviation, generally form the backbone of our descriptive statistics for quantitative variables.Ĭalculating descriptive statistics is fairly simple, the more difficult question is: which variables do we want to calculate statistics for? The State data currently has 85 variables, 80 of which are numeric.
You’re probably already familiar with the measures of central tendency: mean, median and mode.
In other words, you want to know what a “typical” observation looks like and how the rest of the data compares. The goal is to get some idea about the center and variance of your data. Descriptive Statisticsĭescriptive statistics help you attain a general overview of the numeric data. I’m going to stick to working with the State data for now, but everything should apply to the other data sets with some slight modifications. By the end of the process you should possess a better grasp of the data you’re working with and the following are just some tools to guide you there. What I’m about to go over is generally how I approach things, but there’s no right way to go about exploratory analysis. The first step of any quantitative analysis involves computing some descriptive (summary) statistics.