Chapter 1 Introduction
Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions and supporting decision-making. -Wikipedia
1.1 Principles and Practices
Nobody teaches Excel anymore. At least, that’s been my experience in public sector organizations. For many folks, data analysis and Excel are synonymous. And most often– again, in my experience– that consists of finding the “Excel Person” (usually one of two types of people– the young’un who is also the “Sharepoint/Teams/How-Do-I-Use-This-New-Technology?” guru, or long-timer who started using Lotus 1-2-3 when Ronald Reagan was president). Now, there is nothing wrong with being the “Excel person” (as you may have guessed, this is a role that I fill frequently). However, I strongly believe there is value in intentionally learning how to analyze data outside of just compiling survey results from your manager or making a chart from a canned report. And there is so much more out there than traditional Excel-based tools.
The public sector increasingly runs on data. Data-driven decision making. Objective. Based on the facts. Too often, these terms are meaningless platitudes thrown around to discard dissenting opinions. The truth is, the data can never answer your question. Only you can answer your question. Data can certainly help. But data can never speak for themselves. Data, and data analysis, is always interpreted through humans, and humans are inherently messy decisionmakers who weigh experience, intuition, and heuristics when making the call.
This may seem like a strange way to introduce a book on data analysis. But I would argue that the messiness of decisionmaking is what makes democratic data analysis all the more valuable. In this handbook, I will try to convince you that data analysis is worth doing purposely, especially if you are someone who does not consider yourself a data analyst.
1.2 What is data analysis?
Data analysis, in the way I am using the term, is the process of examining, transforming, and modeling collected facts and figures on a screen to insight into the real world. It’s looking data and gaining insight from it. Data analysis can be as simple as adding totals into a column to see cumulative effects, or as complicated as time-series forecasting. But fundamentally, all data analysis is taking inputs and applying those inputs to the real world to gain insight into the real world. It also may be helpful to think about what data analysis isn’t:
- Data analysis isn’t math.
- Calculations are great, but
a7 + b8
in Excel is deterministic. It gives you one answer. This book is not interested in data analysis that gives you the right answer, because there is no such thing. There are many answers to many questions, depending on how those questions are asked and how the data is analyzed.
- Data analysis isn’t statistics.
- This book is about reading and telling the story of your data in a way that can complement expertise and experience to make better decisions. Statistics are often used as a cheap stand-in for domain expertise and are often abused in favor of trusting the analyst or administrator to back up their assumptions with both quantitative and qualitative data.
- Data analysis isn’t research methods.
- No set of tools and practices can stand in for asking the right questions, and transforming data into information to answer that question. This book will give you the tools to work with your quantitative data to answer relevant questions, but all good analysis begins with a good question.
1.3 What is democratic data analysis?
I propose the following four principles of democratic data analysis. Democratic data analysis is:
- Tidy
- Pivot-able
- Reproducible,
- Uncertainty-oriented, and
- Audience-focused
Why focus on principles rather then specific tutorials? Data analysis is enabled by the technologies that we have access to. Whether it be the venerable pivot table, or a new-school dashboard platform, or a data-oriented programming language, the principles that I lay out in this handbook supersede specific technologies. Think of it like grammar. You may write by hand, on a computer, using text-to-speech. You may be writing a poem, a novel, an argument, or an instruction manual. But the basic rules of grammar are relevant in whatever medium you choose.
Similarly, this guide will teach you the basic “grammar” of democratic data analysis. This will allow you to apply this knowledge in whatever platform or technology you are interested in or have access to. But similar to learning language, it helps to practice. It isn’t much to use to study grammar without ever writing a sentence.
1.4 Tools and Techniques
The principles section of this guide will include examples in both Excel and R
. Government runs on Excel, so all of the examples and exercises will be Excel compatible. If you are comfortable with Excel1 and want to challenge yourself, boost your resume, and become a data superstar, I would highly recommend learning R
. But run what you brung, as they say.
And a note on Excel– if you are comfortable in Excel, use modern Excel tools! Institutional knowledge and inertia are strong in large organizations, and there are extremely capable tools sitting under your nose. The Microsoft “Power” (PowerPivot, PowerQuery, PowerBI) Excel stack integrate neatly into the principles that I outline here. Adopting these tools was how I started on the path of incorporating reproducible and democratic data analysis in public sector organizations. By adopting these tools, you will quickly learn why the principles outlined here are important, as you are forced to think about tabular data and calculated summaries. But more on that later. To start, here is a quick start guide to Power Query/Get and Transform and Power Pivot/DAX.
aka you use
vlookup
,index(match)
, pivot tables, or Get & Transform/PowerPivot on a somewhat regular basis↩︎