Students in this course will be expected to become familiar with the use of data in order to produce stories with impact, authority and distinction. Data – large sets of information and numbers – are increasingly available from public, private and social media sources. As a result, journalism organizations are looking to make use of this rich pool of information – and to hire those who are able to do so.
Students in this course will be introduced to the concept of data driven journalism with a strong focus on its use in investigative journalism. They will explore ways to obtain data, use tools to analyze it and learn how to deploy it in their work. They will be introduced to basic concepts in downloading data, making public record requests for electronic data, using optical character recognition scanning, and hand building data sets. They will learn a basic command of spreadsheets and will be introduced to database management software.
They will also, importantly, be taught to avoid the many pitfalls of data, a kind of statistical version of Defense Against the Dark Arts. They will learn how to clean data, question its reliability and use it meaningfully. If we are exceptionally ambitious, we will touch upon future areas of study, including the use of statistical analysis, APIs, web-scraping and computer coding.
Students in this class must demonstrate flexibility. I will be teaching this course in an iterative, responsive, design-driven fashion, adapting to students’ needs, levels of competency and emerging news and new technology. Mid-course adjustments are to be expected. Audibles will be called. New directions will be enthusiastically embraced.
The focus will be emphatically on the practical art of reporting, especially the combination of narrative, street work and data analysis. We are interested in math, statistics, and whiz-bang software only as they improve our ability to report, raise hell and tell a great story. There is no escaping that data-driven journalism is a skill learned by practice, rather than theory. It is a field where, surprisingly, you must get your hands dirty.
- Huff, Darrell. How to Lie with Statistics. New York: Norton, 1993. Print.
- Meyer, Philip. Precision Journalism: A Reporter's Introduction to Social Science Methods. Lanham, MD: Rowman & Littlefield, 2002. Print
The above two books are simply bibles in the field. There will be selected readings from them, and you may acquire them from Amazon in Kindle or hard copy. In addition, we will be distributing stories that use data in journalism from a variety of sources and genres – including investigative reporting, entertainment reporting and sports reporting. These will be discussed in class, with participation expected and the occasional cold call not out of the realm of possibility.
Students will be required to complete exercises involving the use of spreadsheets, databases and other tools both during class and outside of class. We will do a needs-finding at our first class to determine math skills and software knowledge. I would prefer the use of Microsoft’s basic package of Excel. If that’s not possible, we will try to complete our assignments using open source software such as Google Spreadsheet, MySql and Navicat. We will also use tools including OpenRefine, DocumentCloud, the Excel PowerPivot add-on, and Fusion Tables. If we are ambitious, we will play with R, an open-source statistical language, or install a programming environment such as Python or Ruby to get an idea of what coding looks like and how Regular Expressions, or RegExes, can help in data cleaning and scraping.
A student membership with the group Investigative Reporters & Editors is highly desirable. Membership is $25 per year, and provides access to 30 government databases, free software programs such as Tableau and Cometdocs, and detailed roadmaps to 30 years of investigative stories. Full disclosure: I’m on the Board of Directors, but get no compensation or kickbacks of any kind. I just think it’s a marvelous opportunity to connect with a marvelous organization.
Students will be assigned practice drills beginning in week three. These drills are important in your ability to independently know how to query, analyze and clean data. There is no substitute for practice. I do not expect you to produce perfect SQL queries and elegant Excel functions. Failure is an option. What I want to see is your effort to understand and grapple with the exercises. I want you to go out into the wider world confident in your ability to grab data and incorporate into your work. I want you to be able to tell you future employers, with a straight face, that you are a data ninja. Or at least a red belt. Collectively, these drills will contribute 50% of your grade.
Attendance and participation
Journalism is not a passive activity and requires focus, inquisition and involvement. For that reason, I place great weight on your involvement in class, and collaboration with your fellow students. We will be handing out readings and discussing professional work, writings and data issues every week, and I expect your comments, questions and other contributions to our class. None of this can happen if you don’t show up. These factors constitute 25 percent of your final grade.
Each student will complete two written critiques of professional work that makes extensive use of government data. While you may not be able to replicate the reporters’ work, try to put yourself in their shoes and judge the decisions they made. Think of these as mini-book reviews, between 500-750 words for each. Each analysis will contribute 12.5% of your grade.
Students will be expected to strictly follow the university’s honor code, of course. But this is a journalism class preparing students for careers in journalism. In this particular career, plagiarism results not in student counseling or a lowered grade, but in public embarrassment and likely termination from employment. Suffice to say, the honor code will be strictly enforced.
I expect that many of these skills will be taught in a hands-on way: Topics may include analysis of health care data; consumer safety data; corporate filings; Census statistics and other relevant data sets. The ideal would be to connect a long-term project at a large newsgathering organization with students in order to demonstrate the practical application of the techniques.