Jackie Cerretani, Fall 2007
In the fall of 2007, I took a two-course series on Data Retrieval and Analysis Techniques. This website houses all of my homework write-ups from those classes, including code, visualizations, and interpretations. It reflects the evolution of my skills and my way of reasoning through problems they can be applied to.
Here are the course descriptions from the University of Michigan School of Information website:
SI 601: Data Manipulation
Aims to help students get started with their own data harvesting, processing, and aggregation. Data analysis is crucial to evaluating and designing solutions and applications, as well as understanding users' information needs and uses. In many cases, the data we need to access is distributed online among many Web pages, stored in a database or available in a large text file. Often these data (e.g., Web server logs) are too large to obtain and/or process manually. Instead, we need an automated way to gather the data, parse it, and summarize it before we can do more advanced analysis. In this course, you will learn to use Perl and its modules to accomplish these tasks in a quick and easy yet useful and repeatable way. The companion half of this half-semester course, SI 618: "Exploratory Data Analysis," teaches how to further glean insights from the data through analysis and visualization.
618: Exploratory Data Analysis
Aims to help students get started with their own data acquisition and analysis. Data analysis is crucial to evaluating and designing solutions and applications as well as to understanding information needs and use. Students in this course (who will have just completed SI 601: "Data Manipulation") will learn techniques of exploratory data analysis using scripting, text parsing, structured query language, regular expressions, graphing, and clustering methods to explore data. Students will be able to make sense of and see patterns in otherwise intractable quantities of data.
Session II: Exploratory Data Analysis (html)
Unix Utilites, Large CorporaSession I: Data Manipulation (pdfs)
Parsing Large Text Files, Map Visualizations