Course Description:   This course is about creating business insights from big data. The learning objective is to develop three abilities. The first is the ability to manipulate big data. This includes downloading, merging, appending and reshaping data, and creating new variables. Second is the ability to analyze data. This includes exploratory data analysis, visualization, and sophisticated predictive algorithms including nearest neighbor, naive Bayes, decision trees, regression and others. We will pay special attention to validating our predictions using the train and test regimen. Finally, students will develop an ability to formulate questions that can be answered using big data, and lead to better business performance. This includes using data to improve marketing, pricing, investing capital, customer satisfaction, costs, etc. The data manipulation and analysis will be implemented by writing programs in statistical software called R. The syllabus is here.

Readings:  We will use three different books. The primary texts are Machine Learning with R by Brett Lantz, and R for Data Science by Garret Grolemund and Hadley Wickham. You should buy these two books.  Finally, as a helpful reference for writing your final project, we recommend purchasing an ebook called Elements of Data Analytic Style by Jeff Leek. 

Hands-on Exercises: This course is very hands-on. Students will write R Markdown code on day one. Each labs below has two parts: an illustration of a concept and home-work or in-class exercises. If you are an instructor and would like to see the solutions please contact me at dvorakt@union.edu