Instructor

Rick Watson
Email to set up a chat or video connection

Class: Wednesday 1-4

Course description

Provides students with entry level knowledge of data science, along with experience of diverse methods and technologies related to common aspects of data science.

General

The course syllabus is a general plan for the course; deviations announced to the class by the instructor may be necessary.

Objectives

Students completing the course will have foundational skills in the use of common data science tools, including:

Text

Wickham, H., & Grolemund, G. (2017). R for data science: O’Reilly.

Great R packages for data import, wrangling and visualization

Assignments due date and time

The due time is 11:59pm on the Friday after class.

Readings

The class will read a variety of recent articles on topics on data science and related issues. I will randomly call students in class to identify the 2-3 key points made by the article. If you not prepared, you could lose up to the full 5 points allocated to readings as part of the course grade.

Assignments

Assignment will be done in pairs because of the demonstrated value of pair learning. You are expected to help each other learn R. Please notify the instructor by 11:59pm of 20/7 of the composition of your pair.

A1 (R)

  1. Download the data for solar radiation (a timestamp and solar radiation in watts/m2) and electricity prices (a timestamp and cost in cents per kWh) for a city in the South East of the US. The files measure data for different time periods, one every 2 minutes and the other hourly. Assume the 'on the hour' measure of solar radiation is a good estimate for the 30 minutes either side. Merge the two files. 
    Compute the correlation between solar radiation and electricity price. What do you conclude? 
  2. Using the solar radiation data, compute the annual average and monthly averages of solar radiation. Athens is about 100 km east of Atlanta, so compare the data with solar radation for Atlanta, GA, which measures solar radiation in kWh/m2/day. You will need to make a conversion by multiply the power in watts by 24 and dividing by 1000.

Note: A watt is the unit of power whereas a kilowatt-hour (kWh) is the unit of energy.  You can compare a watt to how fast water is flowing out of a water pipe. A kWh is  equivalent to a power consumption of 1,000 watts for 1 hour.

A2 (R)

Using the merged file created in the previous assignment, do the following

  1. Graph the relationship between solar radation and electricity price.
  2. Create a column (geom_col) chart showing average solar radiation for each month.
  3. Graph the daily maximum price of electricity for each day of August

A3 (Exploratory)

A file contains details of CO2 emissions per capita for the four largest economies in the Americas. Use Exploratory to read the file, convert it into a format suitable for use with R. (1) Report the average CO2 per capita for each country in descending order, and (2) prepare a bar chart showing the average CO2 per capita for each country. Create a Word document with your results.

A4 (Exploratory)

Read the temperature data for Central Park. Compute the average temperature for each year, and create a scatter graph with a linear regression line. Create a Word document with your results.

Project

As there are 50 students in the class, we will have around 12 teams of 4 members for the project. . Please notify the instructor by 11:59pm of 20/7 of the composition of your group. Pairs can combine to create a group of four.

Identify a problem and use R to explore data related to the problem and prepare a report on your analysis and related recommendations. You should exploresome local available open data sets, such as https://www.data.govt.nz/, http://opendata.canterburymaps.govt.nz, and https://opendata.ccc.govt.nz/public-portal/. Please discuss your proposed project with the instructor before starting major work on it. Presentations will be 10 minutes each.

Team Members
1 Athira Nair
Amit Shah
Hakan Gulliksen
Megha Malhotra
2

Jabir Singh Baith
Shun Li
Varun Sabharwal
Kirti Lather

3 Jing Wu
Nicole Pearks
Yuting Yang
Zhexi Liang
Weichen Jiang
4 Nimmy Lloyd
Pooja Chindalur
Savika Gunasinghe
Sujith Sampath Kumar
5 Romalee Amolic
Prashant Islur
Piyush Rastogi
Jonathan Munro
6 Vee Chalermglin
Luke Chune
Byunggu Kang
Hiroyuki Nezu
7 Vaisakh Radhadkrishnan
Ankit Jaiswal
8 Sam Davidson
Aileen Medina
Pradeep Raja
Mengqi (Peter) Shi
9
Daniel Bentall
Blake List
Ben Faulks
Sid Bhatnagar
10  
11  
12  

Grading

Item Points
Topic assignments 20
Research assignment 20
Project 30
Exam 25
Articles 5
Total 100
If you are unable to complete an assignment on time, please advise the instructor as soon as possible so that alternative arrangements can be made.

Schedule

Class Date Topics Assignments Package(s) Readings
1 18 Jul

Data & information (slides)
Introduction to Data Science (slides)
Introduction to R (slides)
RStudio

  tidyverse, readr, readxl, DBI, RMySQL, dplyr, lubridate, measures
2 25 Jul Data visualization with R (slides) A1 ggplot2 Stakeholder-Driven Data Science at Warby Parker
How moneyball tactics built a basketball juggernaut Wired
3 1 Aug Regression (slides)
Decision trees (slides)
Neural networks (slides)
A2 ctree
neuralnet
*Getting Value from Machine Learning Isn’t About Fancier Algorithms — It’s About Making It Easier to Use Harvard Business Review
Manheim case
Manheim data
4 8 Aug Time series forecasting (slides)
Exploratory (slides)
•Data wrangling
•Descriptive analytics
A3

lubridate
dygraphs
xts

*Drillers turn to big data in the hunt for more, cheaper oil Financial Times
*Can big data revolutionise policymaking by governments? Financial Times
5 15 Aug Exploratory
•Explanatory analytics
•Predictive analytics
•Prescriptive analytics
A4  

Top data science and machine learning methods used in 2017
*The joys of data hygiene: Europe’s tough new data-protection law Economist

 

6 22 Aug Project presentations    

*Increasingly, hunting money-launderers is automated Economist
How Netflix’s Customer Obsession Created a Customer Obsession
Successful Analytics Leaders Business Intelligence
Should You Pursue a Career in BI/Analytics? Business Intelligence

* These readings are available via Moodle