Redo the tabulation example with temperatures in Celsius.
library(sparklyr) library(tidyverse) spark_install(version='2.0.2') sc <- spark_connect(master = "local", spark_home=spark_home_dir(version = "2.0.2")) url <- "http://people.terry.uga.edu/rwatson/data/centralparktemps.txt" t <- read_delim(url, delim=',') t_tbl <- copy_to(sc,t) t_tbl %>% mutate(Celsius = round((temperature-32)*5/9,0)) %>% group_by(Celsius) %>% summarize(Frequency = n()) %>% arrange(Celsius)
A file of hourly electricity costs for a major city contains a timestamp and cost separated by a comma. Compute the minimum, mean, and maximum costs.
library(sparklyr) library(tidyverse) spark_install(version='2.0.2') sc <- spark_connect(master = "local", spark_home=spark_home_dir(version = "2.0.2")) url <- "http://people.terry.uga.edu/rwatson/data/electricityprices.csv" e <- read_delim(url, delim=',') e_tbl <- copy_to(sc,e) e_tbl %>% summarize(Min = min(cost), Mean = round(mean(cost),2), Max=max(cost))
This page is part of the promotional and support material for Data Management (open edition) by Richard T. Watson |