forked from WinVector/PDSwR2
-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.Rmd
76 lines (59 loc) · 2.95 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
# Example code and data for "Practical Data Science with R" by Nina Zumel and John Mount, Manning 2014.
* The book: ["Practical Data Science with R" by Nina Zumel and John Mount, Manning 2014](http://www.manning.com/zumel/) (book copyright Manning Publications Co., all rights reserved)
* The support site: [GitHub WinVector/PDSwR2](https://github.com/WinVector/PDSwR2)
## The code and data in this directory supports examples from:
* Appendix A: Working with R And Other Tools
# SQL example in knitr Markdown
Material from [Practical Data Science with R examples GitHub archive](https://github.com/WinVector/PDSwR2/)
In support of the Hotel/SQL example in Appendix A of [Practical Data Science with R](http://www.manning.com/zumel/) by Nina Zumel and John Mount.
* start with README.Rmd and Workbook1.xlsx
* produce README.md, HotelRelation.pdf and figure/* by running "knit('README.Rmd')" in R
* produce README.html by running "pandoc README.md -o README.html" in bash shell
* or all in one shot at the bash shell: echo "library('knitr'); knit('README.Rmd')" | R --vanilla ; pandoc README.md -o README.html
```{r allsteps,tidy=F}
library('xlsx')
bookings <- read.xlsx('Workbook1.xlsx',1,startRow=3)
prices <- read.xlsx('Workbook1.xlsx',2,startRow=3)
library('reshape2')
bthin <- melt(bookings,id.vars=c('date'),
variable.name='daysBefore',value.name='bookings')
pthin <- melt(prices,id.vars=c('date'),
variable.name='daysBefore',value.name='price')
daysCodes <- c('day.of.stay', 'X1.before', 'X2.before', 'X3.before')
bthin$nDaysBefore <- match(bthin$daysBefore,daysCodes)-1
pthin$nDaysBefore <- match(pthin$daysBefore,daysCodes)-1
# prevent sqldf from triggering tcl/tk dependency
# see: https://code.google.com/p/sqldf/ Troubleshooting
options(gsubfn.engine = "R")
library('sqldf')
joined <- sqldf('
select
bCurrent.date as StayDate,
bCurrent.daysBefore as daysBefore,
bCurrent.nDaysBefore as nDaysBefore,
p.price as price,
bCurrent.bookings as bookingsCurrent,
bPrevious.bookings as bookingsPrevious,
bCurrent.bookings - bPrevious.bookings as pickup
from
bthin bCurrent
join
bthin bPrevious
on
bCurrent.date=bPrevious.date
and bCurrent.nDaysBefore+1=bPrevious.nDaysBefore
join
pthin p
on
bCurrent.date=p.date
and bCurrent.nDaysBefore=p.nDaysBefore
')
library('ggplot2')
plt <- ggplot(data=joined,aes(x=price,y=pickup)) +
geom_point() + geom_jitter() + geom_smooth(method='lm')
print(plt)
ggsave(filename='HotelRelation.pdf',plot=plt)
print(summary(lm(pickup~price,data=joined)))
```
## Code example license
<a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="http://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License</a>.