-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy pathR01_2 vector.Rmd
315 lines (207 loc) · 7.51 KB
/
R01_2 vector.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
---
title: "Basic vector"
author: "Jilung Hsieh"
date: "2018/7/3"
output:
html_document:
number_sections: true
highlight: textmate
theme: spacelab
toc: yes
editor_options:
chunk_output_type: inline
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
# Language
## Assignment
* 將右邊的算式或數值指派給左邊的變數,`<-`可以用`=`單等號來取代,但為了避免混淆,在R中一般都用`<-`。
* 在幾乎所有程式語言中,單等號`=`指的是assignment,把右方的算式、值或物件指給左方的變數。而比較兩者相不相等,則用雙等號`==`,例如`1==3-2`。
```{r}
a <- 1
b <- c(1, 2, 3, 4)
c <- c("1", "2", "3", "4")
d <- c(b, a)
e <- "abcd"
```
## comments 註解
* 在程式碼區塊若前面有`#`字號後面跟著空白的話,那代表那行被標示為註解,程式執行時會自動跳過註解不執行。
* 在RStudio中的快速鍵為選取該行程式碼後打`cmd(ctrl)-shift-c`。
```{r}
# df <- data.frame(a = c(1, 2, 3), b = c(3, 4, 5))
```
# Vector
注意事項:
1. `a <- a[a %% 2 == 0]`當要篩選合乎條件的a時,如果要改變原本的a,就要用assign覆蓋之。
2. `a <- sort(a)`排序後的結果如果要改變原本的a,亦要覆蓋之。
3. 要篩選一個vector中的數值有兩種做法,直接用索引直接指定你要哪幾個,另一個方法是用一個等長的logical vector(TRUE or FALSE)就可以把TRUE的部分選出來。
4. `sort(a)`傳回的是a的數值排序後的結果,`order(a)`傳回的是a由小到大分別在哪幾個位置。
## V1: Creating a Vector
* 資料來源:[台灣出口進口貿易資料查詢](http://cus93.trade.gov.tw/FSC3040F/FSC3040F?menuURL=FSC3040F)
* 在程式碼中,只要是文字必用成對的雙引號或單引號包含其中,以區隔「文字」、「變數」和「數字」。如果看到沒有雙引號的「英文字母」必定是變數名稱,或函式名稱。如果看到有雙引號的數字,那也是文字。
```{r}
country <- c("CN", "US", "JP", "HK", "KR", "SG", "DE", "MY", "VN", "PH", "TH", "AU", "NL", "SA", "ID", "GB", "IN", "FR", "IT", "AE")
p.import <- c(26.142, 12.008, 7.032, 13.646, 4.589, 5.768, 2.131, 2.802, 3.428, 3.019, 1.976, 1.118, 1.624, 0.449, 0.983, 1.302, 1.027, 0.553, 0.670, 0.455)
p.export <- c(22.987, 12.204, 11.837, 7.739, 5.381, 4.610, 2.866, 2.784, 2.414, 2.092, 1.839, 1.788, 1.665, 1.409, 1.391, 1.075, 0.974, 0.899, 0.800, 0.728)
```
```{r}
a <- seq(11, 99, 11)
b <- 11:20
# create by distribution
x <- runif(1000, 1, 10) # uniform dist, n=1000
x <- rnorm(10000000, 1, 10) # normal dist, n=1000
```
## V2: Plotting
```{r}
plot(density(x))
```
## V3: Viewing
只要直接打該變數名稱,該變數的內容就會被列印出來。例如`df`或者是做完數學操作(如加減乘除),而沒有把他assign給新的變數,那就會被列印出來。
```{r}
country
p.import
head(country)
tail(country)
length(country)
# View(country)
```
## V4: Subsetting, slicing
```{r}
country[c(5, 3, 1)] # how about country[c(1, 3, 5)]
country[3:6] # is it equal to country[c(3, 4, 5, 6)]
a <- 11:19
a[3:length(a)]
a[length(a):3]
```
## V5: Deleting
Without assignment, deletion won't change original vectors
實際上像下面這樣只是用負號來篩掉不要的項目,原本變數沒變,要被覆蓋掉才變。
```{r}
b <- 11:20
b[-(3:5)]
b[-c(1, 3, 5)]
b
```
* (V) Deletion with assignment to replace original vector
(重要)當用負號篩選掉不要的那些項目後,必須要Assign以覆蓋掉原本的變數,不然原本的變數不會有所改變。
```{r}
b <- b[-(3:5)]
b
a <- seq(11, 99, 11)
a <- a[-c(1, 3, 5)]
a
```
## V6: Concatinating
```{r}
a <- 1:10
a <- c(a, 11)
a
b
a <- c(a, b)
a
a <- c(a, a, b)
a
```
## V7: Arithmetic operations
* `a <- a %% 2`除以二取餘數
* `a <- a %/% 2`除以二取商
```{r}
a <- 11:19
a + 3
a / 2
a %% 2
a %/% 2
a <- a %% 2 # modular arithmetic, get the reminder
a <- a %/% 2 # Quotient
```
## V8: Logic comparisons
```{r}
a %% 2 == 0 # deteting odd/even number
a %% 2 != 0
a[a%%2==0]
a > b
p.import > mean(p.import)
p.import > p.export
TRUE == T # == equal to,
TRUE != F # != Not equal to
any(a>11) # is there any element larger than 1
all(a>11) # are all elements larger than 1
```
## V9: Subsetting by logic comparisons
* two methods to filter data from vectors, by index vector or a logical vector with equal length.
* `a%% 2 == 0`意味在判斷a除以2會不會等於0。計算後的結果會變成TRUE和FALSE,因為邏輯判斷所產生的就是TRUE或FALSE。所以他會產生一整個TRUE或FALSE的Vector。把這樣的Vector塞給a後,它會留下TRUE的部分,FALSE的部分則不會留下。
```{r}
a <- seq(11, 55, 11)
a[c(T, F, T, F, T)]
a[a%%2==1]
a%%2
a%%2==1
a <- c("你好","你好棒棒","你好棒","你真的好棒")
a[nchar(a)>3]
# which will return "index-of"
a <- seq(11, 55, 11)
a[which(a%%2==1)]
which(a%%2==1)
```
## V10: Sorting
* 排序`sort(x)`的結果必須用`<-`覆蓋原本的`x`,此時的`x`才會是排序的結果。
* `order(x)`函式會傳回`x`數值由小到大的**索引**。這個例子的結果是`5, 4, 3, 6, 1, 2`,也就是`5`位置的那個數最小、`4`那個位置的數次小、接下來`3, 6, 1, 2`。
* `x[order(x)]`把`order(x)`結果(也就是`c(5, 4, 3, 6, 1, 2)`)傳給原本的`x`便會使得原本的`x`重新排序。通常`order()`的用途是,我們可以將兩個等長的variables例如var1和var2,依據var2來重新排序var1,例如var1[order(var2)]。
```{r}
x <- c(33, 55, 22, 13, 4, 24)
sort(x)
x <- sort(x) # assign to replace original x
order(x)
x[order(x)]
```
## V11: Using built-in math functions
```{r}
a <- 11:19
min(a); max(a); mean(a); median(a); sd(a)
log2(a)
log1p(a)
?log1p
```
## V12: Checking data type
```{r}
mode(country) # character
mode(p.import) # numeric
mode(p.import > mean(p.import)) # logical
p.importc <- c("26.142", "12.008", "7.032", "13.646", "4.589")
mode(p.importc) # character
```
## V13: Converting data type
* numeric vector可以用`as.character(x)`轉成`charcter`;logical vector可以用`as.numeric(x)`轉為`numeric`。概念上可以說是`character > numeric > logical`。
* 如果硬是在logical vector後附加一個numeric element的話,那就會整個vector被轉為numeric vector;相仿地,如果numeric vector後附加一個character element的話那整個vector就會被轉為character vector。
* 可以用`sum()`函式來計算logical vector有幾個`TRUE`值。例如`sum(a%%2==1)`就是計算`a`中有幾個奇數。`TRUE`可視為`1`、`FALSE`可視為`0`,所以加總起來就是`TRUE`有幾個。
```{r}
p.importc <- as.character(p.import)
p.importn <- as.numeric(p.importc)
a <- seq(11, 99, 11)
a <- c(a, "100")
a <- seq(11, 99, 11)
sum(a%%2==1)
```
## V14: Character operations
```{r}
a <- seq(11, 55, 11)
paste("A", a) # concatenate
paste0("A", a) # concatenate
```
# Practice
## Filtering data
```{r}
x.a <- rnorm(1000, 1, 10)
# 1.1 Filter out extreme values out of two standard deviations
# 1.2 Plotting the distribution of the remaining vector x.a
# 1.3 Calculate the 25% 50% and 75% quantile of vector x.a. You may google "quantile r"
# 1.4 Get the number between 25% to 75% and assign to x.a1
# 1.5 Plotting x.a1
```
## Practice II: Accessing characters
```{r}
x.b <- c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k")
# 2.1 Get only elements at odd positions and assign to x.b1
# 2.2 Truncate the first 2 elements and the last 2 elements and assign to x.b2
```