-
Notifications
You must be signed in to change notification settings - Fork 7
/
Copy pathREADME.Rmd
executable file
·236 lines (139 loc) · 7.47 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
eval = F,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# tiktokr <img src='man/figures/logo.png' align="right" height="139" />
<!-- badges: start -->
[![Lifecycle: experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://www.tidyverse.org/lifecycle/#experimental)
<!-- [![Codecov test coverage](https://codecov.io/gh/benjaminguinaudeau/tiktokr/branch/master/graph/badge.svg)](https://codecov.io/gh/benjaminguinaudeau/tiktokr?branch=master) -->
<!-- [![Travis build status](https://travis-ci.com/benjaminguinaudeau/tiktokr.svg?branch=master)](https://travis-ci.com/benjaminguinaudeau/tiktokr) -->
<!-- badges: end -->
**Disclaimer (January, 7th 2021)**
**At the beginning of December 2020, Tiktok changed its API structure and its security measures to control the traffic of metadata. As a result, requests made with `tiktokr` are blocked very often, if not systematically (error when parsing the json data structure).**
**After trying minor patches, we concluded that Tiktokr needs to be completely rewritten to fit the new infrastructure of Tiktok. Because none of the author has the time currently to rewrite the package, we putting it on hold for now and appologize for the resulting inconvenience. If you are interested in taking over the challenge, we are glad to share the knowledge that we have accumulated along the development of tiktokr.**
The goal of `tiktokr` is to provide a scraper for the video-sharing social networking service [TikTok](http://tiktok.com/).
While writing this library, we were broadly inspired by the Python module [davidteather/TikTok-Api](https://github.com/davidteather/TikTok-Api). You will need Python 3.6 or Docker to use `tiktokr`. If you want to use Docker check out the guide for that [here](https://github.com/benjaminguinaudeau/tiktokr#using-tiktokr-with-docker).
*Many thanks go to [Vivien Fabry](https://twitter.com/ViviFabrien) for creating the hexagon logo.*
**Overview**
+ [Installation](https://github.com/benjaminguinaudeau/tiktokr#installation)
+ [Authentification](https://github.com/benjaminguinaudeau/tiktokr#authentification)
+ [Using tiktokr with Docker](https://github.com/benjaminguinaudeau/tiktokr#using-tiktokr-with-docker)
+ [Examples](https://github.com/benjaminguinaudeau/tiktokr#examples)
## Installation
You can install the development version from [GitHub](https://github.com/) with:
```{r eval = F}
# install.packages("devtools")
devtools::install_github("benjaminguinaudeau/tiktokr")
```
Load library
```{r example}
library(tiktokr)
```
Make sure to use your preferred Python installation
```{r}
library(reticulate)
use_python(py_config()$python)
```
Install necessary Python libraries
```{r, eval = F}
tk_install()
```
## Authentification
In November 2020, Tiktok increased its security protocol. They now frequently show a captcha, which is easily triggered after a few requests. This can be solved by specifying the cookie parameter. To get a cookie session:
1. Open a browser and go to "http://tiktok.com"
2. Scroll down a bit, to ensure, that you don't get any captcha
3. Open the javascript console (in Chrome: View > Developer > Javascript Console)
4. Run `document.cookie` in the console. Copy the entire output (your cookie).
5. Run `tk_auth()` in R and paste the cookie.
Click on image below for screen recording of how to get your TikTok cookie:
[<img src="https://github.com/benjaminguinaudeau/tiktokr/raw/master/data/preview.png" width="100%">](https://youtu.be/kYMV2ugxacs)
The `tk_auth` function will save cookies (and user agent) as environment variable to your `.Renviron` file. You need to only run this once to use `tiktokr` or whenever you want to update your cookie/user agent.
```{r, eval = F}
tk_auth(cookie = "<paste here the output from document.cookie>")
```
## Using `tiktokr` with Docker
TikTok requires API queries to be identified with a unique hash. To get this hash `tiktokr` runs a `puppeteer-chrome` session in the background. Apparently `puppeteer` sometimes causes issues on some operating systems, so we also created a Docker image, that can be run on any computer with Docker installed. Note: if you run `tiktokr` with Docker you won't need a Python installation.
To find out if you are experiencing `puppeteer` problems run:
```{r, eval = F}
library(tiktokr)
Sys.setenv("TIKTOK_DOCKER" = "")
tk_auth(cookie = "<your_cookie_here>")
tk_init()
out <- get_signature("test")
if(stringr::str_length(get_docker_signature("")) > 16){
message("Puppeteer works well on you computer")
} else {
message("Puppeteer does not work, please consider using Docker")
}
```
If you experience problems try to install Docker as outlined in the steps below.
### Installing Docker
If you have either a [Mac](https://docs.docker.com/docker-for-mac/install/), Linux (for example [Ubuntu](https://docs.docker.com/engine/install/ubuntu/)) or [Windows 10 Professional / Education / Enterprise](https://docs.docker.com/docker-for-windows/install/) operating system, simply install Docker (click on respective hyperlinks).
If you only have Windows 10 Home the installation of Docker requires more steps.
1. Follow the steps to [install Windows Subsystem for Linux](https://docs.microsoft.com/en-us/windows/wsl/install-win10)
2. Follow the steps to [install Docker on Windows Home](https://docs.docker.com/docker-for-windows/install-windows-home/)
### Initialize Docker
To run `tiktokr` with Docker you need to use `tk_auth()` with `docker = TRUE` which sets the necessary environment variable.
```{r}
tk_auth(docker = T)
```
Now run `tk_init()` to set up the Docker container.
```{r}
tk_init()
```
You can check whether your Docker container is working correctly by running the following code:
```{r}
if(stringr::str_length(get_docker_signature("")) > 16){
message("Signature successful. Your Docker container is working.")
} else {
message("Unable to get signature")
}
```
Now try running the examples below.
## Examples
For every session involving `tiktokr`, you will need to initialize the package with `tk_init()`. Once
it is initialized you can run as many queries as you want.
```{r}
tk_init()
```
### Get TikTok trends
Returns a tibble with trends.
```{r, eval = F}
# Trend
trends <- tk_posts(scope = "trends", n = 200)
```
### Get TikToks from User
```{r, eval = F}
user_posts <- tk_posts(scope = "user", query = "willsmith", n = 50)
```
### Get TikToks from hashtag
Note: Hashtags query only provides 2k hits, which are not drawn randomly or based on the most recent post date but rather **some mix of recent and popular** TikToks.
```{r, eval = F}
hash_post <- tk_posts(scope = "hashtag", query = "maincharacter", n = 100)
```
### Get TikToks from music id
Note: Hashtags query only provides 2k hits, which are not drawn randomly or based on the most recent post date but rather **some mix of recent and popular** TikToks.
```{r, eval = F}
user_posts <- tk_posts(scope = "user", query = "willsmith", n = 50)
music_post <- tk_posts(scope = "music", query = user_posts$music_id[1], n = 100)
```
### Download TikTok Videos
With `tk_dwnl` you can download videos from TikTok.
From Trends:
```{r, eval = F}
# fs::dir_create("video")
trends <- tk_posts(scope = "trends", n = 5)
trends %>%
split(1:nrow(.)) %>%
purrr::walk(~{tk_dwnl(.x$video_downloadAddr, paste0("video/", .x$id, ".mp4"))})
# fs::dir_delete("video")
```