10
10
grobotstxt is a native Go port of [ Google's robots.txt parser and matcher C++
11
11
library] ( https://github.com/google/robotstxt ) .
12
12
13
- - Direct function-for-function conversion/port.
14
- - Preserves all behaviour of original library.
15
- - All 100% of original test suite functionality.
16
- - The code is not pretty :/
17
- - But the tests all pass! :)
13
+ - Direct function-for-function conversion/port
14
+ - Preserves all behaviour of original library
15
+ - All 100% of original test suite functionality
16
+ - Minor language-specific cleanups
17
+
18
+ The original package includes a standalone binary but that has not yet been ported as part of this package.
19
+
20
+ ## Installation
21
+ ``` bash
22
+ $ go get github.com/jimsmart/grobotstxt
23
+ ```
24
+
25
+ ``` go
26
+ import " github.com/jimsmart/grobotstxt"
27
+ ```
28
+
29
+ ### Dependencies
30
+
31
+ - Standard library.
32
+ - [ Ginkgo] ( https://onsi.github.io/ginkgo/ ) and [ Gomega] ( https://onsi.github.io/gomega/ ) if you wish to run the tests.
33
+
34
+ ## Examples
35
+
36
+ ``` go
37
+ import " github.com/jimsmart/grobotstxt"
38
+
39
+ // Fetched robots.txt file.
40
+ robotsTxt := `
41
+ # robots.txt with restricted area
42
+
43
+ User-agent: *
44
+ Disallow: /members/*
45
+
46
+ Sitemap: http://example.net/sitemap.xml
47
+ `
48
+
49
+ // User-agent of bot.
50
+ const userAgent = " FooBot/1.0"
51
+
52
+ // Target URI.
53
+ uri := " http://example.net/members/index.html"
54
+
55
+
56
+ // Is bot allowed to visit this page?
57
+ ok := grobotstxt.AgentAllowed (robotsTxt, userAgent, uri)
58
+
59
+ ```
60
+
61
+ Additionally, one can also extract all Sitemap URIs from a given robots.txt file:
62
+
63
+ ``` go
64
+ sitemaps := grobotstxt.Sitemaps (robotsTxt)
65
+ ```
66
+
67
+ See GoDocs for further information.
68
+
69
+ ## Documentation
70
+
71
+ GoDocs [ https://godoc.org/github.com/jimsmart/grobotstxt ] ( https://godoc.org/github.com/jimsmart/grobotstxt )
18
72
19
73
## Testing
20
74
@@ -26,6 +80,16 @@ For a full coverage report, try:
26
80
$ go test -coverprofile=coverage.out && go tool cover -html=coverage.out
27
81
```
28
82
83
+ ## Notes
84
+
85
+ Parsing of robots.txt files themselves is done exactly as in the production
86
+ version of Googlebot, including how percent codes and unicode characters in
87
+ patterns are handled. The user must ensure however that the URI passed to the
88
+ AgentAllowed and AgentsAllowed functions, or to the URI parameter
89
+ of the robots tool, follows the format specified by RFC3986, since this library
90
+ will not perform full normalization of those URI parameters. Only if the URI is
91
+ in this format, the matching will be done according to the REP specification.
92
+
29
93
## License
30
94
31
95
Package grobotstxt is licensed under the terms of the
0 commit comments