You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -36,7 +36,7 @@ Informal evaluation shows that the time for each request grows linearly in size,
36
36
## Launching server
37
37
The script *launch.sh* takes care of launching both the Stanford server and the accessor.py. If *tmux* is installed, it will try to launch these process in a new tmux session so that the shell can be closed without stopping the processes necessary for the server. Run the script as follows:
38
38
39
-
`./launch.sh`
39
+
`./launch.sh` or `./launch.sh -v`
40
40
41
41
Alternatively, you can start the processes manually. From the *stanford* directory, the server can be launched as follows:
<aid="git-link" onmouseover="document.getElementById('git-img').src='../static/git-hover.png';" onmouseout="document.getElementById('git-img').src='../static/git-colored.png';" href="https://github.com/Aequivinius/dbcls">see on <imgid="git-img" src="../static/git-colored.png" alt="GitHub" width="15px"/></a>
53
+
</div><!-- #title -->
54
+
</div><!-- #header -->
55
+
</div><!-- .blur-container -->
56
+
57
+
{% if error %}
58
+
<divid="error" class="blur-container">
59
+
<divclass="pre-blur-box"></div>
60
+
<divclass="blur-box error">
61
+
{{ error }}
62
+
63
+
{% if dump %}
64
+
<pre><code>{{dump}}</code></pre>
108
65
{% endif %}
109
-
110
-
<divid="explain" class="blur-container">
111
-
<divclass="pre-blur-box"></div>
112
-
<divclass="blur-box">
113
-
<h3>In a nutshell</h3>
114
-
<p>This is the web interface for a pipeline that delivers fast, accurate dependency parses using <ahref="http://nlp.stanford.edu/software/tagger.shtml">Stanford POS tagger</a> and <ahref="https://spacy.io/">spaCy</a> dependency parser. The result is returned in <ahref="http://www.pubannotation.org/docs/annotation-format/">PubAnnotation JSON</a>, which is visualized using <ahref="http://textae.pubannotation.org">TextAE</a>.</p>
115
-
116
-
117
-
118
-
119
-
<h3>Usage</h3>
120
-
<p>Either use via the web interface above, or using <ahref="https://curl.haxx.se/">cURL</a> as shown below:</p>
121
-
<code>
122
-
curl -H "content-type:application/json" -d '{"text":"This is a sample sentence."}' http://spacy.dbcls.jp/spacy_rest/spacy_rest/</code><br/><br/>
123
-
124
-
<code>curl -d text="Induction of chromosome banding by trypsin/EDTA for gene mapping by in situ hybridization." http://spacy.dbcls.jp/spacy_rest/spacy_rest/</code>
125
-
126
-
<p>Any other service making RESTful requests should be possible to use. In particular, this web service was developed in order to be used in conjunction with <ahref="http://pubannotation.org/">PubAnnotation</a>, which allows users to obtain annotations for collections of biomedical text automatically, and align them with the original publication.</p>
127
-
128
-
<h3>Parsing</h3>
129
-
<p>A combination of the famous <ahref="http://nlp.stanford.edu/software/tagger.shtml">Stanford POS tagger</a> and <ahref="https://spacy.io/">spaCy</a> for parsing is used.</p>
130
-
131
-
<p>spaCy is a python library for NLP. It's main strength is it's speed, owing to the underlying implementation in cython. While it does offer tokenization and POS tagging, we found that the tagger does not perform well, especially in the biomedical domain.</p>
132
-
133
-
<p>We thus employ Stanford tagger in conjunction with spaCy's parser to provide high accuracy and speed. For more information about this approach, see the <ahref="http://cs.aequivinius.ch/downloads/dependencyparsing.pdf">dissertation</a> that informed this project.</p>
<p>This is the web interface for a pipeline that delivers fast, accurate dependency parses using <ahref="http://nlp.stanford.edu/software/tagger.shtml">Stanford POS tagger</a> and <ahref="https://spacy.io/">spaCy</a> dependency parser. The result is returned in <ahref="http://www.pubannotation.org/docs/annotation-format/">PubAnnotation JSON</a>, which is visualized using <ahref="http://textae.pubannotation.org">TextAE</a>.</p>
102
+
103
+
<h3>Usage</h3>
104
+
<p>Either use via the web interface above, or using <ahref="https://curl.haxx.se/">cURL</a> as shown below:</p>
105
+
<code>curl -H "content-type:application/json" -d '{"text":"This is a sample sentence."}' http://spacy.dbcls.jp/spacy_rest/spacy_rest/</code><br/><br/>
106
+
<code>curl -d text="Induction of chromosome banding by trypsin/EDTA for gene mapping by in situ hybridization." http://spacy.dbcls.jp/spacy_rest/spacy_rest/</code>
107
+
<p>Any other service making RESTful requests should be possible to use. In particular, this web service was developed in order to be used in conjunction with <ahref="http://pubannotation.org/">PubAnnotation</a>, which allows users to obtain annotations for collections of biomedical text automatically, and align them with the original publication.</p>
108
+
109
+
<h3>Parsing</h3>
110
+
<p>A combination of the famous <ahref="http://nlp.stanford.edu/software/tagger.shtml">Stanford POS tagger</a> and <ahref="https://spacy.io/">spaCy</a> for parsing is used.</p>
111
+
112
+
<p>spaCy is a python library for NLP. It's main strength is it's speed, owing to the underlying implementation in cython. While it does offer tokenization and POS tagging, we found that the tagger does not perform well, especially in the biomedical domain.</p>
113
+
114
+
<p>We thus employ Stanford tagger in conjunction with spaCy's parser to provide high accuracy and speed. For more information about this approach, see the <ahref="http://cs.aequivinius.ch/downloads/dependencyparsing.pdf">dissertation</a> that informed this project.</p>
115
+
135
116
<h3>Implementation</h3>
136
117
<p>The Stanford POS tagger is written in Java, and can be run as a server using <ahref="https://docs.oracle.com/javase/7/docs/api/java/net/ServerSocket.html">java.net.ServerSocket</a>. Using sockets proved to be an easy way to communicate between python and Java reliably.<p>
137
-
138
-
<p>The main script uses <ahref="http://flask.pocoo.org/">Flask</a> to launch a web service listening to <code>rest_spacy</code> and <code>rest_spacy/</code> for incoming requests. Since loading spaCy models takes a considerable amount of time, one object containing these models is maintained that is then used for all requests.</p>
139
-
140
-
<p>For every request, a new client socket for the Stanford server socket is created, and Stanford's reply is read. The tokenized and tagged text is then passed to the spaCy object. The parses provided by spaCy are then realigned with the original text to facilitate using the results in PubAnnotation, converted into JSON and returned to the client.</p>
141
-
</div>
142
-
</div>
143
-
</div>
144
-
145
-
<divid="explain" class="blur-container">
146
-
<divclass="pre-blur-box"></div>
147
-
<divclass="blur-box">
148
-
149
-
<p>This web service was developed as a joint project of the <ahref="http://www.cl.uzh.ch/de.html">Institute of Computational Linguistics</a> at the University of Zurich, and the <ahref="http://dbcls.rois.ac.jp/en/">DBCLS</a> in Japan. It is funded by a grant by <ahref="https://www.jsps.go.jp/english/">JSPS</a>.</p>
150
-
151
-
<divclass="img-box">
152
-
118
+
119
+
<p>The main script uses <ahref="http://flask.pocoo.org/">Flask</a> to launch a web service listening to <code>rest_spacy</code> and <code>rest_spacy/</code> for incoming requests. Since loading spaCy models takes a considerable amount of time, one object containing these models is maintained that is then used for all requests.</p>
120
+
121
+
<p>For every request, a new client socket for the Stanford server socket is created, and Stanford's reply is read. The tokenized and tagged text is then passed to the spaCy object. The parses provided by spaCy are then realigned with the original text to facilitate using the results in PubAnnotation, converted into JSON and returned to the client.</p>
122
+
</div><!--blur-box -->
123
+
</div><!-- #explain -->
124
+
125
+
<divid="credits" class="blur-container">
126
+
<divclass="pre-blur-box"></div>
127
+
<divclass="blur-box">
128
+
<p>This web service was developed as a joint project of the <ahref="http://www.cl.uzh.ch/de.html">Institute of Computational Linguistics</a> at the University of Zurich, and the <ahref="http://dbcls.rois.ac.jp/en/">DBCLS</a> in Japan. It is funded by a grant by <ahref="https://www.jsps.go.jp/english/">JSPS</a>.</p>
0 commit comments