Skip to content

Commit b7423c1

Browse files
author
Johannes Heinecke
committed
search sentences with (source file) line number
documentation updated
1 parent 82b8e53 commit b7423c1

File tree

16 files changed

+1614
-109
lines changed

16 files changed

+1614
-109
lines changed

CHANGES.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,9 @@
11
# Changes
22

3+
## Version 2.30.0
4+
* search sentences with line number
5+
* new tests
6+
37
## Version 2.29.4
48
* minor change in CSS (to allow triple click to copy the current sentence)
59

Dockerfile

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,11 @@
11
#FROM openjdk:17-alpine
22
FROM amazoncorretto:24-jdk
33

4-
ARG VERSION=2.29.4
5-
# docker build --build-arg VERSION=2.29.4 -t jheinecke/conllueditor:2.29.4 .
6-
# docker build --build-arg VERSION=2.29.4 -t jheinecke/conllueditor:latest .
4+
ARG VERSION=2.30.0
5+
# docker build --build-arg VERSION=2.30.0 -t jheinecke/conllueditor:2.30.0 .
6+
# docker build --build-arg VERSION=2.30.0 -t jheinecke/conllueditor:latest .
77
# docker run -t --rm --name conllueditor -p 5555:5555 --user 1000:1000 -v </absolute/path/to/datadir>:/data --env filename=tt.conllu jheinecke/conllueditor:latest
8-
# docker push jheinecke/conllueditor:2.29.4
8+
# docker push jheinecke/conllueditor:2.30.0
99
# docker push jheinecke/conllueditor:latest
1010

1111
# docker exec -it conllueditor /bin/sh

README.md

Lines changed: 15 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -9,23 +9,24 @@ The editor provides the following functionalities:
99
* join/split/delete words (to correct tokenization errors)
1010
* join/split sentences (to correct segmentation errors)
1111
* undo/redo (partially)
12-
* search: forms, lemmas, UPOS, XPOS, deprels, sentences IDs and comments, sequences of any of these,
12+
* [search](#searching): forms, lemmas, UPOS, XPOS, deprels, sentences IDs and comments, sequences of any of these,
1313
searching for subtrees, importing subtrees from current sentence, sd-parse support
14+
* [searching by source file line numbers](#search-by-source-file-line-number)
1415
* edit non-CoNLL-U columns in a subset of [CoNLL-U plus files](http://universaldependencies.org/ext-format.html)
1516
* create multiword tokens from existing words or add a MWT to contract two ore more existing words
1617
* git support
1718
* export of dependency graphs as svg or LaTeX (for the [tikz-dependency](https://ctan.org/pkg/tikz-dependency) package or
1819
the [doc/deptree.sty](doc/deptree.sty) class, see [documentation](doc/deptree-doc.pdf))
1920
* prohibits invalid (cyclic) trees
2021
* Three edit modes: dependency trees, dependency «hedges» and a table edit mode
21-
* mass editing: modify tokens if a (complex) condition is satisfied
22+
* [mass editing](#mass-editing): modify tokens if a (complex) condition is satisfied
2223
* validation (using implications: _if conditions1 true then conditions2 must be true_)
2324
* sentence metadata editing
2425
* adding Translit= values to the MISC column (transliterating the FORM column) see section [Transliteration](#transliteration)
2526
* finding similar or identical sentence in a list of CoNLL-U files, see section [Find Similar Sentences](#find-similar-sentences)
26-
* configuring the UI on order to hide unneeded functionalities which otherwise clutter the UI
27+
* [configuring the UI](#ui-configuration) on order to hide unneeded functionalities which otherwise clutter the UI
2728

28-
Current version: 2.29.4 (see [change history](CHANGES.md))
29+
Current version: 2.30.0 (see [change history](CHANGES.md))
2930

3031
ConlluEditor can also be used as front-end to display the results of dependency parsing in the same way as the editor.
3132
* dependency tree/dependency hedge
@@ -390,6 +391,16 @@ Other search modes can be chosen with the search select bar (top rop right)
390391
* match like search (cf. [Grew Match](http://universal.grew.fr/[email protected]))
391392
* no search (to have less headers on top of the GUI)
392393

394+
## Search by (source file) line number
395+
396+
The search field on the top right permits to search a sentence by givening a line number of the edited `.conlu` file.
397+
The sentences in which this line number occurs is displayed. If the line number is not a comment line, the word which is on the line is highlighted.
398+
This can be useful to find lines output by the various UD validators.
399+
If sentences are modified at change length (by splitting a word or joining two words, or by adding or deleting comment lines, the line numbers are adapted.
400+
Type in line number in field on the top right and click `line number`:
401+
402+
![Search by line number](doc/search_line_number.png)
403+
393404
## Complex search and search and replace
394405

395406
This opens a search and search-and-display field. The search fields provides a simple language to find sentences with one or several nodes (see [Mass Editing](doc/mass_editing.md))

gui/edit.js

Lines changed: 14 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -441,25 +441,25 @@ function getServerInfo() {
441441
} else {
442442
$('#save').hide();
443443
}
444-
444+
445445
// deactivate buttons as defined in uiconfig.json
446446
console.log("UI", data.uiconfig);
447447
if (data.uiconfig) {
448-
448+
449449
if (data.uiconfig.right2left_show === "hidden") {
450450
$("#r2l").hide();
451-
}
451+
}
452452
if (data.uiconfig.right2left_status === "active") {
453453
showr2l = true;
454454
$("#r2l").addClass('active');
455455
} else {
456456
showr2l = false;
457457
$("#r2l").removeClass('active');
458458
}
459-
459+
460460
if (data.uiconfig.features_show === "hidden") {
461461
$("#feat2").hide();
462-
}
462+
}
463463
if (data.uiconfig.features_status === "active") {
464464
showfeats = true;
465465
$("#feat2").addClass('active');
@@ -477,7 +477,7 @@ function getServerInfo() {
477477
showmisc = false;
478478
$("#misc2").removeClass('active');
479479
}
480-
480+
481481
if (data.uiconfig.display === "flat") {
482482
graphtype = 2;
483483
$("#flat3").val("flat");
@@ -488,7 +488,7 @@ function getServerInfo() {
488488
graphtype = 1;
489489
$("#flat3").val("tree");
490490
}
491-
491+
492492
if (data.uiconfig.searchmode === "simple") {
493493
$("#searchmode").val("simple");
494494
}
@@ -503,12 +503,12 @@ function getServerInfo() {
503503
}
504504
$("#searchmode").click();
505505

506-
506+
507507
if (data.uiconfig.shortcuts === "show") {
508508
showshortcuthelp = false;
509509
ToggleShortcutHelp();
510510
}
511-
511+
512512
if (data.uiconfig.nodewidth_show === "hidden") {
513513
$("#adaptwidth").removeClass('onlyWithTree');
514514
$("#adaptwidth").hide();
@@ -521,8 +521,8 @@ function getServerInfo() {
521521
$("#adaptwidth").click();
522522
//$("#misc2").removeClass('active');
523523
}
524-
525-
524+
525+
526526
if (data.uiconfig.latex === "hidden") {
527527
$("#latex").removeClass('onlyWithTree');
528528
$("#latex").hide();
@@ -539,7 +539,7 @@ function getServerInfo() {
539539
$("#json").removeClass('onlyWithTree');
540540
$("#json").hide();
541541
}
542-
542+
543543
}
544544
// set version number to logo (shown if mouse hovers on the logo)
545545
//$('#logo').attr("title", data.version);
@@ -2547,6 +2547,8 @@ $(document).ready(function () {
25472547
inputtext = "read last";
25482548
} else if (this.id === "lire") {
25492549
inputtext = "read " + ($("#sentid").val() - 1);
2550+
} else if (this.id === "lireln") {
2551+
inputtext = "line " + ($("#sentid_by_ln").val());
25502552
} else if (this.id === "modifier") {
25512553
var inputtext = "mod " + $("#mods").val();
25522554
} else if (this.id === "valid") {

gui/index.html

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -131,6 +131,10 @@
131131
<option value="grew">grew match</option>
132132
<option value="hide">hide search</option>
133133
</select>
134+
</span>
135+
136+
<button class="editbuttons mybutton" id="lireln">line number</button>
137+
<input type="text" id="sentid_by_ln" class="inputfield" pattern="[1-9][0-9]*" size="5" value="1" >
134138
</td>
135139
</tr>
136140

pom.xml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -32,13 +32,13 @@
3232
THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
3333
3434
author Johannes Heinecke
35-
version 2.29.4 as of 21st February 2025
35+
version 2.30.0 as of 12th April 2025
3636
-->
3737

3838
<modelVersion>4.0.0</modelVersion>
3939
<groupId>com.orange.labs</groupId>
4040
<artifactId>ConlluEditor</artifactId>
41-
<version>2.29.4</version>
41+
<version>2.30.0</version>
4242
<packaging>jar</packaging>
4343

4444
<properties>

src/main/java/com/orange/labs/conllparser/ConllFile.java

Lines changed: 30 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
/* This library is under the 3-Clause BSD License
22
3-
Copyright (c) 2018-2024, Orange S.A.
3+
Copyright (c) 2018-2025, Orange S.A.
44
55
Redistribution and use in source and binary forms, with or without modification,
66
are permitted provided that the following conditions are met:
@@ -28,7 +28,7 @@
2828
THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
2929
3030
@author Johannes Heinecke
31-
@version 2.27.0 as of 28th September 2024
31+
@version 2.30.0 as of 12th April 2025
3232
*/
3333
package com.orange.labs.conllparser;
3434

@@ -82,36 +82,23 @@ public class ConllFile {
8282
* open CoNLL-U File and read its contents
8383
*
8484
* @param file CONLL file
85-
* @param ignoreSentencesWithoutAnnot ignore sentences which do not have any
86-
* information above columns 12
87-
* @param ignoreSentencesWithoutTarget ignore sentences which do not have
88-
* any target as annotation
8985
* @throws IOException
90-
* @throws com.orange.labs.nlp.conllparser.ConllWord.ConllWordException
86+
* @throws ConllException
9187
*/
92-
public ConllFile(File file/*, boolean ignoreSentencesWithoutAnnot, boolean ignoreSentencesWithoutTarget*/) throws IOException, ConllException {
88+
public ConllFile(File file) throws IOException, ConllException {
9389
this.file = file;
9490
FileInputStream fis = new FileInputStream(file);
95-
parse(fis /*, ignoreSentencesWithoutAnnot, ignoreSentencesWithoutTarget*/);
91+
parse(fis);
9692
fis.close();
9793
}
9894

9995
/**
100-
*
101-
* @param filecontents contenu du fichier COLL
102-
* @param ignoreSentencesWithoutAnnot ignore sentences which do not have any
103-
* information above columns 12
104-
* @param ignoreSentencesWithoutTarget ignore sentences which do not have
105-
* any target as annotation
96+
* @param file
97+
* @param cs class to use instead of ConllSentence (must be a subclass)
98+
10699
* @throws ConllException
107100
* @throws IOException
108101
*/
109-
// public ConllFile(String filecontents/*, boolean ignoreSentencesWithoutAnnot, boolean ignoreSentencesWithoutTarget*/) throws ConllException, IOException {
110-
// this.file = new File("__contents__");
111-
// InputStream inputStream = new ByteArrayInputStream(filecontents.getBytes(StandardCharsets.UTF_8));
112-
// parse(inputStream/*, ignoreSentencesWithoutAnnot, ignoreSentencesWithoutTarget*/);
113-
// }
114-
115102
public ConllFile(File file, Class<? extends ConllSentence> cs) throws IOException, ConllException {
116103
this.file = file;
117104
conllsentenceSubclass = cs;
@@ -316,6 +303,28 @@ public List<ConllSentence> getSentences() {
316303
return sentences;
317304
}
318305

306+
/** get the sentence which contains the linenumber ln (in the conllu file).
307+
* if any sentence is modified we recalculate. For long files this can take some time
308+
* @param ln the line number for which we search the sentence
309+
* @return an arry of sentence number, position of the given line in the sentence, comments length
310+
*/
311+
public int[] getSentence_with_line(int ln) {
312+
int ends_after = 0;
313+
int first_line = 1;
314+
int sn = 0;
315+
for (ConllSentence csent : sentences) {
316+
ends_after += csent.get_source_length();
317+
if (ends_after >= ln) {
318+
int[] sn_offset = {sn, first_line, csent.get_comment_length()};
319+
320+
return sn_offset;
321+
}
322+
first_line = ends_after + 1;
323+
sn++;
324+
}
325+
return null;
326+
}
327+
319328
public void addSentences(List<ConllSentence> s) {
320329
sentences.addAll(s);
321330
}

0 commit comments

Comments
 (0)