Skip to content

Commit fd99fbe

Browse files
committed
more work
1 parent 1f052bd commit fd99fbe

7 files changed

+215
-50
lines changed

.DS_Store

0 Bytes
Binary file not shown.

README.md

+126-2
Original file line numberDiff line numberDiff line change
@@ -557,15 +557,139 @@ pip install https://huggingface.co/emiltj/da_multi_dupli_rater_1_onto/resolve/ma
557557
python src/predict_single/predict_rater_2-9.py
558558
```
559559

560-
# GOTTEN TO HERE
561-
562560
- **Assess agreement between rater and model**
563561
- Make assessment fine-grained, and assess for each type of ent, in prodigy using the review recipe
562+
For rater 3:
563+
Cases where ents are same between predicted and model: 638
564+
Cases where ents are NOT same between preds and model: 888
565+
For rater 4:
566+
Cases where ents are same between predicted and model: 1114
567+
Cases where ents are NOT same between preds and model: 1363
568+
For rater 5:
569+
Cases where ents are same between predicted and model: 422
570+
Cases where ents are NOT same between preds and model: 980
571+
For rater 6:
572+
Cases where ents are same between predicted and model: 1046
573+
Cases where ents are NOT same between preds and model: 1213
574+
For rater 7:
575+
Cases where ents are same between predicted and model: 754
576+
Cases where ents are NOT same between preds and model: 1148
577+
For rater 8:
578+
Cases where ents are same between predicted and model: 622
579+
Cases where ents are NOT same between preds and model: 1076
580+
For rater 9:
581+
Cases where ents are same between predicted and model: 906
582+
Cases where ents are NOT same between preds and model: 1203
583+
Total cases where ents are same 5502
584+
Total cases where ents are NOT same 7871
564585
```bash
565586
# Go through script manually:
566587
# src/data_assessment/model_and_raters_agreement.ipynb
567588
```
568589

590+
- **Add predictions to db**
591+
- Creates in db:
592+
- rater_"$i"_single_unprocessed_preds
593+
- Creates in folders:
594+
- ./data/single/unprocessed/rater_$i/rater_"$i"_preds.jsonl
595+
```bash
596+
# tools/raters_preds_to_db.sh
597+
prodigy drop rater_2_single_unprocessed_preds
598+
prodigy drop rater_10_single_unprocessed_preds
599+
```
600+
601+
# GOTTEN TO HERE
602+
603+
- **Review raters 3, 4, 5, 6, 7, 8, 9**
604+
```bash
605+
prodigy review rater_3_single_gold_all rater_3_single_unprocessed,rater_3_single_unprocessed_preds --label PERSON,NORP,FACILITY,ORGANIZATION,LOCATION,EVENT,LAW,DATE,TIME,PERCENT,MONEY,QUANTITY,ORDINAL,CARDINAL,GPE,WORK\ OF\ ART,LANGUAGE,PRODUCT -S -A
606+
607+
prodigy review rater_4_single_gold_all rater_4_single_unprocessed,rater_4_single_unprocessed_preds --label PERSON,NORP,FACILITY,ORGANIZATION,LOCATION,EVENT,LAW,DATE,TIME,PERCENT,MONEY,QUANTITY,ORDINAL,CARDINAL,GPE,WORK\ OF\ ART,LANGUAGE,PRODUCT -S -A
608+
609+
prodigy review rater_5_single_gold_all rater_5_single_unprocessed,rater_5_single_unprocessed_preds --label PERSON,NORP,FACILITY,ORGANIZATION,LOCATION,EVENT,LAW,DATE,TIME,PERCENT,MONEY,QUANTITY,ORDINAL,CARDINAL,GPE,WORK\ OF\ ART,LANGUAGE,PRODUCT -S -A
610+
611+
prodigy review rater_6_single_gold_all rater_6_single_unprocessed,rater_6_single_unprocessed_preds --label PERSON,NORP,FACILITY,ORGANIZATION,LOCATION,EVENT,LAW,DATE,TIME,PERCENT,MONEY,QUANTITY,ORDINAL,CARDINAL,GPE,WORK\ OF\ ART,LANGUAGE,PRODUCT -S -A
612+
613+
prodigy review rater_7_single_gold_all rater_7_single_unprocessed,rater_7_single_unprocessed_preds --label PERSON,NORP,FACILITY,ORGANIZATION,LOCATION,EVENT,LAW,DATE,TIME,PERCENT,MONEY,QUANTITY,ORDINAL,CARDINAL,GPE,WORK\ OF\ ART,LANGUAGE,PRODUCT -S -A
614+
615+
prodigy review rater_8_single_gold_all rater_8_single_unprocessed,rater_8_single_unprocessed_preds --label PERSON,NORP,FACILITY,ORGANIZATION,LOCATION,EVENT,LAW,DATE,TIME,PERCENT,MONEY,QUANTITY,ORDINAL,CARDINAL,GPE,WORK\ OF\ ART,LANGUAGE,PRODUCT -S -A
616+
617+
prodigy review rater_9_single_gold_all rater_9_single_unprocessed,rater_9_single_unprocessed_preds --label PERSON,NORP,FACILITY,ORGANIZATION,LOCATION,EVENT,LAW,DATE,TIME,PERCENT,MONEY,QUANTITY,ORDINAL,CARDINAL,GPE,WORK\ OF\ ART,LANGUAGE,PRODUCT -S -A
618+
```
619+
620+
- **Split the rater_3_single_gold_all**
621+
- Creates new files:
622+
- ./data/single/gold/rater_{r}/rater_{r}_single_gold_all.jsonl
623+
- Creates new in db:
624+
- rater_{r}_single_gold_accepted
625+
- rater_{r}_single_gold_ignored
626+
- rater_{r}_single_gold_rejected
627+
```bash
628+
python src/preprocessing/split_by_answer_rater_3_9_single_gold.py
629+
```
630+
631+
- **Resolve ignored cases in rater_{r}_single_gold_ignored**
632+
- Creates in db:
633+
- rater_{r}_single_gold_ignored_resolved
634+
```bash
635+
prodigy mark rater_3_single_gold_ignored_resolved dataset:rater_3_single_gold_ignored --view-id review --label PERSON,NORP,FACILITY,ORGANIZATION,LOCATION,EVENT,LAW,DATE,TIME,PERCENT,MONEY,QUANTITY,ORDINAL,CARDINAL,GPE,WORK\ OF\ ART,LANGUAGE,PRODUCT
636+
637+
prodigy mark rater_4_single_gold_ignored_resolved dataset:rater_4_single_gold_ignored --view-id review --label PERSON,NORP,FACILITY,ORGANIZATION,LOCATION,EVENT,LAW,DATE,TIME,PERCENT,MONEY,QUANTITY,ORDINAL,CARDINAL,GPE,WORK\ OF\ ART,LANGUAGE,PRODUCT
638+
639+
prodigy mark rater_5_single_gold_ignored_resolved dataset:rater_5_single_gold_ignored --view-id review --label PERSON,NORP,FACILITY,ORGANIZATION,LOCATION,EVENT,LAW,DATE,TIME,PERCENT,MONEY,QUANTITY,ORDINAL,CARDINAL,GPE,WORK\ OF\ ART,LANGUAGE,PRODUCT
640+
641+
prodigy mark rater_6_single_gold_ignored_resolved dataset:rater_6_single_gold_ignored --view-id review --label PERSON,NORP,FACILITY,ORGANIZATION,LOCATION,EVENT,LAW,DATE,TIME,PERCENT,MONEY,QUANTITY,ORDINAL,CARDINAL,GPE,WORK\ OF\ ART,LANGUAGE,PRODUCT
642+
643+
prodigy mark rater_7_single_gold_ignored_resolved dataset:rater_7_single_gold_ignored --view-id review --label PERSON,NORP,FACILITY,ORGANIZATION,LOCATION,EVENT,LAW,DATE,TIME,PERCENT,MONEY,QUANTITY,ORDINAL,CARDINAL,GPE,WORK\ OF\ ART,LANGUAGE,PRODUCT
644+
645+
prodigy mark rater_8_single_gold_ignored_resolved dataset:rater_8_single_gold_ignored --view-id review --label PERSON,NORP,FACILITY,ORGANIZATION,LOCATION,EVENT,LAW,DATE,TIME,PERCENT,MONEY,QUANTITY,ORDINAL,CARDINAL,GPE,WORK\ OF\ ART,LANGUAGE,PRODUCT
646+
647+
prodigy mark rater_9_single_gold_ignored_resolved dataset:rater_9_single_gold_ignored --view-id review --label PERSON,NORP,FACILITY,ORGANIZATION,LOCATION,EVENT,LAW,DATE,TIME,PERCENT,MONEY,QUANTITY,ORDINAL,CARDINAL,GPE,WORK\ OF\ ART,LANGUAGE,PRODUCT
648+
```
649+
650+
- **Dump the rater_{r}_single_gold_ignored**
651+
```bash
652+
prodigy db-out rater_{r}_single_gold_ignored data/single/gold
653+
```
654+
655+
- **Merge the rater_{r}_single_gold_ignored and the rater_{r}_single_gold_accepted**
656+
```bash
657+
prodigy db-merge rater-{r}-single-gold-accepted,rater-{r}-single-gold-ignored rater_1_single_gold
658+
```
659+
660+
661+
# Only written out steps to here(!)
662+
# Below add:
663+
- Merge all single gold for all raters
664+
- Add language and product predictions to single gold combined
665+
- Resolve them
666+
- Add overwrite the resolved in single gold combined (see above way of doing it)
667+
- Merge the single-gold-combined with extra lang+prod into the gold-multi-and-gold-rater-1-single
668+
- Have it be NER manual instead (see above way of doing it)
669+
...???
670+
671+
672+
673+
674+
675+
- **Add Language and Product predictions on the gold-multi dataset**
676+
- Use tner/roberta-large-ontonotes5
677+
- Only adds one, wrong label. So I'll skip it
678+
- Perhaps to make sense to mention in methods, regardless
679+
```bash
680+
#gold-multi-training/datasets/lang_product_predict_gold_multi.py
681+
```
682+
683+
684+
- **Merge all gold datasets in db**
685+
686+
687+
688+
689+
690+
691+
692+
569693
- **Potentially. Make appropriate changes on gold-standard-multi data based on the assessment between rater and model**
570694

571695
- **Potentially. Re-train model on new gold-standard-multi data**

src/data_assessment/model_and_raters_agreement.ipynb

+31-45
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
"cells": [
33
{
44
"cell_type": "code",
5-
"execution_count": 56,
5+
"execution_count": 1,
66
"metadata": {},
77
"outputs": [],
88
"source": [
@@ -15,7 +15,7 @@
1515
},
1616
{
1717
"cell_type": "code",
18-
"execution_count": 51,
18+
"execution_count": 2,
1919
"metadata": {},
2020
"outputs": [],
2121
"source": [
@@ -24,60 +24,42 @@
2424
},
2525
{
2626
"cell_type": "code",
27-
"execution_count": 52,
27+
"execution_count": 4,
2828
"metadata": {},
2929
"outputs": [
3030
{
3131
"name": "stdout",
3232
"output_type": "stream",
3333
"text": [
34-
"Cases where ents are same between predicted and model: 638\n",
35-
"Cases where ents are NOT same between preds and model: 888\n"
36-
]
37-
}
38-
],
39-
"source": [
40-
"# testing for single rater\n",
41-
"preds = list(db.from_disk(\"data/single/unprocessed/rater_3/rater_3_preds.spacy\").get_docs(nlp.vocab))\n",
42-
"annotations = list(db.from_disk(\"data/single/unprocessed/rater_3/train.spacy\").get_docs(nlp.vocab))\n",
43-
"counter = sum(\n",
44-
" preds[i].ents == annotations[i].ents for i in range(len(preds))\n",
45-
")\n",
46-
"print(f\"Cases where ents are same between predicted and model: {counter}\")\n",
47-
"print(f\"Cases where ents are NOT same between preds and model: {len(preds) - counter}\")"
48-
]
49-
},
50-
{
51-
"cell_type": "code",
52-
"execution_count": 55,
53-
"metadata": {},
54-
"outputs": [
55-
{
56-
"name": "stdout",
57-
"output_type": "stream",
58-
"text": [
59-
"For rater 1:\n",
60-
"Cases where ents are same between predicted and model: 508\n",
61-
"Cases where ents are NOT same between preds and model: 904\n",
6234
"For rater 3:\n",
6335
"Cases where ents are same between predicted and model: 638\n",
64-
"Cases where ents are NOT same between preds and model: 888\n"
65-
]
66-
},
67-
{
68-
"ename": "IndexError",
69-
"evalue": "list index out of range",
70-
"output_type": "error",
71-
"traceback": [
72-
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
73-
"\u001b[0;31mIndexError\u001b[0m Traceback (most recent call last)",
74-
"Cell \u001b[0;32mIn[55], line 4\u001b[0m\n\u001b[1;32m 2\u001b[0m preds \u001b[39m=\u001b[39m \u001b[39mlist\u001b[39m(db\u001b[39m.\u001b[39mfrom_disk(\u001b[39mf\u001b[39m\u001b[39m\"\u001b[39m\u001b[39mdata/single/unprocessed/rater_\u001b[39m\u001b[39m{\u001b[39;00mr\u001b[39m}\u001b[39;00m\u001b[39m/rater_\u001b[39m\u001b[39m{\u001b[39;00mr\u001b[39m}\u001b[39;00m\u001b[39m_preds.spacy\u001b[39m\u001b[39m\"\u001b[39m)\u001b[39m.\u001b[39mget_docs(nlp\u001b[39m.\u001b[39mvocab))\n\u001b[1;32m 3\u001b[0m annotations \u001b[39m=\u001b[39m \u001b[39mlist\u001b[39m(db\u001b[39m.\u001b[39mfrom_disk(\u001b[39mf\u001b[39m\u001b[39m\"\u001b[39m\u001b[39mdata/single/unprocessed/rater_\u001b[39m\u001b[39m{\u001b[39;00mr\u001b[39m}\u001b[39;00m\u001b[39m/train.spacy\u001b[39m\u001b[39m\"\u001b[39m)\u001b[39m.\u001b[39mget_docs(nlp\u001b[39m.\u001b[39mvocab))\n\u001b[0;32m----> 4\u001b[0m counter \u001b[39m=\u001b[39m \u001b[39msum\u001b[39;49m(\n\u001b[1;32m 5\u001b[0m preds[i]\u001b[39m.\u001b[39;49ments \u001b[39m==\u001b[39;49m annotations[i]\u001b[39m.\u001b[39;49ments \u001b[39mfor\u001b[39;49;00m i \u001b[39min\u001b[39;49;00m \u001b[39mrange\u001b[39;49m(\u001b[39mlen\u001b[39;49m(preds))\n\u001b[1;32m 6\u001b[0m )\n\u001b[1;32m 7\u001b[0m \u001b[39mprint\u001b[39m(\u001b[39mf\u001b[39m\u001b[39m'\u001b[39m\u001b[39mFor rater \u001b[39m\u001b[39m{\u001b[39;00mr\u001b[39m}\u001b[39;00m\u001b[39m:\u001b[39m\u001b[39m'\u001b[39m)\n\u001b[1;32m 8\u001b[0m \u001b[39mprint\u001b[39m(\u001b[39mf\u001b[39m\u001b[39m\"\u001b[39m\u001b[39mCases where ents are same between predicted and model: \u001b[39m\u001b[39m{\u001b[39;00mcounter\u001b[39m}\u001b[39;00m\u001b[39m\"\u001b[39m)\n",
75-
"Cell \u001b[0;32mIn[55], line 5\u001b[0m, in \u001b[0;36m<genexpr>\u001b[0;34m(.0)\u001b[0m\n\u001b[1;32m 2\u001b[0m preds \u001b[39m=\u001b[39m \u001b[39mlist\u001b[39m(db\u001b[39m.\u001b[39mfrom_disk(\u001b[39mf\u001b[39m\u001b[39m\"\u001b[39m\u001b[39mdata/single/unprocessed/rater_\u001b[39m\u001b[39m{\u001b[39;00mr\u001b[39m}\u001b[39;00m\u001b[39m/rater_\u001b[39m\u001b[39m{\u001b[39;00mr\u001b[39m}\u001b[39;00m\u001b[39m_preds.spacy\u001b[39m\u001b[39m\"\u001b[39m)\u001b[39m.\u001b[39mget_docs(nlp\u001b[39m.\u001b[39mvocab))\n\u001b[1;32m 3\u001b[0m annotations \u001b[39m=\u001b[39m \u001b[39mlist\u001b[39m(db\u001b[39m.\u001b[39mfrom_disk(\u001b[39mf\u001b[39m\u001b[39m\"\u001b[39m\u001b[39mdata/single/unprocessed/rater_\u001b[39m\u001b[39m{\u001b[39;00mr\u001b[39m}\u001b[39;00m\u001b[39m/train.spacy\u001b[39m\u001b[39m\"\u001b[39m)\u001b[39m.\u001b[39mget_docs(nlp\u001b[39m.\u001b[39mvocab))\n\u001b[1;32m 4\u001b[0m counter \u001b[39m=\u001b[39m \u001b[39msum\u001b[39m(\n\u001b[0;32m----> 5\u001b[0m preds[i]\u001b[39m.\u001b[39ments \u001b[39m==\u001b[39m annotations[i]\u001b[39m.\u001b[39ments \u001b[39mfor\u001b[39;00m i \u001b[39min\u001b[39;00m \u001b[39mrange\u001b[39m(\u001b[39mlen\u001b[39m(preds))\n\u001b[1;32m 6\u001b[0m )\n\u001b[1;32m 7\u001b[0m \u001b[39mprint\u001b[39m(\u001b[39mf\u001b[39m\u001b[39m'\u001b[39m\u001b[39mFor rater \u001b[39m\u001b[39m{\u001b[39;00mr\u001b[39m}\u001b[39;00m\u001b[39m:\u001b[39m\u001b[39m'\u001b[39m)\n\u001b[1;32m 8\u001b[0m \u001b[39mprint\u001b[39m(\u001b[39mf\u001b[39m\u001b[39m\"\u001b[39m\u001b[39mCases where ents are same between predicted and model: \u001b[39m\u001b[39m{\u001b[39;00mcounter\u001b[39m}\u001b[39;00m\u001b[39m\"\u001b[39m)\n",
76-
"\u001b[0;31mIndexError\u001b[0m: list index out of range"
36+
"Cases where ents are NOT same between preds and model: 888\n",
37+
"For rater 4:\n",
38+
"Cases where ents are same between predicted and model: 1114\n",
39+
"Cases where ents are NOT same between preds and model: 1363\n",
40+
"For rater 5:\n",
41+
"Cases where ents are same between predicted and model: 422\n",
42+
"Cases where ents are NOT same between preds and model: 980\n",
43+
"For rater 6:\n",
44+
"Cases where ents are same between predicted and model: 1046\n",
45+
"Cases where ents are NOT same between preds and model: 1213\n",
46+
"For rater 7:\n",
47+
"Cases where ents are same between predicted and model: 754\n",
48+
"Cases where ents are NOT same between preds and model: 1148\n",
49+
"For rater 8:\n",
50+
"Cases where ents are same between predicted and model: 622\n",
51+
"Cases where ents are NOT same between preds and model: 1076\n",
52+
"For rater 9:\n",
53+
"Cases where ents are same between predicted and model: 906\n",
54+
"Cases where ents are NOT same between preds and model: 1203\n",
55+
"Total cases where ents are same 5502\n",
56+
"Total cases where ents are NOT same 7871\n"
7757
]
7858
}
7959
],
8060
"source": [
61+
"docs_not_same = 0\n",
62+
"docs_same = 0\n",
8163
"for r in raters:\n",
8264
" preds = list(db.from_disk(f\"data/single/unprocessed/rater_{r}/rater_{r}_preds.spacy\").get_docs(nlp.vocab))\n",
8365
" annotations = list(db.from_disk(f\"data/single/unprocessed/rater_{r}/train.spacy\").get_docs(nlp.vocab))\n",
@@ -86,7 +68,11 @@
8668
" )\n",
8769
" print(f'For rater {r}:')\n",
8870
" print(f\"Cases where ents are same between predicted and model: {counter}\")\n",
89-
" print(f\"Cases where ents are NOT same between preds and model: {len(preds) - counter}\")"
71+
" print(f\"Cases where ents are NOT same between preds and model: {len(preds) - counter}\")\n",
72+
" docs_not_same += len(preds) - counter\n",
73+
" docs_same += counter\n",
74+
"print(f\"Total cases where ents are same {docs_same}\")\n",
75+
"print(f\"Total cases where ents are NOT same {docs_not_same}\")"
9076
]
9177
},
9278
{

src/predict_single/predict_rater_2-9.py

+2-1
Original file line numberDiff line numberDiff line change
@@ -6,12 +6,12 @@
66

77
# Load rater_2-9 data
88
db = DocBin()
9-
db2 = DocBin()
109
raters = [3, 4, 5, 6, 7, 8, 9]
1110
nlp = spacy.blank("da")
1211
print("Loading model ...")
1312
nlp2 = spacy.load("da_multi_dupli_rater_1_onto")
1413
print("Model loaded, predicting on raters")
14+
1515
# For each rater
1616
for r in [3, 4, 5, 6, 7, 8, 9]: # raters:
1717
print(f"Predicting on rater {r} ...")
@@ -20,6 +20,7 @@
2020
texts = [doc.text for doc in r_docs]
2121
predicted_docs = [nlp2(text) for text in texts]
2222
savepath = f"data/single/unprocessed/rater_{r}/rater_{r}_preds.spacy"
23+
db2 = DocBin()
2324
for doc in predicted_docs:
2425
db2.add(doc)
2526
db2.to_disk(savepath)

src/preprocessing/split_by_answer_rater_1_single_gold.py

+2-2
Original file line numberDiff line numberDiff line change
@@ -6,11 +6,11 @@
66
srsly.write_jsonl("./data/single/gold/rater_1/gold_all.jsonl", examples)
77

88
accepted = [e for e in examples if e["answer"] == "accept"]
9-
srsly.write_jsonl("./data/single/gold/rater_1/gold_ignored.jsonl", accepted)
9+
srsly.write_jsonl("./data/single/gold/rater_1/gold_accepted.jsonl", accepted)
1010
db.add_examples(accepted, ["rater_1_single_gold_accepted"])
1111

1212
ignored = [e for e in examples if e["answer"] == "ignore"]
13-
srsly.write_jsonl("./data/single/gold/rater_1/gold_accepted.jsonl", ignored)
13+
srsly.write_jsonl("./data/single/gold/rater_1/gold_ignored.jsonl", ignored)
1414
db.add_examples(ignored, ["rater_1_single_gold_ignored"])
1515

1616
rejected = [e for e in examples if e["answer"] == "reject"]
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
from prodigy.components.db import connect
2+
import srsly
3+
4+
db = connect()
5+
6+
raters = [3, 4, 5, 6, 7, 8, 9]
7+
8+
for r in raters:
9+
print(f"Splitting rater {r}")
10+
11+
examples = db.get_dataset(f"rater_{r}_single_gold_all")
12+
srsly.write_jsonl(
13+
f"./data/single/gold/rater_{r}/rater_{r}_single_gold_all.jsonl", examples
14+
)
15+
print(
16+
f"New file has been created: ./data/single/gold/rater_{r}/rater_{r}_single_gold_all.jsonl"
17+
)
18+
19+
accepted = [e for e in examples if e["answer"] == "accept"]
20+
srsly.write_jsonl(
21+
f"./data/single/gold/rater_{r}/rater_{r}_single_gold_accepted.jsonl", accepted
22+
)
23+
db.add_examples(accepted, [f"rater_{r}_single_gold_accepted"])
24+
print(
25+
f"New file has been created: ./data/single/gold/rater_{r}/rater_{r}_single_gold_accepted.jsonl"
26+
)
27+
print(f"New dataset has been added to db: rater_{r}_single_gold_accepted")
28+
29+
ignored = [e for e in examples if e["answer"] == "ignore"]
30+
srsly.write_jsonl(
31+
f"./data/single/gold/rater_{r}/rater_{r}_single_gold_ignored.jsonl", ignored
32+
)
33+
db.add_examples(ignored, [f"rater_{r}_single_gold_ignored"])
34+
print(
35+
f"New file has been created: ./data/single/gold/rater_{r}/rater_{r}_single_gold_ignored.jsonl"
36+
)
37+
print(f"New dataset has been added to db: rater_{r}_single_gold_ignored")
38+
39+
rejected = [e for e in examples if e["answer"] == "reject"]
40+
srsly.write_jsonl(
41+
f"./data/single/gold/rater_{r}/rater_{r}_single_gold_rejected.jsonl", rejected
42+
)
43+
db.add_examples(rejected, [f"rater_{r}_single_gold_rejected"])
44+
print(
45+
f"New file has been created: ./data/single/gold/rater_{r}/rater_{r}_single_gold_rejected.jsonl"
46+
)
47+
print(f"New dataset has been added to db: rater_{r}_single_gold_rejected")

tools/raters_preds_to_db.sh

+7
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
for i in {1..10}
2+
do
3+
echo "Exporting rater_"$i"_preds to jsonl"
4+
python ./src/preprocessing/load_docbin_as_jsonl.py ./data/single/unprocessed/rater_$i/rater_"$i"_preds.spacy blank:da --ner > ./data/single/unprocessed/rater_$i/rater_"$i"_preds.jsonl
5+
echo "Importing rater_"$i"_preds to db"
6+
prodigy db-in rater_"$i"_single_unprocessed_preds ./data/single/unprocessed/rater_$i/rater_"$i"_preds.jsonl
7+
done

0 commit comments

Comments
 (0)