You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+34-4Lines changed: 34 additions & 4 deletions
Original file line number
Diff line number
Diff line change
@@ -1,14 +1,44 @@
1
1
# compsumm
2
2
Code, Datasets and Supplementary Appendix for AAAI paper **Comparative Document Summarisation via Classification**
3
3
4
-
**Code**: Coming Soon
4
+
**Supplementary Appendix**: [pdf](/appendix.pdf)
5
5
6
-
**Dataset**: Coming Soon
6
+
## How to use this repository ?
7
7
8
-
**Supplementary Appendix**: [pdf](/appendix.pdf)
8
+
### Installing
9
+
If you have miniconda or anaconda, please use `install.sh` to install new env `compsumm` that has all dependencies; otherwise the dependencies are listed in `environment.yml`
9
10
10
-
If you use this dataset, please cite this work at
11
+
### 1. Dataset
12
+
The dataset are in directory `dataset` in `HDF5` format. There are three files for each of the three news topics used in paper. Each file has following structure:
13
+
```
14
+
-- data: Averaged GLOVE vectors of title and first 3 sentences, 300 dimensional
15
+
-- y: labels created by dividing timeranges into two groups
16
+
-- yn: labels created using month for beefban and wee for capital punishment and guncontrol.
17
+
-- title: title of article
18
+
-- text: first three sentences
19
+
-- datetime: date of publication
20
+
21
+
The dataset was split 70-20-10 as train-test-val sets several times.
22
+
-- train_idxs: Matrix with each row i containing training indexes of split i.
23
+
-- test_idxs: Matrix with each row i containing test indexes of split i.
24
+
-- val_idxs: Matrix with each row i containing val indexes of split i.
25
+
```
26
+
Please see `news.py` for example loading of this dataset.
11
27
28
+
### 2. Code
29
+
Please see [demo notebook](/demo.ipynb) for example use of `subm.py` and `grad.py`
30
+
-`subm.py` has utility functions and greedy optimiser for discrete optimisation.
31
+
-`grad.py` has utility functions and SGD optimiser for continuous optimisation. SGD optimised wasn't used, LBFGS from scipy was used instead.
32
+
33
+
`models.py` has several models for summarisation as classifiers. Models were abstracted into `Summ` class. This is particularly useful in creating common pattern for different summariser methods and in tuning hyperparameters. Please see [models notebook](/models.ipynb) for demo of `news.py` and `models.py`.
34
+
35
+
`utils.py` has common functions such as `balanced accuracy`, which is used for evaluation.
36
+
37
+
### Crowd-sourced evaluations results
38
+
The crowd-sourced evaluations results is in file `crowdflower.csv`. The design and settings for this experiment is explained in the paper.
39
+
40
+
### 4. Citing
41
+
If you use this dataset, please cite this work at
12
42
```
13
43
@inproceedings{bista2019compsumm,
14
44
title={Comparative Document Summarisation via Classification},
0 commit comments