This repository contains the code and resources for our enhanced multi-document summarization project, building on the foundational work of the GLIMPSE paperwork [1] . Our extensions include various new mechanisms to enhance current results, and to improve the performance of summarization for scholarly reviews.
The Rational Speech Act (RSA) framework is a probabilistic model of communication that interprets human language use as a process of inference. [2]
Multi-Document Summarization (MDS) is the process of automatically generating a concise summary from multiple source documents that discuss the same or related topics. Unlike single-document summarization, which condenses information from a single text, MDS must handle redundant, complementary, and sometimes conflicting information from different sources.
- The increase in conference submissions (e.g., ICLR, ACL) has placed a burden on area chairs, who must read multiple reviews to make decisions.
- Existing multi-document summarization approaches focus on consensus but fail to highlight divergent opinions.
- We propose "GLIMPSE", our new project work for MDS using RSA-based scoring mechanisms.
- GLIMPSE generates more informative and concise summaries compared to existing consensus-based summarization models, thanks to the formulation of RSA as a reference game.
-
The process starts by pre-processing a set of scholarly reviews
-
Later, we generate Extractive and Abstractive summaries for the selected set of documents
-
For the abstractive case, the following models are used:
-
In parallel to this branch, a set of models is introduced, and the following steps are performed:
- A bag of 10 models is present
- An intensive experimental studies were conducted
- A set of evaluations were done to select
Top N(N=3) models for computing conditional likelihood probability (text vs summary)
-
For each of the produced dataset (1 extractive and 3 abstractive) we perform the following set of steps:
-
Compute the conditional likelihood probability (perplexity) using the top N models
-
Aggregate results (likelihoods) for all
k=3combinations out of theNextracted models -
Rank results based on evaluation metrics. Set of metrics are:
3.a BERT Score
3.b ROUGE-1 , ROUGE-2, ROUGE-L, ROUGE-LSum
3.c UniEVAL [5]
3.d SEAHORSE [6]
-
Select Top-performing ensemble models combination result's for RSA-scoring
-
Use GSpeaker and GUnique mechanisms from RSA framework to generate most unique and informative summaries
-
To enrich our work, we decided to extend our architecture by introducing Vectorized-RSA, in this part we define first a set of pairs where each pair consist of key=Evaluation Metrics and value=Summarization Model.
In this case, and for each dataset, we return a vector of size n equal to the number of metrics of interest. Each element of this vector represents the alignment of such (text, summary) with that metric.
E.g. First cell measures fluency of the generated summary, while the second cell may focus on computing consistency of the generated summary.
The set of models used for each metric were select by a extensive set of studies and experimentations.
Finally, this vector is fed to the Vector-RSA framework to generate both GUnique and GSpeaker summaries.
This directory contains the required data to replicate this work.
data_to_process contains the Top-226 selection of document reviews.
candidates are the generated extractive and abstractive summaries
Set of scripts to replicate SOTA techniques
Example usage:
python sumy_baselines.py
--input_folder data/candidates/
--batch_size 32
--device "cuda"
--output_folder data/candidates_sumy/Set of scripts to load original data, and to crawl required data from open_review
Example usage:
python data_preprocessing.pySet of scripts to evaluate a set of summaries Example usage:
# Returns same datasets with an additional column containing the BERTScore
python evaluate_bartbert_metrics.py
--input_folder data/candidates/
--output_folder data/candidates/Set of scripts used for computing likelihood probabilities and RSA-based scores
Example usage:
python compute_rsa.py
--input_folder {input_folder}
--output_folder {output_folder}
--model "facebook/bart-large-cn" --device "cuda"MDS (Multi-document summarization) contains a set of folders to generate GSpeaker and GUnique summaries. The directory contains also some scripts for aggregating set of likelihood matrices.
The RSA and Vect-RSA frameworks
It is highly recommended to follow the steps provided in main_notebook.ipynb where each step is documented, and results can be easily visualized
We would like to extend our sincere gratitude to Professor Luca Cagliero, Teaching Assistant Lorenzo Vaiani, Teaching Assistant Giuseppe Gallipoli for their invaluable guidance, support, and insights throughout the course of this project. Their expertise and encouragement have been instrumental in the successful completion of our work.
[1] Maxime Darrin, Ines Arous, Pablo Piantanida, and Jackie Cheung. 2024. GLIMPSE: Pragmatically Informative Multi-Document Summarization for Scholarly Reviews. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
[2] Michael C. Frank, Noah D. Goodman , Predicting Pragmatic Reasoning in Language Games. Science336,998-998(2012). DOI:10.1126/science.1218633
[3] Lewis, Mike. "Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension." arXiv preprint arXiv:1910.13461 (2019).
[4] Zhang, Jingqing, et al. "Pegasus: Pre-training with extracted gap-sentences for abstractive summarization." International conference on machine learning. PMLR, 2020.
[5] Clark, Elizabeth, et al. "SEAHORSE: A multilingual, multifaceted dataset for summarization evaluation." arXiv preprint arXiv:2305.13194 (2023).
[6] Zhong, Ming, et al. "Towards a unified multi-dimensional evaluator for text generation." arXiv preprint arXiv:2210.07197 (2022).

