Skip to content

This repository contains a collection of URLs that can be used to mine domain-specific academic datasets

Notifications You must be signed in to change notification settings

MorenoLaQuatra/domain-specific-academic-dataset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Domain Specific Academic Dataset - Extractive Text Summarization

This repository contains a collection of URLs that can be used to mine domain-specific academic datasets.

Data were used to train and test a supervised machine learning-based summarization approach presented in the paper:

Cagliero L. & La Quatra M., Extracting Highlights of Scientific Articles: a Supervised Summarization Approach, Expert Systems with Applications, 2020, 113659, ISSN 0957-4174, https://doi.org/10.1016/j.eswa.2020.113659.

Free full access (before August 26th, 2020): https://authors.elsevier.com/a/1bMgg3PiGTFJ76

Keywords: Keyword: Highlight extraction; Extractive summarization; Regression models; Text mining and analytics

This repository is intended for research purposes only. If you use this collection in your research work please cite:

@article{CAGLIERO2020113659,
title = "Extracting highlights of scientific articles: A supervised summarization approach",
journal = "Expert Systems with Applications",
volume = "160",
pages = "113659",
year = "2020",
issn = "0957-4174",
doi = "https://doi.org/10.1016/j.eswa.2020.113659",
url = "http://www.sciencedirect.com/science/article/pii/S0957417420304838",
author = "Luca Cagliero and Moreno {La Quatra}",
keywords = "Highlight extraction, Extractive summarization, Regression models, Text mining and analytics"
}

Organization

The source files are organized as follows:

  • urls_test_ai.txt contains a set of URLs to access the publications part of the AIPubSumm test set.
  • urls_train_ai.txt contains a set of URLs to access the publications part of the AIPubSumm train set.
  • urls_test_bio.txt contains a set of URLs to access the publications part of the BioPubSumm test set.
  • urls_train_bio.txt contains a set of URLs to access the publications part of the BioPubSumm train set.

This data format relies on specific script borrowed by the scientific-paper-summarisation repository parts of the paper: Ed Collins, Isabelle Augenstein, Sebastian Riedel. A Supervised Approach to Extractive Summarisation of Scientific Papers. To appear in Proceedings of CoNLL, July 2017.

Qualitative Example

Type Text
Abstract This study deals with a first step towards context adaptive functionality of a Driver Information System. Driving a car is a complex task for which the driver needs appropriate information to fulfil his or her goals. New technologies enable adaptability to driver state, task, personality etcetera and also to the context. The aim of this study was therefore to investigate what information people perceive that they need and want from the car in different contexts and to what extent there is consensus about the function. A new methodology was developed, and 33 private car drivers were interviewed and asked to rate a number of possible abstract functions in a car in different contexts. It was shown that people need and want different types of information in different contexts. It was furthermore indicated that there is sometimes a difference in drivers’ opinions about what should be presented by the car and that there is varying consensus over different functions in different contexts. The rating result was illustrated by an easily perceived Context Function Matrix. The results may be used in the design of a context adaptive driver information system.
Manually annotated highlights As guideline for design of context adaptive driver information systems or for optimization of display space. As a weight when evaluating future adaptive information systems. When deciding whether a function should be activated automatically or manually
GB Regressor (our) The study resulted in a context function matrix and a zoom metaphor useful for future context adaptive driver information. Not surprisingly, the results indicate that drivers want or need different functions in different contexts. The results can be used as a guideline for design of context adaptive driver information systems or for optimization of display space
RF Classifier (best ranking method) The purpose of this study was therefore to investigate: whether drivers want or have a perceived need for different functions in different contexts (Q1), what information different drivers perceive to be needed and wanted in different contexts (Q2), the extent to which there is consensus about each function (Q3) And to illustrate and make understandable the functions in the different contexts (Q4). The first research question was whether drivers want or have a perceived need for different functions in different contexts (Q1), The interviews, function grading and open end answers in the study gave an indication that drivers have different perceived needs and desires in different driving contexts. For instance, a tired driver and an alert driver, a daily trip to work and a holiday trip, a worn car and a new car and drivers with long and short response times may need different information.
CoreRank There was a high consensus about the lowest grades: show engine coolant temperature, show oil level in engine, show engine oil temperature, measure time, show travel distance in total and show engine oil pressure. Lap time, show cruise control set speed, ability to watch movie,show when it is permitted to take over, show engine oil temperature and show average speed were given the lowest scores but had a high consensus. Before driving, functions of a more strategic character are graded high: warn for slippery road conditions, show outdoor temperature, warn for slippery road conditions on the way to the destination, show fuel level, show distance to empty tank, show alternative roads to the destination, show information about dangerous roads, show estimated time of arrival, show that there are queues on the way to the destination, show recommended speed due to road conditions, visibility and show tire pressure in the different tires. Ability to surf on the Internet, show free parking places, show engine oil temperature, show engine oil pressure, lap time, show start time for parking heater and remind that the car needs regular service received the lowest grades.

About

This repository contains a collection of URLs that can be used to mine domain-specific academic datasets

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published