This is a tool that allows retrieving Jira issue tracking data from the Apache Jira server.
It is based on the JIDownloader tool (available here), and is part of the paper
Semantically-enriched Jira Issue Tracking Data
that has been submitted to the IEEE/ACM 20th International Conference on Mining Software Repositories.
The dataset is available in this link.
The python requirements are available in file requirements.txt
and may be installed using the command
pip install -r requirements.txt
. To run this tool, you must have a Jira account in the Apache Jira server.
Furthermore, you must set a MongoDB instance (and set up users - check file instructions.md of this repo). These details must be set in file properties.py
.
To run the tool, one must first correctly assign the properties in file properties.py
.
After that, the tool can be executed by running python jidownloader.py [jira_project_name_or_list_of_names]
,
where jira_project_name_or_list_of_names
must be replaced by either one of the following:
- a Jira project name (e.g.
MyJiraProject
) - a list of Jira project names, as a text file where each file is a Jira project name If a project already exists, then its data are updated.
The main parameters are the following:
JiraAPI
: the API URL of the Jira installation, leave this tohttps://issues.apache.org/jira/rest/api/2/
for the Apache Jira installationJiraCredentials
: the username and the password of your Jira account (provided as a tuple, e.g.('myusername', 'mypassword')
)JiraWaitTimeInSeconds
: the time for the tool to wait between consecutive requestsupdate_existing_projects
: controls whether the existing (already downloaded) projects will be updated or skippedverbose
: controls the messages in the standard output (0 for no messages, 1 for simple messages, and 2 for progress bars)always_write_to_disk
: controls whether the project data will be written on download (always) or after fully downloading them (either in database or at the disk for debugging purposes)
The tool supports two storage options: disk storage and MongoDB. The MongoDB storage is the default and is the one supported. Disk storage exists only for debugging purposes. If disk storage is preferred one must set the use_database
and dataFolderPath
parameters of the properties file to "disk"
and to the path where the data will be downloaded accordingly.
For database storage, one has to download and set up MongoDB and then set the
parameter use_database
to "mongo"
. The database_host_and_port
must also be set and must include the credentials, the hostname, and the port of the database. See file instructions.md of this repo for setting up the MongoDB instance. Finally, num_bulk_operations
: controls the number of operations that are sent as a bulk to the database (optimization parameter)
If your use this tool or the corresponding dataset in your work, you can cite it using the following bibtex entry:
@inproceedings{SemanticJiraDatasetMSR2023,
author = {Themistoklis Diamantopoulos and Dimitrios-Nikitas Nastos and Andreas Symeonidis},
title = {Semantically-enriched Jira Issue Tracking Data},
booktitle = {IEEE/ACM 20th International Conference on Mining Software Repositories},
year = {2023},
pages = {218-222},
address = {Melbourne, Australia},
doi = {10.1109/MSR59073.2023.00039}
}