This directory is used for small data sets and insights found while playing with the data, reading paper or conducting a different research.
Security related commits. Also has reverted security commits, hence the leading to the security problem.
Commits dealing with performance issues (e.g. running time, memory).
A dataset of commits, labeled by various language models. By comparaing different models predictions on these commits one can use active learning to identify informative commits and further imrpve the models.
A data set of 110k commits related to regular expressions.
This reposotory contain database construction on the BigQuery GitHub scehmea. It was constructed as part of
Supplementary Materials of the "The Corrective Commit Probability Code Quality Metric" paper by Idan Amit and Dror G. Feitelson.
Please cite as
@misc{amit2020corrective,
title={The Corrective Commit Probability Code Quality Metric},
author={Idan Amit and Dror G. Feitelson},
year={2020},
eprint={2007.10912},
archivePrefix={arXiv},
primaryClass={cs.SE}
}
It was later extended as part of
Supplementary Materials of the "Follow Your Nose -- Which Code Smells are Worth Chasing?" paper by Idan Amit, Nili Ben Ezra, and Dror G. Feitelson.
Please cite as
@misc{amit2021follow,
title={Follow Your Nose -- Which Code Smells are Worth Chasing?},
author={Idan Amit and Nili Ben Ezra and Dror G. Feitelson},
year={2021},
eprint={2103.01861},
archivePrefix={arXiv},
primaryClass={cs.SE}
}
See here the linguistic commit classification
See here the analysis utilities
See here the general database construction
Research used the model of commit d15d54e Repository will keep advancing.
Live version is updating at https://github.com/evidencebp/sweets
Repository will keep advancing.