Skip to content

adhiraj141092/RNAinsecta

Repository files navigation

RNAinsecta: A tool for prediction of pre-microRNA in insects using machine learning algorithms.

Pre-MicroRNAs are the hairpin loops which produces microRNAs that negatively regulate gene expression in several organisms. In insects, microRNAs participate in several biological processes including metamorphosis, reproduction, immune response, etc. In this work, we trained machine learning classifiers such as Random Forest, Support Vector Machine, Logistic Regression and k-Nearest Neighbours to predict pre-microRNA hairpin loops in insects while using Synthetic Minority Over-sampling Technique and Near-Miss to handle the class imbalance. The trained model on Support Vector Machine achieved accuracy of 92.19% while the Random Forest attained an accuracy of 82.4% on our validation dataset. These models are hosted online as web application called RNAinsecta. Further, searching target for the predicted pre-microRNA in insect model organism Drosophila melanogaster has been provided in RNAinsecta using miRanda at the backend where experimentally validated genes regulated by microRNA are collected from miRTarBase as target sites. RNAinsecta is currently hosted at https://rnainsecta.in.
Read more
This repository consist of the source code for hosting the webserver as well as testing the Machine Learning models to replicate the results.

Pre- Requisites:

  • ViennaRNA Package
  • Python3
  • Virtual Environment:
  • python3 -m pip install --user virtualenv

    Installation:

  • Create a virtual environment in python3 engine:
  • python3 -m venv my_project_env
  • Activate Virtual Environment:
  • source my_project_env/bin/activate
  • Install the required packages using requirements.txt:
  • pip install requirements.txt

    Testing:
    For testing, the sequences along with their secondary structure are provided in the Dataset directory. The test dataset consist of true insect pre-microRNA (pos.fold) and pseudo insect pre-microRNA (neg.fold) which are hairpin loops found in insects that closely resembles true pre-microRNA. Each prediction for true pre-microRNA can give either True Positive (TP) or False Positive (FP) and likewise, pseudo pre-microRNA gives either True Negative (TN) or False Negative (FN). Using these parameters the accuracy, sensitivity, specificity, MCC and F1 scores are calculated.
    To test the model enter:

    python3 testing.py

    Results are stored in the newly created results directory.

    Deploying the Web-Server
    To run the web-server locally, execute the following commands:

    cd web
    python3 app.py



    Acknowledgements:
    RNAinsecta Uses

  • Perl scripts developed by Kwang Loong Stanley Ng and Santosh K. Mishra for feature calculation.

    1. Kwang Loong Stanley Ng, Santosh K. Mishra, De novo SVM classification of precursor microRNAs from genomic pseudo hairpins using global and intrinsic folding measures, Bioinformatics, Volume 23, Issue 11, June 2007, Pages 1321–1330, https://doi.org/10.1093/bioinformatics/btm026


  • miRanda: A microRNA target searching tool.

    1. John,B., Enright,A.J., Aravin,A., Tuschl,T., Sander,C. and Marks,D.S. (2005) Correction: Human MicroRNA Targets. PLoS Biol., 3, e264.


    Pre-print

    RNAinsecta: A tool for prediction of pre-microRNA in insects using machine learning algorithms.

    Adhiraj Nath, Utpal Bora* bioRxiv 2022.03.31.486617; doi: https://doi.org/10.1101/2022.03.31.486617

    The web-server is created and maintained under GNU/GPL v3 license by:
    Adhiraj Nath
    IIT Guwahati
    PhD candidate
    email: [email protected]
    Mobile: +91 87230 13467

    About

    RNAinsecta: A tool for prediction of pre-microRNA in insects using machine learning algorithms.

    Topics

    Resources

    License

    Stars

    Watchers

    Forks

    Releases

    No releases published

    Packages

    No packages published