Skip to content
Change the repository type filter

All

    Repositories list

    • datatrove

      Public
      Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.
      Python
      194000Updated Jul 22, 2025Jul 22, 2025
    • Toolkit to benchmark various speech recognition APIs (NeMo, Whisper...) and visualize the results
      Jupyter Notebook
      0200Updated Jul 17, 2025Jul 17, 2025
    • ssak

      Public
      SSAK contains helpers and tools to process data and train/infer ASR models.
      Python
      0500Updated Jun 27, 2025Jun 27, 2025
    • Python
      0000Updated Jun 20, 2025Jun 20, 2025
    • data and code associated with LREC 2024 paper
      Jupyter Notebook
      0400Updated Jun 10, 2025Jun 10, 2025
    • NeMo

      Public
      A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
      Python
      3k000Updated Apr 16, 2025Apr 16, 2025
    • Python
      0000Updated Apr 8, 2025Apr 8, 2025
    • Python
      0600Updated Mar 12, 2025Mar 12, 2025
    • Shell
      0600Updated Sep 26, 2024Sep 26, 2024
    • Robust Speech Recognition via Large-Scale Weak Supervision
      Python
      10k102Updated Jul 22, 2024Jul 22, 2024
    • FREDSum

      Public
      Corpus of political debates : transcriptions and summaries
      11100Updated May 15, 2024May 15, 2024
    • Whisper realtime streaming for long speech-to-text transcription and translation
      Python
      378200Updated Apr 12, 2024Apr 12, 2024
    • Jupyter Notebook
      613000Updated Dec 15, 2023Dec 15, 2023
    • This database contains panoramic images of the work meetings at LINAGORA.
      1100Updated Nov 6, 2021Nov 6, 2021