Skip to content
Change the repository type filter

All

    Repositories list

    • Jupyter Notebook
      0000Updated Jun 4, 2025Jun 4, 2025
    • Code for FinerWeb-10BT – tools for cleaning web data line by line using LLMs
      Python
      MIT License
      2000Updated Jun 4, 2025Jun 4, 2025
    • Jupyter Notebook
      0000Updated Jun 4, 2025Jun 4, 2025
    • TCBLex

      Public
      Jupyter Notebook
      0000Updated Jun 3, 2025Jun 3, 2025
    • Jupyter Notebook
      0000Updated Jun 3, 2025Jun 3, 2025
    • Handwritten text recognition pipeline for table data
      Jupyter Notebook
      Apache License 2.0
      0000Updated Jun 2, 2025Jun 2, 2025
    • Python
      0000Updated Jun 2, 2025Jun 2, 2025
    • Python
      Other
      1400Updated May 30, 2025May 30, 2025
    • Python
      1100Updated May 15, 2025May 15, 2025
    • Introduction to Natural Language Processing
      Jupyter Notebook
      Other
      26500Updated May 14, 2025May 14, 2025
    • 0300Updated May 9, 2025May 9, 2025
    • Code for the paper "Analyzing register variation in web texts through automatic segmentation"
      Python
      Apache License 2.0
      0000Updated May 2, 2025May 2, 2025
    • HTML
      0460Updated Apr 30, 2025Apr 30, 2025
    • A Jekyll version of the "Editorial" theme by HTML5 UP.
      JavaScript
      Other
      155300Updated Apr 15, 2025Apr 15, 2025
    • Turku NLP list of publications
      TeX
      2000Updated Apr 15, 2025Apr 15, 2025
    • 0000Updated Apr 2, 2025Apr 2, 2025
    • Handwritten text recognition annotations
      Python
      0000Updated Mar 25, 2025Mar 25, 2025
    • Jupyter Notebook
      Apache License 2.0
      0400Updated Mar 8, 2025Mar 8, 2025
    • 0600Updated Mar 5, 2025Mar 5, 2025
    • 0000Updated Jan 31, 2025Jan 31, 2025
    • 0000Updated Jan 30, 2025Jan 30, 2025
    • FinCORE

      Public
      Finnish Corpus of Online REgisters
      Python
      0200Updated Jan 29, 2025Jan 29, 2025
    • Stuff for the Text Mining course
      Jupyter Notebook
      92800Updated Jan 28, 2025Jan 28, 2025
    • Code for the large LUMI run of ECCO ocr correction
      Python
      Apache License 2.0
      0000Updated Jan 16, 2025Jan 16, 2025
    • Clusters with keywords grouped based on their word embeddings
      0000Updated Jan 14, 2025Jan 14, 2025
    • Code to try out ocr postcorrection with language models
      Jupyter Notebook
      0210Updated Dec 16, 2024Dec 16, 2024
    • Jupyter Notebook
      4400Updated Dec 9, 2024Dec 9, 2024
    • Python
      0000Updated Nov 28, 2024Nov 28, 2024
    • Different vLLM setups on different machines
      Python
      0000Updated Nov 15, 2024Nov 15, 2024
    • Materials for the University of Turku course TKO_8965 Deep Learning in Human Language Technology (previously named TKO_2101 Natural Language Processing)
      Jupyter Notebook
      Other
      112000Updated Oct 15, 2024Oct 15, 2024