The purpose of the application is to automate the analysis of scientific and educational documents in the context of research works using LLM (Large language model, large language models) to reduce the time and intellectual costs of teachers. The result is two LLMs, trained on specially collected data, capable of summarizing the text of a large document and revealing whether the stated goals and objectives of the work have been achieved.
- Algorithms for summarizing large texts and evaluating the achievement of stated goals and objectives;
- Fast APP is an application for interacting with trained models;
- Separate models have been trained to summarize and assess the achievability of goals and objectives;
- Datasets have been prepared for training models, separately for summarization, separately for goals and objectives.
Please help us improve this project, share your feedback with opening issue!
git clone https://github.com/LISA-ITMO/Edulytica.git
source ~/PyProject/Edulytica/api_venv/bin/activate
pip install -r requirements.txt
python3 src/edulytica_api/app.py
celery -A src.edulytica_api.celery.tasks worker --loglevel=info -E -P gevent
npm start
celery -A src.edulytica_api.celery.tasks flower
First, you can familiarize yourself with the examples in JSON format of the system's responses to the test sample of works.
When you have managed to launch the service, you can send the documents yourself and get acquainted with the results of their verification!
Details of the documentation can be found at the links below:
- algorithms - part of the task of analyzing the text how much it is necessary to change the source text (which is written by AI) so that AI recognition systems do not recognize AI in this text;
- data_handling - an auxiliary module that stores parsers of data and documents for generating datasets;
- edulytica_api - this module stores the source code of the web service;
- extracting_rules - This module is devoted to an experiment with extracting design rules using LLM;
- rag - Package for an experiment with semantic search, kNN and the mBERT model are used.
Code documentation is available at the link.
For more information, see the file requiremets.txt.
Our contacts:
- Martsinkevich Viacheslav, [email protected];
- Tereshchenko Vladislav, [email protected];
- Aminov Natig, [email protected].
- XIII Конгресс молодых ученых ИТМО:
- Дворников А.С., Стрижов Д.А., Унтила А.А., Федоров Д.А. ИССЛЕДОВАНИЕ СГЕНЕРИРОВАННОГО ТЕКСТА НА ПРЕДМЕТ РАСПОЗНАВАНИЯ ИЗМЕНЕНИЙ СЕРВИСАМИ ИДЕНТИФИКАЦИИ ИСКУССТВЕННОГО ИНТЕЛЛЕКТА - 2024;
- Мищенко М.Ю., Мустафин Д.Э., Унтила А.А. Оценка релевантности неструктурированных данных для анализа и дообучения LLM - 2024;
- Маракулин А.А., Дедкова А.В., Аминов Н.С., Федоров Д.А. СРАВНИТЕЛЬНЫЙ АНАЛИЗ МЕТОДОВ PEFT ДЛЯ ДООБУЧЕНИЯ БОЛЬШИХ ЯЗЫКОВЫХ МОДЕЛЕЙ - 2024;
- Богданов М.А., Никифоров М.А., Аминов Н.С., Терещенко В.В., Федоров Д.А. АНАЛИЗ БОЛЬШИХ ДОКУМЕНТОВ ПРИ ПОМОЩИ БОЛЬШИХ ЯЗЫКОВЫХ МОДЕЛЕЙ - 2024;
- 53 конференция ППС:
- Мустафин Д.Э., Крылов М.М., Терещенко В.В.ХРАНЕНИЕ ГЕТЕРОГЕННЫХ ДАННЫХ ДЛЯ ИХ ПОСЛЕДУЮЩЕЙ ОБРАБОТКИ - 2023;
- Богданов М.А., Терещенко В.В., Аминов Н.С.ПРЕДВАРИТЕЛЬНЫЙ АНАЛИЗ ДОКУМЕНТОВ УЧЕБНОГО ПРОЦЕССА ДЛЯ ПОСЛЕДУЮЩЕГО ИХ ТЕМАТИЧЕСКОГО МОДЕЛИРОВАНИЯ - 2023;
- Синюков Л.В., Лаптев Е.И., Терещенко В.В.ОЦЕНКА ВЛИЯНИЯ ОБРАЗОВАТЕЛЬНЫХ ДИСЦИПЛИН НА РЕЗУЛЬТАТ КУРСОВЫХ РАБОТ С ИСПОЛЬЗОВАНИЕМ ТЕМАТИЧЕСКОГО МОДЕЛИРОВАНИЯ - 2023;
- Дворников А.С., Стрижов Д.А., Аминов Н.С. РАЗРАБОТКА LLM-МОДЕЛИ КЛАССИФИКАЦИИ ТЕКСТА С ЦЕЛЬЮ АВТОМАТИЧЕСКОГО ОПРЕДЕЛЕНИЯ ДОКУМЕНТА, НАПИСАННОГО ИСКУССТВЕННЫМ ИНТЕЛЛЕКТОМ - 2023.
Tereshchenko Vladislav
Martsinkevich Viacheslav
Aminov Natig
Mischenko Maxim
Bogdanov Maxim
Dvornikov Artem
Laptev Egor
Sinyukov Lev
Marakulin Andrew