In-vehicle AI-driven Digital Voice Assistant

Mobina Moeini ¹, Rouhollah Ahmadian ¹, Mehdi Ghatee ¹

¹Amirkabir University of Technology

Project Overview

Welcome to the Digital Voice Command Recognition System in-car project! This repository hosts the source code and resources for a voice command recognition system that utilizes machine learning and natural language processing techniques to identify car-related commands from the audio input.
The Voice Command Recognition System aims to recognize and classify commands from audio recordings. The system can distinguish between various predefined commands, such as opening doors, windows, or Bluetooth connections, controlling the steering wheel, and more. The project utilizes a combination of natural language processing (NLP) and machine learning (ML) techniques to achieve accurate and reliable command recognition.

Dataset

In this project, in-vehicle commands of this dataset have been used.

Key Features

Speech Recognition: The system is capable of capturing audio input from a microphone, making it suitable for real-world scenarios where voice commands need to be recognized instantly. Utilizing the advanced Vosk deep learning model, spoken input is accurately transformed into text for further processing within the system
Command Classification: Using machine learning models, the system classifies input text into predefined categories, enabling precise command recognition.
Unknown Command Handling: The system includes the ability to handle unknown or ambiguous commands by assigning them a specific "Unknown" label, ensuring robustness and adaptability.

Methodology

Project Components

Speech Recognition: We leverage the open-source Vosk model for speech recognition, ensuring accurate transcription of spoken user input into text format. Vosk stands out for its offline functionality, high accuracy, and multilingual support. What sets Vosk apart is its ease of customization.
Data Preprocessing: Preprocessing techniques included lowercasing, tokenization, lemmatization, and the removal of numbers, contractions, empty strings, and stop words. These techniques aimed to standardize the data format and remove irrelevant information that could hinder model performance. These pre-processing techniques helped filter noise, reduce sparsity, and enhance the semantic content extracted from the text. The uniformly pre-processed data served as input for subsequent model training and evaluation in the classification phase of the proposed framework.
Data Augmentation: To address our dataset's limitations, we employ augmentation techniques like random swapping and synonym replacement. These methods expand the training set and mitigate overfitting risks.
Model Training: Various Machine learning models such as MLP, Logistic Regression, RandomForest, GradientBoosting, and SVM are trained on our dataset and the best ones are chosen by comparing the classification results. Support Vector Machine (SVM) was chosen as the best model and trained on the preprocessed data to learn the patterns associated with each command.
Real-Time Recognition: The system captures and processes audio in real-time, predicting the spoken command using the trained models.

Getting Started

Clone this repository to your local machine.
Install the required dependencies using the provided requirements.txt file.
Run the system.

Future Work

Enhanced Command Set: Expand the range of recognized commands to cover a wider spectrum of user interactions.
Multilingual Support: Extend the project to handle commands and instructions in multiple languages. This would require translating and preprocessing text in various languages while maintaining the model's accuracy.
Continuous Learning: Explore techniques to enable the system to learn new commands from user input, enhancing adaptability.
Collaborative Filtering: Implement collaborative filtering techniques to learn from the preferences of different users and adapt the model accordingly.
Utilizing Pre-trained Language Models (LLMS): Explore the utilization of pre-trained Large Language Models for Transfer Learning to improve the efficiency of the text classification model, especially in scenarios characterized by limited labeled data.

Contributions

Contributions are welcome! If you find issues or have suggestions for improvements, feel free to create pull requests or open issues in this repository.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
Driver_Assistant.ipynb		Driver_Assistant.ipynb
README.md		README.md
mydataset.csv		mydataset.csv
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

In-vehicle AI-driven Digital Voice Assistant

Project Overview

Dataset

Key Features

Methodology

Project Components

Getting Started

Future Work

Contributions

About

Releases

Packages

Languages

Ruhallah93/digital-voice-assistant-in-car

Folders and files

Latest commit

History

Repository files navigation

In-vehicle AI-driven Digital Voice Assistant

Project Overview

Dataset

Key Features

Methodology

Project Components

Getting Started

Future Work

Contributions

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages