Repo for the final deliverable of Dialogue Systems (COSC-4463).
This paper introduces SmartSpeak, a dialogue system designed to enhance productivity in corporate meetings by integrating transcription, conversational synthesis, and real-time question-answering capabilities. Leveraging advancements in large language models and multimodal conversational attributes derived from linguistic research, SmartSpeak aims to address critical challenges in dialogue systems, including grounding, turn-taking, and speaker recognition. OpenAI’s Whisper and Resemblyzer are utilized for speech transcription and speaker identification, while LLMs process transcriptions to deliver contextually relevant responses. Through the implementation and evaluation of SmartSpeak, this study demonstrates the potential of integrating linguistic principles with AI technologies to create systems that effectively augment human conversations. Key findings reveal both the limitations of current transcription models in real-world scenarios and the promise of LLMs to generate actionable insights. The paper concludes with directions for future research, including multimodal enhancements, system optimization for conversational flow, and fine-tuning on domain-specific corpora to further bridge the gap between human dialogue and AI systems.