Skip to content

Dynamic Transcription Model Support, Settings Fix & Improved Clipboard Handling #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 17 commits into
base: main
Choose a base branch
from

Conversation

jpierzchala
Copy link

This pull request introduces three key improvements:

  1. Dynamic Transcription Model Support
  • Updated the configuration schema (config_schema.yaml) to include a new transcription model option - gpt-4o-transcribe.
  • Modified transcription.py to extract the desired transcription model from the api_options configuration rather than hardcoding "whisper-1".
  • Updated the logging message to display the selected model during API requests.
  • This change enables dynamic selection of transcription models for the OpenAI API based on user configuration.
  1. Settings Window Bug Fix
  • Fixed an issue in the settings window (settings_window.py) where reading a file resulted in concatenating its content to the existing text in the text editor.
  • The fix ensures that the editor preserves its current text, preventing unintended modifications. File contents will now be used only where needed for API calls.

3 Enhanced Clipboard Handling

  • Introduced a new static method, safe_open_clipboard, in input_simulation.py which attempts to open the clipboard multiple times (with configurable retries and delays) to handle transient access issues.
  • Replaced direct calls to win32clipboard.OpenClipboard in the _paste_with_clipboard_preservation method with safe_open_clipboard.
  • Updated win32clipboard.SetClipboardText to utilize the CF_UNICODETEXT flag for proper Unicode formatting and wrapped clipboard closing calls in try/except blocks to ensure cleanup even if errors occur.
  • Added error messages when the clipboard cannot be opened for preservation or restoration.
  • These improvements increase the robustness and reliability of clipboard operations during simulated input actions.

jpierzchala and others added 17 commits March 21, 2025 10:26
• Added a new static method safe_open_clipboard that attempts to open the clipboard repeatedly (with configurable retries and delay) to handle transient access issues.
• Replaced direct win32clipboard.OpenClipboard calls in _paste_with_clipboard_preservation with safe_open_clipboard to safely preserve and later restore clipboard data.
• Updated win32clipboard.SetClipboardText to use the CF_UNICODETEXT flag to ensure proper Unicode formatting.
• Wrapped the clipboard closing calls in a try/except block to guarantee cleanup even if errors occur.
• Added error messages if the clipboard cannot be opened for preservation or restoration.

These changes improve the robustness and reliability of clipboard operations during simulated input actions.
…tion

Body:
Previously, when reading a file in the settings window the file's content was appended to the existing text in the text editor. This led to unexpected modifications of the displayed text. With this change, the editor now retains its current text without appending additional content from the file. File contents are only appended for the api call.
…or gpt-4o-transcribe

Body:
• Update config_schema.yaml to include a new transcription model option, gpt-4o-transcribe.
• Modify transcription.py to obtain the model from api_options instead of hardcoding "whisper-1".
• Update the logging message to reflect the chosen model, ensuring transparency during API requests.

These changes enable dynamic selection of OpenAI transcription models based on the configuration.
…transcription-with-fallback

Retry transcription on failure
This commit introduces a retry mechanism for the audio transcription process.

The ResultThread will now attempt to transcribe the audio up to 3 times if the initial attempt fails or returns an empty result. This improves the robustness of the transcription process by handling intermittent errors or empty transcriptions.

Key changes:

Added a loop to retry transcription up to 3 times.
Included a 1-second pause between retries.
Logged detailed information for each transcription attempt.
If all attempts fail, the audio is saved, and a transcription_failed signal is emitted.
* Initial plan

* Enhance failed audio saving with validation and detailed logging

Co-authored-by: jpierzchala <[email protected]>

* Add implementation summary and complete failed audio saving improvements

Co-authored-by: jpierzchala <[email protected]>

* Enhance failed audio saving validation and add comprehensive unit tests

Co-authored-by: jpierzchala <[email protected]>

* fix: resolve test failures in audio saving and result thread tests

- Fix ConfigManager mock in test_result_thread.py by adding missing initialize() method
- Improve module mocking and cleanup in test_failed_audio_simple.py to prevent import conflicts
- Add proper module state management with backup/restore in tests
- Ensure clean import state for each test run to avoid cross-test contamination

All 7 tests now pass successfully. The fixes address:
- AttributeError: MockConfigManager missing 'initialize' attribute
- Message capture failures due to improper module mocking
- Module import system conflicts between tests

Tests now properly validate audio data validation, transcription retry logic,
error handling, and failed audio file saving functionality.

* feat: add comprehensive AI agent testing requirements and documentation

- Create AGENTS.md with mandatory test execution requirements for AI agents
- Add testing section to README.md Contributing guidelines
- Configure VS Code pytest integration in .vscode/settings.json
- Fix existing test failures in audio validation and result thread tests

Key changes:
- AGENTS.md: Establishes critical requirement that AI agents MUST run tests before completion
- README.md: Adds "Running Tests" section with clear pytest commands and AI agent notice
- VS Code: Enables automatic test discovery and pytest integration
- Tests: Resolve ConfigManager mock issues and module import conflicts

All 7 tests now pass successfully. This ensures AI agents (GitHub Copilot, OpenAI Codex,
VS Code Copilot Agent) will automatically validate changes before completing work,
preventing regressions in audio processing, transcription retry logic, and error handling.

Commands for AI agents:
- pytest tests/ -v (run all tests)
- All tests must pass before work completion

* Fix audio saving for failed transcriptions

- Modified condition in result_thread.py to check both empty and whitespace-only results
- Changed from `if not result:` to `if not result or not result.strip():`
- Replaced debug print statements with proper console logging in transcription.py
- Ensures failed audio files are saved when transcription returns empty strings after post-processing
- Addresses issue where quota exceeded errors weren't triggering audio file saves

---------

Co-authored-by: copilot-swe-agent[bot] <[email protected]>
Co-authored-by: jpierzchala <[email protected]>
Co-authored-by: jpierzchala <[email protected]>
* Initial plan

* Add Azure OpenAI Whisper support and English language requirement

Co-authored-by: jpierzchala <[email protected]>

* Update AGENTS.md with comprehensive dependency installation instructions

Co-authored-by: jpierzchala <[email protected]>

* Remove implementation summary leftover from previous PR

* fix: add Azure OpenAI API key support in keyring system

- Fix missing azure_openai_api_key handling in settings_window.py
- Add Azure OpenAI key loading from keyring when displaying settings
- Add Azure OpenAI key saving to keyring when saving settings
- Add Azure OpenAI key removal from config.yaml after keyring save
- Create migrate_azure_key.py script to migrate existing keys from config to keyring
- Resolve "Azure OpenAI API key not found in keyring" error

Fixes issue where Azure OpenAI transcription failed despite API key being set in options

* feat(llm): add Azure OpenAI support for LLM post-processing

- Add azure_openai as a new API type option in config schema
- Implement Azure OpenAI LLM processor with endpoint and deployment support
- Add configuration fields for Azure OpenAI LLM credentials and settings
- Update settings UI to show/hide provider-specific options dynamically
- Add Azure OpenAI LLM API key handling in keyring management
- Support both cleanup and instruction modes with Azure OpenAI models

* test: Add comprehensive test coverage for Azure OpenAI features

- Add 31 new tests covering all Azure OpenAI functionality changes
- Test Azure OpenAI LLM processor initialization and text processing
- Test Azure OpenAI transcription provider integration
- Test keyring manager integration for API key storage
- Test Azure key migration script functionality
- Test UI integration for Azure OpenAI settings
- Test end-to-end workflows and error handling
- Resolve test isolation issues with proper mocking strategy
- Achieve 100% test coverage for branch changes (40/40 tests passing)

Tests added:
- test_azure_openai_llm.py (5 tests) - LLM processor core functionality
- test_azure_openai_llm_integration.py (5 tests) - Integration tests
- test_azure_key_migration.py (6 tests) - Key migration functionality
- test_azure_ui_integration.py (10 tests) - UI integration tests
- test_azure_end_to_end.py (6 tests) - End-to-end workflow tests

All tests pass successfully, ensuring robust coverage of Azure OpenAI
features including transcription, LLM processing, keyring integration,
and configuration management.

---------

Co-authored-by: copilot-swe-agent[bot] <[email protected]>
Co-authored-by: jpierzchala <[email protected]>
Co-authored-by: jpierzchala <[email protected]>
🎯 Summary

This PR introduces two major enhancements to WhisperWriter: Windows Autostart functionality and a comprehensive logging system with verbose mode. These features significantly improve user experience by enabling automatic application startup and providing better debugging capabilities.

✨ New Features

1. Windows Autostart Functionality

Automatic Startup: WhisperWriter can now automatically start when Windows boots up

GUI Integration: Added checkbox in Settings window to easily enable/disable autostart

Smart Executable Detection: Automatically detects whether to use run_project.bat or run.py for startup

Windows-Only Feature: Gracefully handles non-Windows systems with appropriate messaging

Shortcut Management: Creates and manages Windows shortcuts in the startup folder using PowerShell

2. Comprehensive Logging System

Verbose Mode: New -V or --verbose command-line flag for detailed debugging output

File Logging: Optional logging to file with configurable path (~/.whisperwriter/logs/whisperwriter.log by default)

Console Control: Configurable console output (can be disabled when using file logging)

LLM Debugging: Full logging of prompts, system messages, and API responses in verbose mode

Centralized Logging: All output goes through ConfigManager.console_print() for consistent handling

🔧 Technical Implementation

AutostartManager (autostart_manager.py)

Platform detection for Windows-only functionality

PowerShell integration for reliable shortcut creation

Robust error handling and user feedback

Working directory and executable path management

Configuration Schema Updates

autostart_on_login: Boolean setting for autostart preference

log_to_file: Boolean setting to enable file logging

log_file_path: Configurable path for log file (optional)

verbose_mode: Boolean setting for verbose output

print_to_terminal: Boolean setting to control console output

Enhanced Utils (utils.py)

New console_print() method with verbose filtering

File logging setup with proper encoding and formatting

Dynamic logging configuration on config changes

set_verbose_mode() for runtime verbose control

🧪 Testing

Comprehensive Test Suite: 234 lines of new test code

test_autostart.py: Tests all autostart functionality scenarios

test_autostart_checkbox.py: Tests GUI integration

Platform-specific test handling for Windows/non-Windows environments

Mock testing for PowerShell and file system operations

🔄 Updated Components

Main Application: Integration of verbose mode from command-line arguments

Settings Window: New autostart checkbox with proper state management

LLM Processor: Enhanced logging for debugging API interactions

Result Thread: Improved error logging and status reporting

Run Script: Verbose flag propagation from run.py to main application

📋 Configuration Changes

All new settings are backward-compatible with sensible defaults:
@Lord-Memester
Copy link

Lord-Memester commented Jul 2, 2025

I was hoping there would be a way to start this up automatically. I was just going to do it by using pyenv and a startup script, but integrating it into the options is a much better idea! (I also never got around to seeing if my idea would work 😅)

@jpierzchala
Copy link
Author

@Lord-Memester, I should have made a pull request from a branch instead of from main, because Tom hasn't touched my pull request here at all since March, and quite a lot has happened in the repository since then. My latest version starts automatically with the system, simply by adding a shortcut to the Windows startup. Simple, but it works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants