Dynamic Transcription Model Support, Settings Fix & Improved Clipboard Handling #1

jpierzchala · 2025-03-21T13:21:32Z

This pull request introduces three key improvements:

Dynamic Transcription Model Support

Updated the configuration schema (config_schema.yaml) to include a new transcription model option - gpt-4o-transcribe.
Modified transcription.py to extract the desired transcription model from the api_options configuration rather than hardcoding "whisper-1".
Updated the logging message to display the selected model during API requests.
This change enables dynamic selection of transcription models for the OpenAI API based on user configuration.

Settings Window Bug Fix

Fixed an issue in the settings window (settings_window.py) where reading a file resulted in concatenating its content to the existing text in the text editor.
The fix ensures that the editor preserves its current text, preventing unintended modifications. File contents will now be used only where needed for API calls.

3 Enhanced Clipboard Handling

Introduced a new static method, safe_open_clipboard, in input_simulation.py which attempts to open the clipboard multiple times (with configurable retries and delays) to handle transient access issues.
Replaced direct calls to win32clipboard.OpenClipboard in the _paste_with_clipboard_preservation method with safe_open_clipboard.
Updated win32clipboard.SetClipboardText to utilize the CF_UNICODETEXT flag for proper Unicode formatting and wrapped clipboard closing calls in try/except blocks to ensure cleanup even if errors occur.
Added error messages when the clipboard cannot be opened for preservation or restoration.
These improvements increase the robustness and reliability of clipboard operations during simulated input actions.

• Added a new static method safe_open_clipboard that attempts to open the clipboard repeatedly (with configurable retries and delay) to handle transient access issues. • Replaced direct win32clipboard.OpenClipboard calls in _paste_with_clipboard_preservation with safe_open_clipboard to safely preserve and later restore clipboard data. • Updated win32clipboard.SetClipboardText to use the CF_UNICODETEXT flag to ensure proper Unicode formatting. • Wrapped the clipboard closing calls in a try/except block to guarantee cleanup even if errors occur. • Added error messages if the clipboard cannot be opened for preservation or restoration. These changes improve the robustness and reliability of clipboard operations during simulated input actions.

…tion Body: Previously, when reading a file in the settings window the file's content was appended to the existing text in the text editor. This led to unexpected modifications of the displayed text. With this change, the editor now retains its current text without appending additional content from the file. File contents are only appended for the api call.

…or gpt-4o-transcribe Body: • Update config_schema.yaml to include a new transcription model option, gpt-4o-transcribe. • Modify transcription.py to obtain the model from api_options instead of hardcoding "whisper-1". • Update the logging message to reflect the chosen model, ensuring transparency during API requests. These changes enable dynamic selection of OpenAI transcription models based on the configuration.

…transcription-with-fallback Retry transcription on failure

…t activation

This commit introduces a retry mechanism for the audio transcription process. The ResultThread will now attempt to transcribe the audio up to 3 times if the initial attempt fails or returns an empty result. This improves the robustness of the transcription process by handling intermittent errors or empty transcriptions. Key changes: Added a loop to retry transcription up to 3 times. Included a 1-second pause between retries. Logged detailed information for each transcription attempt. If all attempts fail, the audio is saved, and a transcription_failed signal is emitted.

* Initial plan * Enhance failed audio saving with validation and detailed logging Co-authored-by: jpierzchala <[email protected]> * Add implementation summary and complete failed audio saving improvements Co-authored-by: jpierzchala <[email protected]> * Enhance failed audio saving validation and add comprehensive unit tests Co-authored-by: jpierzchala <[email protected]> * fix: resolve test failures in audio saving and result thread tests - Fix ConfigManager mock in test_result_thread.py by adding missing initialize() method - Improve module mocking and cleanup in test_failed_audio_simple.py to prevent import conflicts - Add proper module state management with backup/restore in tests - Ensure clean import state for each test run to avoid cross-test contamination All 7 tests now pass successfully. The fixes address: - AttributeError: MockConfigManager missing 'initialize' attribute - Message capture failures due to improper module mocking - Module import system conflicts between tests Tests now properly validate audio data validation, transcription retry logic, error handling, and failed audio file saving functionality. * feat: add comprehensive AI agent testing requirements and documentation - Create AGENTS.md with mandatory test execution requirements for AI agents - Add testing section to README.md Contributing guidelines - Configure VS Code pytest integration in .vscode/settings.json - Fix existing test failures in audio validation and result thread tests Key changes: - AGENTS.md: Establishes critical requirement that AI agents MUST run tests before completion - README.md: Adds "Running Tests" section with clear pytest commands and AI agent notice - VS Code: Enables automatic test discovery and pytest integration - Tests: Resolve ConfigManager mock issues and module import conflicts All 7 tests now pass successfully. This ensures AI agents (GitHub Copilot, OpenAI Codex, VS Code Copilot Agent) will automatically validate changes before completing work, preventing regressions in audio processing, transcription retry logic, and error handling. Commands for AI agents: - pytest tests/ -v (run all tests) - All tests must pass before work completion * Fix audio saving for failed transcriptions - Modified condition in result_thread.py to check both empty and whitespace-only results - Changed from `if not result:` to `if not result or not result.strip():` - Replaced debug print statements with proper console logging in transcription.py - Ensures failed audio files are saved when transcription returns empty strings after post-processing - Addresses issue where quota exceeded errors weren't triggering audio file saves --------- Co-authored-by: copilot-swe-agent[bot] <[email protected]> Co-authored-by: jpierzchala <[email protected]> Co-authored-by: jpierzchala <[email protected]>

* Initial plan * Add Azure OpenAI Whisper support and English language requirement Co-authored-by: jpierzchala <[email protected]> * Update AGENTS.md with comprehensive dependency installation instructions Co-authored-by: jpierzchala <[email protected]> * Remove implementation summary leftover from previous PR * fix: add Azure OpenAI API key support in keyring system - Fix missing azure_openai_api_key handling in settings_window.py - Add Azure OpenAI key loading from keyring when displaying settings - Add Azure OpenAI key saving to keyring when saving settings - Add Azure OpenAI key removal from config.yaml after keyring save - Create migrate_azure_key.py script to migrate existing keys from config to keyring - Resolve "Azure OpenAI API key not found in keyring" error Fixes issue where Azure OpenAI transcription failed despite API key being set in options * feat(llm): add Azure OpenAI support for LLM post-processing - Add azure_openai as a new API type option in config schema - Implement Azure OpenAI LLM processor with endpoint and deployment support - Add configuration fields for Azure OpenAI LLM credentials and settings - Update settings UI to show/hide provider-specific options dynamically - Add Azure OpenAI LLM API key handling in keyring management - Support both cleanup and instruction modes with Azure OpenAI models * test: Add comprehensive test coverage for Azure OpenAI features - Add 31 new tests covering all Azure OpenAI functionality changes - Test Azure OpenAI LLM processor initialization and text processing - Test Azure OpenAI transcription provider integration - Test keyring manager integration for API key storage - Test Azure key migration script functionality - Test UI integration for Azure OpenAI settings - Test end-to-end workflows and error handling - Resolve test isolation issues with proper mocking strategy - Achieve 100% test coverage for branch changes (40/40 tests passing) Tests added: - test_azure_openai_llm.py (5 tests) - LLM processor core functionality - test_azure_openai_llm_integration.py (5 tests) - Integration tests - test_azure_key_migration.py (6 tests) - Key migration functionality - test_azure_ui_integration.py (10 tests) - UI integration tests - test_azure_end_to_end.py (6 tests) - End-to-end workflow tests All tests pass successfully, ensuring robust coverage of Azure OpenAI features including transcription, LLM processing, keyring integration, and configuration management. --------- Co-authored-by: copilot-swe-agent[bot] <[email protected]> Co-authored-by: jpierzchala <[email protected]> Co-authored-by: jpierzchala <[email protected]>

🎯 Summary This PR introduces two major enhancements to WhisperWriter: Windows Autostart functionality and a comprehensive logging system with verbose mode. These features significantly improve user experience by enabling automatic application startup and providing better debugging capabilities. ✨ New Features 1. Windows Autostart Functionality Automatic Startup: WhisperWriter can now automatically start when Windows boots up GUI Integration: Added checkbox in Settings window to easily enable/disable autostart Smart Executable Detection: Automatically detects whether to use run_project.bat or run.py for startup Windows-Only Feature: Gracefully handles non-Windows systems with appropriate messaging Shortcut Management: Creates and manages Windows shortcuts in the startup folder using PowerShell 2. Comprehensive Logging System Verbose Mode: New -V or --verbose command-line flag for detailed debugging output File Logging: Optional logging to file with configurable path (~/.whisperwriter/logs/whisperwriter.log by default) Console Control: Configurable console output (can be disabled when using file logging) LLM Debugging: Full logging of prompts, system messages, and API responses in verbose mode Centralized Logging: All output goes through ConfigManager.console_print() for consistent handling 🔧 Technical Implementation AutostartManager (autostart_manager.py) Platform detection for Windows-only functionality PowerShell integration for reliable shortcut creation Robust error handling and user feedback Working directory and executable path management Configuration Schema Updates autostart_on_login: Boolean setting for autostart preference log_to_file: Boolean setting to enable file logging log_file_path: Configurable path for log file (optional) verbose_mode: Boolean setting for verbose output print_to_terminal: Boolean setting to control console output Enhanced Utils (utils.py) New console_print() method with verbose filtering File logging setup with proper encoding and formatting Dynamic logging configuration on config changes set_verbose_mode() for runtime verbose control 🧪 Testing Comprehensive Test Suite: 234 lines of new test code test_autostart.py: Tests all autostart functionality scenarios test_autostart_checkbox.py: Tests GUI integration Platform-specific test handling for Windows/non-Windows environments Mock testing for PowerShell and file system operations 🔄 Updated Components Main Application: Integration of verbose mode from command-line arguments Settings Window: New autostart checkbox with proper state management LLM Processor: Enhanced logging for debugging API interactions Result Thread: Improved error logging and status reporting Run Script: Verbose flag propagation from run.py to main application 📋 Configuration Changes All new settings are backward-compatible with sensible defaults:

Lord-Memester · 2025-07-02T14:49:41Z

I was hoping there would be a way to start this up automatically. I was just going to do it by using pyenv and a startup script, but integrating it into the options is a much better idea! (I also never got around to seeing if my idea would work 😅)

jpierzchala · 2025-07-03T11:12:52Z

@Lord-Memester, I should have made a pull request from a branch instead of from main, because Tom hasn't touched my pull request here at all since March, and quite a lot has happened in the repository since then. My latest version starts automatically with the system, simply by adding a shortcut to the Windows startup. Simple, but it works.

jpierzchala and others added 17 commits March 21, 2025 10:26

fix: Conditionally apply temperature parameter for OpenAI models

4cedef9

Add transcription retry and failure handling

53b580a

Merge pull request #2 from jpierzchala/codex/add-retry-mechanism-for-…

a2d61c5

…transcription-with-fallback Retry transcription on failure

Merge branch 'main' of https://github.com/jpierzchala/whisper-writer

9eb1813

feat: add batch script to run the application with virtual environmen…

47ffd8e

…t activation

feat: add prompts file type to .gitignore

055de4a

Add unit tests for transcription retry logic

59ab9ae

feat: add prompts file type to .gitignore

6cf742c

Added .code-workspace to gitignore

79f8494

Remove Azure OpenAI API key migration script

49d5a68

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Dynamic Transcription Model Support, Settings Fix & Improved Clipboard Handling #1

Dynamic Transcription Model Support, Settings Fix & Improved Clipboard Handling #1

Uh oh!

jpierzchala commented Mar 21, 2025

Uh oh!

Lord-Memester commented Jul 2, 2025 •

edited

Loading

Uh oh!

jpierzchala commented Jul 3, 2025

Uh oh!

Uh oh!

Dynamic Transcription Model Support, Settings Fix & Improved Clipboard Handling #1

Are you sure you want to change the base?

Dynamic Transcription Model Support, Settings Fix & Improved Clipboard Handling #1

Uh oh!

Conversation

jpierzchala commented Mar 21, 2025

Uh oh!

Lord-Memester commented Jul 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jpierzchala commented Jul 3, 2025

Uh oh!

Uh oh!

Lord-Memester commented Jul 2, 2025 •

edited

Loading