Skip to content

Milestones

List view

  • # Description For some users, the number of false wake-ups can be too numerous. Ideally, the number of false wake-ups within a week of continuous use should be within 5. This should be accomplished without hurting the ability of the model to detect true-wake-ups. ## Production quality Given a user who runs the model for a week continuously, and the model has been fine-tuned on user-specific data: 1. A user can run the model with 5 false wake-ups or less. 2. A user can trigger the wake-up in a non-noisy environment at least 95% of the time. 3. A user can trigger the wake-up in a noisy environment at least 80% of the time. ## Hypothesis A wakeword model can be fine-tuned in some way using user collected data that will yield a production quality model. # DoD (Definition of Done) * data collection scheme, including a well-defined process and helper functions to make the fine-tuning process relatively simple * production quality model with extensive amounts of user-specific data (e.g., <=10 positive samples should lead to a significant performance improvement) * benchmarks demonstrating the fine-tuning process and representative performance improvements * unit tests * documentation

    No due date
  • # Description For some users, the number of false wake-ups can be too numerous. Ideally, the number of false wake-ups within a week of continuous use should be within 5. This should be accomplished without hurting the ability of the model to detect true-wake-ups. ## Production quality Given a user who runs the model for a week continuously: 1. A user can run the model with 5 false wake-ups or less. 2. A user can trigger the wake-up in a non-noisy environment at least 90% of the time. 3. A user can trigger the wake-up in a noisy environment at least 60% of the time. ## Hypothesis A wakeword model can be trained using only generated data that will yield a production quality model. # DoD (Definition of Done) * A well-defined process for training wake word/phrase models on synthetic-only * production quality model without user data * benchmarks demonstrating performance on real-world data * unit tests * documentation

    No due date
  • # Description The audio features for the current models come from melspectrograms as the input, and then the pre-trained embedding model from Google. This embedding model is the heaviest component, about 85% of total inference time. But this is also difficult to reduce, as it's the core of the model. To significantly increase efficiency, a lighter version of the model could be made that just uses spectrogram features, or only a portion of the layers in the pre-trained embedding model. This lighter model could serve as the first-stage in a multi-stage detection model, where heaver models are run only when a the lighter model is activated. This could significantly reduce idle CPU usage in deployment. # DoD (Definition of Done) * A model architecture that is at least 2x more efficient (with respect to CPU usage) compared to the current models, yet performs well enough to be used stand-alone, or as the first stage in a multi-stage model pipeline * unit tests * benchmarks * documentation

    No due date