Description
after a thread on the audiogames.net forum regarding speech dispatcher not working, therefore user is left without a way to use their computer in graphical mode, I concluded that, regardless how good speech dispatcher may be from a unix philosophi standpoint, it's made of many moving parts, two many imho, any of which can bring the entire system down, simply by failing. In a world where speech is the only way for visually impaired people to use a computer therefore as critical as the GPU for sighted users, having it fail on us is simply inacceptable. Dear readers with some kind of sight, would you like your screen to go black simply because the shader cache was full for example? Also, graphics are integrated everywhere in the stack while speech isn't, but that's another discussion for another time. So then, as I'm sure you won't like that, why should we have to deal with it? I took this idea from screenreaders like nvda and fenrir for the tty, where there's a speech abstraction inside the sr itself, not speech dispatcher. Then, there are different backends facilitating speech, which eventually perhaps give back pcm wave data to the sr, which gets processed by some internal systems, for example direct interaction with pipewire. So, in this case, we would use a rust trait to abstract away the concrete implementation of the backend speech provider, then probably the sr would use Box<dyn SpeechProvider>
as the interface through which to deliver speech to the user. For now, this is the draft I want to propose for this, feel free to modify it and suggest improvements, as this one is probably here to stay after it'll be implemented
pub trait SpeechProvider: Sized {
type Configurator;
type Error;
type Buffer;
fn init_speaker(cfg: Self::Configurator) -> Result<Self, Self::Error>;
fn speak<T>(&self, text: T) -> Result<Self::Buffer, Self::Error>
where
T: AsRef<str>;
fn pause(&self) -> Result<(), Self::Error>;
fn stop(&self) -> Result<(), Self::Error>;
//configuration specific methods like set_volume, get_volume, set_pitch, etc are not required because the configurator is backend specific and will load values in the init method, this is ment to be called each time the configuration is supposed to change.
fn reload_configuration(self, cfg: Self::Configurator) -> Result<Self, Self::Error>;
}
For now, here are a few things the current implementation doesn't explain:
- loading the right speech provider based on configuration and populating the middleware with it. Define mechanism that would fit inside rust's typesystem, such as enum dispatch and dynamic dispatch
- loading the right provider specific configuration
- where is that located?
- should it be standardised, or anywhere the config crate can find it?
- how are changes to it tracked?
- do we allow implementations to define their own format, or do we inforce toml everywhere?
- loading the right provider specific configuration
- buffer handling
- do we introduce synthizer now?
- how do we pass the buffer over? rust isn't a dynamically typed language, so how do we interract with the buffer type? Is an associated type even a good mechanism, given what we want to do?
- as to the functionality of the buffer itself, do we treat it like a vec, or do we make an additional buffer trait that has a method which returns an iterator of float values?
- or maybe we should implement pipewire access directly
- do we allow custom sample rates, or define a static one?