Many usability thoughts #362
generrosity
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello!
I have ideas from using. Do you want 'Issues'?
I'm hoping many of these things are not "structural" and are more "UI quality of life" stuff
I tested on llava-hf/llava-v1.6-mistral-7b-hf as my second model as it was quick for testing.
Easy stuff;
GPU status - you could show if the logs contains proof of CPU vs GPU as its detected by adding to end of line, displaying icon or emoji in the button or draggable title, or just colouring something green vs red. 🎨𓇲
Pin the "start auto-captioning" button to the top of the panel (outside of scroll bar) so that it can't go offscreen and is always available to use. Minor irritation to keep scrolling to the top when working with settings further down (or accidently scrolling the whole panel when I was scrolling on a textarea).
text boxes look the same - caption output, and prompt in particular, but obviously caption output can't be edited, and control-C doesn't work (right click does work). This might just be changing the colour of the background and text (like html) or changing the border colour (aka Windows, etc).
expanding textboxes (caption output, and prompt) - this might be as easy as turning on an option or adding a section slider, but would save time scrolling a lot, or for reading error logs.
Image selection scroll bar - this seems to be only able to show images perfectly aligned with the top of the box. It feels very unnatural to use until I realized one tick was one image, maybe it is a specific item list component reused so you get the keyboard movement right. My suggestion would be to use a scroll container here (and ignore keyboard, or call to refocus after keyboard event depending on which way it goes)
Fairly easy stuff;
beep beep - when a model is downloaded, or the end of Start-Auto-Captioning is ended, play a low note or trigger a notification sound. Just helps with not knowing when it will finish.
I have adhd, and a poor memory. It would be amazing if the app could show in the Model dropdown menu;
Running other models is baffling. Some can't be run in 4-bit, some require a specific template for prompt. Could I suggest making wiki pages on github with the model name, and having a button to open your website? Then we can read the latest person's edit on prompts or setup
If prompts are very "model dependant" - remember prompt "per model"?? Otherwise I'm keeping a notepad on the side to maintain the configurations and settins.
does the app scan sub directories?
does the app only output text files, or does it also apply directly on the file? If not, what is the exiftool.exe command to consume?
my experiance with python says I can get GPU working if I uninstall a module and reinstall (at end) - is there an easy way for me to dig into that? I'm only scratching the surface of local pc Python
is there a way to automatically "test" models against files and see results? I was thinking of a powershell for-loop but the syntax for the program was escaping me for all the models and things
Thanks for the read :) ~generrosity
Beta Was this translation helpful? Give feedback.
All reactions