Skip to content

Conversation

@RandomGamingDev
Copy link

Description

Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context.

Fixes # (issue)

Checklist

Please ensure the following items are complete before submitting a pull request:

  • My code follows the code style of the project.
  • I have updated the documentation (if applicable).
  • I have added tests to cover my changes.

Type of Change

Please check the relevant option below:

  • Bug fix (non-breaking change which fixes an issue)
  • Documentation update (non-breaking change which updates documentation)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

Additional Notes

Now capable of loading larger models (i.e. Google's E2B & E4B) without just completely crashing.

…dels by implementing buffered loading system allowing for the loading of larger models like Gemma 3n E2B & E4B
@RandomGamingDev
Copy link
Author

@tyrmullen
Apologies beforehand for the ping, but it seems like a lot of people struggled with the same issue, not being able to get past it, and you're the most active contributor on the repository (which I appreciate, I love Google's work on AI efficiency, and especially edge AI :D).

@tyrmullen
Copy link
Contributor

Thanks! For this particular sample/demo, we're hoping to keep things as simple as possible for first-time users. In particular, we wanted to have this demo be (a) super-easy to run, and (b) have the code be pretty minimal.

To that end, we just submitted a few changes to simplify things. Instead of requiring users serve models themselves locally with Python, we switched to a simple "file chooser" button, so now users can simply open the demo .html file in their browsers and immediately try out any models they have (no Python required). As a side-effect of this, the models will skip the remote fetching and thus the streaming loading should be very fast (and we also ensure streaming loading occurs for the file upload by using modelAssetBuffer:modelStream.getReader() to create the LlmInference).

For more advanced users, like those interested in local caching, the other demos (LLM chat and 3N) should be a better and more complete reference, so we're focusing our local caching examples there just to keep this code a little smaller (since local caching can be a bit tricky, is domain-specific, can be done in a few different ways, and isn't for everyone).

@RandomGamingDev
Copy link
Author

Thanks! For this particular sample/demo, we're hoping to keep things as simple as possible for first-time users. In particular, we wanted to have this demo be (a) super-easy to run, and (b) have the code be pretty minimal.

To that end, we just submitted a few changes to simplify things. Instead of requiring users serve models themselves locally with Python, we switched to a simple "file chooser" button, so now users can simply open the demo .html file in their browsers and immediately try out any models they have (no Python required). As a side-effect of this, the models will skip the remote fetching and thus the streaming loading should be very fast (and we also ensure streaming loading occurs for the file upload by using modelAssetBuffer:modelStream.getReader() to create the LlmInference).

For more advanced users, like those interested in local caching, the other demos (LLM chat and 3N) should be a better and more complete reference, so we're focusing our local caching examples there just to keep this code a little smaller (since local caching can be a bit tricky, is domain-specific, can be done in a few different ways, and isn't for everyone).

That sounds great! This issue should be free to be closed now then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants