Using different machines for model and draft model across LAN? Chaining draft models? #11152

itch- · 2025-01-09T03:12:37Z

itch-
Jan 9, 2025

These new announcements of APUs that can use much more vram got me wondering. I doubt I'd enjoy their tk/s speed, but what if I can still use my GPU to run a draft model while across LAN I have a (very) big model running on an APU? The overhead may not be pretty but there should surely still be a huge speed increase over using just the APU.

And if I do have all my vram to spend on draft models, what about chaining them? I would have enough to split between a big draft model plus a tiny draft model for the big draft model.

Maybe it's crazy but I gotta put it out there.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using different machines for model and draft model across LAN? Chaining draft models? #11152

{{title}}

Replies: 0 comments

Select a reply

Using different machines for model and draft model across LAN? Chaining draft models? #11152

itch- Jan 9, 2025

Replies: 0 comments

itch-
Jan 9, 2025