I recently bought the AI hat 2 with it's dedicated RAM for offloading.
I'm wondering what the implications of this design are, for things such as AirLLM https://github.com/lyogavin/airllm
Specifically claims such as:
> you can run 405B Llama3.1 on 8GB vram now.
I think this would have implications for tool calling and other properties of models.
All the pre-build models seem to be 1.5B, which feels under-sized for the AI hat
I'm wondering if anyone else is working on this?
I'm wondering what the implications of this design are, for things such as AirLLM https://github.com/lyogavin/airllm
Specifically claims such as:
> you can run 405B Llama3.1 on 8GB vram now.
I think this would have implications for tool calling and other properties of models.
All the pre-build models seem to be 1.5B, which feels under-sized for the AI hat
I'm wondering if anyone else is working on this?
Statistics: Posted by LewisCowles1986 — Sat Jan 24, 2026 9:00 pm