This looks incredible:
https://prismml.com/news/bonsai-8b
This should be the perfect model for phones - Qwen 3 level intelligence with 1 GB of RAM at 5x higher speeds.
As you can see, they have the 8B model running on iPhone at very reasonable speeds.
They have a modified llama.cpp that supports it:
https://github.com/ggml-org/llama.cpp
Whitepaper here:
https://github.com/PrismML-Eng/Bonsai-demo/blob/main/1-bit-bonsai-8b-whitepaper.pdf
It's already available in AnythingLLM:
https://github.com/Mintplex-Labs/anything-llm
This could be the perfect model for phones - if it could work with this app, it might even make sense as a default, perhaps with options to use the even smaller models, which they've also released.
It would be amazing if this app was the first to ship this model to Android users. 😄
This looks incredible:
https://prismml.com/news/bonsai-8b
This should be the perfect model for phones - Qwen 3 level intelligence with 1 GB of RAM at 5x higher speeds.
As you can see, they have the 8B model running on iPhone at very reasonable speeds.
They have a modified llama.cpp that supports it:
https://github.com/ggml-org/llama.cpp
Whitepaper here:
https://github.com/PrismML-Eng/Bonsai-demo/blob/main/1-bit-bonsai-8b-whitepaper.pdf
It's already available in AnythingLLM:
https://github.com/Mintplex-Labs/anything-llm
This could be the perfect model for phones - if it could work with this app, it might even make sense as a default, perhaps with options to use the even smaller models, which they've also released.
It would be amazing if this app was the first to ship this model to Android users. 😄