Making my local LLM voice assistant faster and more scalable with RAG

Making my local LLM voice assistant faster and more scalable with RAG
CRANK

If you read my previous blog post, you probably already know that I like my smart home open-source and very local, and that certainly includes any voice assistant I may have. If you watched the video demo, you have probably also found out that it’s… slow. Trust me, I did too. Prefix caching helps, but it feels like cheating. Sure, it’ll look amazing in a demo, but as soon as I start using my LLM for other things (which I do, quite often), that cache is going to get evicted and that first prompt is still going to be slow.

johnthenerd.com 2 years ago

Open page

https://johnthenerd.com/blog/faster-local-llm-assistant/