If you’re using the Home Assistant voice assistant mechanism (not Alexa/Google/etc.) how’s it working for you?
Given there’s a number of knobs that you can use, what do you use and what works well?
- Wake word model. There’s the default models and custom
- Conservation agent and model
- Speech to text models (e.g. speech-to-phrase or whisper)
- Text to speech models
The biggest challenge, in my experience, is finding hardware that heats you well, and you can hear well.
On my PCs USB mic when I’m sitting directly on front of it, everything works quite well. Once I start stepping away, things start to get funky.
We’ve been using the previews since they shipped. The Mycroft wake word has worked well enough for the whole family. Tried the chatbot fallback but the syntax of the intent parser is strict enough we were getting routed to the llm way more than we wanted. For example asking it to turn on a light and Claude telling us it couldn’t do that. It fails faster and more reliably with just the intent parser.
Our favorite use case is shopping lists. “Hey Mycroft add greens to groceries list” is great and won me some WAF. I also regularly use timers, some custom commands (hey Mycroft I fed the dog), and managing lights with scenes (hey Mycroft turn on Daytime).
I’m hoping to one day transition to a local llm that’s fine tuned for homeassistant specific tasks and it looks like some good ones will arrive soon. The existing implementations haven’t won me over yet.
Dunno, I’m a big fan and the wife doesn’t hate them, I’m really optimistic about the future of these. I think HA is going about them the right way and we’ll see good things in the future. It’s a little rough right now if you’re not willing to put up with the quirks probably but I think it’s just going to keep getting better.
Forgot to add, something I really want to figure out is how to do reminders with it. I’m stuck using Gemini on my phone for that and I’d really love to find a way to do that in HA if anyone has any tips.
Me too, so I started looking around. https://community.home-assistant.io/t/calendar-notifications-actions/612326
I don’t have time right now, but will this work?
HA Voice Preview is cool but it’s a toy compared to Alexa. It’s not loud enough and doesn’t pick out the wake work well enough. Incredibly cool pipeline and ability to tweak it, once the hardware improves I’d love to replace all my Echo’s.
I only ever set timers and checked the weather when I had a Google home mini, Voice preview is able to do that pretty well.
I have speech to text with whisper working on my phone. HA is set to the default assistant on my phone so I can control lights, timers and scenes from 1 action on my phone or watch. Works well once I set friendly names for each light, room, device and scene.
Picked up a preview edition last year and it just kind of sits there.
I really need to get it running for basic automation tasks but finding the time to research good tutorials seems to be eluding me.
I also have a preview edition.
I moved HA from my server to a HA green to separate reliability (my server is a test bed and uptime isnt great, and home automation warrants better uptime than I was giving it).
The voice services don’t work as well on the green directly, but I view it as part of the HA ecosystem and I want it running on the same hardware, but it seems very much like not a great option for that. And even on my own hardware, it still seems like it was a bit slower than I’d want and not always accurate. I definitely need a lot of tweaking (just like OP) to make it worth while.
I use the HA Voice Preview in two different rooms and got rid of my Alexa Dots. I’ve been trying both speech-to-phrase and whisper with medium.en running on the GPU for STT, tried llama3.2 and granite4 for the LLM with local command handling
I’ve been trying to get it working better, but it’s been a struggle. The wake word responds to me, but not my girlfriend’s voice. I try setting timers, and it says done, but never triggers the timer.
I’d love to improve operating performance of my assistant, but want to know what options work well for others. I’ve been experimenting with an intermediary STT proxy to send it to both whisper and speech-to-phrase to see which one has more confidence.
I have an S3-BOX-3 and it works, sort of. But without a local LLM and better local speech to text it’s not super useful.


