Trying Home Assistant’s “Year of Voice” AI

Home Assistant is arguably a staple of home labs for IoT devices and this year they’ve been running a “Year of Voice” campaign about their efforts to enable voice assistant functionality. So, I decided to take a peek and see how well it works.

For those who aren’t familiar with Home Assistant, it’s a platform/service that you can run on a server, Raspberry Pi, ESP device, etc. for creating Internet of Things automations. For example, you can have it automatically turn on your lights at sunset, or send you a notification if a water leak is detected.

However, the functionality I mainly want to look at is the Assist tool. This started as a chat you can do within Home Assistant which lets you tell it what you want to do and it will run the text through an intent parser to figure out what actions to take.

This year, as part of the “Year of Voice,” Home Assistant has been creating integrations to allow voice interaction with Assist. For example, you can use Google, Microsoft, or Amazon’s speech recognition and text-to-speech. It’s actually a clever use of AI in my opinion – instead of re-engineering the entire Assist tool, it simple transcribes the text to use the already-existing chat, and then converts the response to audio using the text-to-speech.

And even cooler, is that you can use locally-hosted AI models so you aren’t dependent on paying a cloud provider and sacrificing your privacy.

For local speech recognition, Home Assistant uses a derivative of OpenAI’s Whisper called faster-whisper that’s optimized to run faster. And for local text-to-speech, Home Assistant uses a model called Piper.

Since my ISP manages the routers in my apartment complex and doesn’t like IoT devices, I’m testing Home Assistant in a virtual machine. It took me two attempts to set up Home Assistant correctly – I originally set it up via Docker, however, it turns out that this doesn’t work with the local voice assistant addons. I ended up having to scrap that plan and restart using the virtual disk import this time.

By default, Assist doesn’t have any real capabilities. Even after exposing the default services (like weather), it wasn’t able to answer any questions.

Annoyingly, you have to specify every sentence you want Assist to be able to match so it knows which integration to run. However, after a bit of tinkering, I was able to set up the Microsoft Teams notification integration so that when I said “Hello World” it would send me a “Hello!” message in a Teams channel.

After knowing a little about specifying sentences, I made another attempt at the weather. I added alises of “weather” and “forecast” to the weather integration (in addition to already having exposed it to Assist). This proved more promising:

However, the excitement was short-lived when I tried another follow-up – despite adding “forecast,” the Assist wasn’t able to read the weather integration. Additionally, when asking for the humidity it didn’t associate that either:

Adding “humidity” as an alias showed what was happening – the weather integration seems to only export the current weather as a variable, regardless of which parameters are available:

I find this somewhere between ironic and frustrating since I can see the values on the dashboard:

So, what lessons were learned? I found that Home Assistant’s voice AI works well – once I figured out how to install Home Assistant in a way that supported local AI, it worked seamlessly with Assist. In some additional testing, I was able to get Assist to transcribe about a paragraph before it cut me off. However, the utility of the Assist tool is severely limited by the out-of-box readiness of integrations, the need to pre-define sentences, and integrations not exposing data in the right way for Assist to use it, even if it’s otherwise available in Home Assistant. Ultimately, I’m looking forward to seeing where Assist will go, and it’s neat to see how much on-device voice capabilities have improved in the past couple years. And given efforts like Home Assistant Green, which are designed to improve the out-of-box experience, I hope that using Assist will get easier.