Hi all,
I’m trying to build my aiOS that I can control is with my Voice from anywhere at anytime. On my computer it’s easy. I’ve built a speech to text exactly like Whispr flow works but then locally.
Speech - local llm transcribe - text cleanup with local Gemma 4 and a custom python script with my keywords. This is amazing and FAST. Great for prompting with my voice at zero cost.
The next step is a bit more difficult but still pretty ok. I want to do the same with my iPhone... The recording automation is easy: shortcut button - record audio - tap for end recording - save to Dropbox (or iCloud).
The computer gets this audio file and my desktop system transcribes it at the same high level with all my personal words. I speak dutch, so the Claude speech to text is not as good as my personal system and has limits. So far so good, but now i’m stuck…
I sometimes have a voice prompt like ‘write me this e-Mail’ (it runs my emailskill with knowledgebase). Sometimes it will just be a braindump saved in my ideas & sometimes I need the text back on my Phone. ‘Honey, i’ll be home by 5 and yes, I will put out the garbage’. This would be amazing for when i’m on the road and will be the future of my system. All Voice, all Apple devices connected to my aiOS knowing what to do.
Anyone got a good idea for this architecture?