Your Claude API call is eating 1.2 seconds. Here is when that stops being acceptable.
Every automation you build has a round trip baked in. Claude API, OpenAI call, n8n webhook, vector DB lookup. Data leaves the device, travels to a data center, gets processed, comes back. On a good day a cached Claude call runs 600ms to 1.2s. On a bad day you are watching a spinner. Where that round trip breaks: - Real-time perception in vehicles or robotics - Industrial control loops that cannot wait on a network - Anywhere connectivity is spotty or intermittent - High-volume sensor data where shipping everything to the cloud burns bandwidth and budget Edge changes the math. Compute runs where the data is generated. Local work stays local. Cloud only sees what needs cross-site context. The split that is emerging: - Edge: real-time inference, event detection, filtering, local control - Cloud: model training, cross-site analytics, long-term storage, heavy compute On-device SLMs are usable now. Llama 3.1 8B on an M-series Mac via Ollama hits sub-100ms first token. Haiku-class reasoning is running on phones with NPUs. If your automation touches physical systems, live audio, live video, or anything latency-sensitive, you have a real placement decision to make: edge, cloud, or hybrid. The pattern I keep reaching for: route by cost of latency. If a one-second delay breaks the experience, run it local. If the step needs cross-context memory or a frontier model, send it up. Cheap decisions at the edge, expensive ones in the cloud, with the edge doing the filter so the cloud only sees signal. The architecture question has changed. Not "which cloud model do I call," but "where does each step of this pipeline need to run." What is running in your stack right now that should not be making a cloud round trip? What is stopping you from moving it?