Systems | Development | Analytics | API | Testing

Is WebSockets enough for AI chat?

WebSockets are the right protocol for production AI chat. But that fact doesn’t prevent the failure most teams hit first. An enterprise load balancer closes the idle connection at 60 seconds during a tool execution wait. Your reconnect logic fires in under a second, the agent keeps running server-side, and the client receives nothing from the gap. No tokens, no tool call results, no context. The reconnected socket has no view of what happened while it was down.

We built a Custom Transport for Vercel's AI SDK

Ably is a realtime messaging platform, it's a pub/sub product where you can publish messages to channels and clients subscribed to those channels will receive those messages in realtime. It turns out that the Ably realtime platform is really well suited to being the transport that sits between your AI models and the clients receiving the generated responses.

Conversation tree branching in @ably/ai-transport

Picture a developer pair-programming with an AI assistant. The model returns a function that almost works. The developer asks it to try again. The second attempt is worse. They want the first one back. In a linear chat, that history is gone, or it's a third bubble in the thread that pollutes context for every future turn.

The model is fine. The session is broken.

Take any AI agent demo from the last six months. It works. Now ship it to real users on real networks, real devices, real attention spans. A meaningful share of those users will never finish their first conversation cleanly. Not because the model gave a bad answer. Because the connection dropped, the tab refreshed, the phone took over from the laptop, or the spinner kept spinning forever.

Why we built a dedicated SDK for realtime AI streaming

If you've built a conversational AI feature, you know the pattern. Client sends a message, backend calls a model, response streams back over HTTP. SSE mostly, or WebSockets if you need bidirectional. For a single user on a single device, it works well. The trouble is the best AI products right now have moved well past that.

Why production AI needs a session layer, not just a stream

I spoke at AI Engineer Europe last week, and came away with a clearer picture of where the industry actually is right now. My talk was about why AI user experience breaks at the transport layer. But the bigger takeaway wasn't from my own session. It was from watching what the rest of the room was building, and what problems they were running into.

The Durable Sessions stack is forming

By Matt O'Riordan, CEO and Co-Founder Across AI infrastructure right now, one word is doing a lot of work: durable. It is attached to execution. To agents. To workflows. To sessions. To streams. To transports. To memory. Every few weeks, another product ships with "durable" in the name. This is not branding noise. The underlying observation is the same in every case. AI systems are long-lived. They can fail at any layer. They need infrastructure that assumes failure rather than hopes against it.

Ably Python SDK v3: realtime for Python, built for AI

Python dominates AI development. It's where teams build their agents, orchestration layers, and the backend systems that turn LLM calls into products people actually use. Over the past year, those systems have matured rapidly. What used to live in notebooks and prototypes is now running in production, serving real users with real expectations around reliability and performance. That maturity brings infrastructure requirements. Tokens need to stream in order.