Background
Daniel and his co-founder had a clear product vision: an AI layer that deflects repetitive support tickets before they reach a human agent. The technical challenge wasn't the AI logic — it was the infrastructure around it. Streaming responses, graceful tool call handling, conversation memory with vector retrieval, and background job processing for async operations. Building that from scratch would have taken the entire first sprint. And it would have been brittle — production AI infrastructure has edge cases that only emerge under real load.
The challenge
The team needed AI infrastructure that was production-grade from day one: streaming chat with proper error boundaries, tool call patterns for routing tickets to the right knowledge base, vector memory with Qdrant for retrieving relevant support context, and BullMQ background jobs for async operations like ticket categorization. Each piece had to integrate cleanly with the others. Building from scratch risked two weeks of integration work before a single line of product logic.
How they built it
Three handlers adapted in an afternoon
ShipAI ships with 11 AI handlers covering streaming, tool use, multi-provider configuration, memory retrieval, and more. Daniel identified the three most relevant for his use case — the streaming chat handler, the tool call routing handler, and the vector retrieval handler — and adapted them to SupportLayer's domain. System prompts updated, tool schemas adjusted, Qdrant collection pointed at their support knowledge base. The underlying streaming and error handling infrastructure remained unchanged.
Qdrant vector memory for support context
SupportLayer needed to retrieve relevant documentation and past resolutions during a support conversation. The Qdrant integration in ShipAI was already wired to the AI SDK — Daniel ingested their support knowledge base into a named collection and pointed the retrieval handler at it. The vector search and embedding pipeline were already handled; he wrote the query logic and result formatting.
BullMQ for async ticket processing
Some operations — categorizing tickets, updating external systems, sending escalation notifications — couldn't block the chat response. The BullMQ integration was pre-configured for background job processing. Daniel defined a ticket processing worker following the existing worker pattern and had async operations running in under an hour.
OpenTelemetry traces for production monitoring
Once the bot was running in production, Daniel used the built-in OpenTelemetry tracing to monitor handler latency, identify slow retrieval operations, and trace the full path of problematic tickets. The trace viewer in the admin panel made it possible to debug production issues without instrumenting anything manually.
Outcomes
40% tier-1 ticket deflection
Within two weeks of deployment, the SupportLayer bot was handling 40% of inbound tier-1 support volume without any human intervention.
AI infrastructure built in 2 days
Streaming, tool use, vector retrieval, and background jobs — all production-grade, all operational in under 48 hours.
Zero streaming infrastructure code written
All streaming response handling, error boundaries, and retry logic came from the adapted handlers. Daniel wrote only product-specific logic.
Production issues debugged in minutes with traces
The OpenTelemetry trace viewer caught two latency issues in the first week that would have taken hours to diagnose without observability.
In their own words
The AI handler architecture is the part I keep referencing when I talk to other founders. Most boilerplates give you a toy chat example. These are production patterns — streaming with proper error handling, tool call routing that actually works under load, memory retrieval that integrates cleanly with the chat flow. I built on them, I didn't fight them.
“The AI handler architecture is the part I keep coming back to. Eleven pre-wired handlers covering streaming, tool use, and memory — I adapted three of them for our support deflection bot in an afternoon. We're now deflecting about 40% of tier-1 tickets without touching the support team's workflow.”
— Daniel Hoffmann
Frequently asked questions
How does the vector retrieval work in SupportLayer's case?
SupportLayer ingested their support documentation and resolved ticket history into a Qdrant collection. The retrieval handler embeds the incoming user question, runs a nearest-neighbor search, and injects the top results into the AI context before generating a response.
What made the BullMQ integration easy to adopt?
The worker pattern was already established in the codebase. Daniel defined his ticket processing worker following the existing pattern — same queue registration, same error handling structure, same retry configuration. He didn't need to learn BullMQ from scratch or design the worker architecture.
What does the 40% deflection mean in practice?
For every 100 inbound support requests, 40 are now fully handled by the AI bot — the user gets a resolution and never opens a ticket. The other 60 get routed to humans with context already attached, which also reduces human handling time.