How to Build an AI SaaS with Next.js in 2026
A practical breakdown of the architecture, stack decisions, and implementation steps for building a production-ready AI SaaS on Next.js—from first commit to first paying customer.
Contents
Building an AI SaaS in 2026 is cheaper and faster than ever. LLM APIs are commoditized, hosting is serverless, and payment rails are pre-built. What takes time is the infrastructure glue: auth, billing, rate limiting, observability, and keeping your AI costs from running away.
This guide covers the architecture decisions, exact stack, and implementation sequence for a production-ready AI SaaS—not a prototype, a shippable product.
What "AI SaaS" Actually Means
An AI SaaS is a subscription product where the core value is delivered by a language model or ML model. The business model requires you to:
- Manage user accounts and sessions.
- Gate features by plan tier.
- Track usage (tokens, requests, compute) and enforce limits.
- Monetize usage without losing margin to LLM costs.
Your product is not the model. Your product is the workflow, interface, and reliability around the model.
Stack Decisions
These are defensible choices with good reasons, not cargo cult:
| Layer | Choice | Why |
|---|---|---|
| Framework | Next.js 15+ (App Router) | Server Actions + Route Handlers cover 90% of AI patterns cleanly |
| Database | PostgreSQL + Drizzle ORM | Type-safe queries, migrations you control, no ORM magic |
| Auth | Auth.js v5 | OAuth + sessions with Drizzle adapter, zero custom backend |
| Billing | Stripe | Webhooks, metered billing, customer portal are first-class |
| LLM | OpenAI / Groq | OpenAI for reliability, Groq for speed-sensitive features |
| Vector DB | Qdrant | Self-hostable, fast, works well with Next.js via REST API |
| Cache | Redis (Upstash) | Rate limiting, session cache, background jobs |
| Observability | OpenTelemetry → Axiom | Traces across AI calls, latency visibility |
| Deployment | Vercel or Docker + Railway | Vercel for zero-config, Docker for cost predictability at scale |
Don't over-engineer on day 1
Skip the vector DB and Redis until you need them. PostgreSQL with pgvector handles embeddings fine for the first 10k users. Add Redis when rate limiting becomes a bottleneck.
Architecture Overview
A minimal AI SaaS has three planes:
User plane: Auth, dashboard, settings, billing portal.
AI plane: Route handlers that call LLM APIs, stream responses, persist results.
Admin plane: Usage dashboards, user management, revenue tracking.
src/
app/
(auth)/ # Login, signup pages
(app)/ # Protected user-facing app
dashboard/
settings/
api/
ai/ # AI route handlers
billing/ # Stripe webhooks
(admin)/ # Admin panel
lib/
ai.ts # LLM client wrappers
db.ts # Drizzle client
billing.ts # Stripe helpers
limits.ts # Plan limit enforcementImplementation Sequence
Build in this order. Each step is shippable:
Database + schema: Create users, subscriptions, ai_requests tables. Run migrations. This is your source of truth for everything else.
Auth: Wire Auth.js with at least one OAuth provider. Protect /dashboard and all app routes via middleware. Verify sign-in, sign-out, and session persistence before moving on.
First AI feature: Create one Route Handler that calls an LLM API and streams the response. No billing gates yet. Just prove the core feature works end-to-end.
Usage tracking: Every AI request logs tokens used to ai_requests. Add user ID, model, input tokens, output tokens, cost estimate. You need this before billing.
Plan limits: Enforce request/token quotas per plan tier. Check limits before calling the LLM. Return 402 Payment Required when exceeded.
Stripe billing: Create products and prices in Stripe. Wire checkout and webhooks. Sync subscription status to your DB.
Admin panel: Build a minimal dashboard showing daily active users, AI usage, and revenue. Required for operations, not just vanity.
Building the AI Route Handler
The core of your product. Here's a production-grade streaming handler:
// src/app/api/ai/chat/route.ts
import { auth } from "@/auth";
import { openai } from "@ai-sdk/openai";
import { streamText } from "ai";
import { checkUsageLimit } from "@/lib/limits";
import { logAIRequest } from "@/lib/ai-logger";
export async function POST(req: Request) {
const session = await auth();
if (!session?.user) {
return new Response("Unauthorized", { status: 401 });
}
const { canProceed, reason } = await checkUsageLimit(session.user.id);
if (!canProceed) {
return new Response(reason, { status: 402 });
}
const { messages } = await req.json();
const result = await streamText({
model: openai("gpt-4o-mini"),
messages,
onFinish: async ({ usage }) => {
await logAIRequest({
userId: session.user.id,
model: "gpt-4o-mini",
inputTokens: usage.promptTokens,
outputTokens: usage.completionTokens,
});
},
});
return result.toDataStreamResponse();
}Usage Limit Enforcement
Don't call the LLM first and check limits after. Check first:
// src/lib/limits.ts
import { db } from "@/lib/db";
import { aiRequests, users } from "@/db/schema";
import { eq, gte, sql } from "drizzle-orm";
const PLAN_LIMITS = {
free: { dailyRequests: 10, monthlyTokens: 50_000 },
pro: { dailyRequests: 200, monthlyTokens: 2_000_000 },
enterprise: { dailyRequests: Infinity, monthlyTokens: Infinity },
};
export async function checkUsageLimit(userId: string) {
const user = await db.query.users.findFirst({
where: eq(users.id, userId),
});
const plan = (user?.plan ?? "free") as keyof typeof PLAN_LIMITS;
const limits = PLAN_LIMITS[plan];
const today = new Date();
today.setHours(0, 0, 0, 0);
const dailyCount = await db
.select({ count: sql<number>`count(*)` })
.from(aiRequests)
.where(
eq(aiRequests.userId, userId),
gte(aiRequests.createdAt, today)
);
if ((dailyCount[0]?.count ?? 0) >= limits.dailyRequests) {
return {
canProceed: false,
reason: `Daily request limit reached for ${plan} plan. Upgrade to continue.`,
};
}
return { canProceed: true, reason: null };
}Cost Management
LLM costs are the biggest risk to margins in an AI SaaS. Control them:
- Model tiers: Use
gpt-4o-minior Groq's Llama models for bulk requests. Reservegpt-4ofor features users explicitly pay premium for. - Prompt caching: OpenAI caches prompts over 1024 tokens. Keep system prompts long and stable.
- Rate limiting: Redis token bucket per user ID prevents a single user from burning your monthly budget in an afternoon.
- Alerts: Set hard spend limits in the OpenAI dashboard and configure billing alerts at 50%, 80%, 100% of budget.
Set a hard spend cap before launch
Configure an OpenAI hard limit under Settings → Billing → Usage limits. Without it, a single runaway request loop can generate thousands of dollars in costs before you notice.
Tradeoffs
Server-side AI calls vs. client-side
Calling the OpenAI API server-side (Route Handler) hides your API key, lets you enforce limits, and lets you log usage. Never call AI APIs directly from the client with your production key.
Streaming vs. non-streaming
Streaming (via Vercel AI SDK's streamText) shows faster perceived response time. Non-streaming is simpler and easier to log completely. For chat interfaces, always stream. For background tasks, skip it.
Self-hosted models vs. API
Self-hosted models (Ollama, vLLM) give cost predictability at scale but require GPU infrastructure and ops overhead. Not practical until you're generating significant revenue. Start with APIs.
Verification
- AI endpoint returns 401 for unauthenticated requests.
- AI endpoint returns 402 after daily limit is reached.
ai_requeststable records every call with correct token counts.- Streaming responses render incrementally in the UI.
- No API keys visible in browser network tab.
Next Steps
- How to Handle Stripe Webhooks in Next.js — production-grade billing for your AI SaaS.
- Next-Auth Setup for SaaS — auth with roles and plan enforcement.
- ShipAI comes with 11 pre-built AI handlers, usage tracking, and Stripe billing wired together—skip the boilerplate and ship your feature.