All PostsMar 2, 2026ShipAI Team1 min read

How to Build an AI SaaS with Next.js in 2026

A practical breakdown of the architecture, stack decisions, and implementation steps for building a production-ready AI SaaS on Next.js—from first commit to first paying customer.

AISaaSArchitectureNext.js

Contents

Building an AI SaaS in 2026 is cheaper and faster than ever. LLM APIs are commoditized, hosting is serverless, and payment rails are pre-built. What takes time is the infrastructure glue: auth, billing, rate limiting, observability, and keeping your AI costs from running away.

This guide covers the architecture decisions, exact stack, and implementation sequence for a production-ready AI SaaS—not a prototype, a shippable product.

What "AI SaaS" Actually Means

An AI SaaS is a subscription product where the core value is delivered by a language model or ML model. The business model requires you to:

  1. Manage user accounts and sessions.
  2. Gate features by plan tier.
  3. Track usage (tokens, requests, compute) and enforce limits.
  4. Monetize usage without losing margin to LLM costs.

Your product is not the model. Your product is the workflow, interface, and reliability around the model.

Stack Decisions

These are defensible choices with good reasons, not cargo cult:

LayerChoiceWhy
FrameworkNext.js 15+ (App Router)Server Actions + Route Handlers cover 90% of AI patterns cleanly
DatabasePostgreSQL + Drizzle ORMType-safe queries, migrations you control, no ORM magic
AuthAuth.js v5OAuth + sessions with Drizzle adapter, zero custom backend
BillingStripeWebhooks, metered billing, customer portal are first-class
LLMOpenAI / GroqOpenAI for reliability, Groq for speed-sensitive features
Vector DBQdrantSelf-hostable, fast, works well with Next.js via REST API
CacheRedis (Upstash)Rate limiting, session cache, background jobs
ObservabilityOpenTelemetry → AxiomTraces across AI calls, latency visibility
DeploymentVercel or Docker + RailwayVercel for zero-config, Docker for cost predictability at scale

Don't over-engineer on day 1

Skip the vector DB and Redis until you need them. PostgreSQL with pgvector handles embeddings fine for the first 10k users. Add Redis when rate limiting becomes a bottleneck.

Architecture Overview

A minimal AI SaaS has three planes:

User plane: Auth, dashboard, settings, billing portal.

AI plane: Route handlers that call LLM APIs, stream responses, persist results.

Admin plane: Usage dashboards, user management, revenue tracking.

src/
  app/
    (auth)/         # Login, signup pages
    (app)/          # Protected user-facing app
      dashboard/
      settings/
      api/
        ai/         # AI route handlers
        billing/    # Stripe webhooks
    (admin)/        # Admin panel
  lib/
    ai.ts           # LLM client wrappers
    db.ts           # Drizzle client
    billing.ts      # Stripe helpers
    limits.ts       # Plan limit enforcement

Implementation Sequence

Build in this order. Each step is shippable:

Database + schema: Create users, subscriptions, ai_requests tables. Run migrations. This is your source of truth for everything else.

Auth: Wire Auth.js with at least one OAuth provider. Protect /dashboard and all app routes via middleware. Verify sign-in, sign-out, and session persistence before moving on.

First AI feature: Create one Route Handler that calls an LLM API and streams the response. No billing gates yet. Just prove the core feature works end-to-end.

Usage tracking: Every AI request logs tokens used to ai_requests. Add user ID, model, input tokens, output tokens, cost estimate. You need this before billing.

Plan limits: Enforce request/token quotas per plan tier. Check limits before calling the LLM. Return 402 Payment Required when exceeded.

Stripe billing: Create products and prices in Stripe. Wire checkout and webhooks. Sync subscription status to your DB.

Admin panel: Build a minimal dashboard showing daily active users, AI usage, and revenue. Required for operations, not just vanity.

Building the AI Route Handler

The core of your product. Here's a production-grade streaming handler:

// src/app/api/ai/chat/route.ts
import { auth } from "@/auth";
import { openai } from "@ai-sdk/openai";
import { streamText } from "ai";
import { checkUsageLimit } from "@/lib/limits";
import { logAIRequest } from "@/lib/ai-logger";

export async function POST(req: Request) {
  const session = await auth();

  if (!session?.user) {
    return new Response("Unauthorized", { status: 401 });
  }

  const { canProceed, reason } = await checkUsageLimit(session.user.id);

  if (!canProceed) {
    return new Response(reason, { status: 402 });
  }

  const { messages } = await req.json();

  const result = await streamText({
    model: openai("gpt-4o-mini"),
    messages,
    onFinish: async ({ usage }) => {
      await logAIRequest({
        userId: session.user.id,
        model: "gpt-4o-mini",
        inputTokens: usage.promptTokens,
        outputTokens: usage.completionTokens,
      });
    },
  });

  return result.toDataStreamResponse();
}

Usage Limit Enforcement

Don't call the LLM first and check limits after. Check first:

// src/lib/limits.ts
import { db } from "@/lib/db";
import { aiRequests, users } from "@/db/schema";
import { eq, gte, sql } from "drizzle-orm";

const PLAN_LIMITS = {
  free: { dailyRequests: 10, monthlyTokens: 50_000 },
  pro: { dailyRequests: 200, monthlyTokens: 2_000_000 },
  enterprise: { dailyRequests: Infinity, monthlyTokens: Infinity },
};

export async function checkUsageLimit(userId: string) {
  const user = await db.query.users.findFirst({
    where: eq(users.id, userId),
  });

  const plan = (user?.plan ?? "free") as keyof typeof PLAN_LIMITS;
  const limits = PLAN_LIMITS[plan];

  const today = new Date();
  today.setHours(0, 0, 0, 0);

  const dailyCount = await db
    .select({ count: sql<number>`count(*)` })
    .from(aiRequests)
    .where(
      eq(aiRequests.userId, userId),
      gte(aiRequests.createdAt, today)
    );

  if ((dailyCount[0]?.count ?? 0) >= limits.dailyRequests) {
    return {
      canProceed: false,
      reason: `Daily request limit reached for ${plan} plan. Upgrade to continue.`,
    };
  }

  return { canProceed: true, reason: null };
}

Cost Management

LLM costs are the biggest risk to margins in an AI SaaS. Control them:

  • Model tiers: Use gpt-4o-mini or Groq's Llama models for bulk requests. Reserve gpt-4o for features users explicitly pay premium for.
  • Prompt caching: OpenAI caches prompts over 1024 tokens. Keep system prompts long and stable.
  • Rate limiting: Redis token bucket per user ID prevents a single user from burning your monthly budget in an afternoon.
  • Alerts: Set hard spend limits in the OpenAI dashboard and configure billing alerts at 50%, 80%, 100% of budget.

Set a hard spend cap before launch

Configure an OpenAI hard limit under Settings → Billing → Usage limits. Without it, a single runaway request loop can generate thousands of dollars in costs before you notice.

Tradeoffs

Server-side AI calls vs. client-side

Calling the OpenAI API server-side (Route Handler) hides your API key, lets you enforce limits, and lets you log usage. Never call AI APIs directly from the client with your production key.

Streaming vs. non-streaming

Streaming (via Vercel AI SDK's streamText) shows faster perceived response time. Non-streaming is simpler and easier to log completely. For chat interfaces, always stream. For background tasks, skip it.

Self-hosted models vs. API

Self-hosted models (Ollama, vLLM) give cost predictability at scale but require GPU infrastructure and ops overhead. Not practical until you're generating significant revenue. Start with APIs.

Verification

  • AI endpoint returns 401 for unauthenticated requests.
  • AI endpoint returns 402 after daily limit is reached.
  • ai_requests table records every call with correct token counts.
  • Streaming responses render incrementally in the UI.
  • No API keys visible in browser network tab.

Next Steps

Ready to ship?

Stop rebuilding auth and billing from scratch.

ShipAI.today gives you a production-ready Next.js foundation. Every module pre-integrated — spend your time building your product, not plumbing.

Full source code · Commercial license · Lifetime updates