AI & Machine Learning

LangChain vs Building from Scratch: When to Use a Framework

Uvin Vindula·June 10, 2024·10 min read

Last updated: April 14, 2026

TL;DR

I've built AI applications both ways — with LangChain and from scratch using the Anthropic SDK. My recommendation: for most production apps, skip LangChain and build a thin custom layer directly on the Anthropic SDK. LangChain adds abstraction complexity that makes debugging harder, locks you into opinions you'll fight against, and changes its API so frequently that maintenance becomes a second job. The exceptions are rapid prototyping where you need to demo something in a day, and genuinely complex multi-provider pipelines where you're orchestrating five different models across three providers. For everything else — and that's 90% of real-world AI products — a clean TypeScript wrapper around the Anthropic SDK gives you more control, better debuggability, and less code to maintain.

What LangChain Does

LangChain is a framework that abstracts LLM interactions into composable building blocks. Chains, agents, memory, tools, retrievers, output parsers — it gives you a vocabulary and a set of patterns for building AI applications.

At its core, LangChain does three things:

Provider abstraction — Write once, swap between OpenAI, Anthropic, Cohere, and others without changing your application code.
Composable chains — Link multiple LLM calls together with data transformations, branching logic, and tool use.
Pre-built integrations — Vector stores, document loaders, text splitters, and retrieval patterns come out of the box.

The pitch is compelling. Why write boilerplate when a framework handles it? Why build your own RAG pipeline when LangChain has one ready?

I bought that pitch. I used LangChain in three production projects in 2023. Then I ripped it out of two of them.

Here's why.

When LangChain Helps

I want to be fair. LangChain isn't useless. There are specific situations where it earns its place in your dependency tree.

Rapid prototyping

If you need to demo a concept in a hackathon or a client pitch, LangChain gets you from zero to working prototype faster than anything else. The ConversationalRetrievalChain can wire up a RAG chatbot in under 50 lines. For throwaway code that proves a point, that speed matters.

Multi-provider orchestration

If your architecture genuinely requires routing between Claude for reasoning, GPT-4 for code generation, and a local model for classification — all in the same pipeline — LangChain's provider abstraction saves real work. I've seen this in exactly one production system. But it does exist.

Exploration and learning

If you're new to AI development, LangChain's patterns teach you how to think about chains, memory, and retrieval. The concepts translate even if you later ditch the framework. I'd rather a junior developer learn LangChain's mental model than try to invent their own agent architecture from first principles.

Complex agent systems with many tools

If you're building an agent that orchestrates 15+ tools with complex routing logic, LangChain's agent framework gives you a structure that's hard to replicate quickly. The tool calling abstractions, the agent executor loop, the built-in error recovery — that's real engineering you'd otherwise build yourself.

When LangChain Hurts

This is the longer section, because this is where I've burned the most time.

Abstraction tax

LangChain wraps everything. The Anthropic SDK becomes a ChatAnthropic class. Your prompt becomes a ChatPromptTemplate. Your output becomes an AIMessage parsed through an OutputParser. Each layer adds indirection that makes it harder to understand what's actually happening.

When things work, you don't notice. When things break — and in production, things always break — you're debugging through five layers of abstraction to find out that the actual issue is a malformed API parameter that would've been obvious if you were calling the SDK directly.

I once spent four hours debugging a streaming issue that turned out to be LangChain's internal callback handler silently swallowing an error from the Anthropic API. Four hours. The equivalent code with the raw SDK would've surfaced the error immediately.

Version instability

LangChain's API changes constantly. I'm not exaggerating. Between versions 0.1 and 0.2, they restructured the entire package into langchain-core, langchain-community, and provider-specific packages. Import paths changed. Class names changed. Constructor signatures changed.

I had a production app that broke on a minor version bump because a method was renamed. Not deprecated — renamed. No migration guide. Just a runtime error in production at 2 AM.

For a framework that wraps stable underlying APIs, LangChain introduces an extraordinary amount of instability.

Performance overhead

LangChain adds latency. Not dramatic latency — maybe 50-200ms per chain invocation depending on complexity. But in a user-facing chat application where perceived responsiveness matters, those milliseconds add up. The overhead comes from internal event dispatching, callback processing, and the chain execution engine.

With the raw Anthropic SDK, your request goes directly to the API. No middleware. No event bus. No chain executor. Just your code and the model.

Opinionated patterns that don't fit

LangChain has opinions about how memory should work, how retrieval should be structured, and how agents should execute. Those opinions are reasonable defaults, but production applications rarely match the defaults.

I built a customer support system where conversation memory needed to persist across sessions, prioritize recent messages, and include metadata from the CRM. LangChain's memory classes got me 60% of the way there. The last 40% required subclassing, monkey-patching, and fighting the framework's assumptions at every turn. I would've been faster building the memory system from scratch.

Building Custom with the Anthropic SDK

Here's what building from scratch actually looks like. It's not as scary as the framework advocates suggest.

A basic conversation with Claude using the Anthropic SDK:

typescript

import Anthropic from '@anthropic-ai/sdk';

const anthropic = new Anthropic();

async function chat(messages: Anthropic.MessageParam[]): Promise<string> {
  const response = await anthropic.messages.create({
    model: 'claude-sonnet-4-20250514',
    max_tokens: 1024,
    messages,
  });

  const textBlock = response.content.find((block) => block.type === 'text');
  return textBlock?.text ?? '';
}

That's it. No chains. No templates. No output parsers. You call the API, you get a response, you handle it.

Now here's the same thing with LangChain:

typescript

import { ChatAnthropic } from '@langchain/anthropic';
import { HumanMessage, SystemMessage } from '@langchain/core/messages';
import { StringOutputParser } from '@langchain/core/output_parsers';

const model = new ChatAnthropic({
  modelName: 'claude-sonnet-4-20250514',
  maxTokens: 1024,
});

const parser = new StringOutputParser();

async function chat(userMessage: string): Promise<string> {
  const response = await model.invoke([
    new HumanMessage(userMessage),
  ]);

  return parser.invoke(response);
}

More imports. More classes. More indirection. And for what? The same HTTP request to the same API.

Adding tool use

The gap widens with tool use. Here's tools with the Anthropic SDK:

typescript

const response = await anthropic.messages.create({
  model: 'claude-sonnet-4-20250514',
  max_tokens: 1024,
  tools: [
    {
      name: 'get_weather',
      description: 'Get current weather for a city',
      input_schema: {
        type: 'object' as const,
        properties: {
          city: { type: 'string', description: 'City name' },
        },
        required: ['city'],
      },
    },
  ],
  messages: [{ role: 'user', content: 'What is the weather in London?' }],
});

// Handle tool use directly
for (const block of response.content) {
  if (block.type === 'tool_use') {
    const result = await executeToolCall(block.name, block.input);
    // Send result back to Claude
  }
}

It's explicit. You see exactly what's being sent to the API. You see exactly what comes back. When something goes wrong, you know where to look.

With LangChain, you'd create a DynamicStructuredTool, wire it into an AgentExecutor with a specific agent type, configure the agent's prompt template, and hope the chain execution handles the tool call loop correctly. It works — until it doesn't, and then you're reading LangChain source code instead of building your product.

Adding RAG

Even RAG — LangChain's supposed killer feature — isn't that complex to build from scratch:

typescript

import Anthropic from '@anthropic-ai/sdk';
import { createClient } from '@supabase/supabase-js';

async function ragQuery(query: string): Promise<string> {
  const supabase = createClient(
    process.env.SUPABASE_URL!,
    process.env.SUPABASE_KEY!,
  );

  // Generate embedding for the query
  const embedding = await generateEmbedding(query);

  // Vector similarity search
  const { data: documents } = await supabase.rpc('match_documents', {
    query_embedding: embedding,
    match_threshold: 0.7,
    match_count: 5,
  });

  // Build context from retrieved documents
  const context = documents
    ?.map((doc: { content: string }) => doc.content)
    .join('\n\n');

  // Ask Claude with context
  const anthropic = new Anthropic();
  const response = await anthropic.messages.create({
    model: 'claude-sonnet-4-20250514',
    max_tokens: 1024,
    system: `Answer based on this context:\n\n${context}`,
    messages: [{ role: 'user', content: query }],
  });

  const textBlock = response.content.find((block) => block.type === 'text');
  return textBlock?.text ?? '';
}

That's a complete RAG pipeline. Embed the query, search the vector store, inject context into the prompt, get a response. No framework needed. Every line is readable. Every step is debuggable.

Complexity Comparison

Let me quantify this. Here's a comparison of a production chat application with RAG, tool use, and streaming:

Aspect	LangChain	Custom (Anthropic SDK)
Dependencies	12+ packages	2 packages (SDK + vector store)
Lines of code	~800	~400
Abstraction layers	5-7	1-2
Time to debug API issues	30-60 min	5-15 min
Version upgrade effort	High (breaking changes)	Low (stable SDK)
Type safety	Partial (lots of `any`)	Full (SDK is well-typed)
Bundle size impact	~2.5 MB	~200 KB
Time to initial prototype	2 hours	4 hours
Time to production-ready	3 weeks	2 weeks

The prototype advantage goes to LangChain. But production readiness flips — because you spend that extra time fighting the framework instead of building features.

Maintenance and Debugging

This is where the real cost lives. Building is a one-time activity. Maintenance is forever.

The LangChain maintenance tax

Every LangChain update is a potential breaking change. I track three production apps that use LangChain. In 2024, I spent approximately 40 hours across those three apps just keeping up with LangChain API changes. That's a full work week of my time — not building features, not fixing bugs, just updating import paths and constructor signatures because the framework decided to restructure.

The Anthropic SDK, by comparison, has had exactly zero breaking changes that affected my production code in the same period. The API evolves, but the SDK team handles backwards compatibility responsibly.

Debugging through layers

When a LangChain app produces wrong output, the debugging process looks like this:

Check the final output. Wrong.
Check the chain's intermediate steps. Did the retriever return the right documents? Did the prompt template format correctly? Did the output parser extract the right fields?
Check the actual API request. What did LangChain actually send to Claude? You need to enable verbose logging or add callback handlers to see this.
Check the actual API response. What did Claude actually return? Again, hidden behind abstraction.
Find the mismatch. Usually it's in how LangChain transformed the data between steps.

With the raw SDK:

Check the input you sent.
Check the output you received.
Fix the issue.

Three steps versus five. And steps 3-5 in the LangChain flow often require reading framework source code, which is a codebase that changes every month.

My Decision Framework

After two years of building AI applications, I use a simple decision tree:

Use LangChain if ALL of these are true:

You're prototyping, not building for production
You need multi-provider support (not just Anthropic)
You're comfortable with frequent dependency updates
Your team has LangChain experience

Build from scratch if ANY of these are true:

You're building for production
You're using a single LLM provider (most apps)
You value debuggability over convenience
You want full control over prompts, retry logic, and error handling
Bundle size matters (frontend or edge deployment)
You have strong TypeScript standards (LangChain's types are... loose)

In practice, "build from scratch" doesn't mean "build everything from nothing." It means building a thin application-specific layer on top of a well-maintained SDK. You're not reinventing HTTP clients or retry logic. You're just skipping the framework that sits between you and the API.

What I Actually Use in Production

Here's my actual production stack for AI applications:

LLM             Anthropic SDK (directly)
Streaming       Vercel AI SDK (for Next.js streaming UI)
Vector Store    Supabase pgvector (self-hosted, no vendor lock-in)
Embeddings      Voyage AI or OpenAI embeddings API
Validation      Zod (for structured output parsing)
Retry Logic     Custom with exponential backoff (10 lines of code)
Memory          PostgreSQL (just a messages table with user_id and session_id)
Monitoring      Helicone or custom logging middleware

No LangChain. No LlamaIndex. No framework sitting between my code and the model.

The Vercel AI SDK is the one "framework" I use, and it earns its place because it solves a genuinely hard problem — streaming AI responses to React Server Components — with a minimal, stable API. It doesn't try to own your entire AI stack. It handles the transport layer and gets out of the way.

For structured output, I use Zod schemas with Claude's tool use feature. Define the schema, pass it as a tool, extract the structured response. No output parser chain needed:

typescript

import Anthropic from '@anthropic-ai/sdk';
import { z } from 'zod';
import { zodToJsonSchema } from 'zod-to-json-schema';

const SentimentSchema = z.object({
  sentiment: z.enum(['positive', 'negative', 'neutral']),
  confidence: z.number().min(0).max(1),
  reasoning: z.string(),
});

async function analyzeSentiment(text: string) {
  const anthropic = new Anthropic();

  const response = await anthropic.messages.create({
    model: 'claude-sonnet-4-20250514',
    max_tokens: 1024,
    tools: [
      {
        name: 'sentiment_result',
        description: 'Return the sentiment analysis result',
        input_schema: zodToJsonSchema(
          SentimentSchema,
        ) as Anthropic.Tool.InputSchema,
      },
    ],
    tool_choice: { type: 'tool', name: 'sentiment_result' },
    messages: [
      {
        role: 'user',
        content: `Analyze the sentiment of this text: "${text}"`,
      },
    ],
  });

  const toolBlock = response.content.find(
    (block) => block.type === 'tool_use',
  );
  return SentimentSchema.parse(toolBlock?.input);
}

Full type safety. Zod validation on the output. No framework magic. If Claude returns something that doesn't match the schema, Zod throws a clear error with the exact field that failed. Try getting that level of clarity from a LangChain output parser error.

Key Takeaways

LangChain is a prototyping tool, not a production framework. Use it to explore ideas quickly, then replace it with direct SDK calls before you ship.

The Anthropic SDK is already simple. The "boilerplate" that LangChain saves you is maybe 20 lines of code. That's not worth 12 extra dependencies and five layers of abstraction.

Debugging is the hidden cost. Every hour you save on initial development with LangChain, you'll spend two hours on debugging and maintenance later.

Build a thin custom layer. Create utility functions for your specific use case — retry logic, structured output parsing, conversation memory. These will be smaller, faster, and more maintainable than any framework.

Version stability matters. The Anthropic SDK is stable. LangChain is not. In production, stability is a feature.

The exception proves the rule. If you genuinely need multi-provider orchestration or you're building a complex agent system with 20+ tools, LangChain might save you enough time to justify the complexity. But be honest about whether you actually need that.

The AI framework space is maturing. LangChain was essential in 2023 when LLM APIs were inconsistent and the patterns hadn't been established yet. In 2024, the SDKs themselves are good enough. The abstractions LangChain provides are no longer worth the complexity they introduce.

Build simple. Build direct. Ship code you can debug at 2 AM without reading someone else's framework source code.

If you're building an AI product and want to talk architecture, check out my services. I've helped teams strip out framework complexity and ship faster with leaner stacks.

*Uvin Vindula is a Web3 and AI engineer based between Sri Lanka and the UK. He builds production AI systems, smart contracts, and full-stack applications at iamuvin.com↗. Find him on X/Twitter at @iamuvin↗.*

Working on a Web3 or AI project?

Let's talk↗

Uvin Vindula

Web3 and AI engineer based in Sri Lanka and the UK. Author of The Rise of Bitcoin. Director of Blockchain and Software Solutions at Terra Labz. Founder of uvin.lk — Sri Lanka's Bitcoin education platform with 10,000+ learners.

hello@iamuvin.com uvin.lk↗LinkedIn↗