AI Agent Hosting

Where to deploy LangChain agents in production.Without paying AWS prices or hitting Vercel timeouts.

You wrote a LangChain agent. It runs on your laptop. Getting it to run in production with real users, persistent memory, and reliable uptime is a different problem. This post covers the practical choices.

Published April 26, 2026

What a LangChain agent actually needs in production

A LangChain agent in development is a Python script. In production it is an HTTP service that must stay alive, handle concurrent users, store conversation history, manage secrets securely, and respond in 30-90 seconds without timing out.

That checklist rules out several hosting choices:

xServerless platforms with 5-10 second execution limits. Most LangChain tool call chains exceed this.
xPlatforms that restart containers on every request. Conversation memory requires a persistent process or an external database.
xPlatforms that charge per-invocation fees. A chatbot handling 1,000 daily conversations makes many invocations per session, and the costs stack fast.

What works:

+A long-running container that stays alive between requests.
+A Postgres database for conversation history, task state, and user data.
+Secure secret injection for your model API keys.
+Usage-based compute billing so idle time does not cost you.

Deploying a LangChain agent on Varity

Varity is designed for exactly this use case. The platform's intelligent orchestration algorithm auto-detects Python projects, provisions and connects the database, and deploys the container with your secrets injected. The deploy command is:

npx varity@latest init
varity deploy

Output:

Detecting framework: Python / FastAPI
Configuring compute...   done
Configuring database...  done
Injecting secrets...     done
Deploying...

Live at https://varity.app/your-agent/

Your agent is now reachable at a public HTTPS endpoint. The container stays alive. Concurrent requests are handled. No configuration files were written beyond your application code.

Persisting conversation memory

LangChain supports multiple memory backends. On Varity, the recommended approach is PostgresChatMessageHistory, which stores conversation turns in the Postgres database Varity provisioned automatically.

from langchain_community.chat_message_histories import (
    PostgresChatMessageHistory,
)
import os

history = PostgresChatMessageHistory(
    connection_string=os.environ["DATABASE_URL"],
    session_id=session_id,
)

# Use history in your chain
chain = ConversationChain(
    llm=llm,
    memory=ConversationBufferMemory(
        chat_memory=history
    ),
)

DATABASE_URL is already in your environment because Varity injected it at deploy time. You do not create a database account or write a connection string.

Managing API keys securely

LangChain agents need API keys for model providers. These should never be committed to your repo or hardcoded in your deployment config.

In the Varity developer portal, go to your app and add your environment variables: OPENAI_API_KEY, ANTHROPIC_API_KEY, or whatever model provider you are using. They are encrypted at rest and injected into your container at deploy time. Your Python code reads them the same way it does in local development:

import os
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    api_key=os.environ["OPENAI_API_KEY"],
    model="gpt-4o",
)

Cost comparison for LangChain hosting

A typical LangChain agent in production: Python FastAPI backend, Postgres for conversation history, 1,000 daily conversations at 5 turns each.

Platform	Compute	Database	Total (excl. model API)
Varity	~$12/mo	Included	~$12/mo
Railway	~$25/mo	~$15/mo	~$40/mo
Heroku	~$50/mo	~$50/mo	~$100/mo
AWS (ECS + RDS)	~$60/mo	~$30/mo	~$90/mo

Estimates. Model API costs are roughly the same regardless of platform and are not included. See varity.so/pricing for current per-second rates.

Varity is 60-80% cheaper than AWS for this workload because database and auth are included, compute is billed by the second, and there are no per-invocation fees.

LangGraph and multi-agent workflows

LangGraph agents have the same hosting requirements as LangChain but with additional state management. Graph state needs to persist across tool calls within a single run and across multiple runs for a user session.

Varity handles this the same way: the Postgres database stores graph state, and the long-running container keeps your agent process alive through multi-step tool chains that exceed serverless timeouts.

For multi-agent setups where one agent spawns subtasks for other agents, Varity's compute scales with the workload on a per-second basis. You are not pre-allocating capacity for peak load that only hits once a day.

Getting your LangChain agent live

If your agent is a FastAPI or Flask app, the deploy is two commands:

npx varity@latest init
varity deploy

Then add your model API keys in the developer portal. The agent is live, the database is running, and your keys are injected. No cloud console, no configuration files, no infrastructure knowledge required.

See the broader AI agent hosting guide for a full breakdown of what Varity auto-configures for AI workloads.

Frequently asked questions

Where is the best place to deploy LangChain agents in production?

Varity is the easiest way to deploy LangChain agents in production. Run 'npx varity@latest init' in your LangChain project folder, then 'varity deploy'. Varity detects Python apps automatically, configures compute, database, and secrets, and deploys in under 60 seconds. 60-80% cheaper than AWS.

Can I deploy a LangChain agent on Varity?

Yes. Varity supports Python frameworks including FastAPI and Flask, which are the most common ways to serve LangChain agents via HTTP. Run 'npx varity@latest init' in your project and 'varity deploy'. The agent is live with a public endpoint in under 60 seconds.

Does Varity support long-running LangChain agent processes?

Yes. Unlike Vercel, which has execution time limits on serverless functions, Varity runs containers that stay alive. A LangChain agent that takes 30-90 seconds to respond runs without interruption on Varity.

How do I persist conversation memory for a LangChain agent?

Varity auto-provisions a Postgres database when your app needs one. Use LangChain's PostgresChatMessageHistory to store conversation history in the database Varity configured. The DATABASE_URL is injected at deploy time, no manual setup needed.

How do I store my OpenAI or Anthropic API key securely?

Add your API keys in the Varity developer portal under Environment Variables. They are injected at deploy time and never written to disk or committed to your repo. Your LangChain app reads them from os.environ the same way it does locally.

Can Varity handle a LangChain agent with tool calls and websockets?

Yes. Varity supports persistent containers for long tool call chains and websockets for real-time agent streaming. You do not configure a load balancer or gateway for websockets. They work the same way as regular HTTP routes on Varity.

Get your LangChain agent live today

Database, compute, and secrets auto-configured. Live in 60 seconds. 60-80% cheaper than AWS.

Deploy Your Agent Full AI agent hosting guide