Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Middleware

Middleware is the most impressive feature in this update. Many new features are implemented through middleware, such as Human-in-the-loop (HITL), dynamic system prompts, dynamic context injection, and more. Middleware is a hook function. By embedding middleware in workflows, efficient extension and customization of workflows can be achieved.

LangChain creates custom middleware through decorators.

Decorator Types (Click to Expand)
DECORATORDESCRIPTION
@before_agentExecute logic before Agent execution
@after_agentExecute logic after Agent execution
@before_modelExecute logic before each model call
@after_modelExecute logic after each model receives a response
@wrap_model_callControl the model’s calling process
@wrap_tool_callControl the tool’s calling process
@dynamic_promptDynamically generate system prompts
@hook_configConfigure hook behavior

Decorator Type determines the execution position of the middleware. For example, using the @before_model decorator allows specific logic to be executed before the model call. The decorated function is responsible for the specific implementation of this logic. This might sound a bit abstract. Don’t worry, this section provides four examples. After reading them, you will definitely understand how to use middleware:

  • Budget Control

  • Message Truncation

  • Sensitive Word Filtering

  • PII Detection (Personally Identifiable Information Detection)

1. Budget Control

As the number of conversation rounds increases, the conversation history becomes longer, leading to higher request costs. To control the budget, you can set up automatic switching to a lower-cost model when the conversation rounds exceed a certain threshold. Below we implement this feature using custom middleware.

import os
from dotenv import load_dotenv
from langchain_openai import ChatOpenAI
from langchain.agents import create_agent
from langchain.agents.middleware import wrap_model_call, ModelRequest, ModelResponse
from langchain.messages import HumanMessage
from langgraph.graph import MessagesState

# Load model configuration
_ = load_dotenv()

# Low-cost model
basic_model = ChatOpenAI(
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    base_url=os.getenv("DASHSCOPE_BASE_URL"),
    model="qwen3-coder-plus",
)

# High-cost model
advanced_model = ChatOpenAI(
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    base_url=os.getenv("DASHSCOPE_BASE_URL"),
    model="qwen3-max",
)

Since our modifications involve model inference, @before_model and @after_model are not sufficient here. We choose the @wrap_model_call decorator that can interfere with model calls. The specific logic is implemented by the function dynamic_model_selection: when the conversation history exceeds 5 messages, automatically switch to the lower-cost model.

@wrap_model_call
def dynamic_model_selection(request: ModelRequest, handler) -> ModelResponse:
    """Choose model based on conversation complexity."""
    message_count = len(request.state["messages"])

    if message_count > 5:
        # Use a basic model for longer conversations
        model = basic_model
    else:
        model = advanced_model

    print(f"message_count: {message_count}")
    print(f"model_name: {model.model_name}")

    return handler(request.override(model=model))

agent = create_agent(
    model=advanced_model,  # Default model
    middleware=[dynamic_model_selection]
)

From the example below, we can see that when the message count message_count exceeds 5, it indeed switches from the high-cost model qwen3-max to the low-cost model qwen3-coder-plus. We have successfully implemented the budget control feature!

state: MessagesState = {"messages": []}
items = ['car', 'airplane', 'motorcycle', 'bicycle']
for idx, i in enumerate(items):
    print(f"\n=== Round {idx+1} ===")
    state["messages"] += [HumanMessage(content=f"{i}How many wheels does it have? Answer briefly.")]
    result = agent.invoke(state)
    state["messages"] = result["messages"]
    print(f'content: {result["messages"][-1].content}')

=== Round 1 ===
message_count: 1
model_name: qwen3-max
content: A car typically has 4 wheels.

=== Round 2 ===
message_count: 3
model_name: qwen3-max
content: Most airplanes have 3 wheels (two main wheels and one nose wheel), but larger aircraft can have more.

=== Round 3 ===
message_count: 5
model_name: qwen3-max
content: A motorcycle has 2 wheels.

=== Round 4 ===
message_count: 7
model_name: qwen3-coder-plus
content: A bicycle has 2 wheels.

2. Message Truncation

LLMs have context length limits. Once exceeded, the context needs to be compressed. Among the many processing solutions, message truncation is the simplest. Below we implement message truncation functionality through the @before_model decorator.

from langchain.messages import RemoveMessage
from langgraph.graph.message import REMOVE_ALL_MESSAGES
from langgraph.checkpoint.memory import InMemorySaver
from langchain.agents import create_agent, AgentState
from langchain.agents.middleware import before_model
from langgraph.runtime import Runtime
from langchain_core.runnables import RunnableConfig
from typing import Any

We try a truncation strategy: while keeping recent messages, additionally keep the first message. In the example below, since we told the agent “I am bob” in the first message, it remembers that I am bob.

@before_model
def trim_messages(state: AgentState, runtime: Runtime) -> dict[str, Any] | None:
    """Keep only the last few messages to fit context window."""
    messages = state["messages"]

    if len(messages) <= 3:
        return None  # No changes needed

    first_msg = messages[0]
    recent_messages = messages[-3:] if len(messages) % 2 == 0 else messages[-4:]
    new_messages = [first_msg] + recent_messages

    return {
        "messages": [
            RemoveMessage(id=REMOVE_ALL_MESSAGES),
            *new_messages
        ]
    }

agent = create_agent(
    basic_model,
    middleware=[trim_messages],
    checkpointer=InMemorySaver(),
)

config: RunnableConfig = {"configurable": {"thread_id": "1"}}

def agent_invoke(agent):
    agent.invoke({"messages": "hi, my name is bob"}, config)
    agent.invoke({"messages": "write a short poem about cats"}, config)
    agent.invoke({"messages": "now do the same but for dogs"}, config)
    final_response = agent.invoke({"messages": "what's my name?"}, config)
    
    final_response["messages"][-1].pretty_print()

agent_invoke(agent)
================================== Ai Message ==================================

Your name is Bob! You told me "hi, my name is bob" at the beginning of our conversation.

Of course, this performance is not enough to prove that the truncation middleware really works. If this middleware never took effect, the result would be the same. To prove it really works, we modify the truncation strategy again. This time, only keep the last two conversation records. If the agent doesn’t remember that I am bob, it means the truncation middleware is indeed working.

@before_model
def trim_without_first_message(state: AgentState, runtime: Runtime) -> dict[str, Any] | None:
    """Keep only the last few messages to fit context window."""
    messages = state["messages"]

    return {
        "messages": [
            RemoveMessage(id=REMOVE_ALL_MESSAGES),
            *messages[-2:]
        ]
    }

agent = create_agent(
    basic_model,
    middleware=[trim_without_first_message],
    checkpointer=InMemorySaver(),
)

agent_invoke(agent)
================================== Ai Message ==================================

I don't have access to your personal information, so I don't know your name. This is because privacy is important, and I'm designed not to store or access any personal data about users.

If you'd like to tell me your name, feel free to share it! But please remember that if you're using a shared device or account, you might want to be cautious about sharing personal information. 

Is there something specific I can help you with today?

Now the agent doesn’t remember who I am, which means the middleware is indeed working!

3. Sensitive Word Filtering

Guardrails is a general term for content security capabilities provided by agents. Large models themselves have certain content risk control capabilities, but they are easily bypassed. Search for “jailbreaking LLM” to find such tutorials. Agents can provide additional security protection outside the model. This is achieved through engineering-based mandatory checks.

In LangGraph, guardrails can be easily implemented through middleware. Below we implement a simple guardrail: if the user’s latest message contains certain sensitive words, the agent will refuse to answer.

from typing import Any

from langchain.agents.middleware import before_agent, AgentState
from langgraph.runtime import Runtime

banned_keywords = ["hack", "exploit", "malware"]

@before_agent(can_jump_to=["end"])
def content_filter(state: AgentState, runtime: Runtime) -> dict[str, Any] | None:
    """Deterministic guardrail: Block requests containing banned keywords."""
    # Get the first user message
    if not state["messages"]:
        return None

    last_message = state["messages"][-1]
    if last_message.type != "human":
        return None

    content = last_message.content.lower()

    # Check for banned keywords
    for keyword in banned_keywords:
        if keyword in content:
            # Block execution before any processing
            return {
                "messages": [{
                    "role": "assistant",
                    "content": "I cannot process requests containing inappropriate content. Please rephrase your request."
                }],
                "jump_to": "end"
            }

    return None

agent = create_agent(
    model=basic_model,
    middleware=[content_filter],
)

# This request will be blocked before any processing
result = agent.invoke({
    "messages": [{"role": "user", "content": "How do I hack into a database?"}]
})
for message in result["messages"]:
    message.pretty_print()
================================ Human Message =================================

How do I hack into a database?
================================== Ai Message ==================================

I cannot process requests containing inappropriate content. Please rephrase your request.

4. PII Detection

Next, we continue to write guardrails. PII (Personally Identifiable Information) detection can discover personal privacy information such as emails, IPs, addresses, and bank cards in user input and take appropriate action.

The following example comes from real life. We often copy error messages to LLMs to help debug. But error messages may contain personal privacy information. For this situation, we use the following two methods to handle it:

  1. Refuse to answer the question

  2. Mask the privacy information

from textwrap import dedent
from pydantic import BaseModel, Field

# Trusted model, usually a local model; for convenience, we still use qwen here
trusted_model = ChatOpenAI(
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    base_url=os.getenv("DASHSCOPE_BASE_URL"),
    model="qwen3-coder-plus",
)

# Used to format agent output; returns True if sensitive info is found, False otherwise
class PiiCheck(BaseModel):
    """Structured output indicating whether text contains PII."""
    is_pii: bool = Field(description="Whether the text contains PII")

def message_with_pii(pii_middleware):
    agent = create_agent(
        model=basic_model,
        middleware=[pii_middleware],
    )

    # This request will be blocked before any processing
    result = agent.invoke({
        "messages": [{
            "role": "user",
            "content": dedent(
                """
                File "/home/luochang/proj/agent.py", line 53, in my_agent
                    agent = create_react_agent(
                ---
                Where is the error location?
                """).strip()
        }]
    })

    return result

Handling Method 1: If privacy information is detected, refuse to respond.

@before_agent(can_jump_to=["end"])
def content_blocker(state: AgentState,  runtime: Runtime) -> dict[str, Any] | None:
    """Deterministic guardrail: Block requests containing banned keywords."""
    # Get the first user message
    if not state["messages"]:
        return None

    last_message = state["messages"][-1]
    if last_message.type != "human":
        return None

    content = last_message.content.lower()
    prompt = (
        "You are a privacy protection assistant. Please identify personally identifiable information (PII) in the following text, "
        "such as: name, ID number, passport number, phone number, email, address, bank card number, social media account, license plate, etc. "
        "Note that if code or file paths contain usernames, they should also be considered sensitive information. "
        "If sensitive information is found, return {\"is_pii\": True}, otherwise return {\"is_pii\": False}. "
        "Please strictly return in JSON format and only output the JSON. The text is:\n\n" + content
    )

    pii_agent = trusted_model.with_structured_output(PiiCheck)
    result = pii_agent.invoke(prompt)

    if result.is_pii is True:
        # Block execution before any processing
        return {
            "messages": [{
                "role": "assistant",
                "content": "I cannot process requests containing inappropriate content. Please rephrase your request."
            }],
            "jump_to": "end"
        }
    else:
        print("No PII found")

    return None
result = message_with_pii(pii_middleware=content_blocker)

for message in result["messages"]:
    message.pretty_print()
================================ Human Message =================================

File "/home/luochang/proj/agent.py", line 53, in my_agent
    agent = create_react_agent(
---
Where is the error location?
================================== Ai Message ==================================

I cannot process requests containing inappropriate content. Please rephrase your request.

Handling Method 2: If sensitive information is detected, use a series of ***** to mask the privacy information.

@before_agent(can_jump_to=["end"])
def content_filter(state: AgentState,  runtime: Runtime) -> dict[str, Any] | None:
    """Deterministic guardrail: Block requests containing banned keywords."""
    # Get the first user message
    if not state["messages"]:
        return None

    last_message = state["messages"][-1]
    if last_message.type != "human":
        return None

    content = last_message.content.lower()
    prompt = (
        "You are a privacy protection assistant. Please identify personally identifiable information (PII) in the following text, "
        "such as: name, ID number, passport number, phone number, email, address, bank card number, social media account, license plate, etc. "
        "Note that if code or file paths contain usernames, they should also be considered sensitive information. "
        "If sensitive information is found, return {\"is_pii\": True}, otherwise return {\"is_pii\": False}. "
        "Please strictly return in JSON format and only output the JSON. The text is:\n\n" + content
    )

    pii_agent = trusted_model.with_structured_output(PiiCheck)
    result = pii_agent.invoke(prompt)

    if result.is_pii is True:
        mask_prompt = (
            "You are a privacy protection assistant. Please replace all personally identifiable information (PII) in the following text with asterisks (*). "
            "Only replace sensitive fragments, keep other text unchanged. "
            "Only output the processed text, no explanations or additional content. The text is:\n\n" + last_message.content
        )
        masked_message = basic_model.invoke(mask_prompt)
        return {
            "messages": [{
                "role": "assistant",
                "content": masked_message.content
            }]
        }
    else:
        print("No PII found")

    return None
result = message_with_pii(pii_middleware=content_filter)

for message in result["messages"]:
    message.pretty_print()
================================ Human Message =================================

File "/home/luochang/proj/agent.py", line 53, in my_agent
    agent = create_react_agent(
---
Where is the error location?
================================== Ai Message ==================================

File "/home/******/proj/agent.py", line 53, in my_agent
    agent = create_react_agent(
---
Where is the error location?
================================== Ai Message ==================================

The error is occurring at **line 53** in the file `/home/luochang/proj/agent.py`, specifically within the `my_agent` function where the `create_react_agent()` function is being called.

However, the traceback you've shown only shows the beginning of the error - it's incomplete. To see the exact error message and understand what's going wrong, you need to look at the rest of the traceback that comes after this line.

The complete error traceback would typically show:
- The specific error type (like `TypeError`, `ValueError`, `ImportError`, etc.)
- The detailed error message
- The full stack trace showing all function calls leading to the error

To get the complete error information, you should:

1. **Check the full console output** - the actual error message should be below what you've shown
2. **Look for the error type and message** - it will tell you exactly what went wrong
3. **Common issues with `create_react_agent`** might include:
   - Missing required parameters
   - Incorrect parameter types
   - Import issues
   - Missing dependencies

Could you share the complete error message? That would help identify the exact problem at line 53.