Before building with Claude
Decide whether to use Claude for support chat
Here are some key indicators that you should employ an LLM like Claude to automate portions of your customer support process:High volume of repetitive queries
High volume of repetitive queries
Claude excels at handling a large number of similar questions efficiently, freeing up human agents for more complex issues.
Need for quick information synthesis
Need for quick information synthesis
Claude can quickly retrieve, process, and combine information from vast knowledge bases, while human agents may need time to research or consult multiple sources.
24/7 availability requirement
24/7 availability requirement
Claude can provide round-the-clock support without fatigue, whereas staffing human agents for continuous coverage can be costly and challenging.
Rapid scaling during peak periods
Rapid scaling during peak periods
Claude can handle sudden increases in query volume without the need for hiring and training additional staff.
Consistent brand voice
Consistent brand voice
You can instruct Claude to consistently represent your brand’s tone and values, whereas human agents may vary in their communication styles.
- You prioritize natural, nuanced conversation: Claude’s sophisticated language understanding allows for more natural, context-aware conversations that feel more human-like than chats with other LLMs.
- You often receive complex and open-ended queries: Claude can handle a wide range of topics and inquiries without generating canned responses or requiring extensive programming of permutations of user utterances.
- You need scalable multilingual support: Claude’s multilingual capabilities allow it to engage in conversations in over 200 languages without the need for separate chatbots or extensive translation processes for each supported language.
Define your ideal chat interaction
Outline an ideal customer interaction to define how and when you expect the customer to interact with Claude. This outline will help to determine the technical requirements of your solution. Here is an example chat interaction for car insurance customer support:- Customer: Initiates support chat experience
- Claude: Warmly greets customer and initiates conversation
- Customer: Asks about insurance for their new electric car
- Claude: Provides relevant information about electric vehicle coverage
- Customer: Asks questions related to unique needs for electric vehicle insurances
- Claude: Responds with accurate and informative answers and provides links to the sources
- Customer: Asks off-topic questions unrelated to insurance or cars
- Claude: Clarifies it does not discuss unrelated topics and steers the user back to car insurance
- Customer: Expresses interest in an insurance quote
- Claude: Ask a set of questions to determine the appropriate quote, adapting to their responses
- Claude: Sends a request to use the quote generation API tool along with necessary information collected from the user
- Claude: Receives the response information from the API tool use, synthesizes the information into a natural response, and presents the provided quote to the user
- Customer: Asks follow up questions
- Claude: Answers follow up questions as needed
- Claude: Guides the customer to the next steps in the insurance process and closes out the conversation
In the real example that you write for your own use case, you might find it useful to write out the actual words in this interaction so that you can also get a sense of the ideal tone, response length, and level of detail you want Claude to have.
Break the interaction into unique tasks
Customer support chat is a collection of multiple different tasks, from question answering to information retrieval to taking action on requests, wrapped up in a single customer interaction. Before you start building, break down your ideal customer interaction into every task you want Claude to be able to perform. This ensures you can prompt and evaluate Claude for every task, and gives you a good sense of the range of interactions you need to account for when writing test cases.Customers sometimes find it helpful to visualize this as an interaction flowchart of possible conversation inflection points depending on user requests.
-
Greeting and general guidance
- Warmly greet the customer and initiate conversation
- Provide general information about the company and interaction
-
Product Information
- Provide information about electric vehicle coverage
This will require that Claude have the necessary information in its context, and might imply that a RAG integration is necessary.
- Answer questions related to unique electric vehicle insurance needs
- Answer follow-up questions about the quote or insurance details
- Offer links to sources when appropriate
- Provide information about electric vehicle coverage
-
Conversation Management
- Stay on topic (car insurance)
- Redirect off-topic questions back to relevant subjects
-
Quote Generation
- Ask appropriate questions to determine quote eligibility
- Adapt questions based on customer responses
- Submit collected information to quote generation API
- Present the provided quote to the customer
Establish success criteria
Work with your support team to define clear success criteria and write detailed evaluations with measurable benchmarks and goals. Here are criteria and benchmarks that can be used to evaluate how successfully Claude performs the defined tasks:Query comprehension accuracy
Query comprehension accuracy
This metric evaluates how accurately Claude understands customer inquiries across various topics. Measure this by reviewing a sample of conversations and assessing whether Claude has the correct interpretation of customer intent, critical next steps, what successful resolution looks like, and more. Aim for a comprehension accuracy of 95% or higher.
Response relevance
Response relevance
This assesses how well Claude’s response addresses the customer’s specific question or issue. Evaluate a set of conversations and rate the relevance of each response (using LLM-based grading for scale). Target a relevance score of 90% or above.
Response accuracy
Response accuracy
Assess the correctness of general company and product information provided to the user, based on the information provided to Claude in context. Target 100% accuracy in this introductory information.
Citation provision relevance
Citation provision relevance
Track the frequency and relevance of links or sources offered. Target providing relevant sources in 80% of interactions where additional information could be beneficial.
Topic adherence
Topic adherence
Measure how well Claude stays on topic, such as the topic of car insurance in our example implementation. Aim for 95% of responses to be directly related to car insurance or the customer’s specific query.
Content generation effectiveness
Content generation effectiveness
Measure how successful Claude is at determining when to generate informational content and how relevant that content is. For example, in our implementation, we would be determining how well Claude understands when to generate a quote and how accurate that quote is. Target 100% accuracy, as this is vital information for a successful customer interaction.
Escalation efficiency
Escalation efficiency
This measures Claude’s ability to recognize when a query needs human intervention and escalate appropriately. Track the percentage of correctly escalated conversations versus those that should have been escalated but weren’t. Aim for an escalation accuracy of 95% or higher.
Sentiment maintenance
Sentiment maintenance
This assesses Claude’s ability to maintain or improve customer sentiment throughout the conversation. Use sentiment analysis tools to measure sentiment at the beginning and end of each conversation. Aim for maintained or improved sentiment in 90% of interactions.
Deflection rate
Deflection rate
The percentage of customer inquiries successfully handled by the chatbot without human intervention. Typically aim for 70-80% deflection rate, depending on the complexity of inquiries.
Customer satisfaction score
Customer satisfaction score
A measure of how satisfied customers are with their chatbot interaction. Usually done through post-interaction surveys. Aim for a CSAT score of 4 out of 5 or higher.
Average handle time
Average handle time
The average time it takes for the chatbot to resolve an inquiry. This varies widely based on the complexity of issues, but generally, aim for a lower AHT compared to human agents.
How to implement Claude as a customer service agent
Choose the right Claude model
The choice of model depends on the trade-offs between cost, accuracy, and response time. For customer support chat,claude-opus-4-1-20250805
is well suited to balance intelligence, latency, and cost. However, for instances where you have conversation flow with multiple prompts including RAG, tool use, and/or long-context prompts, claude-3-haiku-20240307
may be more suitable to optimize for latency.
Build a strong prompt
Using Claude for customer support requires Claude having enough direction and context to respond appropriately, while having enough flexibility to handle a wide range of customer inquiries. Let’s start by writing the elements of a strong prompt, starting with a system prompt:While you may be tempted to put all your information inside a system prompt as a way to separate instructions from the user conversation, Claude actually works best with the bulk of its prompt content written inside the first
User
turn (with the only exception being role prompting). Read more at Giving Claude a role with a system prompt.config.py
.
Add dynamic and agentic capabilities with tool use
Claude is capable of taking actions and retrieving information dynamically using client-side tool use functionality. Start by listing any external tools or APIs the prompt should utilize. For this example, we will start with one tool for calculating the quote.As a reminder, this tool will not perform the actual calculation, it will just signal to the application that a tool should be used with whatever arguments specified.
Deploy your prompts
It’s hard to know how well your prompt works without deploying it in a test production setting and running evaluations so let’s build a small application using our prompt, the Anthropic SDK, and streamlit for a user interface. In a file calledchatbot.py
, start by setting up the ChatBot class, which will encapsulate the interactions with the Anthropic SDK.
The class should have two main methods: generate_message
and process_user_input
.
Build your user interface
Test deploying this code with Streamlit using a main method. Thismain()
function sets up a Streamlit-based chat interface.
We’ll do this in a file called app.py
Evaluate your prompts
Prompting often requires testing and optimization for it to be production ready. To determine the readiness of your solution, evaluate the chatbot performance using a systematic process combining quantitative and qualitative methods. Creating a strong empirical evaluation based on your defined success criteria will allow you to optimize your prompts.The Claude Console now features an Evaluation tool that allows you to test your prompts under various scenarios.
Improve performance
In complex scenarios, it may be helpful to consider additional strategies to improve performance beyond standard prompt engineering techniques & guardrail implementation strategies. Here are some common scenarios:Reduce long context latency with RAG
When dealing with large amounts of static and dynamic context, including all information in the prompt can lead to high costs, slower response times, and reaching context window limits. In this scenario, implementing Retrieval Augmented Generation (RAG) techniques can significantly improve performance and efficiency. By using embedding models like Voyage to convert information into vector representations, you can create a more scalable and responsive system. This approach allows for dynamic retrieval of relevant information based on the current query, rather than including all possible context in every prompt. Implementing RAG for support use cases RAG recipe has been shown to increase accuracy, reduce response times, and reduce API costs in systems with extensive context requirements.Integrate real-time data with tool use
When dealing with queries that require real-time information, such as account balances or policy details, embedding-based RAG approaches are not sufficient. Instead, you can leverage tool use to significantly enhance your chatbot’s ability to provide accurate, real-time responses. For example, you can use tool use to look up customer information, retrieve order details, and cancel orders on behalf of the customer. This approach, outlined in our tool use: customer service agent recipe, allows you to seamlessly integrate live data into your Claude’s responses and provide a more personalized and efficient customer experience.Strengthen input and output guardrails
When deploying a chatbot, especially in customer service scenarios, it’s crucial to prevent risks associated with misuse, out-of-scope queries, and inappropriate responses. While Claude is inherently resilient to such scenarios, here are additional steps to strengthen your chatbot guardrails:- Reduce hallucination: Implement fact-checking mechanisms and citations to ground responses in provided information.
- Cross-check information: Verify that the agent’s responses align with your company’s policies and known facts.
- Avoid contractual commitments: Ensure the agent doesn’t make promises or enter into agreements it’s not authorized to make.
- Mitigate jailbreaks: Use methods like harmlessness screens and input validation to prevent users from exploiting model vulnerabilities, aiming to generate inappropriate content.
- Avoid mentioning competitors: Implement a competitor mention filter to maintain brand focus and not mention any competitor’s products or services.
- Keep Claude in character: Prevent Claude from changing their style of context, even during long, complex interactions.
- Remove Personally Identifiable Information (PII): Unless explicitly required and authorized, strip out any PII from responses.
Reduce perceived response time with streaming
When dealing with potentially lengthy responses, implementing streaming can significantly improve user engagement and satisfaction. In this scenario, users receive the answer progressively instead of waiting for the entire response to be generated. Here is how to implement streaming:- Use the Anthropic Streaming API to support streaming responses.
- Set up your frontend to handle incoming chunks of text.
- Display each chunk as it arrives, simulating real-time typing.
- Implement a mechanism to save the full response, allowing users to view it if they navigate away and return.
Scale your Chatbot
As the complexity of your Chatbot grows, your application architecture can evolve to match. Before you add further layers to your architecture, consider the following less exhaustive options:- Ensure that you are making the most out of your prompts and optimizing through prompt engineering. Use our prompt engineering guides to write the most effective prompts.
- Add additional tools to the prompt (which can include prompt chains) and see if you can achieve the functionality required.
Integrate Claude into your support workflow
While our examples have focused on Python functions callable within a Streamlit environment, deploying Claude for real-time support chatbot requires an API service. Here’s how you can approach this:-
Create an API wrapper: Develop a simple API wrapper around your classification function. For example, you can use Flask API or Fast API to wrap your code into a HTTP Service. Your HTTP service could accept the user input and return the Assistant response in its entirety. Thus, your service could have the following characteristics:
- Server-Sent Events (SSE): SSE allows for real-time streaming of responses from the server to the client. This is crucial for providing a smooth, interactive experience when working with LLMs.
- Caching: Implementing caching can significantly improve response times and reduce unnecessary API calls.
- Context retention: Maintaining context when a user navigates away and returns is important for continuity in conversations.
- Build a web interface: Implement a user-friendly web UI for interacting with the Claude-powered agent.