
Executive Overview
A leading data services company partnered with Xenonstack to implement an agentic Retrieval Augmented Generation (RAG) solution on AWS. The client's legacy support systems led to delayed responses, fragmented knowledge silos, and inconsistent customer experiences.
By utilizing open-source LLMs including DeepSeek-R1 and Llama 3.3 series in a specialized multi-agent architecture, the new platform achieved near real-time WhatsApp-based customer interactions. Key outcomes include reduction in response time, increased automated resolution rate, and high TCO savings while significantly increasing customer satisfaction.
Customer Challenge & Technical Obstacles
Customer Information
- Industry: Data Services
- Location: Italy
- Company Size: Global operations with multi-region support team
Business Challenges
The client faced significant hurdles in modernizing their customer engagement platform due to the following:
The business problems the customer was facing:
-
High volume of repetitive customer inquiries requiring extensive human agent involvement
-
Inconsistent response quality and long wait times (15-20 minutes average)
-
Difficulty scaling support operations during peak periods
Technical limitations or challenges they encountered:
-
Fragmented knowledge base with information scattered across documents and departmental silos
-
Inability to process customer-uploaded documents efficiently
-
Limited support for rich media interactions
Business goals they were trying to achieve:
-
Enable real-time customer engagement across global operations
-
Build a scalable support infrastructure with automated resolution
-
Provide consistent high-quality responses with minimal human intervention
Why their existing solution was inadequate:
-
Traditional ticket-based system with rigid workflows
-
Legacy tools couldn't support WhatsApp integration or real-time responses
-
No centralized knowledge management or intelligent retrieval
Any compliance, security, or performance requirements:
-
Data privacy compliance across multiple regions
-
Secure handling of customer documents and conversation history
-
Response time requirements of under 2 minutes
Critical timelines or business pressures:
-
Required rapid improvement in customer satisfaction metrics
-
Business needed to support expansion into new markets with consistent experience
Technical Challenges
Legacy systems and infrastructure challenges:
-
The existing infrastructure relied on outdated ticket-based systems and lacked support for modern messaging platforms like WhatsApp.
Technical debt or architectural limitations:
-
Fragmented knowledge architecture with siloed systems introduced inconsistencies and hindered unified response generation.
Integration requirements:
-
Integration was needed with WhatsApp Business API, document processing services, and existing knowledge repositories.
Scalability, reliability, or performance issues:
-
The previous solution was not scalable for global operations and suffered from significant latency during peak usage.
Data challenges:
-
Inconsistent knowledge formats, lack of semantic search capabilities, and minimal context awareness made personalized responses difficult.
Security and compliance requirements:
-
Required implementation of end-to-end encryption, secure document handling, and compliance with regional data protection regulations.
The Xenonstack Solution: Multi-Agent RAG Architecture
Solution Overview
Xenonstack designed a four-layer modular architecture on AWS:
1. User Interaction & Message Processing Layer
This layer handles customer communication and initial message processing:
-
Meta Webhooks receives user messages from WhatsApp and forward them to API Gateway
-
API Gateway routes them to appropriate AWS Lambda functions
-
Message Processor Lambda functions prepare messages for agent processing
-
SQS queues ensure reliable message handling during high traffic
-
User conversation history stored in DynamoDB maintains context across interactions
2. Document Processing & Vectorization Layer
This layer processes uploaded files and transforms them into searchable knowledge:
-
User-uploaded documents stored in S3 with metadata in DynamoDB
-
AWS Event Bridge triggers ECS jobs for document processing
-
Content Extraction services parse different document formats
-
Titan Embeddings convert extracted text into vector representations
-
Embedding Indexing services add vectors to OpenSearch for retrieval
3. Agentic Intelligence & Orchestration Layer
This layer provides the Agent-powered decision making and response generation:
-
Orchestration Agent: Implemented in Langgraph, this agent controls conversation flow and coordinates other agents
-
Knowledge Base Agent (DeepSeek-R1) specializes in retrieving and synthesizing information from vector base
-
Response Generation Agent (Llama 3.3) creates natural, contextually appropriate responses tailored to user profile
-
Prompt Library provides optimized prompts for different conversation scenarios
4. Knowledge Storage & Retrieval Layer
This layer stores and retrieves information needed to answer user queries:
-
Vector Store serves as central repository for searchable knowledge
-
Amazon OpenSearch Service provides vector search capabilities
-
Amazon Bedrock Knowledge Base stores structured information
-
Meta Callback API returns generated responses to users via WhatsApp
AWS Services Used
-
Amazon API Gateway: Routes incoming Webhook requests from WhatsApp secured via WAF
-
AWS Lambda: Processes message and orchestrates agent workflows
-
Amazon SQS: Ensures reliable message queuing during high traffic
-
Amazon DynamoDB: Stores user conversation history and document metadata
-
Amazon S3: Stores user-uploaded documents and files
-
AWS Event Bridge: Detects new document uploads and triggers processing
-
Amazon ECS: Runs document processing and embedding generation jobs
-
Amazon OpenSearch Service: Provides vector search capabilities for semantic matching
-
AWS EKS: Containerize the agents and make it accessible to lambda
-
Amazon Bedrock Knowledge Base: Stores structured information for common queries
-
Amazon Titan Embeddings: Converts extracted text into vector representations
Architecture Diagram
Implementation Details
The solution was implemented by Xenonstack using Agile methodology and DevOps automation. The team began with stakeholder workshops to identify key conversation flows and knowledge requirements.
-
How the solution was implemented: WhatsApp messages were routed through Meta Webhooks to API Gateway and processed by Lambda functions. Multiple specialized AI agents powered by open-source LLMs (Llama 3.3 and DeepSeek-R1) were orchestrated through langgraph. Document processing pipelines converted uploaded files into vector embeddings for semantic search through OpenSearch.
-
Methodology used: Agile sprints guided iterative development and testing. DevOps best practices were followed with GitOps pipelines for deployment automation.
-
Migration approach: The legacy ticket system was replaced in phases, starting with knowledge vectorization, then agent development, and finally WhatsApp integration. This ensured continuity of customer support operations.
-
Integration with existing systems: The platform was integrated with WhatsApp Business API, existing knowledge bases, and document repositories. Conversation history was maintained for context awareness.
-
Security and compliance considerations: End-to-end encryption, secure document handling, and compliance with regional data protection regulations were implemented throughout the system.
-
Deployment and testing strategy: Components were deployed using Infrastructure as Code with AWS CDK. Integration and load testing were performed using automated test suites with monitoring through CloudWatch.
Innovative Approaches & AWS Best Practices
The solution adopted several AWS best practices including serverless architecture, event-driven design, and multi-agent AI orchestration.
How the solution used AWS best practices: Serverless services (Lambda, SQS, API Gateway) were used for scalability and cost efficiency. Event-driven architecture with Event Bridge enabled asynchronous processing.
Innovative approaches or unique aspects of the implementation:
-
Specialized Multi-Agent Architecture: Rather than using a single LLM for all tasks, the solution employs dedicated Bedrock Foundation models (Llama 3.3 and DeepSeek-R1) for specific functions
-
Model Selection Based on Strengths: Deliberate selection of Bedrock's DeepSeek-R1 for reasoning-heavy retrieval tasks and Llama 3.3 for orchestration and response generation
-
RAG-Enhanced Open Source LLMs: The combination of vector search and specialized prompts enabled open-source models to achieve performance comparable to proprietary solutions
Use of AWS Well-Architected Framework principles: Operational Excellence was achieved through observability and automation. Reliability and Performance Efficiency were addressed with queuing and stateful conversation management. Cost Optimization was accomplished via serverless components and open-source LLMs.
DevOps, CI/CD, or other modern practices implemented:
-
CI/CD pipelines for infrastructure and agent deployment
-
Automated testing for conversation flows and agent responses
-
Prompt version control and optimization
-
Agent performance monitoring and telemetry
-
Used specialized agents for different conversation aspects
-
Applied vector search for semantic knowledge retrieval
-
Enabled WhatsApp for business user self-service
Business Impact & ROI
Business Outcomes and Success Metrics
Cost savings (specific percentages or amounts):
-
Achieved significant reduction in total cost of ownership by shifting to a serverless, open-source LLM-based architecture
-
Reduced human agent requirements through automated resolution rate
Revenue increases or new revenue streams:
-
Enabled faster customer issue resolution and improved satisfaction, contributing to higher customer retention and lifetime value
Time-to-market improvements:
-
Response time dramatically reduced
-
Support for new products or services can now be added in days rather than weeks
Operational efficiencies:
-
Reduction in support escalations to human agents
-
Support team now focuses on complex cases requiring human judgment
Competitive advantages gained:
-
Real-time WhatsApp interactions with document understanding capabilities
-
Consistent support quality across various languages
-
24/7 availability compared to previous business-hours-only model
ROI and payback period:
-
A significant reduction in operational costs and improved customer metrics
Technical Achievements & Performance Gains
-
Performance improvements (with metrics): The shift to multi-agent architecture with specialized LLMs dramatically improved performance. Response time was reduced with increased automated resolution rate.
-
Scalability enhancements: By using serverless components and event-driven design, the platform achieved elastic scalability. Lambda functions, SQS, and the multi-agent system could handle varying loads without performance degradation.
-
Reliability and availability improvements: The event-driven architecture with queuing introduced fault tolerance and high availability. Conversation state management in DynamoDB ensured context preservation even during service disruptions.
-
Security posture strengthening: End-to-end encryption, secure document handling, and compliance with data protection regulations enhanced security. Access controls and monitoring were implemented at all layers.
-
Reduced technical debt: The replacement of legacy ticket systems with modern serverless architecture and LLM-based agents reduced complexity, enhanced maintainability, and lowered long-term technical overhead.
-
Improved development velocity: CI/CD pipelines and Infrastructure as Code enabled faster iteration cycles. Teams could deploy new agent capabilities, knowledge, or conversation flows with minimal manual intervention.
Overcoming Implementation Hurdles
Challenges Overcome
During implementation, the team encountered several complex challenges:
Significant challenges encountered during implementation:
-
Complex multi-turn conversations were difficult to manage consistently across different agent handoffs
-
Document processing latency created poor user experience during initial uploads
-
Initial hallucination in rare cases reduced trust in automated responses
-
Cross-language support quality varied significantly in early versions
How these challenges were addressed:
-
Multi-turn conversation issues were resolved through improved conversation state management in DynamoDB and refinement of the Orchestration Agent's (Llama 3.3) context window management.
-
Document processing latency was mitigated by implementing asynchronous processing with customer notifications.
-
Hallucination was addressed by improving RAG precision with better embedding techniques and implementing a "citations required" protocol for the DeepSeek-R1 knowledge agent.
-
Language support was enhanced through language-specific embeddings and detection mechanisms.
Adjustments made to the original plan:
-
Added more robust conversation state management
-
Implemented asynchronous document processing with status notifications
-
Enhanced prompt engineering for reducing hallucinations
-
Added specialized language handling
Key Success Factors & Proven Methodologies
Key learnings from the implementation
-
Multi-agent architecture provides superior specialization compared to single LLM approaches
-
WhatsApp as a primary channel significantly increased customer engagement
-
Event-driven architecture with queuing ensured reliable processing during peak periods
Practices that contributed to success
-
Specialized agent roles (orchestration, knowledge retrieval, response generation)
-
Vector search dramatically improved information retrieval relevance
-
Serverless approach reduced operational complexity and costs
Approaches that could benefit other implementations
-
Selecting different open source LLMs based on their strengths (DeepSeek-R1 for reasoning, Llama 3.3 for orchestration and generation)
-
Using vector search to enhance knowledge retrieval
-
Implementing conversation state management (Dynamo DB) for context preservation
-
Focusing on WhatsApp integration for markets where it's the dominant messaging platform
Roadmap for Future Enhancements
Next phases of the project
-
Adding specialized agents for complex domains like technical support and billing
-
Implementing predictive support capabilities based on usage patterns
-
Expanding to additional messaging platforms beyond WhatsApp
Additional AWS services to be implemented
-
Amazon Kendra for enhanced enterprise search capabilities
-
AWS Lambda Inference Optimization for improved agent response times
Future optimization plans
-
Fine-tuning open-source models (DeepSeek and Llama) for domain-specific knowledge
-
Implementing more sophisticated agent collaboration patterns
-
Enhancing personalization through customer preference learning
Ongoing partnership activities
-
Regular evaluation of new open source LLM releases for potential agent improvements
-
Performance benchmarking and optimization
-
Knowledge expansion and continuous prompt refinement
Next steps to real-time engagement with Agentic RAG & Multi-Agent AI
Talk to our experts about building real-time engagement platforms using Agentic RAG and Multi-Agent AI. Learn how industries and departments harness agentic workflows and decision intelligence to drive personalized interactions, automate operations, and enhance responsiveness—unlocking smarter, faster customer engagement at scale.