New Amazon Bedrock AgentCore Capabilities Drive Next Wave of Agentic AI Development

AWS has announced new innovations in Amazon Bedrock AgentCore, its platform for building and deploying agents securely at scale. Policy in AgentCore allows teams to set boundaries on what agents can do with tools, and AgentCore Evaluations help teams understand how their agents will perform in the real world. Additionally, AWS launched an enhanced memory capability that enables agents to learn from experience and improve over time, providing more tailored insights to customers.

Enterprise AI Agents

While the ability for agents to reason and act autonomously makes them powerful, organizations must establish robust controls to prevent unauthorized data access, inappropriate interactions, and system-level mistakes that could impact business operations. Even with careful prompting, agents make real-world mistakes that can have serious consequences.

AWS has launched Policy in Amazon Bedrock AgentCore, which helps organizations set clear boundaries for agent actions. Using natural language, teams can now give agents boundaries by defining which tools and data they can access, what actions they can perform, and under what conditions. These tools could be APIs, Lambda functions, MCP servers, or popular third-party services like Salesforce and Slack.

To ensure agents stay fast and responsive, Policy is integrated into AgentCore Gateway to instantly check agent actions against policies in milliseconds. This ensures agents stay within defined boundaries while operating autonomously. The natural language-based policy authoring provides a more accessible and user-friendly way for customers to create fine-grained policies by allowing them to describe rules in natural language instead of writing formal policy code.

Unlike traditional software metrics, evaluating AI agent quality requires complex data science pipelines, subjective assessments, and continuous real-time monitoring, a challenge that compounds with each agent update or model change.

AgentCore Evaluations simplifies complicated processes and eliminates complex infrastructure management with 13 pre-built evaluators for common quality dimensions such as correctness, helpfulness, tool selection accuracy, safety, goal success rate, and context relevance. 

 Additionally, developers have the flexibility to write their own custom evaluators using their preferred LLMs and prompts. Previously, this required months of data science work to build just the evaluation systems. The new service continuously samples live agent interactions to analyze agent behavior for pre-identified criteria like correctness, helpfulness, and safety. Development teams can set up alerts for proactive quality monitoring, using evaluations both during testing and in production. For example, if a customer service agent’s satisfaction scores drop by 10% over eight hours, the system triggers immediate alerts, enabling swift response before customer experience is impacted.

Most AI agents today lack critical memory capabilities because “memory” is often limited to a short-term context window that is reset with each new interaction, preventing them from learning from past successes or failures in production environments.

AgentCore Memory provides this critical feature, allowing an agent to build a coherent understanding of users over time. Today, AgentCore Memory is making a new episodic functionality generally available that allows agents to learn from past experiences and apply those insights to future interactions. Through structured episodes that capture context, reasoning, actions, and outcomes, another agent automatically analyzes patterns to improve decision-making. 

When agents encounter similar tasks, they can quickly access relevant historical data, reducing processing time and eliminating the need for extensive custom instructions. For example, an agent books airport transportation 45 minutes before the flight when you are traveling alone. Three months later, when you are traveling to the same destination—with kids this time—it automatically schedules pickup two hours early, remembering previous family trip challenges. This targeted learning approach helps agents make more consistent decisions based on actual performance data rather than relying on predetermined guidelines.

Leave a Reply

Your email address will not be published. Required fields are marked *