Media Summary: This video introduces a new series on testing AI This lecture discusses the critical shift from Jason Lopatecki, Co-Founder and CEO of Arize AI, dives into the world of

The Agent Evaluation Revolution - Detailed Analysis & Overview

This video introduces a new series on testing AI This lecture discusses the critical shift from Jason Lopatecki, Co-Founder and CEO of Arize AI, dives into the world of In this step-by-step video walkthrough, we'll show you how to use Ready to become a certified watsonx AI Assistant Engineer? Register now and use code IBMTechYT20 for 20% off of your exam ... ... verbosity, self-enhancement bias 00:47:22 Best practices 00:54:06 Factuality 01:00:15

Photo Gallery

The agent evaluation revolution
Agent Evaluation & Benchmarks - Agentic AI MOOC 2025 Lecture 4 Summary
AI Agent evaluation: A complete guide to measuring performance
Top 5 AI Agent Evaluation Tools (2025): Maxim AI, Langfuse, Arize | LLM Observability Comparison
Agent evaluation with ADK & Vertex AI | The Agent Factory Podcast
Evaluating Agents and Assistants: The AI Conference
How to use Agent Evaluation in Microsoft Copilot Studio
How to Evaluate AI Agents: Comprehensive Strategies for Reliable, High‑Quality Agentic Systems
Self-Improving Agents and Agent Evaluation With Arize & Databricks ML Flow
LLM as a Judge: Scaling AI Evaluation Strategies
Stanford CME295 Transformers & LLMs | Autumn 2025 | Lecture 8 - LLM Evaluation
AWS re:Invent 2025 - Improve agent quality in production with Bedrock AgentCore Evaluations(AIM3348)
Sponsored
View Detailed Profile
The agent evaluation revolution

The agent evaluation revolution

This video introduces a new series on testing AI

Agent Evaluation & Benchmarks - Agentic AI MOOC 2025 Lecture 4 Summary

Agent Evaluation & Benchmarks - Agentic AI MOOC 2025 Lecture 4 Summary

This lecture discusses the critical shift from

AI Agent evaluation: A complete guide to measuring performance

AI Agent evaluation: A complete guide to measuring performance

Evaluating

Top 5 AI Agent Evaluation Tools (2025): Maxim AI, Langfuse, Arize | LLM Observability Comparison

Top 5 AI Agent Evaluation Tools (2025): Maxim AI, Langfuse, Arize | LLM Observability Comparison

The landscape of AI

Agent evaluation with ADK & Vertex AI | The Agent Factory Podcast

Agent evaluation with ADK & Vertex AI | The Agent Factory Podcast

Learn how to effectively

Sponsored
Evaluating Agents and Assistants: The AI Conference

Evaluating Agents and Assistants: The AI Conference

Jason Lopatecki, Co-Founder and CEO of Arize AI, dives into the world of

How to use Agent Evaluation in Microsoft Copilot Studio

How to use Agent Evaluation in Microsoft Copilot Studio

In this step-by-step video walkthrough, we'll show you how to use

How to Evaluate AI Agents: Comprehensive Strategies for Reliable, High‑Quality Agentic Systems

How to Evaluate AI Agents: Comprehensive Strategies for Reliable, High‑Quality Agentic Systems

Evaluating

Self-Improving Agents and Agent Evaluation With Arize & Databricks ML Flow

Self-Improving Agents and Agent Evaluation With Arize & Databricks ML Flow

As autonomous

LLM as a Judge: Scaling AI Evaluation Strategies

LLM as a Judge: Scaling AI Evaluation Strategies

Ready to become a certified watsonx AI Assistant Engineer? Register now and use code IBMTechYT20 for 20% off of your exam ...

Stanford CME295 Transformers & LLMs | Autumn 2025 | Lecture 8 - LLM Evaluation

Stanford CME295 Transformers & LLMs | Autumn 2025 | Lecture 8 - LLM Evaluation

... verbosity, self-enhancement bias 00:47:22 Best practices 00:54:06 Factuality 01:00:15

AWS re:Invent 2025 - Improve agent quality in production with Bedrock AgentCore Evaluations(AIM3348)

AWS re:Invent 2025 - Improve agent quality in production with Bedrock AgentCore Evaluations(AIM3348)

Amazon Bedrock AgentCore

Evaluating and Debugging Non-Deterministic AI Agents

Evaluating and Debugging Non-Deterministic AI Agents

Evaluate