Stephen Casper Powering Up Capability Evaluations Alignment Workshop

Media Summary: SoroushPour, CEO of Harmony Intelligence, shares insights from “Third-Party Evals: Learnings from Harmony Intelligence,” ... Beth Barnes shares how METR is building scalable, interpretable metrics to reliably indicate when models become concerning. Who should learn the Personal Agility System with Peter Stevens? Why you should you take it? What will happen in the

Stephen Casper Powering Up Capability Evaluations Alignment Workshop - Detailed Analysis & Overview

SoroushPour, CEO of Harmony Intelligence, shares insights from “Third-Party Evals: Learnings from Harmony Intelligence,” ... Beth Barnes shares how METR is building scalable, interpretable metrics to reliably indicate when models become concerning. Who should learn the Personal Agility System with Peter Stevens? Why you should you take it? What will happen in the

Photo Gallery

Stephen Casper – Powering Up Capability Evaluations [Alignment Workshop]

Stephen Casper - Powering up AI Capability Evaluations with Model Tampering Attacks [Alignment Works

Stephen Casper – Generalized Adversarial Training and Testing

Stephen Casper - ML Researchers as Policymakers [Alignment Workshop]

Stephen Casper: Problems with Evals (HAAISS 2024)

Stephen Casper: Problems with RLHF (HAAISS 2024)

Post-AGI Civilizational Equilibria | Stephen Casper

Soroush Pour – 3rd-Party Evals: Harmony Intelligence [Alignment Workshop]

Stephen Casper - Why do LLM Outputs Disagree with Internal Representations of Truthfulness?

Beth Barnes – METR Updates & Research Directions [Alignment Workshop]

Ep 14 - Interp, latent robustness, RLHF limitations w/ Stephen Casper (PhD AI researcher, MIT)

Peter Stevens Introduces PAS the Personal Agility System Online Workshop

View Detailed Profile

Stephen Casper – Powering Up Capability Evaluations [Alignment Workshop]

Stephen Casper – Powering Up Capability Evaluations [Alignment Workshop]

In “

Stephen Casper - Powering up AI Capability Evaluations with Model Tampering Attacks [Alignment Works

Stephen Casper - Powering up AI Capability Evaluations with Model Tampering Attacks [Alignment Works

Casper

Stephen Casper – Generalized Adversarial Training and Testing

Stephen Casper – Generalized Adversarial Training and Testing

Stephen Casper

Stephen Casper - ML Researchers as Policymakers [Alignment Workshop]

Stephen Casper - ML Researchers as Policymakers [Alignment Workshop]

Stephen Casper

Stephen Casper: Problems with Evals (HAAISS 2024)

Stephen Casper: Problems with Evals (HAAISS 2024)

Stephen Casper

Stephen Casper: Problems with RLHF (HAAISS 2024)

Stephen Casper: Problems with RLHF (HAAISS 2024)

Stephen Casper

Post-AGI Civilizational Equilibria | Stephen Casper

Post-AGI Civilizational Equilibria | Stephen Casper

This

Soroush Pour – 3rd-Party Evals: Harmony Intelligence [Alignment Workshop]

Soroush Pour – 3rd-Party Evals: Harmony Intelligence [Alignment Workshop]

SoroushPour, CEO of Harmony Intelligence, shares insights from “Third-Party Evals: Learnings from Harmony Intelligence,” ...

Stephen Casper - Why do LLM Outputs Disagree with Internal Representations of Truthfulness?

Stephen Casper - Why do LLM Outputs Disagree with Internal Representations of Truthfulness?

Stephen Casper

Beth Barnes – METR Updates & Research Directions [Alignment Workshop]

Beth Barnes – METR Updates & Research Directions [Alignment Workshop]

Beth Barnes shares how METR is building scalable, interpretable metrics to reliably indicate when models become concerning.

Ep 14 - Interp, latent robustness, RLHF limitations w/ Stephen Casper (PhD AI researcher, MIT)

Ep 14 - Interp, latent robustness, RLHF limitations w/ Stephen Casper (PhD AI researcher, MIT)

We speak with

Peter Stevens Introduces PAS the Personal Agility System Online Workshop

Peter Stevens Introduces PAS the Personal Agility System Online Workshop

Who should learn the Personal Agility System with Peter Stevens? Why you should you take it? What will happen in the