Media Summary: This has been my favorite video so far to make! I think The paper proposes a method to identify and interpret the directions in activation space of neural networks, addressing the issue ... One of the core roadblocks to understanding the computation inside a transformer is the fact that individual neurons do not seem ...

Sparse Autoencoders Find Highly Interpretable Features In Language Models - Detailed Analysis & Overview

This has been my favorite video so far to make! I think The paper proposes a method to identify and interpret the directions in activation space of neural networks, addressing the issue ... One of the core roadblocks to understanding the computation inside a transformer is the fact that individual neurons do not seem ... Warning: This is an ad-libbed talk, and I'm sure I got some facts wrong. This is a talk I gave to my MATS 9.0 training program on ... I made a video about one of my favorite papers! I hope you enjoy :) ===Summary=== "Applying In this video you will learn everything about variational

Photo Gallery

A Window  Into LLMs | Sparse Autoencoders Explained
Sparse Autoencoders Find Highly Interpretable Features in Language Models
Hoagy Cunningham — Finding distributed features in LLMs with sparse autoencoders [TAIS 2024]
[Lab Seminar] Sparse Autoencoders Find Highly Interpretable Features in Language Models
InterPLM: Discovering Interpretable Features in Protein Language Models via Sparse Autoencoders
What Happened With Sparse Autoencoders?
Introduction to Sparse AutoEncoders | ML@P Reading Group | Jinen Setpal
Transcoders Beat Sparse Autoencoders for Interpretability
Sparse Autoencoders Unlearn Knowledge in LLMs | A Paper-Based Walkthrough
Demo: Gemma Scope: Sparse autoencoders on Gemma 2
Sparse Autoencoders in PyTorch: Learn Interpretable Neural Features in Python
Variational Autoencoders | Generative AI Animated
Sponsored
View Detailed Profile
A Window  Into LLMs | Sparse Autoencoders Explained

A Window Into LLMs | Sparse Autoencoders Explained

This has been my favorite video so far to make! I think

Sparse Autoencoders Find Highly Interpretable Features in Language Models

Sparse Autoencoders Find Highly Interpretable Features in Language Models

The paper proposes a method to identify and interpret the directions in activation space of neural networks, addressing the issue ...

Hoagy Cunningham — Finding distributed features in LLMs with sparse autoencoders [TAIS 2024]

Hoagy Cunningham — Finding distributed features in LLMs with sparse autoencoders [TAIS 2024]

One of the core roadblocks to understanding the computation inside a transformer is the fact that individual neurons do not seem ...

[Lab Seminar] Sparse Autoencoders Find Highly Interpretable Features in Language Models

[Lab Seminar] Sparse Autoencoders Find Highly Interpretable Features in Language Models

"

InterPLM: Discovering Interpretable Features in Protein Language Models via Sparse Autoencoders

InterPLM: Discovering Interpretable Features in Protein Language Models via Sparse Autoencoders

Protein

Sponsored
What Happened With Sparse Autoencoders?

What Happened With Sparse Autoencoders?

Warning: This is an ad-libbed talk, and I'm sure I got some facts wrong. This is a talk I gave to my MATS 9.0 training program on ...

Introduction to Sparse AutoEncoders | ML@P Reading Group | Jinen Setpal

Introduction to Sparse AutoEncoders | ML@P Reading Group | Jinen Setpal

Slides: https://jinen.setpal.net/slides/sae.pdf.

Transcoders Beat Sparse Autoencoders for Interpretability

Transcoders Beat Sparse Autoencoders for Interpretability

Transcoders Beat

Sparse Autoencoders Unlearn Knowledge in LLMs | A Paper-Based Walkthrough

Sparse Autoencoders Unlearn Knowledge in LLMs | A Paper-Based Walkthrough

I made a video about one of my favorite papers! I hope you enjoy :) ===Summary=== "Applying

Demo: Gemma Scope: Sparse autoencoders on Gemma 2

Demo: Gemma Scope: Sparse autoencoders on Gemma 2

Sparse Autoencoders

Sparse Autoencoders in PyTorch: Learn Interpretable Neural Features in Python

Sparse Autoencoders in PyTorch: Learn Interpretable Neural Features in Python

Sparse autoencoder

Variational Autoencoders | Generative AI Animated

Variational Autoencoders | Generative AI Animated

In this video you will learn everything about variational

Extract high-level features from the human action information by Deep Sparse Autoencoder

Extract high-level features from the human action information by Deep Sparse Autoencoder

Extract high-level