Media Summary: This has been my favorite video so far to make! I think The paper proposes a method to identify and interpret the directions in activation space of neural networks, addressing the issue ... One of the core roadblocks to understanding the computation inside a transformer is the fact that individual neurons do not seem ...
Sparse Autoencoders Find Highly Interpretable Features In Language Models - Detailed Analysis & Overview
This has been my favorite video so far to make! I think The paper proposes a method to identify and interpret the directions in activation space of neural networks, addressing the issue ... One of the core roadblocks to understanding the computation inside a transformer is the fact that individual neurons do not seem ... Warning: This is an ad-libbed talk, and I'm sure I got some facts wrong. This is a talk I gave to my MATS 9.0 training program on ... I made a video about one of my favorite papers! I hope you enjoy :) ===Summary=== "Applying In this video you will learn everything about variational