LW - How ARENA course material gets made by CallumMcDougall

The Nonlinear Library: LessWrong

Content provided by The Nonlinear Fund. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by The Nonlinear Fund or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://player.fm/legal.

5d ago 14:10

MP3•Episode home

Link to original article
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: How ARENA course material gets made, published by CallumMcDougall on July 3, 2024 on LessWrong. TL;DR In this post, I describe my methodology for building new material for ARENA. I'll mostly be referring to the exercises on IOI, Superposition and Function Vectors as case studies. I expect this to be useful for people who are interested in designing material for ARENA or ARENA-like courses, as well as people who are interested in pedagogy or ML paper replications. The process has 3 steps: 1. Start with something concrete 2. First pass: replicate, and understand 3. Second pass: exercise-ify Summary I'm mostly basing this on the following 3 sets of exercises: Indirect Object Identification - these exercises focus on the IOI paper (from Conmy et al). The goal is to have people understand what exploratory analysis of transformers looks like, and introduce the key ideas of the circuits agenda. Superposition & SAEs - these exercises focus on understanding superposition and the agenda of dictionary learning (specifically sparse autoencoders). Most of the exercises explore Anthropic's Toy Models of Superposition paper, except for the last 2 sections which explore sparse autoencoders (firstly by applying them to the toy model setup, secondly by exploring a sparse autoencoder trained on a language model). Function Vectors - these exercises focus on the Function Vectors paper by David Bau et al, although they also make connections with related work such as Alex Turner's GPT2-XL steering vector work. These exercises were interesting because they also had the secondary goal of being an introduction to the nnsight library, in much the same way that the intro to mech interp exercises were also an introduction to TransformerLens. The steps I go through are listed below. I'm indexing from zero because I'm a software engineer so of course I am. The steps assume you already have an idea of what exercises you want to create; in Appendix (1) you can read some thoughts on what makes for a good exercise set. 1. Start with something concrete When creating material, you don't want to be starting from scratch. It's useful to have source code available to browse - bonus points if that takes the form of a Colab or something which is self-contained and has easily visible output. IOI - this was Neel's "Exploratory Analysis Demo" exercises. The rest of the exercises came from replicating the paper directly. Superposition - this was Anthroic's Colab notebook (although the final version went quite far beyond this). The very last section (SAEs on transformers) was based on Neel Nanda's demo Colab). Function Vectors - I started with the NDIF demo notebook, to show how some basic nnsight syntax worked. As for replicating the actual function vectors paper, unlike the other 2 examples I was mostly just working from the paper directly. It helped that I was collaborating with some of this paper's authors, so I was able to ask them some questions to clarify aspects of the paper. 2. First-pass: replicate, and understand The first thing I'd done in each of these cases was go through the material I started with, and make sure I understood what was going on. Paper replication is a deep enough topic for its own series of blog posts (many already exist), although I'll emphasise that I'm not usually talking about full paper replication here, because ideally you'll be starting from something a it further along, be that a Colab, a different tutorial, or something else. And even when you are just working directly from a paper, you shouldn't make the replication any harder for yourself than you need to. If there's code you can take from somewhere else, then do. My replication usually takes the form of working through a notebook in VSCode. I'll either start from scratch, or from a downloaded Colab if I'm using one as a ...

1702 episodes

#The Nonlinear Fund #Podcasting Education #Of TexttoSpeech