[JUNE 2022] Aran Komatsuzaki on Scaling, GPT-J and Alignment

The Inside View

Content provided by Michaël Trazzi. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Michaël Trazzi or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://player.fm/legal.

1y ago 1:17:21

M4A•Episode home

Aran Komatsuzaki is a ML PhD student at GaTech and lead researcher at EleutherAI where he was one of the authors on GPT-J. In June 2022 we recorded an episode on scaling following up on the first Ethan Caballero episode (where we mentioned Aran as an influence on how Ethan started thinking about scaling).

Note: For some reason I procrastinated on editing the podcast, then had a lot of in-person podcasts so I left this one as something to edit later, until the date was so distant from June 2022 that I thought publishing did not make sense anymore. In July 2023 I'm trying that "one video a day" challenge (well I missed some days but I'm trying to get back on track) so I thought it made sense to release it anyway, and after a second watch it's somehow interesting to see how excited Aran was about InstructGPT, which turned to be quite useful for things like ChatGPT.

Outline

(00:00) intro

(00:53) the legend of the two AKs, Aran's arXiv reading routine

(04:14) why Aran expects Alignment to be the same as some other ML problems

(05:44) what Aran means when he says "AGI"

(10:24) what Aran means by "human-level at doing ML research"

(11:31) software improvement happening before hardware improvement

(13:00) is scale all we need?

(15:25) how "Scaling Laws for Neural Language Models" changed the process of doing experiments

(16:22) how Aran scale-pilled Ethan

(18:46) why Aran was already scale-pilled before GPT-2

(20:12) Aran's 2019 scaling paper: "One epoch is all you need"

(25:43) Aran's June 2022 interest: T0 and InstructGPT

(31:33) Encoder-Decoder performs better than encoder if multi-task-finetuned

(33:30) Why the Scaling Law might be different for T0-like models

(37:15) The Story Behind GPT-J

(41:40) Hyperparameters and architecture changes in GPT-J

(43:56) GPT-J's throughput

(47:17) 5 weeks of training using 256 of TPU cores

(50:34) did publishing GPT-J accelerate timelines?

(55:39) how Aran thinks about Alignment, defining Alignment

(58:19) in practice: improving benchmarks, but deception is still a problem

(1:00:49) main difficulties in evaluating language models

(1:05:07) how Aran sees the future: AIs aligning AIs, merging with AIs, Aran's takeoff scenario

(1:10:09) what Aran thinks we should do given how he sees the next decade

(1:12:34) regulating access to AGI

(1:14:50) what might happen: preventing some AI authoritarian regime

(1:15:42) conclusion, where to find Aran

54 episodes

#Tech #Michaël Trazzi