Best Bluedot Impact Podcasts (2024)

1
Ep 14 - Interp, latent robustness, RLHF limitations w/ Stephen Casper (PhD AI researcher, MIT) 2:42:17

2h ago2:42:17

2:42:17

We speak with Stephen Casper, or "Cas" as his friends call him. Cas is a PhD student at MIT in the Computer Science (EECS) department, in the Algorithmic Alignment Group advised by Prof Dylan Hadfield-Menell. Formerly, he worked with the Harvard Kreiman Lab and the Center for Human-Compatible AI (CHAI) at Berkeley. His work focuses on better unders…

1
Ep 13 - AI researchers expect AGI sooner w/ Katja Grace (Co-founder & Lead Researcher, AI Impacts) 1:20:28

2h ago1:20:28

1:20:28

We speak with Katja Grace. Katja is the co-founder and lead researcher at AI Impacts, a research group trying to answer key questions about the future of AI — when certain capabilities will arise, what will AI look like, how it will all go for humanity. We talk to Katja about: * How AI Impacts latest rigorous survey of leading AI researchers shows …

1
Intro to Brain-Like-AGI Safety 1:02:10

1d ago1:02:10

1:02:10

(Sections 3.1-3.4, 6.1-6.2, and 7.1-7.5) Suppose we someday build an Artificial General Intelligence algorithm using similar principles of learning and cognition as the human brain. How would we use such an algorithm safely? I will argue that this is an open technical problem, and my goal in this post series is to bring readers with no prior knowle…

1
Eliciting Latent Knowledge 1:00:27

6h ago1:00:27

1:00:27

In this post, we’ll present ARC’s approach to an open problem we think is central to aligning powerful machine learning (ML) systems: Suppose we train a model to predict what the future will look like according to cameras and other sensors. We then use planning algorithms to find a sequence of actions that lead to predicted futures that look good t…

1
Chinchilla’s Wild Implications 24:57

24h ago24:57

24:57

This post is about language model scaling laws, specifically the laws derived in the DeepMind paper that introduced Chinchilla. The paper came out a few months ago, and has been discussed a lot, but some of its implications deserve more explicit notice in my opinion. In particular: Data, not size, is the currently active constraint on language mode…

1
Deep Double Descent 8:27

7h ago8:27

8:27

We show that the double descent phenomenon occurs in CNNs, ResNets, and transformers: performance first improves, then gets worse, and then improves again with increasing model size, data size, or training time. This effect is often avoided through careful regularization. While this behavior appears to be fairly universal, we don’t yet fully unders…

1
Empirical Findings Generalize Surprisingly Far 11:32

8h ago11:32

11:32

Previously, I argued that emergent phenomena in machine learning mean that we can’t rely on current trends to predict what the future of ML will be like. In this post, I will argue that despite this, empirical findings often do generalize very far, including across “phase transitions” caused by emergent behavior. This might seem like a contradictio…

1
Gradient Hacking: Definitions and Examples 9:15

8h ago9:15

9:15

Gradient hacking is a hypothesized phenomenon where: A model has knowledge about possible training trajectories which isn’t being used by its training algorithms when choosing updates (such as knowledge about non-local features of its loss landscape which aren’t taken into account by local optimization algorithms). The model uses that knowledge to …

1
An Investigation of Model-Free Planning 8:11

3d ago8:11

8:11

The field of reinforcement learning (RL) is facing increasingly challenging domains with combinatorial complexity. For an RL agent to address these challenges, it is essential that it can plan effectively. Prior work has typically utilized an explicit model of the environment, combined with a specific planning algorithm (such as tree search). More …

1
Discovering Latent Knowledge in Language Models Without Supervision 37:09

8h ago37:09

37:09

Abstract: Existing techniques for training language models can be misaligned with the truth: if we train models with imitation learning, they may reproduce errors that humans make; if we train them to generate text that humans rate highly, they may output errors that human evaluators can't detect. We propose circumventing this issue by directly fin…

1
Imitative Generalisation (AKA ‘Learning the Prior’) 18:14

8h ago18:14

18:14

This post tries to explain a simplified version of Paul Christiano’s mechanism introduced here, (referred to there as ‘Learning the Prior’) and explain why a mechanism like this potentially addresses some of the safety problems with naïve approaches. First we’ll go through a simple example in a familiar domain, then explain the problems with the ex…

1
ABS: Scanning Neural Networks for Back-Doors by Artificial Brain Stimulation 16:08

8h ago16:08

16:08

This paper presents a technique to scan neural network based AI models to determine if they are trojaned. Pre-trained AI models may contain back-doors that are injected through training or by transforming inner neuron weights. These trojaned models operate normally when regular inputs are provided, and mis-classify to a specific output label when t…

1
Least-To-Most Prompting Enables Complex Reasoning in Large Language Models 16:08

8h ago16:08

16:08

Chain-of-thought prompting has demonstrated remarkable performance on various natural language reasoning tasks. However, it tends to perform poorly on tasks which requires solving problems harder than the exemplars shown in the prompts. To overcome this challenge of easy-to-hard generalization, we propose a novel prompting strategy, least-to-most p…

1
Low-Stakes Alignment 13:56

8h ago13:56

13:56

Right now I’m working on finding a good objective to optimize with ML, rather than trying to make sure our models are robustly optimizing that objective. (This is roughly “outer alignment.”) That’s pretty vague, and it’s not obvious whether “find a good objective” is a meaningful goal rather than being inherently confused or sweeping key distinctio…

1
Two-Turn Debate Doesn’t Help Humans Answer Hard Reading Comprehension Questions 16:39

8h ago16:39

16:39

Using hard multiple-choice reading comprehension questions as a testbed, we assess whether presenting humans with arguments for two competing answer options, where one is correct and the other is incorrect, allows human judges to perform more accurately, even when one of the arguments is unreliable and deceptive. If this is helpful, we may be able …

1
Toy Models of Superposition 41:43

8h ago41:43

41:43

It would be very convenient if the individual neurons of artificial neural networks corresponded to cleanly interpretable features of the input. For example, in an “ideal” ImageNet classifier, each neuron would fire only in the presence of a specific visual feature, such as the color red, a left-facing curve, or a dog snout. Empirically, in models …

1
Compute Trends Across Three Eras of Machine Learning 13:50

18h ago13:50

13:50

This article explains key drivers of AI progress, explains how compute is calculated, as well as looks at how the amount of compute used to train AI models has increased significantly in recent years. Original text: https://epochai.org/blog/compute-trends Author(s): Jaime Sevilla, Lennart Heim, Anson Ho, Tamay Besiroglu, Marius Hobbhahn, Pablo Vill…

1
Worst-Case Thinking in AI Alignment 11:35

22d ago11:35

11:35

Alternative title: “When should you assume that what could go wrong, will go wrong?” Thanks to Mary Phuong and Ryan Greenblatt for helpful suggestions and discussion, and Akash Wasil for some edits. In discussions of AI safety, people often propose the assumption that something goes as badly as possible. Eliezer Yudkowsky in particular has argued f…

1
Public by Default: How We Manage Information Visibility at Get on Board 9:50

1M ago9:50

9:50

I’ve been obsessed with managing information, and communications in a remote team since Get on Board started growing. Reducing the bus factor is a primary motivation — but another just as important is diminishing reliance on synchronicity. When what I know is documented and accessible to others, I’m less likely to be a bottleneck for anyone else in…

1
How to Get Feedback 7:30

1M ago7:30

7:30

Feedback is essential for learning. Whether you’re studying for a test, trying to improve in your work or want to master a difficult skill, you need feedback. The challenge is that feedback can often be hard to get. Worse, if you get bad feedback, you may end up worse than before. Original text: https://www.scotthyoung.com/blog/2019/01/24/how-to-ge…

1
Writing, Briefly 3:09

27d ago3:09

3:09

(In the process of answering an email, I accidentally wrote a tiny essay about writing. I usually spend weeks on an essay. This one took 67 minutes—23 of writing, and 44 of rewriting.) Original text: https://paulgraham.com/writing44.html Author: Paul Graham A podcast by BlueDot Impact. Learn more on the AI Safety Fundamentals website.…

1
Being the (Pareto) Best in the World 6:46

2M ago6:46

6:46

This introduces the concept of Pareto frontiers. The top comment by Rob Miles also ties it to comparative advantage. While reading, consider what Pareto frontiers your project could place you on. Original text: https://www.lesswrong.com/posts/XvN2QQpKTuEzgkZHY/being-the-pareto-best-in-the-world Author: John Wentworth A podcast by BlueDot Impact. Le…

1
How to Succeed as an Early-Stage Researcher: The “Lean Startup” Approach 15:16

2M ago15:16

15:16

I am approaching the end of my AI governance PhD, and I’ve spent about 2.5 years as a researcher at FHI. During that time, I’ve learnt a lot about the formula for successful early-career research. This post summarises my advice for people in the first couple of years. Research is really hard, and I want people to avoid the mistakes I’ve made. Origi…

1
Become a Person who Actually Does Things 5:14

2M ago5:14

5:14

The next four weeks of the course are an opportunity for you to actually build a thing that moves you closer to contributing to AI Alignment, and we're really excited to see what you do! A common failure mode is to think "Oh, I can't actually do X" or to say "Someone else is probably doing Y." You probably can do X, and it's unlikely anyone is doin…

1
Planning a High-Impact Career: A Summary of Everything You Need to Know in 7 Points 11:02

2M ago11:02

11:02

We took 10 years of research and what we’ve learned from advising 1,000+ people on how to build high-impact careers, compressed that into an eight-week course to create your career plan, and then compressed that into this three-page summary of the main points. (It’s especially aimed at people who want a career that’s both satisfying and has a signi…

1
Working in AI Alignment 1:08:44

2M ago1:08:44

1:08:44

This guide is written for people who are considering direct work on technical AI alignment. I expect it to be most useful for people who are not yet working on alignment, and for people who are already familiar with the arguments for working on AI alignment. If you aren’t familiar with the arguments for the importance of AI alignment, you can get a…

1
Computing Power and the Governance of AI 26:49

2M ago26:49

26:49

This post summarises a new report, “Computing Power and the Governance of Artificial Intelligence.” The full report is a collaboration between nineteen researchers from academia, civil society, and industry. It can be read here. GovAI research blog posts represent the views of their authors, rather than the views of the organisation. Source: https:…

1
AI Control: Improving Safety Despite Intentional Subversion 20:51

2M ago20:51

20:51

We’ve released a paper, AI Control: Improving Safety Despite Intentional Subversion. This paper explores techniques that prevent AI catastrophes even if AI instances are colluding to subvert the safety techniques. In this post: We summarize the paper; We compare our methodology to the methodology of other safety papers. Source: https://www.alignmen…

1
Emerging Processes for Frontier AI Safety 18:20

2M ago18:20

18:20

The UK recognises the enormous opportunities that AI can unlock across our economy and our society. However, without appropriate guardrails, such technologies can pose significant risks. The AI Safety Summit will focus on how best to manage the risks from frontier AI such as misuse, loss of control and societal harms. Frontier AI organisations play…

1
AI Watermarking Won’t Curb Disinformation 8:05

2M ago8:05

8:05

Generative AI allows people to produce piles upon piles of images and words very quickly. It would be nice if there were some way to reliably distinguish AI-generated content from human-generated content. It would help people avoid endlessly arguing with bots online, or believing what a fake image purports to show. One common proposal is that big c…

1
Challenges in Evaluating AI Systems 22:33

2M ago22:33

22:33

Most conversations around the societal impacts of artificial intelligence (AI) come down to discussing some quality of an AI system, such as its truthfulness, fairness, potential for misuse, and so on. We are able to talk about these characteristics because we can technically evaluate models for their performance in these areas. But what many peopl…

1
Interpretability in the Wild: A Circuit for Indirect Object Identification in GPT-2 Small 24:48

3M ago24:48

24:48

Research in mechanistic interpretability seeks to explain behaviors of machine learning (ML) models in terms of their internal components. However, most previous work either focuses on simple behaviors in small models or describes complicated behaviors in larger models with broad strokes. In this work, we bridge this gap by presenting an explanatio…

1
Zoom In: An Introduction to Circuits 44:03

3M ago44:03

44:03

By studying the connections between neurons, we can find meaningful algorithms in the weights of neural networks. Many important transition points in the history of science have been moments when science “zoomed in.” At these points, we develop a visualization or tool that allows us to see the world in a new level of detail, and a new field of scie…

1
Towards Monosemanticity: Decomposing Language Models With Dictionary Learning 8:53

3M ago8:53

8:53

Using a sparse autoencoder, we extract a large number of interpretable features from a one-layer transformer. Mechanistic interpretability seeks to understand neural networks by breaking them into components that are more easily understood than the whole. By understanding the function of each component, and how they interact, we hope to be able to …

1
Weak-To-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision 35:05

3M ago35:05

35:05

Widely used alignment techniques, such as reinforcement learning from human feedback (RLHF), rely on the ability of humans to supervise model behavior—for example, to evaluate whether a model faithfully followed instructions or generated safe outputs. However, future superhuman models will behave in complex ways too difficult for humans to reliably…

1
Can We Scale Human Feedback for Complex AI Tasks? 20:06

3M ago20:06

20:06

Reinforcement learning from human feedback (RLHF) has emerged as a powerful technique for steering large language models (LLMs) toward desired behaviours. However, relying on simple human feedback doesn’t work for tasks that are too complex for humans to accurately judge at the scale needed to train AI models. Scalable oversight techniques attempt …

1
Ep 12 - Education & advocacy for AI safety w/ Rob Miles (YouTube host) 1:21:26

3M ago1:21:26

1:21:26

We speak with Rob Miles. Rob is the host of the “Robert Miles AI Safety” channel on YouTube, the single most popular AI alignment video series out there — he has 145,000 subscribers and his top video has ~600,000 views. He goes much deeper than many educational resources out there on alignment, going into important technical topics like the orthogo…

1
Ep 11 - Technical alignment overview w/ Thomas Larsen (Director of Strategy, Center for AI Policy) 1:37:19

6M ago1:37:19

1:37:19

We speak with Thomas Larsen, Director for Strategy at the Center for AI Policy in Washington, DC, to do a "speed run" overview of all the major technical research directions in AI alignment. A great way to quickly learn broadly about the field of technical AI alignment. In 2022, Thomas spent ~75 hours putting together an overview of what everyone i…

1
Ep 10 - Accelerated training to become an AI safety researcher w/ Ryan Kidd (Co-Director, MATS) 1:16:58

7M ago1:16:58

1:16:58

We speak with Ryan Kidd, Co-Director at ML Alignment & Theory Scholars (MATS) program, previously "SERI MATS". MATS (https://www.matsprogram.org/) provides research mentorship, technical seminars, and connections to help new AI researchers get established and start producing impactful research towards AI safety & alignment. Prior to MATS, Ryan comp…

1
Ep 9 - Scaling AI safety research w/ Adam Gleave (CEO, FAR AI) 1:19:12

8M ago1:19:12

1:19:12

We speak with Adam Gleave, CEO of FAR AI (https://far.ai). FAR AI’s mission is to ensure AI systems are trustworthy & beneficial. They incubate & accelerate research that's too resource-intensive for academia but not ready for commercialisation. They work on everything from adversarial robustness, interpretability, preference learning, & more. We t…

1
Ep 8 - Getting started in AI safety & alignment w/ Jamie Bernardi (AI Safety Lead, BlueDot Impact) 1:07:23

8M ago1:07:23

1:07:23

We speak with Jamie Bernardi, co-founder & AI Safety Lead at not-for-profit BlueDot Impact, who host the biggest and most up-to-date courses on AI safety & alignment at AI Safety Fundamentals (https://aisafetyfundamentals.com/). Jamie completed his Bachelors (Physical Natural Sciences) and Masters (Physics) at the U. Cambridge and worked as an ML E…

1
Ep 7 - Responding to a world with AGI - Richard Dazeley (Prof AI & ML, Deakin University) 1:10:05

11M ago1:10:05

1:10:05

In this episode, we speak with Prof Richard Dazeley about the implications of a world with AGI and how we can best respond. We talk about what he thinks AGI will actually look like as well as the technical and governance responses we should put in today and in the future to ensure a safe and positive future with AGI. Prof Richard Dazeley is the Dep…

1
Ep 6 - Will we see AGI this decade? Our AGI predictions & debate w/ Hunter Jay (CEO, Ripe Robotics) 1:20:58

11M ago1:20:58

1:20:58

In this episode, we have back on the show Hunter Jay, CEO Ripe Robotics, our co-host on Ep 1. We synthesise everything we've heard on AGI timelines from experts in Ep 1-5, take in more data points, and use this to give our own forecasts for AGI, ASI (i.e. superintelligence), and "intelligence explosion" (i.e. singularity). Importantly, we have diff…

1
Ep 5 - Accelerating AGI timelines since GPT-4 w/ Alex Browne (ML Engineer) 38:26

1y ago38:26

38:26

In this episode, we have back on our show Alex Browne, ML Engineer, who we heard on Ep2. He got in contact after watching recent developments in the 4 months since Ep2, which have accelerated his timelines for AGI. Hear why and his latest prediction. Hosted by Soroush Pour. Follow me for more AGI content: Twitter: https://twitter.com/soroushjp Link…

1
Machine Learning for Humans: Supervised Learning 22:05

1y ago22:05

22:05

The two tasks of supervised learning: regression and classification. Linear regression, loss functions, and gradient descent. How much money will we make by spending more dollars on digital advertising? Will this loan applicant pay back the loan or not? What’s going to happen to the stock market tomorrow? Original article: https://medium.com/machin…

1
Intelligence Explosion: Evidence and Import 18:59

1y ago18:59

18:59

It seems unlikely that humans are near the ceiling of possible intelligences, rather than simply being the first such intelligence that happened to evolve. Computers far outperform humans in many narrow niches (e.g. arithmetic, chess, memory size), and there is reason to believe that similar large improvements over human performance are possible fo…

1
On the Opportunities and Risks of Foundation Models 15:46

1y ago15:46

15:46

AI is undergoing a paradigm shift with the rise of models (e.g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks. We call these models foundation models to underscore their critically central yet incomplete character. This report provides a thorough account of the opportunities and …

1
Future ML Systems Will Be Qualitatively Different 12:47

1y ago12:47

12:47

In 1972, the Nobel prize-winning physicist Philip Anderson wrote the essay "More Is Different". In it, he argues that quantitative changes can lead to qualitatively different and unexpected phenomena. While he focused on physics, one can find many examples of More is Different in other domains as well, including biology, economics, and computer sci…

1
More Is Different for AI 6:34

1y ago6:34

6:34

Machine learning is touching increasingly many aspects of our society, and its effect will only continue to grow. Given this, I and many others care about risks from future ML systems and how to mitigate them. When thinking about safety risks from ML, there are two common approaches, which I'll call the Engineering approach and the Philosophy appro…

1
Biological Anchors: A Trick That Might Or Might Not Work 1:10:46

1y ago1:10:46

1:10:46

I've been trying to review and summarize Eliezer Yudkowksy's recent dialogues on AI safety. Previously in sequence: Yudkowsky Contra Ngo On Agents. Now we’re up to Yudkowsky contra Cotra on biological anchors, but before we get there we need to figure out what Cotra's talking about and what's going on. The Open Philanthropy Project ("Open Phil") is…

Podcasts Worth a Listen

Bluedot Impact Podcasts

Podcasts Worth a Listen

Quick Reference Guide