Daniel Filan public
[search 0]
More
Download the App!
show episodes
 
AXRP (pronounced axe-urp) is the AI X-risk Research Podcast where I, Daniel Filan, have conversations with researchers about their papers. We discuss the paper, and hopefully get a sense of why it's been written and how it might reduce the risk of AI causing an existential catastrophe: that is, permanently and drastically curtailing humanity's future potential. You can visit the website and read transcripts at axrp.net.
  continue reading
 
Artwork
 
Every two weeks we surprise Chana with a new guest from the EA/rationalist-sphere and sometimes even the buttoned-up AI Safet– ahem– AI Security field. Chana tells us how to live and think; Matt does all the work; occasionally a joke gets cracked; complaining is unceasing.
  continue reading
 
Loading …
show series
 
We resume our popular AI in Context video review series by talking about AIC’s second video on the MechaHitler débâcle. Matt actually thought it was good! It made an argument! But we do need to sort out our relationship to the AI omnicause and figure out what the channel is for. All before we go to the Ren Faire. LINKS: AI in Context MechaHitler Vi…
  continue reading
 
Matt worries he’s disqualified for being insufficiently util-simped and being too preoccupied with the ideas themselves. Julia comes on and tells cute but harrowing tales of her bright children learning about the world. Everyone tried to avoid osteoporosis. LINKS: Julia’s excellent (and excellently named) blog: https://juliawise.net/ I regret that …
  continue reading
 
Chana discovers anti-elitism is a current theme in US politics in this already extremely dated episode from our brief foray into conflict with Iran, a conflict which, like all others so far, we can now put in the win column of Peter Wildeford’s forecasting record. Oh, and the fire alarm goes off. As always, you can email us with questions and comme…
  continue reading
 
Matt is sad and grasping at topics and thereby inevitably gets squeezed out of the conversation when two teachers come on to do what they do best: gripe about teaching... and parents and protests and workouts and forecasting. Alex's Reflection on his joke paper going viral: https://lawsen.substack.com/p/when-yo...You can email us with questions and…
  continue reading
 
Could AI enable a small group to gain power over a large country, and lock in their power permanently? Often, people worried about catastrophic risks from AI have been concerned with misalignment risks. In this episode, Tom Davidson talks about a risk that could be comparably important: that of AI-enabled coups. Patreon: https://www.patreon.com/axr…
  continue reading
 
In this episode, I chat with Caspar Oesterheld about a relatively simple application of weird decision theory: evidential cooperation in large worlds, or ECL for short. The tl;dr is you think there's at least some small probability of a very large multiverse, so you try to follow something closer to the average of all the values of civilizations in…
  continue reading
 
Chana triumphs at work with the release of her AI 2027 video, but even after two million views and 85,000 subscribers, she still has to win over her biggest critic. But to get there, she’ll have to understand what the hell he’s saying first. Watch 80k’s AI 2027 video here: https://www.youtube.com/watch?v=5KVDDfAkRgc Read Matt’s lamentations here: h…
  continue reading
 
In this episode, I chat with Samuel Albanie about the Google DeepMind paper he co-authored called "An Approach to Technical AGI Safety and Security". It covers the assumptions made by the approach, as well as the types of mitigations it outlines. Patreon: https://www.patreon.com/axrpodcast Ko-fi: https://ko-fi.com/axrpodcast Transcript: https://axr…
  continue reading
 
In this episode, I chat with Alessandro (@polisisti on X/Twitter) about our respective experiences learning Latin (and in his case ancient Greek). The Ranieri-Roberts approach to learning ancient Greek: https://www.youtube.com/watch?v=2vwb1wVzPec We need to talk about Latinitas: https://foundinantiquity.com/2024/04/15/we-need-to-talk-about-latinita…
  continue reading
 
Are do gooders getting corrupted by prestige, or just nerdsniped? We explore that question, and, with no hint of irony, what would make good merch for the podcast.Plus! we finally have an email where you can write in with questions: [email protected] Daniel's podcast here: https://axrp.net/Daniel other podcast here: https://thefilancabinet.com…
  continue reading
 
In this episode, I talk with Peter Salib about his paper "AI Rights for Human Safety", arguing that giving AIs the right to contract, hold property, and sue people will reduce the risk of their trying to attack humanity and take over. He also tells me how law reviews work, in the face of my incredulity. Patreon: https://www.patreon.com/axrpodcast K…
  continue reading
 
Matt can't resist talking politics and religion, but everything gets better when we pivot to Disney movies (which are allegedly based on books) featuring the great Conor Barnes of 80k Job Board fame!Conor's writing: https://parhelia.conorbarnes.com/ The Daily Show on Shrimp Welfare: https://www.youtube.com/watch?v=VNbIKtGMoaA…
  continue reading
 
In this episode, I talk with David Lindner about Myopic Optimization with Non-myopic Approval, or MONA, which attempts to address (multi-step) reward hacking by myopically optimizing actions against a human's sense of whether those actions are generally good. Does this work? Can we get smarter-than-human AI this way? How does this compare to approa…
  continue reading
 
Earlier this year, the paper "Emergent Misalignment" made the rounds on AI x-risk social media for seemingly showing LLMs generalizing from 'misaligned' training data of insecure code to acting comically evil in response to innocuous questions. In this episode, I chat with one of the authors of that paper, Owain Evans, about that research as well a…
  continue reading
 
What's the next step forward in interpretability? In this episode, I chat with Lee Sharkey about his proposal for detecting computational mechanisms within neural networks: Attribution-based Parameter Decomposition, or APD for short. Patreon: https://www.patreon.com/axrpodcast Ko-fi: https://ko-fi.com/axrpodcast Transcript: https://axrp.net/episode…
  continue reading
 
Chana wasn't sure there'd be a guest three days after we decided (read: she agreed) to do this show, but there was! ...And she showed up late anyway, so much discussion of punctuality ensued. We also cover veganism, AI, and not alcohol. LINKS: Rumsfeld! The Musical ChatGPT on the lives of factory farmed pigs All found on Andy's great blog, The Weir…
  continue reading
 
How do we figure out whether interpretability is doing its job? One way is to see if it helps us prove things about models that we care about knowing. In this episode, I speak with Jason Gross about his agenda to benchmark interpretability in this way, and his exploration of the intersection of proofs and modern machine learning. Patreon: https://w…
  continue reading
 
In this episode, I chat with David Duvenaud about two topics he's been thinking about: firstly, a paper he wrote about evaluating whether or not frontier models can sabotage human decision-making or monitoring of the same models; and secondly, the difficult situation humans find themselves in in a post-AGI future, even if AI is aligned with human i…
  continue reading
 
The Future of Life Institute is one of the oldest and most prominant organizations in the AI existential safety space, working on such topics as the AI pause open letter and how the EU AI Act can be improved. Metaculus is one of the premier forecasting sites on the internet. Behind both of them lie one man: Anthony Aguirre, who I talk with in this …
  continue reading
 
Typically this podcast talks about how to avert destruction from AI. But what would it take to ensure AI promotes human flourishing as well as it can? Is alignment to individuals enough, and if not, where do we go form here? In this episode, I talk with Joel Lehman about these questions. Patreon: https://www.patreon.com/axrpodcast Ko-fi: https://ko…
  continue reading
 
Suppose we're worried about AIs engaging in long-term plans that they don't tell us about. If we were to peek inside their brains, what should we look for to check whether this was happening? In this episode Adrià Garriga-Alonso talks about his work trying to answer this question. Patreon: https://www.patreon.com/axrpodcast Ko-fi: https://ko-fi.com…
  continue reading
 
AI researchers often complain about the poor coverage of their work in the news media. But why is this happening, and how can it be fixed? In this episode, I speak with Shakeel Hashim about the resource constraints facing AI journalism, the disconnect between journalists' and AI researchers' views on transformative AI, and efforts to improve the st…
  continue reading
 
Lots of people in the AI safety space worry about models being able to make deliberate, multi-step plans. But can we already see this in existing neural nets? In this episode, I talk with Erik Jenner about his work looking at internal look-ahead within chess-playing neural networks. Patreon: https://www.patreon.com/axrpodcast Ko-fi: https://ko-fi.c…
  continue reading
 
The 'model organisms of misalignment' line of research creates AI models that exhibit various types of misalignment, and studies them to try to understand how the misalignment occurs and whether it can be somehow removed. In this episode, Evan Hubinger talks about two papers he's worked on at Anthropic under this agenda: "Sleeper Agents" and "Sycop…
  continue reading
 
You may have heard of singular learning theory, and its "local learning coefficient", or LLC - but have you heard of the refined LLC? In this episode, I chat with Jesse Hoogland about his work on SLT, and using the refined LLC to find a new circuit in language models. Patreon: https://www.patreon.com/axrpodcast Ko-fi: https://ko-fi.com/axrpodcast T…
  continue reading
 
In which I monologue about my experience learning Latin. Links: My friend's podcast episode: https://mutualunderstanding.substack.com/p/election-and-other-stuff-on-my-mind r/latin: https://www.reddit.com/r/latin/ Lingua Latina: https://www.amazon.com/Lingua-Latina-Illustrata-Pars-Familia/dp/1585104205 Playlist of someone reading out loud the chapte…
  continue reading
 
Road lines, street lights, and licence plates are examples of infrastructure used to ensure that roads operate smoothly. In this episode, Alan Chan talks about using similar interventions to help avoid bad outcomes from the deployment of AI agents. Patreon: https://www.patreon.com/axrpodcast Ko-fi: https://ko-fi.com/axrpodcast The transcript: https…
  continue reading
 
Do language models understand the causal structure of the world, or do they merely note correlations? And what happens when you build a big AI society out of them? In this brief episode, recorded at the Bay Area Alignment Workshop, I chat with Zhijing Jin about her research on these questions. Patreon: https://www.patreon.com/axrpodcast Ko-fi: http…
  continue reading
 
Epoch AI is the premier organization that tracks the trajectory of AI - how much compute is used, the role of algorithmic improvements, the growth in data used, and when the above trends might hit an end. In this episode, I speak with the director of Epoch AI, Jaime Sevilla, about how compute, data, and algorithmic improvements are impacting AI, an…
  continue reading
 
Sometimes, people talk about transformers as having "world models" as a result of being trained to predict text data on the internet. But what does this even mean? In this episode, I talk with Adam Shai and Paul Riechers about their work applying computational mechanics, a sub-field of physics studying how to predict random processes, to neural net…
  continue reading
 
How do we figure out what large language models believe? In fact, do they even have beliefs? Do those beliefs have locations, and if so, can we edit those locations to change the beliefs? Also, how are we going to get AI to perform tasks so hard that we can't figure out if they succeeded at them? In this episode, I chat with Peter Hase about his re…
  continue reading
 
How can we figure out if AIs are capable enough to pose a threat to humans? When should we make a big effort to mitigate risks of catastrophic AI misbehaviour? In this episode, I chat with Beth Barnes, founder of and head of research at METR, about these questions and more. Patreon: patreon.com/axrpodcast Ko-fi: ko-fi.com/axrpodcast The transcript:…
  continue reading
 
Reinforcement Learning from Human Feedback, or RLHF, is one of the main ways that makers of large language models make them 'aligned'. But people have long noted that there are difficulties with this approach when the models are smarter than the humans providing feedback. In this episode, I talk with Scott Emmons about his work categorizing the pro…
  continue reading
 
What's the difference between a large language model and the human brain? And what's wrong with our theories of agency? In this episode, I chat about these questions with Jan Kulveit, who leads the Alignment of Complex Systems research group. Patreon: patreon.com/axrpodcast Ko-fi: ko-fi.com/axrpodcast The transcript: axrp.net/episode/2024/05/30/epi…
  continue reading
 
What's going on with deep learning? What sorts of models get learned, and what are the learning dynamics? Singular learning theory is a theory of Bayesian statistics broad enough in scope to encompass deep neural networks that may help answer these questions. In this episode, I speak with Daniel Murfet about this research program and what it tells …
  continue reading
 
Top labs use various forms of "safety training" on models before their release to make sure they don't do nasty stuff - but how robust is that? How can we ensure that the weights of powerful AIs don't get leaked or stolen? And what can AI even do these days? In this episode, I speak with Jeffrey Ladish about security and AI. Patreon: patreon.com/ax…
  continue reading
 
In 2022, it was announced that a fairly simple method can be used to extract the true beliefs of a language model on any given topic, without having to actually understand the topic at hand. Earlier, in 2021, it was announced that neural networks sometimes 'grok': that is, when training them on certain tasks, they initially memorize their training …
  continue reading
 
In this episode, I give you updates from my trip with friends to see the 2024 total solar eclipse. Questions answered include: - Why are we bothering to go see it? - How many of us will fail to make it to the eclipse? - Does it actually get darker during a total solar eclipse, or is that just an optical illusion? - What moral dilemma will we face, …
  continue reading
 
How should the law govern AI? Those concerned about existential risks often push either for bans or for regulations meant to ensure that AI is developed safely - but another approach is possible. In this episode, Gabriel Weil talks about his proposal to modify tort law to enable people to sue AI companies for disasters that are "nearly catastrophic…
  continue reading
 
A lot of work to prevent AI existential risk takes the form of ensuring that AIs don't want to cause harm or take over the world---or in other words, ensuring that they're aligned. In this episode, I talk with Buck Shlegeris and Ryan Greenblatt about a different approach, called "AI control": ensuring that AI systems couldn't take over the world, e…
  continue reading
 
The events of this year have highlighted important questions about the governance of artificial intelligence. For instance, what does it mean to democratize AI? And how should we balance benefits and dangers of open-sourcing powerful AI systems such as large language models? In this episode, I speak with Elizabeth Seger about her research on these …
  continue reading
 
In this episode, I speak with Aaron Silverbook about the bacteria that cause cavities, and how different bacteria can prevent them: specifically, a type of bacterium that you can buy at luminaprobiotic.com. This podcast episode has not been approved by the FDA. Specific topics we talk about include: How do bacteria cause cavities? How can you creat…
  continue reading
 
Imagine a world where there are many powerful AI systems, working at cross purposes. You could suppose that different governments use AIs to manage their militaries, or simply that many powerful AIs have their own wills. At any rate, it seems valuable for them to be able to cooperatively work together and minimize pointless conflict. How do we ensu…
  continue reading
 
In this episode, I talk to Holly Elmore about her advocacy around AI Pause - encouraging governments to pause the development of more and more powerful AI. Topics we discuss include: Why advocate specifically for AI pause? What costs of AI pause would be worth it? What might AI pause look like? What are the realistic downsides of AI pause? How the …
  continue reading
 
Recently, OpenAI made a splash by announcing a new "Superalignment" team. Lead by Jan Leike and Ilya Sutskever, the team would consist of top researchers, attempting to solve alignment for superintelligent AIs in four years by figuring out how to build a trustworthy human-level AI alignment researcher, and then using it to solve the rest of the pro…
  continue reading
 
Is there some way we can detect bad behaviour in our AI system without having to know exactly what it looks like? In this episode, I speak with Mark Xu about mechanistic anomaly detection: a research direction based on the idea of detecting strange things happening in neural networks, in the hope that that will alert us of potential treacherous tur…
  continue reading
 
What can we learn about advanced deep learning systems by understanding how humans learn and form values over their lifetimes? Will superhuman AI look like ruthless coherent utility optimization, or more like a mishmash of contextually activated desires? This episode's guest, Quintin Pope, has been thinking about these questions as a leading resear…
  continue reading
 
Loading …

Quick Reference Guide

Copyright 2025 | Privacy Policy | Terms of Service | | Copyright
Listen to this show while you explore
Play