Xrisk public
[search 0]
Download the App!
show episodes
 
AXRP (pronounced axe-urp) is the AI X-risk Research Podcast where I, Daniel Filan, have conversations with researchers about their papers. We discuss the paper, and hopefully get a sense of why it's been written and how it might reduce the risk of AI causing an existential catastrophe: that is, permanently and drastically curtailing humanity's future potential. You can visit the website and read transcripts at axrp.net.
  continue reading
 
Loading …
show series
 
Epoch AI is the premier organization that tracks the trajectory of AI - how much compute is used, the role of algorithmic improvements, the growth in data used, and when the above trends might hit an end. In this episode, I speak with the director of Epoch AI, Jaime Sevilla, about how compute, data, and algorithmic improvements are impacting AI, an…
  continue reading
 
Sometimes, people talk about transformers as having "world models" as a result of being trained to predict text data on the internet. But what does this even mean? In this episode, I talk with Adam Shai and Paul Riechers about their work applying computational mechanics, a sub-field of physics studying how to predict random processes, to neural net…
  continue reading
 
How do we figure out what large language models believe? In fact, do they even have beliefs? Do those beliefs have locations, and if so, can we edit those locations to change the beliefs? Also, how are we going to get AI to perform tasks so hard that we can't figure out if they succeeded at them? In this episode, I chat with Peter Hase about his re…
  continue reading
 
How can we figure out if AIs are capable enough to pose a threat to humans? When should we make a big effort to mitigate risks of catastrophic AI misbehaviour? In this episode, I chat with Beth Barnes, founder of and head of research at METR, about these questions and more. Patreon: patreon.com/axrpodcast Ko-fi: ko-fi.com/axrpodcast The transcript:…
  continue reading
 
Reinforcement Learning from Human Feedback, or RLHF, is one of the main ways that makers of large language models make them 'aligned'. But people have long noted that there are difficulties with this approach when the models are smarter than the humans providing feedback. In this episode, I talk with Scott Emmons about his work categorizing the pro…
  continue reading
 
What's the difference between a large language model and the human brain? And what's wrong with our theories of agency? In this episode, I chat about these questions with Jan Kulveit, who leads the Alignment of Complex Systems research group. Patreon: patreon.com/axrpodcast Ko-fi: ko-fi.com/axrpodcast The transcript: axrp.net/episode/2024/05/30/epi…
  continue reading
 
What's going on with deep learning? What sorts of models get learned, and what are the learning dynamics? Singular learning theory is a theory of Bayesian statistics broad enough in scope to encompass deep neural networks that may help answer these questions. In this episode, I speak with Daniel Murfet about this research program and what it tells …
  continue reading
 
Top labs use various forms of "safety training" on models before their release to make sure they don't do nasty stuff - but how robust is that? How can we ensure that the weights of powerful AIs don't get leaked or stolen? And what can AI even do these days? In this episode, I speak with Jeffrey Ladish about security and AI. Patreon: patreon.com/ax…
  continue reading
 
In 2022, it was announced that a fairly simple method can be used to extract the true beliefs of a language model on any given topic, without having to actually understand the topic at hand. Earlier, in 2021, it was announced that neural networks sometimes 'grok': that is, when training them on certain tasks, they initially memorize their training …
  continue reading
 
How should the law govern AI? Those concerned about existential risks often push either for bans or for regulations meant to ensure that AI is developed safely - but another approach is possible. In this episode, Gabriel Weil talks about his proposal to modify tort law to enable people to sue AI companies for disasters that are "nearly catastrophic…
  continue reading
 
A lot of work to prevent AI existential risk takes the form of ensuring that AIs don't want to cause harm or take over the world---or in other words, ensuring that they're aligned. In this episode, I talk with Buck Shlegeris and Ryan Greenblatt about a different approach, called "AI control": ensuring that AI systems couldn't take over the world, e…
  continue reading
 
I spoke with Derek Wong about the adaptive markets hypothesis, macro investing and investing in cryptocurrency markets We talk about Why investors should consider the financial markets as a complex adaptive system How investing in China is very different from investing in the West How he invests in crypto without fundamental anchors…
  continue reading
 
The events of this year have highlighted important questions about the governance of artificial intelligence. For instance, what does it mean to democratize AI? And how should we balance benefits and dangers of open-sourcing powerful AI systems such as large language models? In this episode, I speak with Elizabeth Seger about her research on these …
  continue reading
 
Imagine a world where there are many powerful AI systems, working at cross purposes. You could suppose that different governments use AIs to manage their militaries, or simply that many powerful AIs have their own wills. At any rate, it seems valuable for them to be able to cooperatively work together and minimize pointless conflict. How do we ensu…
  continue reading
 
I spoke with Devashish Dhar, the author of the excellent book India's Blind Spot which talks about India's urbanisation crisis and solutions to it. We talk about Why does India have a much lower reported rate of urbanisation than the rest of the world? Explaining the global bias against cities “Extremely high levels of traffic is caused by poor lan…
  continue reading
 
I interviewed one of the most interesting thinkers today, Tyler Cowen. We talked about Why there are such few Singaporean famous people What Singapore can do to get more weird Why he's sceptical of an AI-driven singularity What happens to kids in a post-GPT world What happens to public intellectuals in a post-GPT world Why he's optimistic on Kenyan…
  continue reading
 
I spoke to Rohit Krishnan the author of the blog Strange Loop Canon about why he is sceptical about the idea that AI will kill us all We talked about Why he’s sceptical of AI regulation proposals Why AI “timelines” are not as meaningful as you think AI deployment is harder than you think! The value of incrementalism in AI policy Why he thinks instr…
  continue reading
 
I spoke to Soham Sankaran who runs PopVax, an Indian mRNA vaccine company. Their goal is to build low-cost broadly-protective vaccines to protect against the entire sarbecovirus species. Read Soham's experience here (https://chronicles.popvax.com/p/three-meetings-and-six-million-funerals) as a complement to this episode. Also check out their jobs p…
  continue reading
 
Recently, OpenAI made a splash by announcing a new "Superalignment" team. Lead by Jan Leike and Ilya Sutskever, the team would consist of top researchers, attempting to solve alignment for superintelligent AIs in four years by figuring out how to build a trustworthy human-level AI alignment researcher, and then using it to solve the rest of the pro…
  continue reading
 
Is there some way we can detect bad behaviour in our AI system without having to know exactly what it looks like? In this episode, I speak with Mark Xu about mechanistic anomaly detection: a research direction based on the idea of detecting strange things happening in neural networks, in the hope that that will alert us of potential treacherous tur…
  continue reading
 
I talked to Dwarkesh Patel of the Lunar Society Podcast about many topics. We talked about: Why do AI researchers and rationalists disagree about existential risk? What would happen if Robert Moses ran San Francisco? Is localism overrated? What does Effective Altruism get right and wrong? Which politicians would he like to interview…
  continue reading
 
What can we learn about advanced deep learning systems by understanding how humans learn and form values over their lifetimes? Will superhuman AI look like ruthless coherent utility optimization, or more like a mishmash of contextually activated desires? This episode's guest, Quintin Pope, has been thinking about these questions as a leading resear…
  continue reading
 
I spoke to Kartik Akileswaran who runs Growth Teams - an initiative which helps build state capacity for economic growth in developing countries. We talked about - Why implementation is a binding constraint for economic policy - How industrial policy helps reduce information constraints for investors - Underrated growth reforms…
  continue reading
 
Lots of people in the field of machine learning study 'interpretability', developing tools that they say give us useful information about neural networks. But how do we know if meaningful progress is actually being made? What should we want out of these tools? In this episode, I speak to Stephen Casper about these questions, as well as about a benc…
  continue reading
 
How should we scientifically think about the impact of AI on human civilization, and whether or not it will doom us all? In this episode, I speak with Scott Aaronson about his views on how to make progress in AI alignment, as well as his work on watermarking the output of language models, and how he moved from a background in quantum complexity the…
  continue reading
 
How good are we at understanding the internal computation of advanced machine learning models, and do we have a hope at getting better? In this episode, Neel Nanda talks about the sub-field of mechanistic interpretability research, as well as papers he's contributed to that explore the basics of transformer circuits, induction heads, and grokking. …
  continue reading
 
I spoke to Matt Korda who works at the Federation of American Scientists on nuclear weapon policy. We have an exciting discussion about the role of nuclear weapons, their growth and the dangerous arms races that are starting Some highlights of the show: The advent of “exotic” nuclear weapon systems China’s nuclear strategy has changed dramatically!…
  continue reading
 
I have a new podcast, where I interview whoever I want about whatever I want. It's called "The Filan Cabinet", and you can find it wherever you listen to podcasts. The first three episodes are about pandemic preparedness, God, and cryptocurrency. For more details, check out the podcast website (thefilancabinet.com), or search "The Filan Cabinet" in…
  continue reading
 
Concept extrapolation is the idea of taking concepts an AI has about the world - say, "mass" or "does this picture contain a hot dog" - and extending them sensibly to situations where things are different - like learning that the world works via special relativity, or seeing a picture of a novel sausage-bread combination. For a while, Stuart Armstr…
  continue reading
 
Sometimes, people talk about making AI systems safe by taking examples where they fail and training them to do well on those. But how can we actually do this well, especially when we can't use a computer program to say what a 'failure' is? In this episode, I speak with Daniel Ziegler about his research group's efforts to try doing this with present…
  continue reading
 
I talked to Anupam Manur, a professor of economics about India's trade policy before 1991. We talked about: The scarcity mindset about foreign exchange reserves The controversial 1966 devaluation How did the pre-1991 import licensing system work? “The financial account was almost non existent” “Hindustan Motors and Toyota were set up at the same ti…
  continue reading
 
Many people in the AI alignment space have heard of AI safety via debate - check out AXRP episode 6 (axrp.net/episode/2021/04/08/episode-6-debate-beth-barnes.html) if you need a primer. But how do we get language models to the stage where they can usefully implement debate? In this episode, I talk to Geoffrey Irving about the role of language model…
  continue reading
 
Why does anybody care about natural abstractions? Do they somehow relate to math, or value learning? How do E. coli bacteria find sources of sugar? All these questions and more will be answered in this interview with John Wentworth, where we talk about his research plan of understanding agency via natural abstractions. Topics we discuss, and timest…
  continue reading
 
I spoke to Steven Hamilton professor of Economics at George Washington University about Australian economic policy, and their upcoming elections. We talk about Why was Australian COVID policy so strict? Australia as a nation of prison guards Economic issues of the Australian election “Australia is a mine with a parliament” Dutch disease in Australi…
  continue reading
 
What is the labour market like? What are the largest barriers in the labour market? Nathan Young and I spoke to economist Bryan Caplan about his new book Labor Econ Versus The World. We also talk about Censorship and dictatorships Bets he is willing to take Malengo and international migration DALLE-2 and writing graphic novels The literature on edu…
  continue reading
 
Loading …

Quick Reference Guide