LessWrong Curated public
[search 0]
More
Download the App!
show episodes
 
Loading …
show series
 
Paging Gwern or anyone else who can shed light on the current state of the AI market—I have several questions. Since the release of ChatGPT, at least 17 companies, according to the LMSYS Chatbot Arena Leaderboard, have developed AI models that outperform it. These companies include Anthropic, NexusFlow, Microsoft, Mistral, Alibaba, Hugging Face, Go…
  continue reading
 
If you ask the internet if breastfeeding is good, you will soon learn that YOU MUST BREASTFEED because BREAST MILK = OPTIMAL FOOD FOR BABY. But if you look for evidence, you’ll discover two disturbing facts. First, there's no consensus about why breastfeeding is good. I’ve seen experts suggest at least eight possible mechanisms: Formula can’t fully…
  continue reading
 
Crossposted from https://williamrsaunders.substack.com/p/principles-for-the-agi-race Why form principles for the AGI Race? I worked at OpenAI for 3 years, on the Alignment and Superalignment teams. Our goal was to prepare for the possibility that OpenAI succeeded in its stated mission of building AGI (Artificial General Intelligence, roughly able t…
  continue reading
 
Two new The Information articles with insider information on OpenAI's next models and moves. They are paywalled, but here are the new bits of information: Strawberry is more expensive and slow at inference time, but can solve complex problems on the first try without hallucinations. It seems to be an application or extension of process supervision …
  continue reading
 
People often talk about “solving the alignment problem.” But what is it to do such a thing? I wanted to clarify my thinking about this topic, so I wrote up some notes. In brief, I’ll say that you’ve solved the alignment problem if you’ve: avoided a bad form of AI takeover, built the dangerous kind of superintelligent AI agents, gained access to the…
  continue reading
 
In the past two years there has been increased interest in formal verification-based approaches to AI safety. Formal verification is a sub-field of computer science that studies how guarantees may be derived by deduction on fully-specified rule-sets and symbol systems. By contrast, the real world is a messy place that can rarely be straightforwardl…
  continue reading
 
Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.I often talk to people who think that if frontier models were egregiously misaligned and powerful enough to pose an existential threat, you could get AI developers to slow down or undeploy models by producing evidence of their misalignment. I'm not so sure. As an …
  continue reading
 
For many products, we face a choice of who to hold liable for harms that would not have occurred if not for the existence of the product. For instance, if a person uses a gun in a school shooting that kills a dozen people, there are many legal persons who in principle could be held liable for the harm: The shooter themselves, for obvious reasons. T…
  continue reading
 
Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.We wanted to share a recap of our recent outputs with the AF community. Below, we fill in some details about what we have been working on, what motivated us to do it, and how we thought about its importance. We hope that this will help people build off things we h…
  continue reading
 
Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.This is a link post.Is AI takeover like a nuclear meltdown? A coup? A plane crash? My day job is thinking about safety measures that aim to reduce catastrophic risks from AI (especially risks from egregious misalignment). The two main themes of this work are the d…
  continue reading
 
[This article was originally published on Dan Elton's blog, More is Different.] Cerebrolysin is an unregulated medical product made from enzymatically digested pig brain tissue. Hundreds of scientific papers claim that it boosts BDNF, stimulates neurogenesis, and can help treat numerous neural diseases. It is widely used by doctors around the world…
  continue reading
 
This work was produced at Apollo Research, based on initial research done at MATS. LayerNorm is annoying for mechanstic interpretability research (“[...] reason #78 for why interpretability researchers hate LayerNorm” – Anthropic, 2023). Here's a Hugging Face link to a GPT2-small model without any LayerNorm. The final model is only slightly worse t…
  continue reading
 
This is slightly old news at this point, but: as part of MIRI's recent strategy pivot, they've eliminated the Agent Foundations research team. I've been out of a job for a little over a month now. Much of my research time in the first half of the year was eaten up by engaging with the decision process that resulted in this, and later, applying to g…
  continue reading
 
This is a story about a flawed Manifold market, about how easy it is to buy significant objective-sounding publicity for your preferred politics, and about why I've downgraded my respect for all but the largest prediction markets. I've had a Manifold account for a while, but I didn't use it much until I saw and became irked by this market on the co…
  continue reading
 
Cross-posted from Substack. 1. And the sky opened, and from the celestial firmament descended a cube of ivory the size of a skyscraper, lifted by ten thousand cherubim and seraphim. And the cube slowly landed among the children of men, crushing the frail metal beams of the Golden Gate Bridge under its supernatural weight. On its surface were inscri…
  continue reading
 
Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.What the heck is up with “corrigibility”? For most of my career, I had a sense that it was a grab-bag of properties that seemed nice in theory but hard to get in practice, perhaps due to being incompatible with agency. Then, last year, I spent some time revisiting…
  continue reading
 
Figure 1. Image generated by DALL-3 to represent the concept of self-other overlapMany thanks to Bogdan Ionut-Cirstea, Steve Byrnes, Gunnar Zarnacke, Jack Foxabbott and Seong Hah Cho for critical comments and feedback on earlier and ongoing versions of this work. Summary In this post, we introduce self-other overlap training: optimizing for similar…
  continue reading
 
TL;DR: Your discernment in a subject often improves as you dedicate time and attention to that subject. The space of possible subjects is huge, so on average your discernment is terrible, relative to what it could be. This is a serious problem if you create a machine that does everyone's job for them. See also: Reality has a surprising amount of de…
  continue reading
 
This is a link post.Content warning: About an IRL death. Today's post isn’t so much an essay as a recommendation for two bodies of work on the same topic: Tom Mahood's blog posts and Adam “KarmaFrog1” Marsland's videos on the 2010 disappearance of Bill Ewasko, who went for a day hike in Joshua Tree National Park and dropped out of contact. 2010 – B…
  continue reading
 
NB. I am on the Google Deepmind language model interpretability team. But the arguments/views in this post are my own, and shouldn't be read as a team position. “It would be very convenient if the individual neurons of artificial neural networks corresponded to cleanly interpretable features of the input. For example, in an “ideal” ImageNet classif…
  continue reading
 
This is a link post.Google DeepMind reports on a system for solving mathematical problems that allegedly is able to give complete solutions to four of the six problems on the 2024 IMO, putting it near the top of the silver-medal category. Well, actually, two systems for solving mathematical problems: AlphaProof, which is more general-purpose, and A…
  continue reading
 
This is a link post.What is an agent? It's a slippery concept with no commonly accepted formal definition, but informally the concept seems to be useful. One angle on it is Dennett's Intentional Stance: we think of an entity as being an agent if we can more easily predict it by treating it as having some beliefs and desires which guide its actions.…
  continue reading
 
(Crossposted from Twitter) I'm skeptical that Universal Basic Income can get rid of grinding poverty, since somehow humanity's 100-fold productivity increase (since the days of agriculture) didn't eliminate poverty. Some of my friends reply, "What do you mean, poverty is still around? 'Poor' people today, in Western countries, have a lot to legitim…
  continue reading
 
Eliezer Yudkowsky periodically complains about people coming up with questionable plans with questionable assumptions to deal with AI, and then either: Saying "well, if this assumption doesn't hold, we're doomed, so we might as well assume it's true." Worse: coming up with cope-y reasons to assume that the assumption isn't even questionable at all.…
  continue reading
 
This post was inspired by some talks at the recent LessOnline conference including one by LessWrong user “Gene Smith”. Let's say you want to have a “designer baby”. Genetically extraordinary in some way — super athletic, super beautiful, whatever. 6’5”, blue eyes, with a trust fund. Ethics aside[1], what would be necessary to actually do this? Fund…
  continue reading
 
Loading …

Quick Reference Guide