LW - AI #74: GPT-4o Mini Me and Llama 3 by Zvi

The Nonlinear Library

Content provided by The Nonlinear Fund. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by The Nonlinear Fund or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://player.fm/legal.

4M ago 59:03

MP3•Episode home

Fetch error

Hmmm there seems to be a problem fetching this series right now. Last successful fetch was on September 26, 2024 16:04 (1M ago)

What now? This series will be checked again in the next hour. If you believe it should be working, please verify the publisher's feed link below is valid and includes actual episode links. You can contact support to request the feed be immediately fetched.

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: AI #74: GPT-4o Mini Me and Llama 3, published by Zvi on July 26, 2024 on LessWrong.
We got two big model releases this week. GPT-4o Mini is covered here. Llama 3.1-405B (and 70B and 8B) is mostly covered in yesterday's post, this has some follow up.
Table of Contents
1. Introduction.
2. Table of Contents.
3. Language Models Offer Mundane Utility. All your coding are belong to us.
4. Language Models Don't Offer Mundane Utility. Math is hard. Can be expensive.
5. GPT-4o Mini Me. You complete me at lower than usual cost.
6. Additional Llama-3.1 Notes. Pricing information, and more rhetoric.
7. Fun With Image Generation. If you're confused why artists are so upset.
8. Deepfaketown and Botpocalypse Soon. Not surprises.
9. They Took Our Jobs. Layoffs at Activision and across gaming.
10. In Other AI News. New benchmarks, new chip variants, and more.
11. The Art of the Jailbreak. Pliny remains undefeated.
12. Quiet Speculations. Where will the utility be coming from?
13. The Quest for Sane Regulations. Public opinion continues to be consistent.
14. Openly Evil AI. Some Senators have good questions.
15. The Week in Audio. Dwarkesh in reverse, and lots of other stuff. Odd Lots too.
16. Rhetorical Innovation. What are corporations exactly?
17. Aligning a Smarter Than Human Intelligence is Difficult. So are evals.
18. People Are Worried About AI Killing Everyone. Roon warns you to beware.
19. The Sacred Timeline. Hype?
20. Other People Are Not As Worried About AI Killing Everyone. Older Joe Rogan.
21. The Lighter Side. It's on.
Language Models Offer Mundane Utility
Coding is seriously much faster now, and this is the slowest it will ever be.
Roon: pov: you are ten months from working for claude sonnet the new technical founder.
Garry Tan: Underrated trend.
It's happening.
Sully: 50% of our code base was written entirely by LLMs expect this to be ~80% by next year With sonnet we're shipping so fast, it feels like we tripled headcount overnight Not using Claude 3.5 to code? Expect to be crushed by teams who do (us).
Not only coding, either.
Jimmy (QTing Tan): It can also do hardware related things quite well too, and legal, and logistics (planning) and compliance even.
I've been able to put off hiring for months.
When I run out of sonnet usage I patch in gpt-4o, it's obviously and notably worse which I why I rarely use it as a primary anymore.
Claude 3.5 Sonnet becomes the first AI to crush the Lem Test to 'write an impossible poem.'
Laugh all you want, this is actually great.
Kache: dude hahahahahah i used so many tokens today on just formatting json logs
near: the just stop oil people are gonna come and spray paint you now
Compared to how much carbon a human coder would have used? Huge improvement.
Language Models Don't Offer Mundane Utility
IMO problems are still mostly too hard. The linked one, which GPT-4, GPT-4o and Claude 3.5 Sonnet failed on, seems unusually easy? Although a math Olympiad solver does, predictably given the contests we've seen.
[EDIT: I didn't read this properly, but a reader points out this is the floor symbol, which means what I thought was an obvious proof doesn't actually answer the question, although it happens to get the right answer. Reader says the answers provided would actually also get 0/7, order has been restored].
Figure out what song Aella was talking about here. Found the obvious wrong answer.
Grok offers to tell you 'more about this account.' I haven't seen the button yet, probably it is still experimental.
Our price cheap. Llama 3.1-405B was a steal in terms of compute costs.
Seconds: "AI is expensive" its not even half the cost of a middling marvel movie.
Teortaxes: Pretty insane that the cost of producing llama-3-405B, this behemoth, is like 40% of *Ant-Man and the Wasp: Quantumania* movie at most If I were Zuck, I'd have open sourced a $...

2447 episodes

#Podcasting Education #The Nonlinear Fund