When Song Lyrics and British Lit Meet Tidy Text

Data Crunch

Content provided by Data Crunch Corporation. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Data Crunch Corporation or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://player.fm/legal.

7y ago 17:47

MP3•Episode home

Archived series ("Inactive feed" status)

When? This feed was archived on February 26, 2024 19:24 (2M ago). Last successful fetch was on January 02, 2024 21:11 (4M ago)

Why? Inactive feed status. Our servers were unable to retrieve a valid podcast feed for a sustained period.

What now? You might be able to find a more up-to-date version using the search function. This series will no longer be checked for updates. If you believe this to be in error, please check if the publisher's feed link below is valid and contact support to request the feed be restored or if you have any other concerns about this.

When Julia Silge's personal interests meet her professional proficiencies, she discovers new meaning in Jane Austen's literature, and she gauges the cultural influence of locations in pop songs. Even more impressive than these finds, though, is that she and her collaborator, Dave Robinson, have developed some new, efficient ways to mine text data. Check out the book they've written called Tidy Text Mining with R.Below is a partial transcript. For the full interview, listen to the podcast episode by selecting the Play button above or by selecting this link, or you can also listen to the podcast through Apple Podcasts, Google Play, Stitcher, and Overcast.TranscriptJulia Silge: “One that I worked on that was really fun was about song lyrics. The last 50 years or so of pop songs, we have all these lyrics, so all this text data, and I wanted to ask the question, what places are mentioned more or less often in these pop songs.”Ginette: “I’m Ginette.”Curtis: “And I’m Curtis.”Ginette: “And you are listening to Data Crunch.”Curtis: “A podcast about how data and prediction shape our world.”Ginette: “A Vault Analytics production.”Curtis: “Brought to you by data.world, the social network for data people. Discover and share cool data, connect with interesting people, and work together to solve problems faster at data.world. Whether you’re already a frequent dataset contributor or totally new to data.world, there are several resources you can use to stay in the loop on the latest features, learn new skills, and get support. Check out docs.data.world for up-to-date API documentation, tutorials on SQL, and other query techniques, and much more!”Ginette: “We hope you’re enjoying some vacation time this summer. We just did, and now Data Crunch is back! To hear the latest from us, add us on Twitter, @datacrunchpod. Today we hear from an exciting guest—someone who is on the cutting edge of data science tool creation, someone exploring and developing new ways to slice and dice difficult data.”Julia: “My name is Julia Silge, and I'm a data scientist at Stack Overflow. My academic background is in physics and astronomy, but I’ve worked in academia, teaching and doing research, I worked at an ed tech start up, and I've made a transition now into data science.”Ginette: “Stack Overflow, where Julia works, is the largest online community for programmers to learn, share knowledge, and build their careers. It's a great resource when you need to solve a coding problem or develop new skills.”Curtis: “Now there are basically two main camps in data science: people who program with R, a statistical programming language, and people who program with Python, a high-level, general purpose language. Both languages have devoted followers, and both do excellent work. Today, we’re looking at R, and Julia is a big name in this space, as is her collaborator Dave Robinson.”Julia: “Text is increasingly a really important part of our work as people who are involved in data. Text is being generated all the time, at ever faster rates. This unstructured data is becoming a really important part of things that we do. I also am somebody that—my academic background is not in text or literature or natural language processing or anything like that, but I am somebody who's always been a reader and always been interested in language, and these sort of collection of circumstances kind of all came together to converge that me and Dave decided to develop some tools for making text mining something that people can do within this idiom of people who work using the R programming language. So we’ve developed a package called tidy text.”Ginette: “Now this particular tool is based on tidy data principles, which is basically organizing data in a uniform way so it’s ready for you to ferret out insights.”Julia: “There's a section of people who use tools that are built for dealing with tidy data principles,

101 episodes

#Data Science #Society #Data Crunch Corporation #Science #Natural Sciences