14: Building a Character-Based Text Classifier

The Data Life Podcast

Content provided by Sanket Gupta. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Sanket Gupta or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://player.fm/legal.

5y ago 23:20

M4A•Episode home

Ever wonder how to automatically detect language from a script? How does Google do it?

Ever wonder how Amazon knows whether you are searching for a product or a SKU on its search bar?

We look into character-based text classifiers in this episode. We cover 2 types of models. First is the bag-of-words models such as Naive Bayes, logistic regression and vanilla neural network. Second we cover sequence models such as LSTMs and how to prepare your characters for the LSTMs including things like one-hot encoding, padding, creating character embeddings and then feeding these into LSTMs. We also cover how to set up and compile these sequence models.

Thanks for listening, and if you find this content useful, please leave a review and consider supporting this podcast from the link below.

--- Send in a voice message: https://podcasters.spotify.com/pod/show/the-data-life-podcast/message Support this podcast: https://podcasters.spotify.com/pod/show/the-data-life-podcast/support

27 episodes

#Tech #Sanket Gupta #Data Science