Manage episode 209957923 series 1437556
Searching through all of the videos on the Internet is not a simple problem.
In order to search through all the videos, you need to build a search index. In order to build a search index, you need to build a web crawler. Video files are large. To store all of the actual video files would cost far too much money. In order to build an index in a cost-efficient manner, you need to have a way of storing information about a video without storing the entire video itself.
You might be thinking “hasn’t Google already solved video search? Why are we even talking about this?” Google has solved some aspects of video search–but a different set of challenges is being tackled by a video search company called Pex.
In order to explain what Pex is building, we should first explain the problem set they are trying to tackle.
Videos across the internet are consumed on a variety of platforms such as YouTube, Instagram, Facebook, and Vimeo. These videos are sliced up, bootlegged, and repurposed from one platform to another. For content creators who earn their living from their hosted video streams, this can be a nightmare.
Imagine you are a musician, and you make lots of money from music videos. You upload your cool new video to YouTube, and it instantly gets bootlegged by other users and shared across the internet in hundreds of different places. When people watch the stolen versions of your video, you are not getting compensated. If you could locate all of those stolen videos, you could order them to take it down, or claim the video so that you are paid for it.
And here is the engineering problem–how can you find all those re-posted videos? By crawling the web and building a search index for every video on the web.
Rasty Turek is the CEO of Pex, and in this episode he describes how to build a system that crawls the Internet and indexes videos. It’s a large scale engineering challenge, and there are lots of tradeoffs to be made between financial cost, speed, accuracy, and engineering complexity.
Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
153 episodes available. A new episode about every 6 days averaging 57 mins duration .