Daniel Whitenack (@dwhitena): Data Science language GO, Containers, Reproducibility


Manage episode 181593433 series 1466098
By Data Podcast, Rajib Bahar, and Shabnam Khan. Discovered by Player FM and our community — copyright is owned by the publisher, not Player FM, and audio is streamed directly from their servers. Hit the Subscribe button to track updates in Player FM, or paste the feed URL into other podcast apps.
Daniel (@dwhitena) is a Ph.D. trained data scientist working with Pachyderm (@pachydermIO). Daniel develops innovative, distributed data pipelines which include predictive models, data visualizations, statistical analyses, and more. He has spoken at conferences around the world (ODSC, Spark Summit, Datapalooza, DevFest Siberia, GopherCon, and more), teaches data science/engineering with Ardan Labs (@ardanlabs), maintains the Go kernel for Jupyter, and is actively helping to organize contributions to various open source data science projects. Interviewer: Rajib Bahar Agenda: - Many of us may or may not be aware of "Jupyter Notebook", which is a web application to write codes in various Languages such is R, Python, Julia, node.js, GoLang, Ruby, & Scala. That appliation in turn creates separate process in the Kernel to receive output from the OS and return the output back to the web application. One of the coolest thing you do is to maintain the Kernela on GoLang aka Go. Currently, Data Scientists tend to gravitate toward either R, or Python as language. You're playing with a bit more modern languages in data science. Why Go? How is it more useful in statistical analysis or Data visualization? - How do you achive reproducibility in data science? - Most of us heard of Virtual Machine tools such as VMWare, Virtual PC, Virtual Box. This is the 1st time I heard of containers. What are some key benefits of it?Are there websites such as Turnkey hub where you can get some good images of various OS / software / DBMS platforms? - What are some best practices around deploying Data Science Models? Do you do something similar to DBAs or DataEngineers to run a job at certain frequencies in the day or hour? - How do you use data pipelines in your project? Is that something used in ETL like Data-Wrangling process? - Please tell us where we can find you in social media? Music: www.freesfx.co.uk

29 episodes