Summary Jake Vanderplas is an astronomer by training and a prolific contributor to the Python data science ecosystem. His current role is using Python to teach principles of data analysis and data visualization to students and researchers at the University of Washington. In this episode he discusses how he got started with Python, the challenges of teaching best practices for software engineering and reproducible analysis, and how easy to use tools for data visualization can help democratize access to, and understanding of, data. Preface Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great. I would like to thank everyone who supports us on Patreon. Your contributions help to make the show sustainable. When you’re ready to launch your next project you’ll need somewhere to deploy it. Check out Linode at podastinit.com/linode and get a $20 credit to try out their fast and reliable Linux virtual servers for running your awesome app. And now you can deliver your work to your users even faster with the newly upgraded 200 GBit network in all of their datacenters. If you’re tired of cobbling together your deployment pipeline then it’s time to try out GoCD, the open source continuous delivery platform built by the people at ThoughtWorks who wrote the book about it. With GoCD you get complete visibility into the life-cycle of your software from one location. To download it now go to podcatinit.com/gocd. Professional support and enterprise plugins are available for added piece of mind. Visit the site to subscribe to the show, sign up for the newsletter, and read the show notes. And if you have any questions, comments, or suggestions I would love to hear them. You can reach me on Twitter at @Podcast__init__ or email hosts@podcastinit.com) To help other people find the show please leave a review on iTunes, or Google Play Music, tell your friends and co-workers, and share it on social media. Your host as usual is Tobias Macey and today I’m interviewing Jake Vanderplas about data science best practices, and applying them to academic sciences Interview Introductions How did you get introduced to Python? How has your astronomy background informed and influenced your current work? In your work at the University of Washington, what are some of the most common difficulties that students face when learning data science? How does that list differ for professional scientists who are learning how to apply data science to their work? Where is the tooling still lacking in terms of enabling consistent and repeatable workflows? One of the projects that you are spending time on now is Altair, which is a library for generating visualizations from Pandas dataframes. How does that work factor into your teaching? What are some of the most novel applications of data science that you have been involved with? What are some of the trends in data analysis that you are most excited for? Keep In Touch Website @jakevdp jakevdp on GitHub Picks Tobias The Redwall Cookbook Jake Kevin M. Kruse White Flight by Kevin Kruse Links UW eScience Institute NumPy SciPy SciPy Conference PyCon Pandas Sloan Digital Sky Survey Spectroscopy Software Carpentry Data Carpentry Git Mercurial Matplotlib Altair Conda Xonsh Jupyter Jupyter Lab Vega Vega-lite Interactive Data Lab D3 Mike Bostock Brian Granger Bokeh Grammar of Graphics ggplot2 Holoviews Wikimedia AstroPy Podcast.__init__ Interview About AstroPy LIGO Wes McKinney Feather The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA