Summary Building a machine learning model is a process that requires a lot of iteration and trial and error. For certain classes of problem a large portion of the searching and tuning can be automated. This allows data scientists to focus their time on more complex or valuable projects, as well as opening the door for non-specialists to experiment with machine learning. Frustrated with some of the awkward or difficult to use tools for AutoML, Angela Lin and Jeremy Shih helped to create the EvalML framework. In this episode they share the use cases for automated machine learning, how they have designed the EvalML project to be approachable, and how you can use it for building and training your own models. Announcements Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Your host as usual is Tobias Macey and today I’m interviewing Angela Lin, Jeremy Shih about EvalML, an AutoML library which builds, optimizes, and evaluates machine learning pipelines Interview Introductions How did you get introduced to Python? Can you describe what EvalML is and the story behind it? What do we mean by the term AutoML? What are the kinds of problems that are best suited to applications of automated ML? What does the landscape for AutoML tools look like? What was missing in the available offerings that motivated you and your team to create EvalML? Who is the target audience for EvalML? How is the EvalML project implemented? How has the project changed or evolved since you first began working on it? What is the workflow for building a model with EvalML? Can you describe the preprocessing steps that are necessary and the input formats that it is expecting? What are the supported algorithms/model architectures? How does EvalML explore the search space for an optimal model? What decision functions does it employ to determine an appropriate stopping point? What is involved in operationalizing an AutoML pipeline? What are some challenges or edge cases that you see users of EvalML run into? What are the most interesting, innovative, or unexpected ways that you have seen EvalML used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on EvalML? When is EvalML the wrong choice? When is auto ML the wrong approach? What do you have planned for the future of EvalML? Keep In Touch Angela angela97lin on GitHub LinkedIn Jeremy jeremyliweishih on GitHub LinkedIn Picks Tobias Gloryhammer Angela Sarma mediterranean restaurant Jeremy Crucial Conversations by Stephen Covey (affiliate link) Closing Announcements Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email hosts@podcastinit.com) with your story. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Join the community in the new Zulip chat workspace at pythonpodcast.com/chat Links EvalML FeatureLabs Alteryx Scheme NetLogo Flask AutoML Woodwork FeatureTools Compose Random Forest XGBoost Prophet GreyKite Shap The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA