Summary The foundation of every ML model is the data that it is trained on. In many cases you will be working with tabular or unstructured information, but there is a growing trend toward networked, or graph data sets. Benedek Rozemberczki has focused his research and career around graph machine learning applications. In this episode he discusses the common sources of networked data, the challenges of working with graph data in machine learning projects, and describes the libraries that he has created to help him in his work. If you are dealing with connected data then this interview will provide a wealth of context and resources to improve your projects. Announcements Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Are you bored with writing scripts to move data into SaaS tools like Salesforce, Marketo, or Facebook Ads? Hightouch is the easiest way to sync data into the platforms that your business teams rely on. The data you’re looking for is already in your data warehouse and BI tools. Connect your warehouse to Hightouch, paste a SQL query, and use their visual mapper to specify how data should appear in your SaaS systems. No more scripts, just SQL. Supercharge your business teams with customer data using Hightouch for Reverse ETL today. Get started for free at pythonpodcast.com/hightouch. Your host as usual is Tobias Macey and today I’m interviewing Benedek Rozemberczki about his work on machine learning for graph data, including a variety of libraries to support his efforts Interview Introductions How did you get introduced to Python? Can you start by giving an overview of when you might want to do machine learning on networked/graph data? How do networked data sets change the way that you approach machine learning tasks? Can you describe the current state of the ecosystem for machine learning on graphs? You have created a number of libraries to address different aspects of machine learning on graphs. Can you list them and share some of the stories behind their creation? How do the different tools relate to each other? Can you talk through some of the structural and user experience design principles that you lean on when building these libraries? When you are working with networked data sets, what is your current workflow from idea to completion? What are the most difficult aspects of working with networked data sets for machine learning applications? What are the most interesting, innovative, or unexpected ways that you have seen graph ML used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on graph ML problems? What are some examples of when you would choose not to use some or all of your own libraries? What do you have planned for the future of your libraries/what new libraries do you anticipate needing to build? Keep In Touch benedekrozemberczki on GitHub @benrozemberczki on Twitter LinkedIn Picks Tobias Wrath of Man Benedek Hunt for the Wilderpeople Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges Closing Announcements Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email hosts@podcastinit.com) with your story. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Join the community in the new Zulip chat workspace at pythonpodcast.com/chat Links Karate Club PyTorch Geometric Temporal AstraZeneca Budapest University of Edinburgh Matlab R Bipartite Graph Node Classification Graph Classification PyTorch Podcast Episode PyTorch Geometric DGL (Deep Graph Library) Parametric Machine Learning graph-tool Jax NetworkX Little Ball of Fur GCN == Graph Convolutional Network NetworKit Gensim Podcast Episode Nvidia cuGraph Random Walk scikit-learn MalNet Graph Representation Learning by William Hamilton The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA