Summary A large portion of the software industry has standardized on Git as the version control sytem of choice. But have you thought about all of the information that you are generating with your branches, commits, and code changes? Davide Spadini created the PyDriller framework to simplify the work of mining software repositories to perform research on the technical and social aspects of software engineering. In this episode he shares some of the insights that you can gain by exploring the history of your code, the complexities of building a framework to interact with Git, and some of the interesting ways that PyDriller can be used to inform your own development practices. Announcements Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $60 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! You listen to this show to learn and stay up to date with the ways that Python is being used, including the latest in machine learning and data analysis. For more opportunities to stay up to date, gain new skills, and learn from your peers there are a growing number of virtual events that you can attend from the comfort and safety of your home. Go to pythonpodcast.com/conferences to check out the upcoming events being offered by our partners and get registered today! Your host as usual is Tobias Macey and today I’m interviewing Davide Spadini about PyDriller, a framework for mining software repositories Interview Introductions How did you get introduced to Python? Can you start by describing what PyDriller is and how the project got started? How is Pydriller different from other Git frameworks? What kinds of information can you discover by mining a software repository? Where and how might the collected information be used? What are the limitations of the capabilities offered by Git for investigating the repository? What are the additional metrics that you are able to extract using PyDriller? Can you describe how PyDriller itself is implemented? How has the project evolved since you first began working on it? I noticed that for testing PyDriller you crafted a set of repositories to serve as test cases. What has been the most complex or challenging aspect of writing meaningful tests to ensure a reasonable coverage of this problem domain? What would be required to add support for other version control systems? How have you used PyDriller in your own research? What are some of the most interesting, unexpected, or innovative ways that you have seen PyDriller used? What are some of the most interesting, unexpected, or challenging lessons that you have learned while working on and with PyDriller? What do you have planned for the future of PyDriller? Keep In Touch Website ishepard on GitHub @DavideSpadini on Twitter Picks Tobias pre-commit Davide Fall guys Closing Announcements Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email hosts@podcastinit.com) with your story. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Join the community in the new Zulip chat workspace at pythonpodcast.com/chat Links PyDriller Delft Git GitPython PyGit2 RepoDriller Mining Software Repositories Conference Lizard Hadoop Mercurial Podcast Episode Subversion CVS Neo4J GraphRepo The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA