Summary HDF5 is a file format that supports fast and space efficient analysis of large datasets. PyTables is a project that wraps and expands on the capabilities of HDF5 to make it easy to integrate with the larger Python data ecosystem. Francesc Alted explains how the project got started, how it works, and how it can be used for creating sharable and archivable data sets. Preface Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great. I would like to thank everyone who has donated to the show. Your contributions help us make the show sustainable. When you’re ready to launch your next project you’ll need somewhere to deploy it. Check out Linode at linode.com/podcastinit and get a $20 credit to try out their fast and reliable Linux virtual servers for running your awesome app. Linode will has announced new plans, including 1GB for $5 plan, high memory plans starting at 16GB for $60/mo and an upgrade in storage from 24GB to 30GB on our 2GB for $10 plan. Visit our site to subscribe to our show, sign up for our newsletter, read the show notes, and get in touch. To help other people find the show you can leave a review on iTunes, or Google Play Music, and tell your friends and co-workers Your host as usual is Tobias Macey and today I’m interviewing Francesc Alted about PyTables Interview Introductions How did you get introduced to Python? To start with, what is HDF5 and what was the problem that motivated you to wrap Python around it to create PyTables? Which are the most relevant contributors for PyTables? How you interacted? How is the project architected and what are some of the design decisions that you are most proud of? What are some of the typical use cases for PyTables and how does it tie into the broader Python data ecosystem? How common is it to use an HDF5 file as a data interchange format to be shared between researchers or between languages? Given the ability to create custom node types, does that inhibit the ability to interact with the stored data using other libraries? What are some of the capabilities of HDF5 and PyTables that can’t be reasonably replicated in other data storage systems? One of the more intriguing capabilities that I noticed while reading the documentation is the ability to perform undo and redo operations on the data. How might that be leveraged in a real-world use case? What are some of the most interesting or unexpected uses of PyTables that you are aware of? Keep In Touch @FrancescAlted on Twitter FrancescAlted on GitHub Picks Tobias The Accountant Francesc Blosc a high speed compressor, specially meant for binary data The Lego Batman Movie Links PyTables PyTables – Optimization Presentations and Videos about PyTables Part of the story behind PyTables HDF5 Pandas SIMD NumFOCUS The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA