SpaCy with Matthew Honnibal

The Python Podcast.__init__

Summary As the amount of text available on the internet and in businesses continues to increase, the need for fast and accurate language analysis becomes more prominent. This week Matthew Honnibal, the creator of SpaCy, talks about his experiences researching natural language processing and creating a library to make his findings accessible to industry. Brief Introduction Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great. I would like to thank everyone who has donated to the show. Your contributions help us make the show sustainable. When you’re ready to launch your next project you’ll need somewhere to deploy it. Check out Linode at linode.com/podcastinit and get a $20 credit to try out their fast and reliable Linux virtual servers for running your awesome app. You’ll want to make sure that your users don’t have to put up with bugs, so you should use Rollbar for tracking and aggregating your application errors to find and fix the bugs in your application before your users notice they exist. Use the link rollbar.com/podcastinit to get 90 days and 300,000 errors for free on their bootstrap plan. Visit our site to subscribe to our show, sign up for our newsletter, read the show notes, and get in touch. To help other people find the show you can leave a review on iTunes, or Google Play Music, and tell your friends and co-workers Join our community! Visit discourse.pythonpodcast.com for your opportunity to find out about upcoming guests, suggest questions, and propose show ideas. Your host as usual is Tobias Macey and today I’m interviewing Matthew Honnibal about SpaCy and Explosion.AI Interview with Matthew Honnibal Introductions How did you get introduced to Python? Can you start by sharing what SpaCy is and what problem you were trying to solve when you created it? Another project for natural language processing that has been part of the Python ecosystem for a number of years is the Natural Language Tool Kit (NLTK). How does SpaCy differ from the NLTK and are there any cases where that would be the better choice? How much knowledge of NLP and computational linguistics is necessary to be able to use SpaCy? What does the internal design and architecture of SpaCy look like and what are the biggest challenges associated with its development to date and into the future? One of the projects that you have built around SpaCy which I think is really cool and caught my attention when I first found your project is the displaCy visualization tool. Can you explain what that is and why you think it is important? What are some kinds of applications where SpaCy would be useful which might not be obvious candidates for it? Why is speed such an important focus for an NLP library? One of the ways that you have been able to gain a speed boost is through releasing the GIL and allowing for true parallelism via Cython. How have you managed to ensure that this doesn’t lead to data races and program failures? Building on the success of SpaCy you founded a company called Explosion AI. Can you explain what your goals are for this endeavor and the kinds of services that you are offering? What are some of the most interesting uses of SpaCy that you have seen? What do you have planned for the future of SpaCy? Keep In Touch Twitter Matthew SpaCy Explosion AI Mailing List Explosion AI Contact Form Picks Tobias Zoom H4N Pro Shure SM58 Links Reddit sense2vec demo DisplaCy DisplaCy Entity Visualizer SpaCy Showcase NLTK Chartbeat Cytora The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

00:00

36:48

00:00

SpaCy with Matthew Honnibal

The Python Podcast.__init__

00:00

36:48

00:00