Gnocchi: A Scalable Time Series Database For Your Metrics with Julien Danjou

The Python Podcast.__init__

Summary Do you know what your servers are doing? If you have a metrics system in place then the answer should be “yes”. One critical aspect of that platform is the timeseries database that allows you to store, aggregate, analyze, and query the various signals generated by your software and hardware. As the size and complexity of your systems scale, so does the volume of data that you need to manage which can put a strain on your metrics stack. Julien Danjou built Gnocchi during his time on the OpenStack project to provide a time oriented data store that would scale horizontally and still provide fast queries. In this episode he explains how the project got started, how it works, how it compares to the other options on the market, and how you can start using it today to get better visibility into your operations. Preface Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so check out Linode. With 200 Gbit/s private networking, scalable shared block storage, node balancers, and a 40 Gbit/s public network, all controlled by a brand new API you’ve got everything you need to scale up. Go to pythonpodcast.com/linode to get a $20 credit and launch a new server in under a minute. And to keep track of how your team is progressing on building new features and squashing bugs, you need a project management system designed by software engineers, for software engineers. Clubhouse lets you craft a workflow that fits your style, including per-team tasks, cross-project epics, a large suite of pre-built integrations, and a simple API for crafting your own. Podcast.__init__ listeners get 2 months free on any plan by going to pythonpodcast.com/clubhouse today and signing up for a trial. Visit the site to subscribe to the show, sign up for the newsletter, and read the show notes. And if you have any questions, comments, or suggestions I would love to hear them. You can reach me on Twitter at @Podcast__init__ or email hosts@podcastinit.com) To help other people find the show please leave a review on iTunes, or Google Play Music, tell your friends and co-workers, and share it on social media. Join the community in the new Zulip chat workspace at pythonpodcast.com/chat Your host as usual is Tobias Macey and today I’m interviewing Julien Danjou about Gnocchi, an open source time series database built to handle large volumes of system metrics Interview Introductions How did you get introduced to Python? Can you start by describing what Gnocchi is and how the project got started? What was the motivation for moving Gnocchi out of the Openstack organization and into its own top level project? The space of time series databases and metrics as a service platforms are both fairly crowded. What are the unique features of Gnocchi that would lead someone to deploy it in place of other options? What are some of the tools and platforms that are popular today which hadn’t yet gained visibility when you first began working on Gnocchi? How is Gnocchi architected? How has the design changed since you first started working on it? What was the motivation for implementing it in Python and would you make the same choice today? One of the interesting features of Gnocchi is its support of resource history. Can you describe how that operates and the types of use cases that it enables? Does that factor into the multi-tenant architecture? What are some of the drawbacks of pre-aggregating metrics as they are being written into the storage layer (e.g. loss of fidelity)? Is it possible to maintain the raw measures after they are processed into aggregates? One of the challenging aspects of building a scalable metrics platform is support for high-cardinality data. What sort of labelling and tagging of metrics and measures is available in Gnocchi? For someone who wants to implement Gnocchi for their system metrics, what is involved in deploying, maintaining, and upgrading it? What are the available integration points for extending and customizing Gnocchi? Once metrics have been stored, aggregated, and indexed, what are the options for querying and analyzing the collected data? When is Gnocchi the wrong choice? What do you have planned for the future of Gnocchi? Keep In Touch jd on GitHub Website @juldanjou on Twitter Picks Tobias Marketplace Podcast Julien Mergify Links Gnocchi RedHat OpenStack Object Oriented Programming O’Reilly Debian Ceilometer Prometheus Time Series MySQL Gerrit Zuul Podcast Episode GitHub GitLab Graphite Podcast Episode DataDog RabbitMQ InfluxDB Ceph Podcast Episode S3 OpenStack Swift Cassandra Honeycomb Observability Service Podcast Episode AMQP Redis DSL (Domain Specific Language) Golang RBAC (Role-Based Access Control) CollectD StatsD Gnocchi Client Telegraf Grafana TimescaleDB Podcast Episode OpenStack Heat The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

00:00

39:16

00:00

Gnocchi: A Scalable Time Series Database For Your Metrics with Julien Danjou

The Python Podcast.__init__

00:00

39:16

00:00