
The Python Podcast.__init__
Tobias Macey·389 episodes
The podcast about Python and the people who make it great
Why listen
The Python Podcast.__init__ is built for developers who want to understand how real Python projects, tools, and teams are made. Tobias Macey interviews maintainers, founders, and engineers about the architecture, tradeoffs, and practical use cases behind libraries, platforms, and production systems. It is especially useful if you like technical conversations that go beyond release notes and into why a tool exists and how people actually use it.
Episodes
Preamble This is a cross-over episode from our new show The Machine Learning Podcast, the show about going from idea to production with machine learning. Summary The majority of machine learning projects that you read about or work on are built around batch processes. The model is trained, and then validated, and then deployed, with each step being a discrete and isolated task. Unfortunately, the real world is rarely static, leading to concept drift and model failures. River is a framework for building streaming machine learning projects that can constantly adapt to new information. In this episode Max Halford explains how the project works, why you might (or might not) want to consider streaming ML, and how to get started building with River. Announcements Hello and welcome to the Machine Learning Podcast, the podcast about machine learning and how to bring it from idea to delivery. Building good ML models is hard, but testing them properly is even harder. At Deepchecks, they built an open-source testing framework that follows best practices, ensuring that your models behave as expected. Get started quickly using their built-in library of checks for testing and validating your model’s behavior and performance, and extend it to meet your specific needs as your model evolves. Accelerate your machine learning projects by building trust in your models and automating the testing that you used to do manually. Go to themachinelearningpodcast.com/deepchecks today to get started! Your host is Tobias Macey and today I’m interviewing Max Halford about River, a Python toolkit for streaming and online machine learning Interview Introduction How did you get involved in machine learning? Can you describe what River is and the story behind it? What is "online" machine learning? What are the practical differences with batch ML? Why is batch learning so predominant? What are the cases where someone would want/need to use online or streaming ML? The prevailing pattern for batch ML model lifecycles is to train, deploy, monitor, repeat. What does the ongoing maintenance for a streaming ML model look like? Concept drift is typically due to a discrepancy between the data used to train a model and the actual data being observed. How does the use of online learning affect the incidence of drift? Can you describe how the River framework

Preamble This is a cross-over episode from our new show The Machine Learning Podcast, the show about going from idea to production with machine learning. Summary Deep learning is a revolutionary category of machine learning that accelerates our ability to build powerful inference models. Along with that power comes a great deal of complexity in determining what neural architectures are best suited to a given task, engineering features, scaling computation, etc. Predibase is building on the successes of the Ludwig framework for declarative deep learning and Horovod for horizontally distributing model training. In this episode CTO and co-founder of Predibase, Travis Addair, explains how they are reducing the burden of model development even further with their managed service for declarative and low-code ML and how they are integrating with the growing ecosystem of solutions for the full ML lifecycle. Announcements Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great! When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. And now you can launch a managed MySQL, Postgres, or Mongo database cluster in minutes to keep your critical data safe with automated backups and failover. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Your host is Tobias Macey and today I’m interviewing Travis Addair about Predibase, a low-code platform for building ML models in a declarative format Interview Introduction How did you get involved in machine learning? Can you describe what Predibase is and the story behind it? Who is your target audience and how does that focus influence your user experience and feature development priorities? How would you describe the semantic differences between your chosen terminology of "declarative ML" and the "autoML" nomenclature that many projects and products have adopted? Another platform that launched recently with a promise of "declarat

Preamble This is a cross-over episode from our new show The Machine Learning Podcast, the show about going from idea to production with machine learning. Summary Machine learning has the potential to transform industries and revolutionize business capabilities, but only if the models are reliable and robust. Because of the fundamental probabilistic nature of machine learning techniques it can be challenging to test and validate the generated models. The team at Deepchecks understands the widespread need to easily and repeatably check and verify the outputs of machine learning models and the complexity involved in making it a reality. In this episode Shir Chorev and Philip Tannor explain how they are addressing the problem with their open source deepchecks library and how you can start using it today to build trust in your machine learning applications. Announcements Hello and welcome to the Machine Learning Podcast, the podcast about machine learning and how to bring it from idea to delivery. Do you wish you could use artificial intelligence to drive your business the way Big Tech does, but don’t have a money printer? Graft is a cloud-native platform that aims to make the AI of the 1% accessible to the 99%. Wield the most advanced techniques for unlocking the value of data, including text, images, video, audio, and graphs. No machine learning skills required, no team to hire, and no infrastructure to build or maintain. For more information on Graft or to schedule a demo, visit themachinelearningpodcast.com/graft today and tell them Tobias sent you. Predibase is a low-code ML platform without low-code limits. Built on top of our open source foundations of Ludwig and Horovod, our platform allows you to train state-of-the-art ML and deep learning models on your datasets at scale. Our platform works on text, images, tabular, audio and multi-modal data using our novel compositional model architecture. We allow users to operationalize models on top of the modern data stack, through REST and PQL – an extension of SQL that puts predictive power in the hands of data practitioners. Go to themachinelearningpodcast.com/predibase today to learn mo

Preamble This is a cross-over episode from our new show The Machine Learning Podcast, the show about going from idea to production with machine learning. Summary Building an ML model is getting easier than ever, but it is still a challenge to get that model in front of the people that you built it for. Baseten is a platform that helps you quickly generate a full stack application powered by your model. You can easily create a web interface and APIs powered by the model you created, or a pre-trained model from their library. In this episode Tuhin Srivastava, co-founder of Basten, explains how the platform empowers data scientists and ML engineers to get their work in production without having to negotiate for help from their application development colleagues. Announcements Hello and welcome to the Machine Learning Podcast, the podcast about machine learning and how to bring it from idea to delivery. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. And now you can launch a managed MySQL, Postgres, or Mongo database cluster in minutes to keep your critical data safe with automated backups and failover. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Your host is Tobias Macey and today I’m interviewing Tuhin Srivastava about Baseten, an ML Application Builder for data science and machine learning teams Interview Introduction How did you get involved in machine learning? Can you describe what Baseten is and the story behind it? Who are the target users for Baseten and what problems are you solving for them? What are some of the typical technical requirements for an application that is powered by a machine learning model? In the absence of Baseten, what are some of the common utilities/patterns that teams might rely on? What kinds of challenges do teams run into when serving a model in the context of an application? There are a number of

Summary Starting a new project is always exciting and full of possibility, until you have to set up all of the repetitive boilerplate. Fortunately there are useful project templates that eliminate that drudgery. PyScaffold goes above and beyond simple template repositories, and gives you a toolkit for different application types that are packed with best practices to make your life easier. In this episode Florian Wilhelm shares the story behind PyScaffold, how the templates are designed to reduce friction when getting a new project off the ground, and how you can extend it to suit your needs. Stop wasting time with boring boilerplate and get straight to the fun part with PyScaffold! Announcements Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great! When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. And now you can launch a managed MySQL, Postgres, or Mongo database cluster in minutes to keep your critical data safe with automated backups and failover. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Your host as usual is Tobias Macey and today I’m interviewing Florian Wilhelm about PyScaffold, a Python project template generator with batteries included Interview Introductions How did you get introduced to Python? Can you describe what PyScaffold is and the story behind it? What is the main goal of the project? There are a huge number of templates and starter projects available (both in Python and other languages). What are the aspects of PyScaffold that might encourage someone to adopt it? What are the different types/categories of applications that you are focused on supporting with the scaffolding? For each category, what is your selection process for which dependencies to include? How do you approach the work of keeping the various components up to date with community "best practices"? Can you describe how PyScaffold is implemented? How have the design and goals of the project changed since you first started it? What is the user experience for someone bootstrapping a project with PyScaffold? How can you adapt an existing project into the structure

Summary Application configuration is a deceptively complex problem. Everyone who is building a project that gets used more than once will end up needing to add configuration to control aspects of behavior or manage connections to other systems and services. At first glance it seems simple, but can quickly become unwieldy. Bruno Rocha created Dynaconf in an effort to provide a simple interface with powerful capabilities for managing settings across environments with a set of strong opinions. In this episode he shares the story behind the project, how its design allows for adapting to various applications, and how you can start using it today for your own projects. Announcements Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great! When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. And now you can launch a managed MySQL, Postgres, or Mongo database cluster in minutes to keep your critical data safe with automated backups and failover. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Your host as usual is Tobias Macey and today I’m interviewing Bruno Rocha about Dynaconf, a powerful and flexible framework for managing your application’s configuration settings Interview Introductions How did you get introduced to Python? Can you describe what Dynaconf is and the story behind it? What are your main goals for Dynaconf? What kinds of projects (e.g. web, devops, ML, etc.) are you focused on supporting with Dynaconf? Settings management is a deceptively complex and detailed aspect of software engineering, with a lot of conflicting opinions about the "right way". What are the design philosophies that you lean on for Dynaconf? Many engineers end up building their own frameworks for managing settings as their use cases and environments get increasingly complicated. What are some of the ways that those efforts can go wrong or become unmaintainable? Can you describe how Dynaconf is implemented? How have the design and goals of the project evolved since you first started it? What is the workflow for getting started with Dynaconf on a new project? How does the usage scale with the complex

Summary Software is eating the world, but that code has to have hardware to execute the instructions. Most people, and many software engineers, don’t have a proper understanding of how that hardware functions. Charles Petzold wrote the book "Code: The Hidden Language of Computer Hardware and Software" to make this a less opaque subject. In this episode he discusses what motivated him to revise that work in the second edition and the additional details that he packed in to explore the functioning of the CPU. Announcements Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great! When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. And now you can launch a managed MySQL, Postgres, or Mongo database cluster in minutes to keep your critical data safe with automated backups and failover. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Your host as usual is Tobias Macey and today I’m interviewing Charles Petzold about his work on the second edition of Code: The Hidden Language of Computer Hardware and Software Interview Introductions How did you get introduced to Python? Can you start by describing the focus and goal of "Code" and the story behind it? Who is the target audience for the book? The sequencing of the topics parallels the curriculum of a computer engineering course of study. Why do you think that it is useful/important for a general audience to understand the electrical engineering principles that underly modern computers? What was your process for determining how to segment the information that you wanted to address in the book to balance the pacing of the reader with the density of the information? Technical books are notoriously challenging to write due to the constantly changing subject matter. What are some of the ways that the first edition of "Code" was becoming outdated? What are the most notable changes in the foundational elements of computing that have happened in the time since the first edition was published? One of the concepts that I have found most helpful as a software engineer is that of "mechanical sympathy". What are some of the ways that a better understanding of compu
Summary The generation, distribution, and consumption of energy is one of the most critical pieces of infrastructure for the modern world. With the rise of renewable energy there is an accompanying need for systems that can respond in real-time to the availability and demand for electricity. FlexMeasures is an open source energy management system that is designed to integrate a variety of inputs intelligently allocate energy resources to reduce waste in your home or grid. In this episode Nicolas Höning explains how the project is implemented, how it is being used in his startup Seita, and how you can try it out for your own energy needs. Announcements Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great! When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. And now you can launch a managed MySQL, Postgres, or Mongo database cluster in minutes to keep your critical data safe with automated backups and failover. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Your host as usual is Tobias Macey and today I’m interviewing Nicolas Höning about FlexMeasures, an open source project designed to manage energy resources dynamically to improve efficiency Interview Introductions How did you get introduced to Python? Can you describe what FlexMeasures is and the story behind it? What are the primary goals/objectives of the project? The energy sector is huge. Where can FlexMeasures be used? Energy systems are typically governed by a marketplace system. What are the benefits that FlexMeasures can provide for each side of that market? How do renewable sources of energy confuse/complicate the role that the different stakeholders represent? What are the different points of interaction that producers/consumers might have with the FlexMeasures platform? What are some examples of the types of decisions/recommendations that FlexMeasures might generate and how to they manifest in the energy systems? What are the types of information that FlexMeasures relies on for driving those decisions? Can you describe how FlexMeasures is implemented? How have the design and goals of the sy

Summary Your ability to build and maintain a software project is tempered by the strength of the team that you are working with. If you are in a position of leadership, then you are responsible for the growth and maintenance of that team. In this episode Jigar Desai, currently the SVP of engineering at Sisu Data, shares his experience as an engineering leader over the past several years and the useful insights he has gained into how to build effective engineering teams. Announcements Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great! When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. And now you can launch a managed MySQL, Postgres, or Mongo database cluster in minutes to keep your critical data safe with automated backups and failover. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! The biggest challenge with modern data systems is understanding what data you have, where it is located, and who is using it. Select Star’s data discovery platform solves that out of the box, with a fully automated catalog that includes lineage from where the data originated, all the way to which dashboards rely on it and who is viewing them every day. Just connect it to your dbt, Snowflake, Tableau, Looker, or whatever you’re using and Select Star will set everything up in just a few hours. Go to pythonpodcast.com/selectstar today to double the length of your free trial and get a swag package when you convert to a paid plan. Your host as usual is Tobias Macey and today I’m interviewing Jigar Desai about building effective engineering teams Interview Introductions How did you get introduced to Python? What have you found to be the central challenges involved in building an effective engineering team? What are the measures that you use to determine what "effective" means for a given team? how to establish mutual trust in an engineering team challenges introduced at different levels of team size/organizational complexity establishing and managing career ladders You have mostly worked in heavily tech-focused companies. How do industry ver

Summary Working on hardware projects often has significant friction involved when compared to pure software. Brian Pugh enjoys tinkering with microcontrollers, but his "weekend projects" often took longer than a weekend to complete, so he created Belay. In this episode he explains how Belay simplifies the interactions involved in developing for MicroPython boards and how you can use it to speed up your own experimentation. Announcements Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great! When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. And now you can launch a managed MySQL, Postgres, or Mongo database cluster in minutes to keep your critical data safe with automated backups and failover. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! The biggest challenge with modern data systems is understanding what data you have, where it is located, and who is using it. Select Star’s data discovery platform solves that out of the box, with a fully automated catalog that includes lineage from where the data originated, all the way to which dashboards rely on it and who is viewing them every day. Just connect it to your dbt, Snowflake, Tableau, Looker, or whatever you’re using and Select Star will set everything up in just a few hours. Go to pythonpodcast.com/selectstar today to double the length of your free trial and get a swag package when you convert to a paid plan. Your host as usual is Tobias Macey and today I’m interviewing Brian Pugh about Belay, a python library that enables the rapid development of projects that interact with hardware via a micropython-compatible board. Interview Introductions How did you get introduced to Python? Can you describe what Belay is and the story behind it? Who are the target users for Belay? What are some of the points of friction involved in developing for hardware projects? What are some of the features of Belay that make that a smoother process? What are some of the ways that simplifying the develop/debug cycles can improve the overall experience of developing for hardware platforms?

Summary Static typing versus dynamic typing is one of the oldest debates in software development. In recent years a number of dynamic languages have worked toward a middle ground by adding support for type hints. Python’s type annotations have given rise to an ecosystem of tools that use that type information to validate the correctness of programs and help identify potential bugs. At Instagram they created the Pyre project with a focus on speed to allow for scaling to huge Python projects. In this episode Shannon Zhu discusses how it is implemented, how to use it in your development process, and how it compares to other type checkers in the Python ecosystem. Announcements Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. And now you can launch a managed MySQL, Postgres, or Mongo database cluster in minutes to keep your critical data safe with automated backups and failover. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Your host as usual is Tobias Macey and today I’m interviewing Shannon Zhu about Pyre, a type checker for Python 3 built from the ground up to support gradual typing and deliver responsive incremental checks Interview Introductions How did you get introduced to Python? Can you describe what Pyre is and the story behind it? There have been a number of tools created to support various aspects of typing for Python. How would you describe the various goals that they support and how Pyre fits in that ecosystem? What are the core goals and notable features of Pyre? Can you describe how Pyre is implemented? How have the design and goals of the project changed/evolved since you started working on it? What are the different ways that Pyre is used in the development workflow for a team or individual? What are some of the challenges/roadblocks that people run into when adopting type definitions in their Python projects? How has the evolution of type annotations and overall support for them affected your work on Pyre? As someone who is working closely with type systems, what are the strongest aspects of Python’s implementation and

Summary Every software project is subject to a series of decisions and tradeoffs. One of the first decisions to make is which programming language to use. For companies where their product is software, this is a decision that can have significant impact on their overall success. In this episode Sean Knapp discusses the languages that his team at Ascend use for building a service that powers complex and business critical data workflows. He also explains his motivation to standardize on Python for all layers of their system to improve developer productivity. Announcements Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. And now you can launch a managed MySQL, Postgres, or Mongo database cluster in minutes to keep your critical data safe with automated backups and failover. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Your host as usual is Tobias Macey and today I’m interviewing Sean Knapp about his motivations and experiences standardizing on Python for development at Ascend Interview Introductions How did you get introduced to Python? Can you describe what Ascend is and the story behind it? How many engineers work at Ascend? What are their different areas of focus? What are your policies for selecting which technologies (e.g. languages, frameworks, dev tooling, deployment, etc.) are supported at Ascend? What does it mean for a technology to be supported? You recently started standardizing on Python as the default language for development. How has Python been used up to now? What other languages are in common use at Ascend? What are some of the challenges/difficulties that motivated you to establish this policy? What are some of the tradeoffs that you have seen in the adoption of Python in place of your other adopted languages? How are you managing ongoing maintenance of projects/products that are not written in Python? What are some of the potential pitfalls/risks that you are guarding against in your investment in Python? What are the most interesting, innovative, or un

Summary Writing code is only one piece of creating good software. Code reviews are an important step in the process of building applications that are maintainable and sustainable. In this episode On Freund shares his thoughts on the myriad purposes that code reviews serve, as well as exploring some of the patterns and anti-patterns that grow up around a seemingly simple process. Announcements Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. And now you can launch a managed MySQL, Postgres, or Mongo database cluster in minutes to keep your critical data safe with automated backups and failover. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Your host as usual is Tobias Macey and today I’m interviewing On Freund about the intricacies and importance of code reviews Interview Introductions How did you get introduced to Python? Can you start by giving us your description of what a code review is? What is the purpose of the code review? At face value a code review appears to be a simple task. What are some of the subtleties that become evident with time and experience? What are some of the ways that code reviews can go wrong? What are some common anti-patterns that get applied to code reviews? What are the elements of code review that are useful to automate? What are some of the risks/bad habits that can result from overdoing automated checks/fixes or over-reliance on those tools in code reviews? identifying who can/should do a review for a piece of code how to use code reviews as a teaching tool for new/junior engineers how to use code reviews for avoiding siloed experience/promoting cross-training PR templates for capturing relevant context What are the most interesting, innovative, or unexpected ways that you have seen code reviews used? What are the most interesting, unexpected, or challenging lessons that you have learned while leading and supporting engineering teams? What are some resources that you recommend for anyone who wants to learn more about code review strategies and

Summary Quality assurance in the software industry has become a shared responsibility in most organizations. Given the rapid pace of development and delivery it can be challenging to ensure that your application is still working the way it’s supposed to with each release. In this episode Jonathon Wright discusses the role of quality assurance in modern software teams and how automation can help. Announcements Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. And now you can launch a managed MySQL, Postgres, or Mongo database cluster in minutes to keep your critical data safe with automated backups and failover. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Your host as usual is Tobias Macey and today I’m interviewing Jonathon Wright about the role of automation in your testing and QA strategies Interview Introductions How did you get introduced to Python? Can you share your relationship with software testing/QA and automation? What are the main categories of how companies and software teams address testing and validation of their applications? What are some of the notable tradeoffs/challenges among those approaches? With the increased adoption of agile practices and the "shift left" mentality of DevOps, who is responsible for software quality? What are some of the cases where a discrete QA role or team becomes necessary? (or is it always necessary?) With testing and validation being a shared responsibility, competing with other priorities, what role does automation play? What are some of the ways that automation manifests in software quality and testing? How is automation distinct from software tests and CI/CD? For teams who are investing in automation for their applications, what are the questions they should be asking to identify what solutions to adopt? (what are the decision points in the build vs. buy equation?) At what stage(s) of the software lifecycle does automation live? What is the process for identifying which capabilities and interactions to target during the initial application

Summary The goal of every software team is to get their code into production without breaking anything. This requires establishing a repeatable process that doesn’t introduce unnecessary roadblocks and friction. In this episode Ronak Rahman discusses the challenges that development teams encounter when trying to build and maintain velocity in their work, the role that access to infrastructure plays in that process, and how to build automation and guardrails for everyone to take part in the delivery process. Announcements Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. And now you can launch a managed MySQL, Postgres, or Mongo database cluster in minutes to keep your critical data safe with automated backups and failover. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Your host as usual is Tobias Macey and today I’m interviewing Ronak Rahman about how automating the path to production helps to build and maintain development velocity Interview Introductions How did you get introduced to Python? Can you describe what Quali is and the story behind it? What are the problems that you are trying to solve for software teams? How does Quali help to address those challenges? What are the bad habits that engineers fall into when they experience friction with getting their code into test and production environments? How do those habits contribute to negative feedback loops? What are signs that developers and managers need to watch for that signal the need for investment in developer experience improvements on the path to production? Can you describe what you have built at Quali and how it is implemented? How have the design and goals shifted/evolved from when you first started working on it? What are the positive and negative impacts that you have seen from the evolving set of options for application deployments? (e.g. K8s, containers, VMs, PaaS, FaaS, etc.) Can you describe how Quali fits into the workflow of software teams? Once a team has established patterns for deploying their software, what are so

Summary Every startup begins with an idea, but that won’t get you very far without testing the feasibility of that idea. A common practice is to build a Minimum Viable Product (MVP) that addresses the problem that you are trying to solve and working with early customers as they engage with that MVP. In this episode Tony Pavlovych shares his thoughts on Python’s strengths when building and launching that MVP and some of the potential pitfalls that businesses can run into on that path. Announcements Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. And now you can launch a managed MySQL, Postgres, or Mongo database cluster in minutes to keep your critical data safe with automated backups and failover. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Your host as usual is Tobias Macey and today I’m interviewing Tony Pavlovych about Python’s strengths for startups and the steps to building an MVP (minimum viable product) Interview Introductions How did you get introduced to Python? Can you describe what PLANEKS is and the story behind it? One of the services that you offer is building an MVP. What are the goals and outcomes associated with an MVP? What is the process for identifying the product focus and feature scope? What are some of the common misconceptions about building and launching MVPs that you have dealt with in your work with customers? What are the common pitfalls that companies encounter when building and validating an MVP? Can you describe the set of tools and frameworks (e.g. Django, Poetry, cookiecutter, etc.) that you have invested in to reduce the overhead of starting and maintaining velocity on multiple projects? What are the configurations that are most critical to keep constant across projects to maintain familiarity and sanity for your developers? (e.g. linting rules, build toolchains, etc.) What are the architectural patterns that you have found most useful to make MVPs flexible for adaptation and extension? Once the MVP is built and launched, what are the next steps to validate the product and

Summary Application architectures have been in a constant state of evolution as new infrastructure capabilities are introduced. Virtualization, cloud, containers, mobile, and now web assembly have each introduced new options for how to build and deploy software. Recognizing the transformative potential of web assembly, Matt Butcher and his team at Fermyon are investing in tooling and services to improve the developer experience. In this episode he explains the opportunity that web assembly offers to all language communities, what they are building to power lightweight server-side microservices, and how Python developers can get started building and contributing to this nascent ecosystem. Announcements Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. And now you can launch a managed MySQL, Postgres, or Mongo database cluster in minutes to keep your critical data safe with automated backups and failover. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Need to automate your Python code in the cloud? Want to avoid the hassle of setting up and maintaining infrastructure? Shipyard is the premier orchestration platform built to help you quickly launch, monitor, and share python workflows in a matter of minutes with 0 changes to your code. Shipyard provides powerful features like webhooks, error-handling, monitoring, automatic containerization, syncing with Github, and more. Plus, it comes with over 70 open-source, low-code templates to help you quickly build solutions with the tools you already use. Go to dataengineeringpodcast.com/shipyard to get started automating with a free developer plan today! Your host as usual is Tobias Macey and today I’m interviewing Matt Butcher about Fermyon and the impact of WebAssembly on software architecture and deployment across language boundaries Interview Introductions How did you get introduced to Python? For anyone who isn’t familiar with WebAssembly can you give your elevator pitch for why it matters? What is the current state of language support

Summary As your code scales beyond a trivial level of complexity and sophistication it becomes difficult or impossible to know everything that it is doing. The flow of logic and data through your software and which parts are taking the most time are impossible to understand without help from your tools. VizTracer is the tool that you will turn to when you need to know all of the execution paths that are being exercised and which of those paths are the most expensive. In this episode Tian Gao explains why he created VizTracer and how you can use it to gain a deeper familiarity with the code that you are responsible for maintaining. Announcements Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. And now you can launch a managed MySQL, Postgres, or Mongo database cluster in minutes to keep your critical data safe with automated backups and failover. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Need to automate your Python code in the cloud? Want to avoid the hassle of setting up and maintaining infrastructure? Shipyard is the premier orchestration platform built to help you quickly launch, monitor, and share python workflows in a matter of minutes with 0 changes to your code. Shipyard provides powerful features like webhooks, error-handling, monitoring, automatic containerization, syncing with Github, and more. Plus, it comes with over 70 open-source, low-code templates to help you quickly build solutions with the tools you already use. Go to dataengineeringpodcast.com/shipyard to get started automating with a free developer plan today! Your host as usual is Tobias Macey and today I’m interviewing Tian Gao about VizTracer, a low-overhead logging/debugging/profiling tool that can trace and visualize your python code execution Interview Introductions How did you get introduced to Python? Can you describe what VizTracer is and the story behind it? What are the main goals that you are focused on with VizTracer? What are some examples of the types of bugs that profiling can h

Summary Analysis of streaming data in real time has long been the domain of big data frameworks, predominantly written in Java. In order to take advantage of those capabilities from Python requires using client libraries that suffer from impedance mis-matches that make the work harder than necessary. Bytewax is a new open source platform for writing stream processing applications in pure Python that don’t have to be translated into foreign idioms. In this episode Bytewax founder Zander Matheson explains how the system works and how to get started with it today. Announcements Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. And now you can launch a managed MySQL, Postgres, or Mongo database cluster in minutes to keep your critical data safe with automated backups and failover. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! The biggest challenge with modern data systems is understanding what data you have, where it is located, and who is using it. Select Star’s data discovery platform solves that out of the box, with a fully automated catalog that includes lineage from where the data originated, all the way to which dashboards rely on it and who is viewing them every day. Just connect it to your dbt, Snowflake, Tableau, Looker, or whatever you’re using and Select Star will set everything up in just a few hours. Go to pythonpodcast.com/selectstar today to double the length of your free trial and get a swag package when you convert to a paid plan. Need to automate your Python code in the cloud? Want to avoid the hassle of setting up and maintaining infrastructure? Shipyard is the premier orchestration platform built to help you quickly launch, monitor, and share python workflows in a matter of minutes with 0 changes to your code. Shipyard provides powerful features like webhooks, error-handling, monitoring, automatic containerization, syncing with Github, and more. Plus, it comes with over 70 open-source, low-code templates to help you quickly build solutions with the tools you already use. Go to <a href="https://www.dataengineeringpodcast.com/shipyard?utm_source=rss&utm_me

Summary Building a fully functional web application has been growing in complexity along with the growing popularity of javascript UI frameworks such as React, Vue, Angular, etc. Users have grown to expect interactive experiences with dynamic page updates, which leads to duplicated business logic and complex API contracts between the server-side application and the Javascript front-end. To reduce the friction involved in writing and maintaining a full application Sam Willis created Tetra, a framework built on top of Django that embeds the Javascript logic into the Python context where it is used. In this episode he explains his design goals for the project, how it has helped him build applications more rapidly, and how you can start using it to build your own projects today. Announcements Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. And now you can launch a managed MySQL, Postgres, or Mongo database cluster in minutes to keep your critical data safe with automated backups and failover. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! So now your modern data stack is set up. How is everyone going to find the data they need, and understand it? Select Star is a data discovery platform that automatically analyzes & documents your data. For every table in Select Star, you can find out where the data originated, which dashboards are built on top of it, who’s using it in the company, and how they’re using it, all the way down to the SQL queries. Best of all, it’s simple to set up, and easy for both engineering and operations teams to use. With Select Star’s data catalog, a single source of truth for your data is built in minutes, even across thousands of datasets. Try it out for free and double the length of your free trial today at pythonpodcast.com/selectstar. You’ll also get a swag package when you continue on a paid plan. Need to automate your Python code in the cloud? Want to avoid the hassle of setting up and maintaining infrastructure? Shipyard is the premier orchestration platform built to help you quickly launch, monitor, and share python workflows in a matter of mi

Summary Virtually everything that you interact with on a daily basis and many other things that make modern life possible were designed and modeled in software called CAD or Computer-Aided Design. These programs are advanced suites with graphical editing environments tailored to domain experts in areas such as mechanical engineering, electrical engineering, architecture, etc. While the UI-driven workflow is more accessible, it isn’t scalable which opens the door to code-driven workflows. In this episode Jeremy Wright discusses the design, uses, and benefits of the CadQuery framework for building 3D CAD models entirely in Python. Announcements Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. And now you can launch a managed MySQL, Postgres, or Mongo database cluster in minutes to keep your critical data safe with automated backups and failover. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! So now your modern data stack is set up. How is everyone going to find the data they need, and understand it? Select Star is a data discovery platform that automatically analyzes & documents your data. For every table in Select Star, you can find out where the data originated, which dashboards are built on top of it, who’s using it in the company, and how they’re using it, all the way down to the SQL queries. Best of all, it’s simple to set up, and easy for both engineering and operations teams to use. With Select Star’s data catalog, a single source of truth for your data is built in minutes, even across thousands of datasets. Try it out for free and double the length of your free trial today at pythonpodcast.com/selectstar. You’ll also get a swag package when you continue on a paid plan. Need to automate your Python code in the cloud? Want to avoid the hassle of setting up and maintaining infrastructure? Shipyard is the premier orchestration platform built to help you quickly launch, monitor, and share python workflows in a matter of minutes with 0 changes to your code. Shipyard provides powerful features like webhooks, error-handling, monitoring, automatic containerization, syncing

Summary Building any software project is going to require relying on dependencies that you and your team didn’t write or maintain, and many of those will have dependencies of their own. This has led to a wide variety of potential and actual issues ranging from developer ergonomics to application security. In order to provide a higher degree of confidence in the optimal combinations of direct and transitive dependencies a team at Red Hat started Project Thoth. In this episode Fridolín Pokorný explains how the Thoth resolver uses multiple signals to find the best combination of dependency versions to ensure compatibility and avoid known security issues. Announcements Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. And now you can launch a managed MySQL, Postgres, or Mongo database cluster in minutes to keep your critical data safe with automated backups and failover. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Need to automate your Python code in the cloud? Want to avoid the hassle of setting up and maintaining infrastructure? Shipyard is the premier orchestration platform built to help you quickly launch, monitor, and share python workflows in a matter of minutes with 0 changes to your code. Shipyard provides powerful features like webhooks, error-handling, monitoring, automatic containerization, syncing with Github, and more. Plus, it comes with over 70 open-source, low-code templates to help you quickly build solutions with the tools you already use. Go to dataengineeringpodcast.com/shipyard to get started automating with a free developer plan today! Your host as usual is Tobias Macey and today I’m interviewing Fridolín Pokorný about Project Thoth, a resolver service that computes the optimal combination of versions for your dependencies Interview Introductions How did you get introduced to Python? Can you describe what Project Thoth is and the story behind it? What are some examples of the types of problems that can be introduced by mismanaged dependency versions? <l

Summary Most developers have encountered code completion systems and rely on them as part of their daily work. They allow you to stay in the flow of programming, but have you ever stopped to think about how they work? In this episode Meredydd Luff takes us behind the scenes to dig into the mechanics of code completion engines and how you can customize them to fit your particular use case. Announcements Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Your host as usual is Tobias Macey and today I’m interviewing Meredydd Luff about how code completion works and what it takes to build your own Interview Introductions How did you get introduced to Python? Most programmers are familiar with the idea of code completion, but can you just give the elevator pitch to get us all on the same page? You gave a presentation recently at PyCon about how to build a code completion system. What was your approach to identifying what fundamental concepts needed to be addressed and how to fit that lesson into the available time? In the presentation you mentioned that you had built a more full-featured completion engine into Anvil. Can you describe what possessed you to build your own code completion tool? What are the core components required to build a completion engine? What are the benefits that can be realized by customizing the completion engine for a given language or task? Can you describe the feature set and implementation details of the full-fledged completion engine that is available in Anvil? Beyond the toy example, there are a number of considerations to address if you want to make the completion engine "production grade". Can you talk through some of the obvious edge cases and how to solve for them? (e.g. handling parsing of incomplete code) What are the inputs that you use to build up the list of candidate tokens for completion? Once you have a functioning baseline for offering completions, what are some of the signals that you hook into for ranking suggestions? In your p

Summary Russell Keith-Magee is an accomplished engineer and a fixture of the Python community. His work on the Beeware suite of projects is one of the most ambitious undertakings in the ecosystem and unfailingly forward-looking. With his recent transition to working for Anaconda he is now able to dedicate his full focus to the effort. In this episode he reflects on the journey that he has taken so far, how Beeware is helping to address some of the threats to Python’s long term viability, and how he envisions its future in light of the recent release of PyScript, an in-browser runtime for Python. Announcements Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Your host as usual is Tobias Macey and today I’m interviewing Russell Keith-Magee about the latest status of the Beeware project, the state of Python’s black swans, and how the PyScript project ties into his ambitions for world domination Interview Introductions How did you get introduced to Python? For anyone who hasn’t been graced with the BeeWare vision, can you give the elevator pitch of what it is and why it matters? At PyCon US 2019 you presented a keynote about the various potential threats to the Python language community and its future viability. With the clarity of 3 years hindsight, how has the landscape shifted? What is PyScript and how does it fit into the venn diagram of BeeWare’s objectives and the portents of black swan events (and what is your involvement with it)? How does it differ from the dozens of other "Python in the browser" and "Python transpiled to Javascript" projects that have sprouted over the years? Now that you have been granted the opportunity to dedicate your full attention to BeeWare and build a team to support it, what new potential does that unlock? What are the current areas of focus/challenges that you are spending your time on for the BeeWare project? What are some of the efforts in the BeeWare suite that proved to be dead-ends? What are the most interesting, innovative, or unexpected ways t

Summary Digital cameras and the widespread availability of smartphones has allowed us all to generate massive libraries of personal photographs. Unfortunately, now we are all left to our own devices of how to manage them. While cloud services such as iPhotos and Google Photos are convenient, they aren’t always affordable and they put your pictures under the control of large companies with their own agendas. LibrePhotos is an open source and self-hosted alternative to these services that puts you in control of your digital memories. In this episode the maintainer of LibrePhotos, Niaz Faridani-Rad, explains how he got involved with the project, the capabilities that it offers for managing your image library, and how to get your own instance set up to take back control of your pictures. Announcements Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! This episode is sponsored by Mergify. It’s an amazing tool to make you and your team way more productive with GitHub. Mergify is all about leveling up your pull requests with useful features that eliminate busy work. Automatic merges allow you define the conditions for acceptance and Mergify will take care of merging the pull request as soon as it’s ready. Automatic updates take care of merging your pull requests serially on top of each other, so there is no way to introduce a regression. With a merge queue you can merge your urgent pull request first, organize your Prs as you wish and Mergify will merge them in that order. Mergify’s backports feature will even copy the pull request into another branch once the pull request has been merged, shipping your bug fixes on multiple branches automatically. By saving time you and your team can focus on projects that matter. Mergify is coordinated with any CI and fully integrated into GitHub. They have a Startup Program that offers a 12 months credit to leverage Mergify (up to $21,000 of value). Start saving time; visit pythonpodcast.com/mergify today to sign up for a demo and get started! Or just click the link in the show notes. <li

Summary Investing effectively is largely a game of information access and analysis. This can involve a substantial amount of research and time spent on finding, validating, and acquiring different information sources. In order to reduce the barrier to entry and provide a powerful framework for amateur and professional investors alike Didier Rodrigues Lopes created the OpenBB Terminal. In this episode he explains how a pandemic project that started as an experiment has led to him founding a new company and dedicating his time to growing and improving the project and its community. Announcements Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Your host as usual is Tobias Macey and today I’m interviewing Didier Rodrigues Lopes about the OpenBB Terminal, a modern Python-based integrated environment for investment research Interview Introductions How did you get introduced to Python? Can you describe what OpenBB is and the story behind it? What is the problem that you are trying to address by creating the OpenBB project and providing it as open source? What are some of the use cases where someone might need to use this project? The elephant in the room for financial data research is the Bloomberg Terminal. What are the other tools or services available for that purpose? What are the differentiating features of the OpenBB Terminal? Can you describe how the OpenBB Terminal is implemented? How have the design and goals/scope of the project changed since you started working on it? Can you describe a typical workflow for someone who is using the OpenBB Terminal? How have you approached the user experience design, and what are you optimizing for? What kinds of utilities do you offer beyond raw data access? What are some examples of data sources that you rely on? What is involved in integrating a new data source? What are the extension points and integration capabilities for expanding the functionality

Summary The experimentation phase of building a machine learning model requires a lot of trial and error. One of the limiting factors of how many experiments you can try is the length of time required to train the model which can be on the order of days or weeks. To reduce the time required to test different iterations Rolando Garcia Sanchez created FLOR which is a library that automatically checkpoints training epochs and instruments your code so that you can bypass early training cycles when you want to explore a different path in your algorithm. In this episode he explains how the tool works to speed up your experimentation phase and how to get started with it. Announcements Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Your host as usual is Tobias Macey and today I’m interviewing Rolando Garcia about FLOR, a suite of machine learning tools for hindsight logging that lets you speed up model experimentation by checkpointing training data Interview Introductions How did you get introduced to Python? Can you describe what FLOR is and the story behind it? What is the core problem that you are trying to solve for with FLOR? What are the fundamental challenges in model training and experimentation that make it necessary? How do machine learning reasearchers and engineers address this problem in the absence of something like FLOR? Can you describe how FLOR is implemented? What were the core engineering problems that you had to solve for while building it? What is the workflow for integrating FLOR into your model development process? What information are you capturing in the log structures and epoch checkpoints? How does FLOR use that data to prime the model training to a given state when backtracking and trying a different approach? How does the presence of FLOR change the costs of ML experimentation and what is the long-range impact of that shift? Once a model has been trained and optimized, what is the long-term utility of FLOR?</l

Summary Programmers love to automate tedious processes, including refactoring your code. In order to support the creation of code modifications for your Python projects Jimmy Lai created LibCST. It provides a richly typed and high level API for creating and manipulating concrete syntax trees of your source code. In this episode Jimmy Lai and Zsolt Dollenstein explain how it works, some of the linting and automatic code modification utilities that you can build with it and how to get started with using it to maintain your own Python projects. Announcements Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Your host as usual is Tobias Macey and today I’m interviewing Zsolt Dollenstein and Jimmy Lai about LibCST, a concrete syntax tree parser and serializer library for Python Interview Introductions How did you get introduced to Python? Can you describe what LibCST is and the story behind it? How does a concrete syntax tree differ from an abstract syntax tree? What are some of the situations where the preservation of the exact structure is necessary? There are a few other libraries in Python for creating concrete syntax trees. What was missing in the available options that made it necessary to create LibCST? What are the use cases that LibCST is focused on supporting Can you describe how LibCST is implemented? How have the design and goals of the project changed or evolved since you started working on it? How might I use LibCST for something like restructuring a set of modules to move a function definition while maintaining proper imports? How do the capabilities of LibCST for codemodding compare to the Rope framework? What are some other workflows that someone might build with LibCST? What are some of the ways that LibCST is being used in your own work? What are the most interesting, innovative, or unexpected ways that you have seen LibCST used? What are the most interesting, unexpected, or challenging lessons that yo

Summary Communication is a fundamental requirement for any program or application. As the friction involved in deploying code has gone down, the motivation for architecting your system as microservices goes up. This shifts the communication patterns in your software from function calls to network calls. In this episode Idit Levine explains how the Gloo platform that she and her team at Solo have created makes it easier for you to configure and monitor the network topologies for your microservice environments. She also discusses what developers need to know about networking in cloud native environments and how a combination of API gateways and service mesh technologies allow you to more rapidly iterate on your systems. Announcements Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Your host as usual is Tobias Macey and today I’m interviewing Idit Levine about what developers need to know about service-oriented networking and her work at Solo on the Gloo project Interview Introductions How did you get introduced to Python? Can you describe what Solo is and the story behind it? How much should developers need to know about the ways that their applications and services are communicating? What is the current state of networking for applications across physical, cloud, and containerized environments? How do service mesh features influence the architectural decisions that software teams make while building their applications? What operational capabilities do they unlock? What are the aspects of application networking that are simplified or enhanced by service mesh platforms? In what ways has service mesh introduced new complexity to operating software systems? How can developers mirror the network topologies for production environments while working on new features? What are the most interesting, innovative, or unexpected ways that you have seen Gloo used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on

Summary Cloud native architectures have been gaining prominence for the past few years due to the rising popularity of Kubernetes. This introduces new complications to development workflows due to the need to integrate with multiple services as you build new components for your production systems. In order to reduce the friction involved in developing applications for cloud native environments Michael Schilonka created Gefyra. In this episode he explains how it connects your local machine to a running Kubernetes environment so that you can rapidly iterate on your software in the context of the whole system. He also shares how the Django Hurricane plugin lets your applications work closely with the Kubernetes process model. Announcements Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! So now your modern data stack is set up. How is everyone going to find the data they need, and understand it? Select Star is a data discovery platform that automatically analyzes & documents your data. For every table in Select Star, you can find out where the data originated, which dashboards are built on top of it, who’s using it in the company, and how they’re using it, all the way down to the SQL queries. Best of all, it’s simple to set up, and easy for both engineering and operations teams to use. With Select Star’s data catalog, a single source of truth for your data is built in minutes, even across thousands of datasets. Try it out for free and double the length of your free trial today at pythonpodcast.com/selectstar. You’ll also get a swag package when you continue on a paid plan. Your host as usual is Tobias Macey and today I’m interviewing Michael Schilonka about Gefyra and what is involved with developing applications for Kubernetes environments Interview Introductions How did you get introduced to Python? Can you describe what Gefyra is and the story behind it? What are the challenges that Kubernetes introduces to the development process? What are some

Summary Science is founded on the collection and analysis of data. For disciplines that rely on data about the earth the ability to simulate and generate that data has been growing faster than the tools for analysis of that data can keep up with. In order to help scale that capacity for everyone working in geosciences the Pangeo project compiled a reference stack that combines powerful tools into an out-of-the-box solution for researchers to be productive in short order. In this episode Ryan Abernathy and Joe Hamman explain what the Pangeo project really is, how they have integrated a combination of XArray, Dask, and Jupyter to power these analytical workflows, and how it has helped to accelerate research on multidimensional geospatial datasets. Announcements Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! So now your modern data stack is set up. How is everyone going to find the data they need, and understand it? Select Star is a data discovery platform that automatically analyzes & documents your data. For every table in Select Star, you can find out where the data originated, which dashboards are built on top of it, who’s using it in the company, and how they’re using it, all the way down to the SQL queries. Best of all, it’s simple to set up, and easy for both engineering and operations teams to use. With Select Star’s data catalog, a single source of truth for your data is built in minutes, even across thousands of datasets. Try it out for free and double the length of your free trial today at pythonpodcast.com/selectstar. You’ll also get a swag package when you continue on a paid plan. Your host as usual is Tobias Macey and today I’m interviewing Ryan Abernathy and Joe Hamman about Pangeo, a community platform for Big Data geoscience Interview Introductions How did you get introduced to Python? Can you describe what Pangeo is and the story behind it? What is your role in the project/community and how did you get involved? What

Summary A common piece of advice when starting anything new is to "begin with the end in mind". In order to help the engineers at Wayfair manage the complete lifecycle of their applications Joshua Woodward runs a team that provides tooling and assistance along every step of the journey. In this episode he shares some of the lessons and tactics that they have developed while assisting other engineering teams with starting, deploying, and sunsetting projects. This is an interesting look at the inner workings of large organizations and how they invest in the scaffolding that supports their myriad efforts. Announcements Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! So now your modern data stack is set up. How is everyone going to find the data they need, and understand it? Select Star is a data discovery platform that automatically analyzes & documents your data. For every table in Select Star, you can find out where the data originated, which dashboards are built on top of it, who’s using it in the company, and how they’re using it, all the way down to the SQL queries. Best of all, it’s simple to set up, and easy for both engineering and operations teams to use. With Select Star’s data catalog, a single source of truth for your data is built in minutes, even across thousands of datasets. Try it out for free and double the length of your free trial today at pythonpodcast.com/selectstar. You’ll also get a swag package when you continue on a paid plan. Your host as usual is Tobias Macey and today I’m interviewing Joshua Woodward about how the application lifecycle team at Wayfair uses Python to Interview Introductions Josh Woodward, for the past year have been managing the application lifecycle team at Wayfair. Prior to that, IC on python platforms team. Embed with teams looking to decouple from monolith. See pain points first hand. How did you get introduced to Python? High school physics class, TI84 Calculator, friend wrote a program to solve v
Summary Kubernetes is a framework that aims to simplify the work of running applications in production, but it forces you to adopt new patterns for debugging and resolving issues in your systems. Robusta is aimed at making that a more pleasant experience for developers and operators through pre-built automations, easy debugging, and a simple means of creating your own event-based workflows to find, fix, and alert on errors in production. In this episode Natan Yellin explains how the project got started, how it is architected and tested, and how you can start using it today to keep your Python projects running reliably. Announcements Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! So now your modern data stack is set up. How is everyone going to find the data they need, and understand it? Select Star is a data discovery platform that automatically analyzes & documents your data. For every table in Select Star, you can find out where the data originated, which dashboards are built on top of it, who’s using it in the company, and how they’re using it, all the way down to the SQL queries. Best of all, it’s simple to set up, and easy for both engineering and operations teams to use. With Select Star’s data catalog, a single source of truth for your data is built in minutes, even across thousands of datasets. Try it out for free and double the length of your free trial today at pythonpodcast.com/selectstar. You’ll also get a swag package when you continue on a paid plan. Your host as usual is Tobias Macey and today I’m interviewing Natan Yellin about Robusta, Interview Introductions How did you get introduced to Python? Can you describe what Robusta is and the story behind it? What are some of the challenges that teams face when running their systems in Kubernetes? How does Robusta help address those difficulties? How does Robusta compare to e.g. Rookout? What are some of the ways that Robusta is able to provide specific

Summary Building a machine learning application is inherently complex. Once it becomes necessary to scale the operation or training of the model, or introduce online re-training the process becomes even more challenging. In order to reduce the operational burden of AI developers Robert Nishihara helped to create the Ray framework that handles the distributed computing aspects of machine learning operations. To support the ongoing development and simplify adoption of Ray he co-founded Anyscale. In this episode he re-joins the show to share how the project, its community, and the ecosystem around it have grown and evolved over the intervening two years. He also explains how the techniques and adoption of machine learning have influenced the direction of the project. Announcements Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Your host as usual is Tobias Macey and today I’m interviewing Robert Nishihara about his work at Anyscale and the Ray distributed execution framework Interview Introductions How did you get introduced to Python? Can you describe what Anyscale is and the story behind it? How has the Ray project and ecosystem evolved since we last spoke? (2 years ago) How has the landscape of AI/ML technologies and techniques shifted in that time? What are the main areas where organizations are trying to apply ML/AI? What are some of the issues that teams encounter when trying to move from prototype to production with ML/AI applications? What are the features of Ray that help to mitigate those challenges? With the introduction of more widely available streaming/real-time technologies the viability of reinforcement learning has increased. What new challenges does that approach introduce? What are some of the operational complexities associated with managing a deployment of Ray? What are some of the specialized utilities that you have had to develop to maintain a large and multi-tenant platform for your customers? What is the governance mod

Summary As software projects grow and change it can become difficult to keep track of all of the logical flows. By visualizing the interconnections of function definitions, classes, and their invocations you can speed up the time to comprehension for newcomers to a project, or help yourself remember what you worked on last month. In this episode Scott Rogowski shares his work on Code2Flow as a way to generate a call graph of your programs. He explains how it got started, how it works, and how you can start using it to understand your Python, Ruby, and PHP projects. Announcements Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Subsurface Live is the cloud data lake conference, a virtual conference where data engineers, data scientists, data architects, and data analysts can gather and hear about cloud data lakes and the data ecosystem. Subsurface Live Winter 2022 includes keynote talks from Bill Inmon, the father of the data warehouse, Author of Deep Work Cal Newport, and several more from companies such as Dremio, AWS, dbt, and more. Subsurface will also have many breakout sessions featuring Pandas creator Wes McKinney, Apache Superset & Airflow creator Maxime Beauchemin, and engineers from Apple, Uber, Adobe, Bloomberg, and more. Meet other data professionals and learn about the data technologies and practices helping companies meet their current and future data needs. Register today at pythonpodcast.com/subsurface Your host as usual is Tobias Macey and today I’m interviewing Scott Rogowski about Code2Flow, a utility for generating "pretty good" call graphs for dynamic languages Interview Introductions How did you get introduced to Python? Can you describe what Code2Flow is and the story behind it? What are some of the ways that a program’s call graph might be used? How does the visual representation generated by Code2Flow help with exploring the structure of a project? What are some of the alternative approaches/tools that might be us

Summary One of the most persistent challenges faced by organizations of all sizes is the recording and distribution of institutional knowledge. In technical teams this is exacerbated by the need to incorporate technical review feedback and manage access to data before publishing. When faced with this problem as an early data scientist at AirBnB, Chetan Sharma helped create the Knowledge Repo project as a solution. In this episode he shares the story behind its creation and growth, how and why it was released as open source, and the features that make it a compelling option for your own team’s knowledge management journey. Announcements Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Your host as usual is Tobias Macey and today I’m interviewing Chetan Sharma about Knowledge Repo, an open source framework for managing documentation for technical users Interview Introductions How did you get introduced to Python? EE + CS/AI + Stats degrees Airbnb working on ML models Knowledge Repo itself Can you describe what Knowledge Repo is and the story behind it? We started seeing interviewees use ipython notebooks, thought they were great Wanted to push more people to use notebooks, but they weren’t very shareable, vettable Existing notebook hosting services weren’t very good, and weren’t built for people who aren’t data stakeholders. It was especially poor with images, annoying cell blocks Made a simple post processor to remove cell blocks, push the images to s3, and host on flask Once we were pushing notebooks into a Github repo for hosting on a flask app, so many things became possible Review cycles Shareability / collaboration features Indexing / searching Concurrently, great work was happening on developing internal R packages / python libraries to provide consistent, branded aesthetics What are some of the approaches that teams typically take for recording and sharing institutional k

Summary Software development is a complex undertaking due to the number of options available and choices to be made in every stage of the lifecycle. In order to make it more scaleable it is necessary to establish common practices and patterns and introduce strong opinions. One area that can have a huge impact on the productivity of the engineers engaged with a project is the tooling used for building, validating, and deploying changes introduced to the software. In this episode maintainers of the Pants build tool Eric Arellano, Stu Hood, and Andreas Stenius discuss the recent updates that add support for more languages, efforts made to simplify its adoption, and the growth of the community that uses it. They also explore how using Pants as the single entry point for all of your routine tasks allows you to spend your time on the decisions that matter. Announcements Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Building data integration workflows is time consuming and tedious, requiring an unpleasant amount of boilerplate code to do it right. Rivery is a managed platform for building our ELT pipelines that offers the industry’s first native integration with Python, allowing you to seamlessly load and export Pandas dataframes to and from all of your databases, services, and data warehouses with a few clicks and no extra code. Rivery is hosting a live demo of their first class Python support on February 22nd, and when you use the promo code "Python" during registration you will be entered to win a brand new series 7 apple watch. Go to pythonpodcast.com/rivery today to learn more and register. Your host as usual is Tobias Macey and today I’m interviewing Eric Arellano, Stu Hood, and Andreas Stenius about the Pants build tool and all of the work that has gone into it recently Interview Introductions How did you get introduced to Python? Can you describe what Pants is and the story behind it? What is the scope of concerns that Pants is focused on addressing?

Summary It doesn’t matter how amazing your application is if you are unable to deliver it to your users. Frustrated with the rampant complexity involved in building and deploying software Vlad A. Ionescu created the Earthly tool to reduce the toil involved in creating repeatable software builds. In this episode he explains the complexities that are inherent to building software projects and how he designed the syntax and structure of Earthly to make it easy to adopt for developers across all language environments. By adopting Earthly you can use the same techniques for building on your laptop and in your CI/CD pipelines. Announcements Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Your host as usual is Tobias Macey and today I’m interviewing Vlad A. Ionescu about Earthly, a syntax and runtime for software builds to reduce friction between development and delivery Interview Introductions How did you get introduced to Python? Can you describe what Earthly is and the story behind it? What are the core principles that engineers should consider when designing their build and delivery process? What are some of the common problems that engineers run into when they are designing their build process? What are some of the challenges that are unique to the Python ecosystem? What is the role of Earthly in the overall software lifecycle? What are the other tools/systems that a team is likely to use alongside Earthly? What are the components that Earthly might replace? How is Earthly implemented? What were the core design requirements when you first began working on it? How have the design and goals of Earthly changed or evolved as you have explored the problem further? What is the workflow for a Python developer to get started with Earthly? How can Earthly help with the challenge of managing Javascript and CSS assets for web application projects? What are some of the challenges (technical, conceptual, or organizat

Summary The process of getting software delivered to an environment where users can interact with it requires many steps along the way. In some cases the journey can require a large number of interdependent workflows that need to be orchestrated across technical and organizational boundaries, making it difficult to know what the current status is. Faced with such a complex delivery workflow the engineers at Ericsson created a message based protocol and accompanying tooling to let the various actors in the process provide information about the events that happened across the different stages. In this episode Daniel Ståhl and Magnus Bäck explain how the Eiffel protocol allows you to build a tooling agnostic visibility layer for your software delivery process, letting you answer all of your questions about what is happening between writing a line of code and your users executing it. Announcements Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Your host as usual is Tobias Macey and today I’m interviewing Daniel Ståhl and Magnus Bäck about Eiffel, an open protocol for platform agnostic communication for CI/CD systems Interview Introductions How did you get introduced to Python? Can you describe what Eiffel is and the story behind it? What are the goals of the Eiffel protocol and ecosystem? What is the role of Python in the Eiffel ecosystem? What are some of the types of questions that someone might ask about their CI/CD workflow? How does Eiffel help to answer those questions? Who are the personas that you would expect to interact with an Eiffel system? Can you describe the core architectural elements required to integrate Eiffel into the software lifecycle? How have the design and goals of the Eiffel protocol/architecture changed or evolved since you first began working on it? What are some example workflows that an engineering/product team might build with Eiffel? What are some of the challenges that teams encounter when integrating Eiffel in

Summary When we are creating applications we spend a significant amount of effort on optimizing the experience of our end users to ensure that they are able to complete the tasks that the system is intended for. A similar effort that we should all consider is optimizing the developer experience for ourselves and other engineers who contribute to the projects that we work on. Adam Johnson recently wrote a book on how to improve the developer experience for Django projects and in this episode he shares some of the insights that he has gained through that project and his work with clients to help you improve the experience that you and your team have when collaborating on software development. Announcements Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Your host as usual is Tobias Macey and today I’m interviewing Adam Johnson about optimizing your developer experience Interview Introductions How did you get introduced to Python? Can you describe what you mean by the term "developer experience"? How does it compare to the concept of user experience design? What are the main goals that you aim for through improving DX? When considering DX, what are the categories of focus for improvement? (e.g. the experience of a given software project, the developer’s physical environment, their editing environment, etc.) What are some of the most high impact optimizations that a developer can make? What are some of the areas of focus that have the most variable impact on a developer’s experience of a project? What are some of the most helpful tools or practices that you rely on in your own projects? How does the size of a development team or the scale of an organization impact the decisions and benefits around DX improvements? One of the perennial challenges with selecting a given tool or architectural pattern is the continually changing landscape of software. How have your choices for DX strategies changed or evolved over the years? What are the most interesting, innovative, or u

Summary Pandas has grown to be a ubiquitous tool for working with data at every stage. It has become so well known that many people learn Python solely for the purpose of using Pandas. With all of this activity and the long history of the project it can be easy to find misleading or outdated information about how to use it. In this episode Matt Harrison shares his work on the book "Effective Pandas" and some of the best practices and potential pitfalls that you should know for applying Pandas in your own work. Announcements Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Your host as usual is Tobias Macey and today I’m interviewing Matt Harrison about best practices for using Pandas for data exploration, manipulation, and analysis Interview Introductions How did you get introduced to Python? What motivated you to write a book about Pandas? There are a number of books available that cover some aspect of the Pandas framework or its application. What was missing from the available literature? Who is your target audience for this book? What are some of the most surprising things that you have learned about Pandas while working on this book? What are the sharp edges that you see newcomers to pandas run into most frequently? It is easy to use Pandas in a naive manner and get things done. What are some of the bad habits that you have seen people form in their work with Pandas? How and when do those habits become harmful? What are the most interesting, innovative, or unexpected ways that you have seen Pandas used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on this book? What are some of the projects that you are planning to work on in the near/medium term? Keep In Touch Website <a href="https://twitter.com/__mharrison__?lang=en&utm_source=rss&utm_medium=rss" target

Summary Developers hate wasting effort on manual processes when we can write code to do it instead. Cog is a tool to manage the work of automating the creation of text inside another file by executing arbitrary Python code. In this episode Ned Batchelder shares the story of why he created Cog in the first place, some of the interesting ways that he uses it in his daily work, and the unique challenges of maintaining a project with a small audience and a well defined scope. Announcements Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Your host as usual is Tobias Macey and today I’m interviewing Ned Batchelder about Cog, a tool for generating files or text from embedded Python logic Interview Introductions How did you get introduced to Python? Can you describe what Cog is and the story behind it? What are the use cases that you initially created Cog to address? What were the shortcomings or extraneous overhead that you encountered in tools such as Jinja, Mako, Genshi, etc. that led you to create a new tool? What was your path from a quick and dirty script that suited your own purposes to turning it into a niche open source project that was general and stable enough for the broader community? One of your claims to fame is your role as the maintainer for coverage.py. How has your experience managing such a widely used project translated to the relatively small and low traffic project like Cog? Can you describe how Cog is implemented? How did you approach the design of the syntactic elements for embedding Python code into a host file? What is the workflow for someone using Cog to generate all or parts of a file? How does the introduction of third party dependencies impact the viability and utility of Cog as compared to other templating systems? What are the most interesting, innovative, or unexpected ways that you have seen Cog used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on Cog? <

Summary Statistical regression models are a staple of predictive forecasts in a wide range of applications. In this episode Matthew Rudd explains the various types of regression models, when to use them, and his work on the book "Regression: A Friendly Guide" to help programmers add regression techniques to their toolbox. Announcements Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Your host as usual is Tobias Macey and today I’m interviewing Matthew Rudd about the applications of statistical modeling and regression, and how to start using it for your work Interview Introductions How did you get introduced to Python? Can you start by describing some use cases for statistical regression? What was your motivation for writing a book to explain this family of algorithms to programmers? What are your goals for the book? Who is the target audience? What are some of the different categories of regression algorithms? What are some heuristics for identifying which regression to use? How have you approached the balance of using software principles for explaining the work of building the models with the mathematical underpinnings that make them work? What are some of the concepts that are most challenging for people who are first working with regression models? What are the most interesting, innovative, or unexpected ways that you have seen statistical regression models used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on your book? What are some of the resources that you recommend for folks who want to learn more about the inner workings and applications of regression models after they finish your book? Keep In Touch LinkedIn @MatthewBRudd on Twitte

Summary Every software project needs a tool for managing the repetitive tasks that are involved in building, running, and deploying the code. Frustrated with the limitations of tools like Make, Scons, and others Eduardo Schettino created doit to handle task automation in his own work and released it as open source. In this episode he shares the story behind the project, how it is implemented under the hood, and how you can start using it in your own projects to save you time and effort. Announcements Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Your host as usual is Tobias Macey and today I’m interviewing Eduardo Schettino about Doit, a flexible and low overhead task automation tool Interview Introductions How did you get introduced to Python? Can you describe what doit is and the story behind it? What are the main goals and use cases of doit? Can you describe how you approached the implementation of Doit? How has the design changed or evolved since you first began working on it? The realm of task automation tools for developers is an exceedingly crowded one, with each tool prioritizing certain use cases. How would you characterize the position of doit in the current ecosystem? How does it compare to e.g. Click, Invoke, Typer, etc.? What is your guiding philosophy for when and how to add new features? You have been running the project for ~13 years now. How has the evolution of the Python language and ecosystem influenced your approach to the development and maintenance of doit? What is the workflow for getting started with doit and integrating it into your development process? For every project there are some tasks that are identical and some that are bespoke for that application. What are the options for maintaining a standard set of tasks across repositories and composing them with per-project activites? What are some of the useful patterns that you and the community have established for designing tasks and execution graphs?<

Summary Whether we like it or not, advertising is a common and effective way to make money on the internet. In order to support the work being done at Read The Docs they decided to include advertisements on the documentation sites they were hosting, but they didn’t want to alienate their users or collect unnecessary information. In this episode David Fischer explains how they built the Ethical Ads network to solve their problem, the technical and business challenges that are involved, and the open source application that they built to power their network. Announcements Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Your host as usual is Tobias Macey and today I’m interviewing David Fischer about the Ethical Ads marketplace and the technology that runs Interview Introductions How did you get introduced to Python? Can you describe what the Ethical Ads project is and the story behind it? What are the technical and organizational requirements involved in running an ad network? How have you approached the problem of kickstarting the flywheel for the two-sided marketplace? What are some of the challenges that you face in building an accurate profile of your audience without using detailed tracking methods? What are the benefits that you see in focusing exclusively on developers in your publisher relationships? Can you describe the design and implementation of the ad server? How has the architecture evolved since you first began working on it? If you were to start over today what might you do differently? How have you approached scaling for performance and geographic distribution? What mechanisms do you use for tracking impressions/measuring ad effectiveness? How can advertisers experiment with A/B testing of ad copy? If someone wants to run their own advertisements with the ethical ads server, what is involved in getting it deployed and integrated into their sites? What are the integration and extension points availabl

Summary Podcasts are one of the few mediums in the internet era that are still distributed through an open ecosystem. This has a number of benefits, but it also brings the challenge of making it difficult to find the content that you are looking for. Frustrated by the inability to pick and choose single episodes across various shows for his listening Wenbin Fang started the Listen Notes project to fulfill his own needs. He ended up turning that project into his full time business which has grown into the most full featured podcast search engine on the market. In this episode he explains how he build the Listen Notes application using Python and Django, his work to turn it into a sustainable business, and the various ways that you can build other applications and experiences on top of his API. Announcements Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Your host as usual is Tobias Macey and today I’m interviewing Wenbin Fang about the technology powering the Listen Notes podcast discovery platform Interview Introductions How did you get introduced to Python? Can you describe what Listen Notes is and the story behind it? What are some of the main goals that listeners have when searching for a podcast? What are the challenges that they commonly encounter when looking for information in a podcast? What are the different sources of information that you can use to extract useful details about a podcast? How do you identify and prioritize new features or product enhancements? Can you describe how the Listen Notes platform is architected? How has it changed or evolved since you first began working on it? How did you approach the technology selection for the initial version of Listen Notes? If you were to start over today, what might you do differently? What are the technical challenges that are posed by the ecosystem around podcasts? What are the biggest changes that have happened in the methods of production and consumption for podcasts

Summary Outer space holds a deep fascination for people of all ages, and the key principle in its exploration both near and far is orbital mechanics. Poliastro is a pure Python package for exploring and simulating orbit calculations. In this episode Juan Luis Cano Rodriguez shares the story behind the project, how you can use it to learn more about space travel, and some of the interesting projects that have used it for planning planetary and interplanetary missions. Announcements Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Your host as usual is Tobias Macey and today I’m interviewing Juan Luis Cano Rodriguez about Poliastro, an open source library for interactive Astrodynamics and Orbital Mechanics, with a focus on ease of use, speed, and quick visualization. Interview Introductions How did you get introduced to Python? Can you describe what Poliastro is and the story behind it? What are some of the simulations that Poliastro is designed to be used for? How much knowledge of orbital mechanics is necessary to get started with Poliastro? Can you describe how the project is implemented? How have the goals and design of the project changed or evolved since you first started it? What are some of the design philosophies that you focus on to make the package accessible to the range of users that you support? Can you talk through the workflow of using Poliastro to do something like track the path of the ISS and its traversal of the debris field from the recent satellite destruction? What are some of the other libraries or frameworks that are commonly used with Poliastro? How are you using Poliastro in your own work? What are some overlooked or underused aspects of the project that you would like to highlight? What are the most interesting, innovative, or unexpected ways that you have seen Poliastro used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on Poliastro? When is Poliastro the

Summary Deep learning frameworks encourage you to focus on the structure of your model ahead of the data that you are working with. Ludwig is a tool that uses a data oriented approach to building and training deep learning models so that you can experiment faster based on the information that you actually have, rather than spending all of our time manipulating features to make them match your inputs. In this episode Travis Addair explains how Ludwig is designed to improve the adoption of deep learning for more companies and a wider range of users. He also explains how the Horovod framework plugs in easily to allow for scaling your training workflow from your laptop out to a massive cluster of servers and GPUs. The combination of these tools allows for a declarative workflow that starts off easy but gives you full control over the end result. Announcements Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Your host as usual is Tobias Macey and today I’m interviewing Travis Adair about building and training machine learning models with Ludwig and Horovod Interview Introductions How did you get introduced to Python? Can you describe what Horovod and Ludwig are? How do the projects work together? What was your path to being involved in those projects and what is your current role? There are a number of AutoML libraries available for frameworks such as scikit-learn, etc. What are the challenges that are introduced by applying that workflow to deep learning architectures? What are the use cases that Ludwig is designed to enable? Who are the target users of Ludwig? How do the workflows change/progress for the different personas? How is the underlying framework architected? What are the available extension points to provide a progressive exposure of complexity? How have the goals and design of the project changed or evolved as it has gained more widespread adoption beyond Uber? What was the motivation for migrating the core of Ludwig from Tensorflow to

Summary A lot of time and energy goes into data analysis and machine learning projects to address various goals. Most of the effort is focused on the technical aspects and validating the results, but how much time do you spend on considering the experience of the people who are using the outputs of these projects? In this episode Benn Stancil explores the impact that our technical focus has on the perceived value of our work, and how taking the time to consider what the desired experience will be can lead us to approach our work more holistically and increase the satisfaction of everyone involved. Announcements Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Your host as usual is Tobias Macey and today I’m interviewing Benn Stancil about the perennial frustrations of working with data and thoughts on how to improve the experience Interview Introductions How did you get introduced to Python? Can you start by discussing your perspective on the most frustrating elements of working with data in an organization? How might that compound when working with machine learning? What are the sources of the disconnect between our level of technical sophistication and our ability to produce meaningful insights from our data? There have been a number of formulations about a "hierarchy of needs" pertaining to data. When the goal is to bring ML/AI methods to bear on an organization’s processes or products how can thinking about the intended experience act to improve the end result? What are some failure modes or suboptimal outcomes that might be expected when building from a tooling/technology/technique first mindset? What are some of the design elements that we can incorporate into our development environments/data infrastructure/data modeling that can incentivize a more experience driven process for building data products/analyses/ML models? How does the design and capabilities of the Mode platform allow teams to progress along the journey from data discovery to descriptive analytics, to ML exp

Summary The true power of artificial intelligence is its ability to work collaboratively with humans. Nate Joens co-founded Structurely to create a conversational AI platform that augments human sales teams to help guide potential customers through the initial steps of the funnel. In this episode he discusses the technical and social considerations that need to be combined for a seamless conversational experience and how he and his team are tackling the problem. Announcements Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Your host as usual is Tobias Macey and today I’m interviewing Nate Joens about his work at Structurely to build conversational AI utilities that augment human sales interactions Interview Introductions How did you get introduced to Python? Can you describe what Structurely is and the story behind it? What are the elements that comprise a "conversational AI"? How is it distinct from the wave of chatbots that were popular in recent years? What lessons from that approach can we take forward into AI enabled conversational platforms? How are you applying AI to the sales process? How much domain expertise is necessary to make an effective and engaging conversational AI? (e.g. knowledge of sales techniques vs. knowledge of real estate, etc.) Can you describe how you have designed the Structurely platform? What are the biggest engineering challenges that you have had to work through? What challenges or complexities have been most persistent? What are the design complexities that you have to work through to make the AI accessible for end users? What are some of the advancements in AI/NLP/transfer learning that have been most beneficial for teams building conversational AI? What are the signals that you emphasize when monitoring the performance of your models? What is your approach for feeding real-world customer interactions back into your model development and training loop?

Reviews
No reviews yet.
If you like this...

Talk Python To Me
Same topic · Same format · Same audience

Python Bytes
Same topic · Same audience · Same tone

The Real Python Podcast
Same topic · Same audience · Same format

The Changelog: Software Development, Open Source
Same format · Same audience · Same vibe

Software Engineering Daily
Same format · Same audience · Same topic

The a16z Show
Same audience · Same format · Same topic

Decrypt News
Same audience · Same topic
Explore more like this
Listening context
Discussion (0)
No comments yet. Be the first to start the discussion!