Chuyển sang chế độ ngoại tuyến với ứng dụng Player FM !
Orchestrating Large and Small Projects With Apache Airflow
Manage episode 353665019 series 2637014
Have you worked on a project that needed an orchestration tool? How do you define the workflow of an entire data pipeline or a messaging system with Python? This week on the show, Calvin Hendryx-Parker is back to talk about using Apache Airflow and orchestrating Python projects.
Calvin is the co-founder and CTO of Six Feet Up and a Python Web Conference co-organizer. He’s recently been working on a massive project that requires thousands of jobs involving transferring and transforming data. Through his research into orchestration systems, he found Apache Airflow.
Airflow is an open-source tool to define, schedule, and monitor workflows. The platform is pure Python and integrates with a wide variety of services. We discuss how workflows are defined by creating directed acyclic graphs (DAG).
Calvin talks about how a recent project outgrew the system and how his team built a clever solution using Python. We also discuss the upcoming Python Web Conference and what virtual attendees can expect.
Course Spotlight: Python Basics: Object-Oriented Programming
In this video course, you’ll get to know OOP, or object-oriented programming. You’ll learn how to create a class, use classes to create new objects, and instantiate classes with attributes.
Topics:
- 00:00:00 – Introduction
- 00:02:24 – Describing the large data pipeline
- 00:04:38 – What format was the data in?
- 00:06:04 – Was the format of the data changed for storage?
- 00:09:34 – Data engineering and describing sources and targets
- 00:11:29 – Apache Airflow orchestration and hitting limitations
- 00:18:12 – Sponsor: CData Software
- 00:18:54 – DAG: Directed acyclic graphs
- 00:22:29 – Streaming data and other tool choices
- 00:25:38 – Overcoming DAG Factory limitations
- 00:31:49 – Another industry example for Airflow
- 00:34:24 – Finding solutions as a consultancy
- 00:35:12 – Is there a minimum-size project for Airflow?
- 00:37:37 – Django under the hood
- 00:38:31 – Video Course Spotlight
- 00:39:58 – The Python Web Conference 2023
- 00:44:24 – Do you have any upcoming conference talks?
- 00:45:53 – How can people follow your work online?
- 00:46:52 – IndyPy talk by Mariatta Wijaya
- 00:48:01 – What are you excited about in the world of Python?
- 00:51:45 – What do you want to learn next?
- 00:53:22 – Thanks and goodbye
Show Links:
- Apache Airflow - Documentation
- Too Big for DAG Factories? — Six Feet Up
- Directed acyclic graph - Wikipedia
- DAGs — Airflow Documentation
- Dynamically generating DAGs in Airflow - Astronomer Documentation
- Data Lakehouse Architecture and AI Company - Databricks
- Episode #10: Python Job Hunting in a Pandemic – The Real Python Podcast
- Episode #124: Exploring Recursion in Python With Al Sweigart – The Real Python Podcast
- The Recursive Book of Recursion
- Episode #61: Scaling Data Science and Machine Learning Infrastructure Like Netflix – The Real Python Podcast
- IndyPy — Indiana Python User Group
- Contributing to Python - Mariatta Wijaya - Python Core Developer - YouTube
- Home Assistant
- Arturia - MicroFreak
- Arturia - Pigments
- CalvinHP (@calvinhp@fosstodon.org) - Fosstodon
- calvinhp - Twitter
- Six Feet Up - Blog
- Python Web Conference 2023
Level up your Python skills with our expert-led courses:
202 tập
Manage episode 353665019 series 2637014
Have you worked on a project that needed an orchestration tool? How do you define the workflow of an entire data pipeline or a messaging system with Python? This week on the show, Calvin Hendryx-Parker is back to talk about using Apache Airflow and orchestrating Python projects.
Calvin is the co-founder and CTO of Six Feet Up and a Python Web Conference co-organizer. He’s recently been working on a massive project that requires thousands of jobs involving transferring and transforming data. Through his research into orchestration systems, he found Apache Airflow.
Airflow is an open-source tool to define, schedule, and monitor workflows. The platform is pure Python and integrates with a wide variety of services. We discuss how workflows are defined by creating directed acyclic graphs (DAG).
Calvin talks about how a recent project outgrew the system and how his team built a clever solution using Python. We also discuss the upcoming Python Web Conference and what virtual attendees can expect.
Course Spotlight: Python Basics: Object-Oriented Programming
In this video course, you’ll get to know OOP, or object-oriented programming. You’ll learn how to create a class, use classes to create new objects, and instantiate classes with attributes.
Topics:
- 00:00:00 – Introduction
- 00:02:24 – Describing the large data pipeline
- 00:04:38 – What format was the data in?
- 00:06:04 – Was the format of the data changed for storage?
- 00:09:34 – Data engineering and describing sources and targets
- 00:11:29 – Apache Airflow orchestration and hitting limitations
- 00:18:12 – Sponsor: CData Software
- 00:18:54 – DAG: Directed acyclic graphs
- 00:22:29 – Streaming data and other tool choices
- 00:25:38 – Overcoming DAG Factory limitations
- 00:31:49 – Another industry example for Airflow
- 00:34:24 – Finding solutions as a consultancy
- 00:35:12 – Is there a minimum-size project for Airflow?
- 00:37:37 – Django under the hood
- 00:38:31 – Video Course Spotlight
- 00:39:58 – The Python Web Conference 2023
- 00:44:24 – Do you have any upcoming conference talks?
- 00:45:53 – How can people follow your work online?
- 00:46:52 – IndyPy talk by Mariatta Wijaya
- 00:48:01 – What are you excited about in the world of Python?
- 00:51:45 – What do you want to learn next?
- 00:53:22 – Thanks and goodbye
Show Links:
- Apache Airflow - Documentation
- Too Big for DAG Factories? — Six Feet Up
- Directed acyclic graph - Wikipedia
- DAGs — Airflow Documentation
- Dynamically generating DAGs in Airflow - Astronomer Documentation
- Data Lakehouse Architecture and AI Company - Databricks
- Episode #10: Python Job Hunting in a Pandemic – The Real Python Podcast
- Episode #124: Exploring Recursion in Python With Al Sweigart – The Real Python Podcast
- The Recursive Book of Recursion
- Episode #61: Scaling Data Science and Machine Learning Infrastructure Like Netflix – The Real Python Podcast
- IndyPy — Indiana Python User Group
- Contributing to Python - Mariatta Wijaya - Python Core Developer - YouTube
- Home Assistant
- Arturia - MicroFreak
- Arturia - Pigments
- CalvinHP (@calvinhp@fosstodon.org) - Fosstodon
- calvinhp - Twitter
- Six Feet Up - Blog
- Python Web Conference 2023
Level up your Python skills with our expert-led courses:
202 tập
Tất cả các tập
×Chào mừng bạn đến với Player FM!
Player FM đang quét trang web để tìm các podcast chất lượng cao cho bạn thưởng thức ngay bây giờ. Đây là ứng dụng podcast tốt nhất và hoạt động trên Android, iPhone và web. Đăng ký để đồng bộ các theo dõi trên tất cả thiết bị.