Loading…
August 19-21 - Co-Located Events
August 21-23 - Conference
Hilton San Diego Bayfront - San Diego, CA
More information for Open Source Summit + Embedded Linux Conference North America 2019
Wednesday, August 21 • 11:30am - 12:05pm
Lessons Learned from the Migration to Apache Airflow - Radek Maciaszek, Skimlinks*

Sign up or log in to save this to your schedule and see who's attending!

Feedback form is now closed.
Apache Airflow is an open-source tool for orchestrating complex workflows and data processing pipelines.

In this talk, Radek Maciaszek will present his learnings from the migration of machine learning and big data processing pipelines to Apache Airflow.

Radek will discuss examples of how are they using Airflow to power their company big data infrastructure where they analyze hundreds of terabytes of data. Examples will cover the building of the ETL pipeline and use of Airflow to manage the machine learning Spark pipeline workflow.

This talk will cover the basic Airflow concepts and show real-life examples of how to define your own workflows in the Python code. The talk will finish with more advanced topics related to Apache Airflow, such as adding custom task operators, sensors and plugins as well as best practices and both the pros and cons of this tool.

Speakers
avatar for Radek Maciaszek

Radek Maciaszek

Chief Architect, Skimlinks
Radek specialises in large-scale data number crunching and cloud computing.During his professional career, Radek worked on building big data solutions for such companies as Skimlinks, where he currently works as a Chief Architect, as well as OpenX, Orange, Kantar and more. He has... Read More →



Wednesday August 21, 2019 11:30am - 12:05pm
Sapphire H
  • Session Slides Included Yes
  • Session Recorded Yes