August 19-21 - Co-Located Events
August 21-23 - Conference
Hilton San Diego Bayfront - San Diego, CA
More information for Open Source Summit + Embedded Linux Conference North America 2019

Sign up or log in to bookmark your favorites and sync them to your phone or calendar.

Open AI [clear filter]
Wednesday, August 21

11:30am PDT

Lessons Learned from the Migration to Apache Airflow - Radek Maciaszek, Skimlinks*
Apache Airflow is an open-source tool for orchestrating complex workflows and data processing pipelines.

In this talk, Radek Maciaszek will present his learnings from the migration of machine learning and big data processing pipelines to Apache Airflow.

Radek will discuss examples of how are they using Airflow to power their company big data infrastructure where they analyze hundreds of terabytes of data. Examples will cover the building of the ETL pipeline and use of Airflow to manage the machine learning Spark pipeline workflow.

This talk will cover the basic Airflow concepts and show real-life examples of how to define your own workflows in the Python code. The talk will finish with more advanced topics related to Apache Airflow, such as adding custom task operators, sensors and plugins as well as best practices and both the pros and cons of this tool.

avatar for Radek Maciaszek

Radek Maciaszek

Chief Architect, Skimlinks
Radek specialises in large-scale data number crunching and cloud computing.During his professional career, Radek worked on building big data solutions for such companies as Skimlinks, where he currently works as a Chief Architect, as well as OpenX, Orange, Kantar and more. He has... Read More →

Wednesday August 21, 2019 11:30am - 12:05pm PDT
Sapphire H
  Open AI
  • Session Slides Included Yes
  • Session Recorded Yes

12:20pm PDT

Machine Learning Made Easy on Kubernetes. DevOps for Data Scientists - Brian Redmond, Microsoft*
Though machine learning and AI are immensely powerful, these solutions are by no means easy. In many cases, there are many diverse components that are not designed to work together. Additionally, these models are most efficient when running on large scale clusters that can be more difficult to manage. Configuration and deployment is often left to data scientists who are wasting time on infrastructure and not on data science itself.

Kubernetes to the rescue! In this session I will talk about how machine learning can be greatly improved by implementing ML solutions on top of Kubernetes with containers. I will be discussing each stage of a typical workflow including: data preparation/versioning, model training, testing and validation, monitoring, and CI/CD and automation. Demos will include tooling such as Tensorflow/Kubeflow, Pachyderm, Argo, etc.

This talk is for both data scientists and infrastructure/SRE teams alike helping bring the benefits of DevOps to AI and machine learning.

avatar for Brian Redmond

Brian Redmond

Cloud Architect, Microsoft
I am a Cloud Architect on the Azure Global Black Belt team at Microsoft. I focus on containers, microservices, and cloud native applications in the Azure cloud platform. I have been working in technology for over 20 years and have a mixed background from application development to... Read More →

Wednesday August 21, 2019 12:20pm - 12:55pm PDT
Sapphire H
  Open AI
  • Session Slides Included Yes
  • Session Recorded Yes

2:25pm PDT

Open Source Tools for ML Experiments Management - Dmitry Petrov & Ruslan Kuprieiev, Iterative AI*
The rise of new AI and ML requires new workflows and new tools: data versioning, ML pipeline versioning, experiments metrics visualization and others that have not been formalized and even named yet.

The traditional software engineering toolset does not fully cover ML team's needs. We will discuss the current practices of organizing ML workflow using traditional open-source tools like Git and Git-LFS as well as their limitations. Thereby motivation for developing new ML specific experiments and data management systems will be explained.

ML workflow differs from software engineering. Experimentation, trials-and-errors nature of ML projects and the need in more granular and efficient data artifacts management requires new sets of development tools. We will show ideas behind open source tool DVC or http://dvc.org which focuses on working with ML experiments, managing large datasets, and ML model.

avatar for Ruslan Kuprieiev

Ruslan Kuprieiev

Software Engineer, Iterative AI
Ruslan is a Software Engineer at Iterative AI. Previously he worked on live container migration at Parallels, Linux Kernel live-patching at CloudLinux, and also in a few startups. Ruslan's career started by working in an open source project called CRIU and he continues to contribute... Read More →
avatar for Dmitry Petrov

Dmitry Petrov

Co-Founder & CEO, DVC
Dmitry is an ex-Data Scientist at Microsoft with Ph.D. in Computer Science and active open source contributor. He has written and open sourced the first version of DVC.org - machine learning workflow management tool. Also he implemented Wavelet-based image hashing algorithm (wHash... Read More →

Wednesday August 21, 2019 2:25pm - 3:00pm PDT
Sapphire H
  Open AI
  • Session Slides Included Yes
  • Session Recorded Yes

3:15pm PDT

How Linux Foundation is Changing the (machine-learning) World! - Dr. Ofer Hermoni, Amdocs*
Open-source AI tools/solutions ARE available but they’re not easy to implement, aren’t always compatible, & each solve only a small piece of the puzzle. That’s why – despite growing adoption – AI is still difficult to deploy. That’s also why LF Deep Learning Foundation (LFDL) was established – to reduce solution fragmentation, encourage project, company & developer collaboration, & drive the effective use of AI tools/solutions to increase adoption/innovation. LFDL ground-breaking projects include Acumos AI (open-source marketplace for Machine-Learning models initiated by ATT) & Horovod, (distributed training framework for TensorFlow, Keras, & PyTorch contributed by Uber). Here Dr. Ofer Hermoni explores LFDL projects & activities, including a new (very cool) AI open-source landscape tool. Ofer also presents the opportunities & benefits of actively participating in the LFDL community.

avatar for Dr. Ofer Hermoni

Dr. Ofer Hermoni

Chairperson, Technical Advisory Council, LF AI Foundation

Wednesday August 21, 2019 3:15pm - 3:50pm PDT
Sapphire H
  Open AI
  • Session Slides Included Yes
  • Session Recorded Yes

4:20pm PDT

Stream Processing and New Approaches: Edge Processing - Eduardo Silva, Arm / Treasure Data
What if it was possible to query your data using aggregation functions, windowing, and grouping results while the data was in motion and in-memory but on the edge side?

In Data Analysis, logging is one of the key components to collect and pre-process data, usually, a logging mechanism goes through collect, parse, filter and centralize logs to a storage backend like a database, so data processing and analysis can be performed. This usually happens after the data has been aggregated and stored, but for real-time analysis needs, process the data while is still in motion brings a lot of advantages and this kind of approach is called Stream Processing.

In this presentation, we will go further and present an extended approach called 'Stream Processing on the Edge', where data is processed on the edge service or device, in a lightweight mode empowering features like anomaly detection (in the order of milliseconds) and Machine Learning in a distributed way using pure Open Source software.

avatar for Eduardo Silva

Eduardo Silva

Principal Engineer, Arm Treasure Data
Eduardo is a Principal Engineer at Arm Treasure Data. He currently leads the efforts to make logging and data processing more friendly and scalable in Embedded and Containerized systems such as Kubernetes. Maintainer of Fluent Bit, a lightweight log and stream processor Besides his... Read More →

Wednesday August 21, 2019 4:20pm - 4:55pm PDT
Sapphire H
  Open AI
  • Session Recorded Yes

5:10pm PDT

Almond: Crowdsourcing an Open, Programmable Virtual Assistant - Giovanni Campagna, Stanford University*
Virtual assistants are fast becoming a proprietary platform duopoly that controls access to the web and has access to private information in all accounts and IoTs. This talk will present Almond, an open, crowdsourced, privacy-preserving virtual assistant. Almond uses the crowdsourced Thingpedia skill library, currently containing over 100 services, that is open to all virtual assistants. Almond is unique in supporting event-driven commands that connect multiple skills. Almond is also federated, helping users share data at fine granularity without a third-party.
Almond is built using Genie, an open-source tool that enables developers to bootstrap deep-learning natural language parsers in new domains quickly. Genie improves by over 20% on the previous state of the art. Genie is available as a web service and as a library. Almond can be run as a cloud service, a GNOME/Gtk app (on Flathub), and also a command line tool. Almond has attracted collaborations from 4 other groups to date.

avatar for Giovanni Campagna

Giovanni Campagna

Student, Stanford University
Giovanni is a 3rd year PhD student at the Stanford University Computer Science Department, advised by prof. Monica Lam. His interests lay at the intersection of programming languages and natural language processing. He's the lead developer of the Almond project, an open, crowdsourced... Read More →

Wednesday August 21, 2019 5:10pm - 5:45pm PDT
Sapphire H
  Open AI
  • Session Slides Included Yes
  • Session Recorded Yes
Thursday, August 22

11:15am PDT

Federated AI in Future Digital Banking - Tianjian Chen, WeBank*
The digital banking industry is booming in recent years. More than 600 million people in China, nearly half of the total population, can access banking service online now. Of which 100 million citizens have become the customers of WeBank, an AI-driven full digital bank which headquarters in Shenzhen. This talk unveils how does this happen in less than five years and why WeBank initiate a momentum of the federated AI based on open-source federated machine learning technology.


Tianjian Chen

Deputy General Manager of AI Department, WeBank
Tianjian Chen is Deputy GM of AI Department at WeBank. Tianjin is responsible for building the Banking Intelligence Ecosystem based on Federated Learning Technology. Before joining Webank, he was the Chief Architect of Baidu Finance and Principal Architect of Baidu. Tianjin has over... Read More →

Thursday August 22, 2019 11:15am - 11:50am PDT
Sapphire H
  Open AI
  • Session Slides Included Yes
  • Session Recorded Yes

12:05pm PDT

Introduction to Using and Use Cases of KubeFlow - Jonathan Gershater, Red Hat & Boris Lublinsky, Lightbend
Kubernetes is evolving to be the hybrid solution for deploying complex workloads on private and public clouds. KubeFlow is an open source project that provides Machine Learning (ML) resources on Kubernetes clusters.

This talk will provide an introduction to KubeFlow, and its main components. Kubeflow is an open source platform for developing and running kubernetes-native machine learning workloads. Then, we’ll walk through a small end-to-end example of machine learning using Jupiter notebooks, converting it to a MLJob and using a trained model for machine serving to demonstrate the power of KubeFlow components and its kubernetes native approach.

The session will include a demonstration of a machine learning model for a recommender, suggesting products based on customers’ prior purchases and a products that a company wants to promote.

Attendees will learn the basics of kubeflow, machine learning and how to get involved in the kubeflow community. Code samples will be provided.

avatar for Boris Lubinsky

Boris Lubinsky

Principal Architect, Lightbend
Boris Lublinsky is a principal architect at Lightbend, where he specializes in big data, stream processing, and services. Boris has over 30 years’ experience in enterprise architecture. Over his career, he has been responsible for setting architectural direction, conducting architecture... Read More →
avatar for Jonathan Gershater

Jonathan Gershater

Senior Product Marketing Manager, Red Hat
Jonathan Gershater has lived and worked in Silicon Valley since 1996. At Red Hat, Jonathan leads market analysis for Red Hat’s cloud, container and kubernetes solutions. Prior to Red Hat Jonathan worked at Trend Micro, Sun Microsystems, Entrust Technologies and 3Com.Jonathan has... Read More →

Thursday August 22, 2019 12:05pm - 12:40pm PDT
Sapphire H
  Open AI
  • Session Recorded Yes

2:10pm PDT

Mindmeld: An Open-source, Deep-domain Conversational AI Toolkit to Build Advanced Enterprise Conversational Assistants - Vijay T Ramakrishnan, Cisco
Cisco's Mindmeld open-source platform allows developers to build enterprise voice and text-based AI assistants. The platform is unique in the industry due to it's support for deep-domain knowledge bases, allowing developers to build rich agents that can serve complex use-cases.


Vijay Ramakrishnan

Machine Learning Engineer, Cisco Inc.
Vijay Ramakrishnan is a machine learning researcher at Cisco. He is a core member of the Mindmeld team within Cisco, developing Artificial Intelligence (AI) and Natural Language Processing (NLP) applications for Cisco’s flagship products. He is an expert practitioner in developing... Read More →

Thursday August 22, 2019 2:10pm - 2:45pm PDT
Sapphire H
  Open AI
  • Session Recorded Yes

4:05pm PDT

Introducing Kubeflow (w. Special Guests Tensorflow and Apache Spark) - Trevor Grant, IBM & Holden Karau, Google
Data Science, Machine Learning, and Artificial Intelligence has exploded in popularity in the last five years, but the nagging question remains, “How to put models into production?” Engineers are typically tasked to build one-off systems to serve predictions which must be maintained amid a quickly evolving back-end serving space which has evolved from single-machine, to custom clusters, to “serverless”, to Docker, to Kubernetes. In this talk, we present KubeFlow- an open source project which makes it easy for users to move models from laptop to ML Rig to training cluster to deployment. In this talk we will discuss, “What is KubeFlow?”, “why scalability is so critical for training and model deployment?”, and other topics.

Kubeflow is a rapidly developing project- this talk will include the most up-to-date information available as of conference time, including new features, recent changes, and future road map.


Holden Karau

Developer Advocate, Google
Holden is a transgender Canadian open source developer advocate @ Google with a focus on Apache Spark, BEAM, and related "big data" tools. She is the co-author of Learning Spark, High Performance Spark, and another Spark book that's a bit more out of date. She is a committer on and... Read More →
avatar for Trevor Grant

Trevor Grant

Open Source AI / IoT Evangelist, IBM
Trevor is an open source evangelist at IBM in Watson IoT. He is also a PMC on the Apache Mahout, Apache Streams, and Apache Community Development projects. He has spoken at conferences and Meetups internationally.

Thursday August 22, 2019 4:05pm - 4:40pm PDT
Sapphire H
  Open AI
  • Session Recorded Yes

4:55pm PDT

BoF: Angel 3.0: A Full Stack Machine Learning Platform - Fitz Wang, Tencent
A mature machine learning pipeline includes components, such as feature engineering, model training, hyperparameter tuning, and model serving. With huge recommendation models with sparse input data available in Angel 2.x, this time, our new Angel 3.0,  aiming at a full-stack machine learning platform, further completes the other components. First, the auto feature engineering (AFE) is supported. Second, we provide a type of auto hyperparameter tuning based on Bayesian optimization. Third, we also provide a cross-platform model serving system. It can serve the models from Angel, Spark, XGBoost, and PyTorch. Apart from completing the pipeline, a new PyTorch engine for Angel is introduced. PyTorch is used for forward and backward propagation to obtain gradients, while Angel parameter server stores, synchronizes and updates parameters. Consequently, we provide a variety of graph embedding and GNN algorithms. Moreover, we make Spark ON Angel adapt to Spark 2.4 and support Kubernetes. Hence, the DataFrame API and Spark Pipeline are supported.


Fitz Wang


Thursday August 22, 2019 4:55pm - 5:30pm PDT
Indigo D
  Open AI
  • Session Recorded Yes