Agenda

Explore the full two-day session lineup, and stay tuned for additional exciting sessions and speakers.

Day 1 - June 07

8:00 AM PDT
11:00 AM EDT

Opening Keynote: Bridging the Last Mile: Applying Foundation Models with Data-Centric AI

Alex Ratner

Alex Ratner

CEO and Co-founder
Snorkel AI

Today, large language or Foundation Models (FMs) represent one of the most powerful new ways to build AI models; however, they still struggle to achieve production-level accuracy out of the box on complex, high-value, and/or dynamic use cases, often “hallucinating” facts, propagating data biases, and misclassifying domain-specific edge cases.  This “last mile” problem is always the hardest part of shipping real AI applications, especially in the enterprise- and while FMs provide powerful foundations, they do not “build the house”.

In this talk, I’ll provide an overview of how this last mile adaptation is increasingly all about the data (not eg. the model architecture, hyperparameters, or algorithms), and give an overview of modern data-centricAI development approaches to solve this and preview new state of the art techniques and tools for handling all stages of data-centric development for foundation models, from pre-training to instruction-tuning and alignment, to task-specific fine tuning and distillation.

9:00 AM PDT
12:00 PM EDT

Fireside Chat: Alex Ratner and Gideon Mann on building BloombergGPT

Alex Ratner

Alex Ratner

CEO and Co-founder
Snorkel AI

Gideon Mann

Head of Machine Learning Product and Research / CTO Office
Bloomberg LP

Join us for a fireside chat with Gideon Mann, Head of Machine Learning Product and Research, CTO Office at Bloomberg about how they built a domain-specific LLM, BloombergGPT.

9:30 AM PDT
12:30 PM EDT

Fireside chat: The role of data in building Stable Diffusion and Generative AI

Alex Ratner

Alex Ratner

CEO and Co-founder
Snorkel AI
Emad Mostaque

Emad Mostaque

Founder and CEO
Stability AI

Discover the transformative power of data in developing Stable Diffusion and Generative AI as Emad Mostaque shares insights into the pivotal role data plays in creating these groundbreaking technologies. Explore the journey of leveraging data-driven approaches to drive innovation, unlock new possibilities, and shape the future of AI.

10:15 AM PDT
1:15 PM EDT

Panel – The Linux Moment of AI: Open Sourced AI Stack

Demetrios Brinkmann

Founder
MLOps Community
Ed Shee

Ed Shee

Head of Developer Relations
Seldon

Julien Simon

Chief Evangelist
Hugging Face
Travis Addair

Travis Addair

Chief Technology Officer
Predibase

In this panel, seasoned experts Julien, Ed, and Travis will delve into how open-source models and tools can revolutionize AI. Julien will shed light on projects like Big Science and explore how open-source projects can lead to a more adaptable AI stack, empowering developers to create use-case-specific solutions. With his vast experience in deploying and monitoring AI systems, Ed will discuss how open-source aids these processes and the challenges and potential solutions when scaling these systems. Meanwhile, Travis will share insights from his work with Ludwig, demonstrating how open-source innovation fosters faster, easier, and more collaborative development. As we witness the evolution of applications like ChatGPT, our panelists will discuss the open-source community’s crucial role in steering future developments and ensuring the ethical and responsible use of such technologies

10:15 AM PDT
1:15 PM EDT

A Practical Guide to Data Centric AI – A Conversational AI Use Case

Daniel Lieb

Daniel Lieb

Senior Director, Model Risk Management
Ally Financial
Samira Shaikh

Samira Shaikh

Director, Data Science
Ally Financial

In this talk, we will provide real-world examples of how data-centric AI is being used to solve complex problems at Ally. We will dive deep into an innovative use of data-centric AI, specifically using Generative AI and LLMs to set up Conversational AI for Ally Auto customers. Overall, this talk will provide insights into how data-centric AI can be used in a practical sense to drive innovation and create value in industry.

10:45 AM PDT
1:45 PM EDT

Panel – Adopting AI: With Power Comes Responsibility

Aarti Bagul

Aarti Bagul

Principal ML Solutions Engineer
Snorkel AI
Daniel Wu

Daniel Wu

Head of AI & Machine Learning, Commercial Banking
JPMorgan Chase & Co.
Vijay Janapa Reddi

Vijay Janapa Reddi

John L. Loeb Associate Professor of Engineering and Applied Sciences
Harvard University

In our panel session, we’ll dissect the complexities inherent to responsibly leveraging Generative AI in the midst of an escalating ML arms race. We’ll probe into the ethical implications of large-scale AI experiments and the ongoing parameter wars, weighing the computational demand against potential fallout. As AI regulation efforts globally accelerate, we’ll discuss their influence on deep learning trajectories and the necessary proactive engagement from organizations. In anticipation of a rise in AI incidents due to rapid ML model scaling and hardware complexity, we’ll explore risk mitigation strategies. Lastly, we’ll delve into the increasing significance of data-centric ML systems, focusing on high-quality data acquisition and the role of data quality in minimizing risks associated with swift AI advancements. Join us as we navigate the technical intricacies of AI innovation and responsibility.

10:45 AM PDT
1:45 PM EDT

The Future is Neurosymbolic

Yoav Shoham

Yoav Shoham

Cofounder
AI21 Labs
11:15 AM PDT
2:15 PM EDT

Generating Synthetic Tabular Data That’s Differentially Private

Lipika Ramaswamy

Senior Applied Scientist
Gretel AI

While generative models are able to produce synthetic datasets that preserve the statistical qualities of the training dataset without identifying any particular record in the training dataset, most generative models to date do not offer mathematical guarantees of privacy that can be used to facilitate information sharing or publishing. Without such mathematical guarantees, each adversarial attack on these models and the synthetic data they generate needs to be thwarted reactively. We can never be aware of adversarial attacks that might become feasible in the future. This is exactly the problem that differential privacy (DP) solves by bounding the probability that a compromising event occurs. By introducing calibrated noise into an algorithm, DP defends against all future privacy attacks with a high probability. In this session, we’ll explore approaches to applying differential privacy, including one that relies on measuring low dimensional distributions in a dataset combined with learning a graphical model representation. We'll end with a preview of Gretel's new generative model that applies this method to create high-quality synthetic tabular data that is differentially private.

11:15 AM PDT
2:15 PM EDT

Fireside Chat: The Building Blocks of Modern Enterprise AI

Aparna Lakshmi Ratan

Aparna Lakshmi Ratan

VP Product
Snorkel AI

Marco Casalaina

Vice President of Products
Azure Cognitive Services

In this illuminating fireside chat, we dive into the heart of modern enterprise AI, exploring the dynamic intersection of data, models, and MLops platforms that define the new ML stack. We’ll investigate how factors such as model form factors, data types, use case variety, enterprise constraints, and the use of private data in AI applications shape this landscape, all while casting an anticipatory gaze towards the future of AI. As we decode the intricacies of today’s AI environment and predict tomorrow’s game-changers, this session offers a comprehensive insight into the building blocks of modern enterprise AI.

11:45 AM PDT
2:45 PM EDT

Panel: Navigating the LLM Labyrinth in a World of Rules

Chris Booth

Chris Booth

Product Owner - Machine Learning
NatWest Group

Harshini Jayaram

Director Business Operations and Growth
Snorkel AI
Nadia Wood

Nadia Wood

Principal Product Manager
Mayo Clinic

In this session, we’ll dive into the intricacies of Large Language Models (LLMs) within regulated industries. Our expert panel will discuss strategies for tuning LLMs to reduce misinterpretations and errors in conversational AI applications, emphasizing the necessity of precision in such sectors. They’ll explore the challenges and potential solutions organizations might encounter when transitioning from rules-based approaches to LLMs. Further, they will shed light on strategies to ensure LLMs’ compliance and predictability, given their capacity for creativity, and discuss how to manage data privacy concerns in LLM training. Finally, our panelists will delve into the role of data-centric approaches and programmatic labeling in aligning LLMs’ behavior with industry-specific requirements and ethical norms.

11:45 AM PDT
2:45 PM EDT

DataComp: In search of the next generation of multimodal datasets

Ludwig Schmidt

Ludwig Schmidt

Assistant Professor in Computer Science
University of Washington

Large multimodal datasets have been instrumental in recent breakthroughs such as CLIP, Stable Diffusion, and GPT-4. At the same time, datasets rarely receive the same attention as model architectures or training algorithms. To address this shortcoming in the ML ecosystem, we introduce DataComp, a benchmark where the training code is fixed and researchers innovate by proposing new training sets. We provide a testbed for dataset experiments centered around a new candidate pool of 12.8B image-text pairs from Common Crawl. Participants in our benchmark design new filtering techniques or curate new data sources and then evaluate their new dataset by running our standardized CLIP training code and testing on 38 downstream test sets. Our benchmark consists of multiple scales, which facilitates the study of scaling trends and makes the benchmark accessible to researchers with varying resources.

Our baseline experiments show that the DataComp workflow is a promising way of improving multimodal datasets. We introduce a new dataset DataComp-1B and show that CLIP models trained on this dataset outperform OpenAI’s CLIP model by 3.7 percentage points on ImageNet while using the same compute budget. Compared to LAION-5B, our data improvement corresponds to a 9x improvement in compute cost.

12:15 PM PDT
3:15 PM EDT

Mindful reset

Karla Arteaga

Karla Arteaga

Operations Specialist
Snorkel AI
12:40 PM PDT
3:40 PM EDT

Poster Competition: Procedure-Aware Pretraining for Instructional Video Understanding

Honglu Zhou

Honglu Zhou

PhD Student
Rutgers University

Instructional videos depict humans demonstrating how to perform multi-step tasks such as cooking, repairing, etc. Building good video representations from instructional videos is challenging due to the small amount of video annotations available. This makes extracting the procedural knowledge such as the identity of the task (e.g., ‘make latte’), its steps (e.g., ‘pour milk’) challenging. Our insight is that instructions for procedures depict sequences of steps that repeat between instances of the same or different tasks, and that this structure can be well represented by a Procedural Knowledge Graph, where nodes are discrete steps and edges connect steps that occur sequentially in the instructional activities. This graph can then be used to generate pseudo labels to train a video representation that encodes the procedural knowledge in a more accessible form to generalize to multiple procedure understanding tasks. We call this Procedural Knowledge Graph based pre-training method and the resulting model Paprika, Procedure-Aware PRe-training for Instructional Knowledge Acquisition. We evaluate Paprika on COIN and CrossTask for procedure understanding tasks such as task recognition, step recognition, and step forecasting. Paprika yields a video representation that improves over the state of the art: up to 11.23% gains in accuracy in 12 evaluation settings.

12:55 PM PDT
3:55 PM EDT

Poster Competition: JoinBoost: Tree Training with just SQL

Zachary Huang

Zachary Huang

PhD Student
Columbia University

Data and machine learning (ML) are crucial for enterprise operations. Enterprises store data in databases for management and use ML to gain business insights. However, there is a mismatch between the way ML expects data to be organized (a single table) and the way data is organized in databases (a join graph of multiple tables). Current specialized ML libraries (e.g., LightGBM, XGBoost) necessitate data denormalization, data export, and data import, as they operate as separate programs incompatible with databases. The existing method not only increases operational complexity but also faces scalability limitations, slower performance, and security risks. But what if there was a way to achieve competitive tree training performance with just SQL? We present JoinBoost, a lightweight Python library that transforms tree training algorithms over normalized databases into pure SQL queries. Compatible with any DBMS and data stack, JoinBoost is a simplified, all-in-one data stack solution that avoids data denormalization, export, and import. JoinBoost delivers exceptional performance and scalability tailored to the capabilities of the underlying DBMS. Our experiments reveal that JoinBoost is 3x (1.1x) faster for random forests (gradient boosting) when compared to LightGBM, and scales well beyond LightGBM in terms of features, DB size, and join graph complexity.

1:10 PM PDT
4:10 PM EDT

Poster Competition: Data-IQ: Characterize & Audit your training data with 2 lines of code!

Nabeel Seedat

Nabeel Seedat

PhD Student
University of Cambridge

High model performance, on average, can hide that models may systematically underperform on subgroups of the data. To tackle this, we propose Data-IQ, a framework to systematically stratify examples into subgroups with respect to their outcomes --- allowing users to audit their tabular, image or text data with just two lines of extra code!

We do this by analyzing the behavior of individual examples during training, based on their predictive confidence and, importantly, the aleatoric (data) uncertainty. Capturing the aleatoric uncertainty permits a principled characterization and then subsequent stratification of data examples into three distinct subgroups (Easy, Ambiguous, Hard). We show that Data-IQ's characterization of examples is most robust to variation across similarly performant (yet different models), compared to baselines. Since Data-IQ can be used with any ML model (including neural networks, gradient boosting etc.), this property ensures consistency of data characterization, while allowing flexible model selection. Taking this a step further, we demonstrate that the subgroups enable us to construct new approaches to both feature acquisition and dataset selection. Furthermore, we highlight how the subgroups can inform reliable model usage, noting the significant impact of the Ambiguous subgroup on model generalization.

1:30 PM PDT
4:30 PM EDT

LLMOps: Making LLM Applications Production-Grade

Matei Zagaria

Matei Zaharia

Cofounder & Chief Technologist
Databricks

Large language models are fluent text generators, but they struggle at generating factual, correct content. How can we convert these capabilities into reliable, production-grade applications? In this talk, I'll cover several techniques to do this based on my work and experience at Stanford and Databricks. On the research side, we've been developing programming frameworks such as Demonstrate-Search-Predict (DSP) that reliably connect an LLM to factual information and automatically improve the app's performance over time. On the industry side, Databricks has been building a stack of simple yet powerful tools for "LLMOps" into the MLflow open source framework.

2:00 PM PDT
5:00 PM EDT

Data-Driven Government: A Fireside Chat with the Former U.S. Chief Data Scientist

Alex Ratner

Alex Ratner

CEO and Co-founder
Snorkel AI
DJ Patil

DJ Patil

General Partner
GreatPoint Ventures

Join us for an engaging fireside chat as we delve into data science's history, impact, and challenges in the United States government. Our esteemed guest, the former U.S. Chief Data Scientist, will share insights into the origins of this vital role and their experiences in managing critical initiatives. Discover the strategies to drive data-driven decisions within the complex governmental landscape and explore the balance between data scientists and engineers. Gain valuable perspectives on the future of AI/ML, the ethical considerations in data science, and the transformative potential of leveraging data to better society. Don't miss this opportunity to unravel the fascinating data science journey at the highest levels of governance.

2:30 PM PDT
5:30 PM EDT

Day 1 Recap + Poster Competition Winners

Devang Sachdev

Devang Sachdev

VP of Marketing
Snorkel AI

A look at the highlights of day one and winners of the poster competition announced.

Day 2 - June 08

8:00 AM PDT
11:00 AM EDT

Opening Keynote: New introductions from Snorkel AI

Alex Ratner

Alex Ratner

CEO and Co-founder
Snorkel AI

Join Snorkel AI CEO, Alex Ratner, as he introduces our latest programmatic AI solutions. Designed to optimize foundation model development, enhance data development, and accelerate customization, these new solutions mark a leap in AI technology. Be the first to explore how Snorkel AI makes it practical for enterprises to leverage LLMs and foundation models.

8:30 AM PDT
11:30 AM EDT

Fireside Chat: Journey of Data: Transforming the Enterprise with Data-Centric Workflows

Alex Ratner

Alex Ratner

CEO and Co-founder
Snorkel AI
Nurtekin Savas

Nurtekin Savas

Head of Enterprise Data Science
Capital One

Join Nurtekin Savas, Head of Enterprise Data Science at Capital One, as he embarks on an insightful exploration of the data's journey across an enterprise. In this session, Nurtekin will unravel how data, from its creation to its ultimate insights, navigates and transforms within the complex enterprise stack. He will spotlight the power of data-centric workflows and their crucial role in driving business decisions, improving operational efficiency, and fueling AI innovation.

9:00 AM PDT
12:00 PM EDT

The Opportunity of Data Centric AI in Insurance

Alejandro Aarate Santovena

Alejandro Zarate Santovena

Lecturer and Managing Director
Columbia University / Marsh
9:00 AM PDT
12:00 PM EDT

Accelerate ML Adoption by Addressing Hidden Needs

Max Williams

Max Williams

AI Platform Product Manager
Wells Fargo

While there has been and continues to be a substantial amount of strategic investment in ML, that investment is rarely rewarded with an attractive return. The industry has seen rapid evolution in capabilities and even dramatic improvement in cost efficiencies; yet, an attractive return on investment remains elusive. This discussion will focus on hidden needs that must be addressed before ML can break free from strategic investment and enjoy broad adoption within the enterprise.

9:30 AM PDT
12:30 PM EDT

Transforming the Customer Experience with AI: Wayfair’s Data-Centric Way

Archana Sapkota

Archana Sapkota

ML Manager
Wayfair
Vinny DeGenova

Vinny DeGenova

Associate Director of Machine Learning
Wayfair

In this talk, we will walk through the problems we solve at Wayfair using machine learning, which impact all aspects of a customer's journey. We will provide insights on how we use ML to understand our customers as well as the products in our catalog. We will also discuss some of the challenges we face in our space and how we are using ML best practices, state of the art foundation models, and data-centric approaches to solve these problems.

One way we help our customers find products is by cleaning and enriching our catalog. We do this by automating image tagging using a data-centric approach. We will provide insights on how we have accomplished this and share our findings.

Finally, we will touch on an important aspect of our approach: the collaboration between subject matter experts (SMEs) and data scientists (DS). By working closely together, we are able to quickly iterate on model development and testing, ultimately leading to a faster time-to-market for the models we develop.

9:30 AM PDT
12:30 PM EDT

Unleashing Human Potential with AI Augmentation

Bryan Wood

Bryan Wood

Data Science Executive
Bank of America

In this presentation, we will delve into the innovative ways artificial intelligence can augment and assist human capabilities, leading to novel applications in various domains. We will touch upon personal experience that demonstrates the power of AI in enhancing human creativity. Further, we will draw parallels to commercial work, showcasing how these techniques can be applied generically across multiple industries. By illustrating the vast potential of AI in both unique and professional contexts, this talk aims to inspire and inform attendees about the limitless possibilities AI offers in enhancing human potential.

10:15 AM PDT
1:15 PM EDT

Tackling advanced classification using Snorkel Flow

Angela Fox

Angela Fox

Staff Product Designer
Snorkel AI

Vincent Chen

Director of Product / Founding Engineer
Snorkel AI

In this talk, we’ll discuss the key challenges and approaches for productionizing classification models in the age of foundation models. To start, we’ll highlight common but underrated challenges related to label schema definition, high cardinality, and multi-label problem formulations. We’ll dive into specific user experiences in Snorkel Flow to overcome these challenges, including ways to leverage foundation models, targeted error analysis, and supervision from subject matter experts. Finally, we’ll zoom out with a few case studies to describe how enterprise teams leverage data-centric workflows to build highly quality production models and unblock previously untenable problems in Snorkel Flow.

10:15 AM PDT
1:15 PM EDT

Combining domain knowledge with data to track and predict heavy-equipment service events

Davide Gerbaudo

Davide Gerbaudo

Sr. Data Scientist
Caterpillar

In this talk, we will illustrate how a century-old company like Caterpillar combines its domain knowledge with data to develop modern analytics that provides value to the enterprise, its dealership network, and its customers. In particular, we will describe how domain expertise and data are used to classify and predict repair events of heavy equipment.

10:45 AM PDT
1:45 PM EDT

Accelerating information extraction with data-centric iteration

John Semerdjian

John Semerdjian

Tech Lead Manager -- Applied Machine Learning
Snorkel AI

Vincent Chen

Director of Product / Founding Engineer
Snorkel AI

During this session, we’ll discuss practical workflows for building enterprise information extraction applications. We’ll start with an end-to-end deep dive into “sequence tagging” tasks in Snorkel Flow, where we’ll highlight how teams of data scientists and subject matter experts can rapidly build powerful, zero-to-one models. In doing so, we’ll cover the key annotation, error analysis, and model-guided iteration capabilities that have helped our customers unblock models that power high-value use cases in production. Finally, we’ll discuss exciting opportunities for even further acceleration of these workflows in an FM-first world.

10:45 AM PDT
1:45 PM EDT

Data Driven AI for Threat Detection

Debabrata Dash

Distinguished Data Scientist
Arista

Network Security has been a complex area to apply traditional machine learning on. The number of possible threats is vast, but at the same time, the number of labeled attack samples is very small. Moreover, when enough sample data is collected for a particular type of threat, the threat-vector changes.

While collecting samples for the true positives is difficult, security analysts usually have good mental heuristics about how the threats behave. They manually “execute” the heuristics to identify the threat among the massive network data. Typically these heuristics are applied after the unsupervised techniques identify the anomalies and outliers in the data. While this works well in practice, the approach is computationally expensive - due to the very nature of the unsupervised algorithms and with unpredictable accuracy in the field.

Weak supervision provides an alternative approach to utilizing the heuristics to identify the threats. It allows us to push the heuristics to the raw data to help us build more efficient models with predictable accuracy. In this talk, I will discuss one prototype of using weak supervision in the cyber security domain with exciting results.

11:15 AM PDT
2:15 PM EDT

Comcast SpeechNet: Weakly Supervised, End-to-End Speech Recognition at Industrial Scale

Raphael Tang

Lead Research Scientist
Comcast Applied AI

End-to-end automatic speech recognition systems represent the state of the art, but they rely on thousands of hours of manually annotated speech for training, as well as heavyweight computation for inference. Of course, this impedes commercialization since most companies lack vast human and computational resources. In this paper, we explore training and deploying an ASR system in the label-scarce, compute-limited setting. To reduce human labor, we use a third-party ASR system as a weak supervision source, supplemented with Snorkel labeling functions derived from implicit user feedback. To accelerate inference, we propose to route production-time queries across a pool of CUDA graphs of varying input lengths, the distribution of which best matches the traffic's. Compared to our third-party ASR, we achieve a relative improvement in word-error rate of 8% and a speedup of 600%. Our system, called SpeechNet, currently serves 12 million queries per day on our voice-enabled smart television.

11:15 AM PDT
2:15 PM EDT

Applying weak supervision and foundation models for computer vision

Ravi Teja Mullapudi

Ravi Teja Mullapudi

ML Research Scientist
Snorkel AI

In this session, we will explore the latest advancements in computer vision that enable data-centric image classification model development. We will showcase how visual prompts and fast parameter-efficient models built on top of foundation models provide immediate feedback to rapidly iterate on data quality and model performance resulting in significant time-savings and performance improvements. Moreover, we will delve into the importance of adapting model representations via large-scale fine-tuning on weakly labeled data to address the limitations of fast but small models trained on fixed features. Finally, we will discuss the necessary scaling and model adaptations needed to transition from image-level classification to object-level detection and segmentation. Overall, this talk aims to provide insights into how computer vision data and models can be effectively improved in tandem and adjusted for downstream applications.

11:45 AM PDT
2:45 PM EDT

Mindful Reset

Karla Arteaga

Karla Arteaga

Operations Specialist
Snorkel AI
12:15 PM PDT
3:15 PM EDT

AI and the Future of Tax

Ken Priyadarshi

EY Global Prompt Engineering Leader
EY

AI is transforming the Tax services sector. Learn how organizations are adapting and leveraging generative AI and machine learning to prepare for the future

12:15 PM PDT
3:15 PM EDT

Leveraging Data-centric AI for Document Intelligence and PDF Extraction

Ashwini Ramamoorthy

Ashwini Ramamoorthy

ML Engineer
Snorkel AI

Extracting entities from semi-structured documents is often a challenging task, requiring complex and time-consuming manual processes. In this session, we will explore how data-centric AI can be leveraged to simplify and streamline this process. We will start by discussing the challenges associated with extracting from PDFs and other semi-structured documents. We will explore how they can be overcome using Snorkel’s data-centric approach. Finally, we will dive into how foundation models can be utilized to further accelerate development of these extraction models.

12:45 PM PDT
3:45 PM EDT

Leveraging foundation models and LLMs for enterprise-grade NLP

Kristina Liapchin

Lead Product Manager
Snorkel AI

In recent years, large language models (LLMs) have shown tremendous potential in solving natural language processing (NLP) problems. However, deploying LLMs in enterprise comes with its own set of challenges, especially when it comes to adapting the models to customer-specific data and incorporating domain knowledge. In this talk, we will explore how Snorkel AI can help address these challenges and enable businesses to leverage LLMs to extract insights from text data. We will walk through how Snorkel Flow can enable businesses to drive value from LLMs today, making the most of enterprise-grade NLP.

12:45 PM PDT
3:45 PM EDT

Bias Busters: Strategies for Monitoring, Managing, and Mitigating AI Bias

Nataraj Prakash

Nataraj Prakash

Vice President: Analytics Digital Platform & Program Delivery
Kaiser Permanente

Dive into the world of AI Bias. This talk explores the pervasive issue of AI bias and its implications. Understand various forms of bias, from data and perception bias to survivorship and availability bias, and how they influence AI models. Learn practical strategies to counteract these biases, such as A/B testing, bias detection during model training, and comprehensive monitoring during model scoring. The talk concludes with a focus on action steps post-detection, including model retraining and selection of challenger models, intending to achieve equitable outcomes, enhance transparency, and meet evolving regulatory requirements.

1:15 PM PDT
4:15 PM EDT

Lessons From a Year with Snorkel: Data-Centric Workflows with SMEs at Georgetown

James Dunham

NLP Engineer
Georgetown University Center for Security and Emerging Technology

When the Center for Security and Emerging Technology began experimenting with Snorkel, we had two high-level goals. We aimed to address recurring bottlenecks in our ML projects, and improve collaborative workflows between data scientists and subject-matter experts. In this talk, we share takeaways from the half-dozen project teams who used Snorkel in the past year. We identify friction points in adoption, summarize feedback from SMEs, and we discuss which challenges Snorkel has helped us address, and which remain.

1:15 PM PDT
4:15 PM EDT

The future of AI is hybrid

Jilei-Hou

Jilei Hou

VP of Engineering & Head of AI Research
Qualcomm

As generative AI adoption grows at record-setting speeds and computing demands increase, hybrid processing is more important than ever. But just like traditional computing evolved from mainframes and thin clients to today’s mix of cloud and edge devices, AI processing must be distributed between the cloud and devices for AI to scale and reach its full potential.
In this talk, you’ll learn:
• Why on-device AI is key
• Which generative AI models can run on device
• Why the future of AI is hybrid
• Qualcomm Technologies’ role in making hybrid AI a reality

1:45 PM PDT
4:45 PM EDT

Fireside chat: Building RedPajama

Braden Hancock

Co-founder and Head of Technology
Snorkel AI
Ce Zhang

Ce Zhang

CTO
Together

Foundation models such as GPT-4 have driven rapid improvement in AI. However, the most powerful models are closed commercial models or only partially open. RedPajama is a project to create a set of leading, fully open-source models. In this session Ce Zhang, Together CTO, will discuss the data collection and training processes that went into building the RedPajama models.

2:15 PM PDT
5:15 PM EDT

Day 2 Recap

Devang Sachdev

Devang Sachdev

VP of Marketing
Snorkel AI
4:00 PM PDT
7:00 PM EDT

In-person meetup: Snorkel AI Headquarters – Redwood City, California

Bay Area Data Science Community

Snorkel AI

Join us for an in-person meetup at Snorkel Headquarters in Redwood City! Brews, artisan pizza, stand-up comedic moment and an unofficial Cornhole tournament (seeking a challenger for a Snorkel AI co-founder who has a real game). Bring two friends!

Date: June 8
Time: 4 PM - 7 PM
Address: 55 Perry Street, Redwood City, CA
RSVP: events@snorkel.ai

Parking: Plentiful and close by. Metered for $1/ hour.
CalTrain: Redwood City stop, 3 minute walk

Image

Watch on demand

Watch all of the live sessions on-demand and discover the latest developments in data-centric AI.
Watch on demand