Comcast SpeechNet: Weakly Supervised, End-to-End Speech Recognition at Industrial Scale

John Marini

End-to-end automatic speech recognition systems represent the state of the art, but they rely on thousands of hours of manually annotated speech for training, as well as heavyweight computation for inference. Of course, this impedes commercialization since most companies lack vast human and computational resources. In this paper, we explore training and deploying an ASR system in the label-scarce, compute-limited …

Fireside chat: Building RedPajama

Jeanette Price

Foundation models such as GPT-4 have driven rapid improvement in AI. However, the most powerful models are closed commercial models or only partially open. RedPajama is a project to create a set of leading, fully open-source models. In this session Ce Zhang, Together CTO, will discuss the data collection and training processes that went into building the RedPajama models.

Opening Keynote: New introductions from Snorkel AI

Jeanette Price

Join Snorkel AI CEO, Alex Ratner, as he introduces our latest programmatic AI solutions. Designed to optimize foundation model development, enhance data development, and accelerate customization, these new solutions mark a leap in AI technology. Be the first to explore how Snorkel AI makes it practical for enterprises to leverage LLMs and foundation models.

Lessons From a Year with Snorkel: Data-Centric Workflows with SMEs at Georgetown

Jeanette Price

When the Center for Security and Emerging Technology began experimenting with Snorkel, we had two high-level goals. We aimed to address recurring bottlenecks in our ML projects, and improve collaborative workflows between data scientists and subject-matter experts. In this talk, we share takeaways from the half-dozen project teams who used Snorkel in the past year. We identify friction points in …

AI and the Future of Tax

John Marini

AI is transforming the Tax services sector. Learn how organizations are adapting and leveraging generative AI and machine learning to prepare for the future

Leveraging foundation models and LLMs for enterprise-grade NLP

Jeanette Price

In recent years, large language models (LLMs) have shown tremendous potential in solving natural language processing (NLP) problems. However, deploying LLMs in enterprise comes with its own set of challenges, especially when it comes to adapting the models to customer-specific data and incorporating domain knowledge. In this talk, we will explore how Snorkel AI can help address these challenges and …

Leveraging Data-centric AI for Document Intelligence and PDF Extraction

Jeanette Price

Extracting entities from semi-structured documents is often a challenging task, requiring complex and time-consuming manual processes. In this session, we will explore how data-centric AI can be leveraged to simplify and streamline this process. We will start by discussing the challenges associated with extracting from PDFs and other semi-structured documents. We will explore how they can be overcome using Snorkel’s data-centric approach. Finally, we …

Applying weak supervision and foundation models for computer vision

Jeanette Price

In this session, we will explore the latest advancements in computer vision that enable data-centric image classification model development. We will showcase how visual prompts and fast parameter-efficient models built on top of foundation models provide immediate feedback to rapidly iterate on data quality and model performance resulting in significant time-savings and performance improvements. Moreover, we will delve into the …

Tackling advanced classification using Snorkel Flow

John Marini

In this talk, we’ll discuss the key challenges and approaches for productionizing classification models in the age of foundation models. To start, we’ll highlight common but underrated challenges related to label schema definition, high cardinality, and multi-label problem formulations. We’ll dive into specific user experiences in Snorkel Flow to overcome these challenges, including ways to leverage foundation models, targeted error …

Accelerating information extraction with data-centric iteration

John Marini

During this session, we’ll discuss practical workflows for building enterprise information extraction applications. We’ll start with an end-to-end deep dive into “sequence tagging” tasks in Snorkel Flow, where we’ll highlight how teams of data scientists and subject matter experts can rapidly build powerful, zero-to-one models. In doing so, we’ll cover the key annotation, error analysis, and model-guided iteration capabilities that …