Building an LLM pipeline to extract insights from DataCamp's learning videos

Founded in 2013, DataCamp aims to make data science education accessible to all. They offer an online learning platform covering data science, engineering, and AI, serving over 10 million users and more than 2,500 companies globally. DataCamp requested our assistance in assembling a product team to accelerate specific roadmap goals, including the development of an LLM pipeline to automatically extract insights from the webinars on their learnings platform.

Challenge

DataCamp was looking for a temporary product team to architect and build an AI solution to unlock insights from the video data on their learning platform (500+ webinars).

Solution

We set out on a 6-months mission to engineer an LLM pipeline that extracts the narrative from videos, processes it, and automatically generates insightful summaries per video, integrated into the DataCamp platform.

Approach

We developed an AI pipeline leveraging multiple Large Language Models in sequence. We used Pyannote to extract speakers and timestamps from videos, Whisper-1 for transcriptions and GPT 4o for summarization.

Peter Petermann

Peter Petermann

Engineering manager at DataCamp

"In less than a month Panenco was able to setup an effective engineering team that quickly gained a solid understanding of our infrastructure, operating model and technology stack. A great combination of speed and qualitative delivery."

Project image

"We wish the whole DataCamp team the very best of luck on their rapid growth journey, educating people all around the world on data science and AI. We are proud to have a significant contribution to the buildout of your platform. We'll be happy to keep supporting in every way possible."

Koen Verschooten

Koen Verschooten

Operations manager

Let's build. Together!

We'll be happy to hear more about your latest product development initiatives. Let's discover how we can help!