allfeeds.ai

 

Deep Papers  

Deep Papers

Author: Arize AI

Deep Papers is a podcast series featuring deep dives on todays most important AI papers and research. Hosted by Arize AI founders and engineers, each episode profiles the people and techniques behind cutting-edge breakthroughs in machine learning. 
Be a guest on this podcast

Language: en-us

Genres: Mathematics, Science, Technology

Contact email: Get it

Feed URL: Get it

iTunes ID: Get it


Get all podcast data

Listen Now...

LibreEval: The Largest Open Source Benchmark for RAG Hallucination Detection
Friday, 18 April, 2025

For this week's paper read, we actually dive into our own research.We wanted to create a replicable, evolving dataset that can keep pace with model training so that you always know you're testing with data your model has never seen before. We also saw the prohibitively high cost of running LLM evals at scale, and have used our data to fine-tune a series of SLMs that perform just as well as their base LLM counterparts, but at 1/10 the cost. So, over the past few weeks, the Arize team generated the largest public dataset of hallucinations, as well as a series of fine-tuned evaluation models.We talk about what we built, the process we took, and the bottom line results.📃 Read the paper: https://arize.com/llm-hallucination-dataset/Learn more about AI observability and evaluation, join the Arize AI Slack community or get the latest on LinkedIn and X.

 

We also recommend:


Tuning In AT

RendezVousCreation
Nicolas Esposito

MPIR Old Time Radio
Clyde J. Kell

MediaSnackers Podcast
MediaSnackers

Introducing ICT systems - for iBooks
The Open University

Dawnforge Productions Complete Feed
Dawnforge Productions

Gaming By Design
Gaming By Design

The Web Ahead
5by5

masrcast

ITishnikai
ITishnikai

4ARMED
4ARMED

Future Thinkers
Mike Gilliland and Euvie Ivanova