allfeeds.ai

 

The Domestic Yak  

The Domestic Yak

Author: The Domestic Yak

The world is a wild place. Ajax & Ed help make the world a little less chaotic. Join us as we discuss popular topics & special interests.
Be a guest on this podcast

Language: en

Genres: News, Tech News

Contact email: Get it

Feed URL: Get it

iTunes ID: Get it


Get all podcast data

Listen Now...

Auditing LLMs & Hidden Objectives
Episode 34
Monday, 17 March, 2025

This episode summarizes: Auditing language models for hidden objectives by Samuel Marks Et.al.Submitted on March 14th 2025 https://arxiv.org/abs/2503.10965Investigated the feasibility of alignment audits by training a language model with a hidden objective: to exhibit behaviors it believes reward models favor, even if undesirable. Three teams successfully identified the hidden objective using techniques like interpretability tools, behavioral attacks, and training data analysis.

 

We also recommend:


Digital World with Game Changers, Presented by SAP
Bonnie D. Graham

IT
IT

Designing Interactive Systems I '18
Prof. Jan Borchers

Real Estate of the Future PropTech PodCast
Nikki

Archives
a.k.a. HMT

Coffee Break
Shack Productions

Un poquito de mucho.
Meje

360Podcast - Virtuaalitodellisuuden ABC
360Podcast

Weekly Standup with Rich and Amy
Amy Yee

Green Frog LIVE
Green Frog LIVE

Der e-volution-Podcast: Kostenloses Know-How für Elektrotechniker
Hager Vertriebsgesellschaft mbH & Co. KG

VR