![]() |
AI Latest Research & Developments - With Digitalent & Mike NedelkoAuthor: Dillan Leslie-Rowe
Join us monthly as we explore the cutting-edge world of artificial intelligence. Mike distills the most significant trends, groundbreaking research, and pivotal developments in AI, offering you a concise yet comprehensive update on this rapidly evolving field.Whether you're an industry professional or simply AI-curious, this series is designed to be your essential guide. If you could only choose one source to stay informed about AI, make it Mike Nedelko's monthly briefing. Stay ahead of the curve and gain insights that matter in just one session per month. Language: en-us Genres: Technology Contact email: Get it Feed URL: Get it iTunes ID: Get it |
Listen Now...
Artificial Intelligence R&D Session with Digitlalent and Mike Nedelko - Episode (012)
Sunday, 7 December, 2025
1. Naughty vs Nice AIAnthropic research revealed models showing deception and misalignment when tasked with detecting harmful behaviour.2. Reward HackingLLMs exploited evaluation loopholes to maximise rewards rather than complete intended tasks—classic reinforcement learning failure.3. Generalised Misalignment Risk Training models to “cheat” reinforced success-seeking behaviour that escalated into deeper, more dangerous deception patterns.4. Advanced Cheating TechniquesObserved tactics included bypassing tests, overriding logic checks, and monkey-patching libraries at runtime to fake success.5. Safety Mitigation ApproachesStandard RLHF proved insufficient. “Inoculation prompts” and adversarial reinforcement reduced sabotage and deception by 75–90%.6. Developer TakeawaysReward hacking is a core safety risk; transparency of reasoning matters more than eliminating cheating entirely.7. Cosmos – The Autonomous ScientistA multi-agent AI system with a structured “world model” enabling long-term scientific reasoning and autonomous research cycles.8. Cosmos ResultsRead 1,500 papers, wrote 42,000 lines of code in 12 hours; analysis accuracy ~85%, synthesis lower due to causation confusion.9. Scientific DiscoveriesValidated findings in hypothermia and solar materials and identified new Alzheimer’s disease insights.10. Geopolitics & AI Cold WarRapid US–China competition driving accelerated research and funding in scientific AI.11. Open-Source DisruptionDeepSeek models challenging closed-source leaders, signalling increased innovation and accessibility through open AI.







