Fine-tuning an LLM Judge to Reduce Hallucination

Webinar
July 17 at 8:00am PT

What to expect?

In this webinar, we explore the potential of leveraging out-of-domain data to enhance the finetuning of MistralAI language models for detecting factual inconsistencies, also known as hallucinations.

Inspired by Eugene Yan’s article on bootstrapping hallucination detection, we use the Factual Inconsistency Benchmark (FIB) dataset and initially finetune a MistralAI-based model solely on this dataset, achieving limited success. We then employed pre-finetuning on Wikipedia summaries from the Unified Summarization Benchmark (USB) before applying task-specific finetuning on FIB.

This approach significantly improved performance. Our methodology incorporates Weights & Biases Weave to automate model evaluation, demonstrating that pre-finetuning on related but out-of-domain data can effectively bootstrap the detection of factual inconsistencies, thus reducing the need for extensive task-specific data collection.

This technique offers a promising strategy for enhancing the accuracy and applicability of natural language inference models in production environments.

Attend for the live Q&A interaction with the speaker or watch on-demand.

Register Now

Our Speakers

Sophia Yang

Head of Developer Relations

Mistral AI

Thomas Capelle

ML Engineer

Weights & Biases