SVELA at EVALITA 2026: Overview of the Selective Verification of Erasure from LLM Answers Task

Claudio, Savelli; La Quatra, Moreno; Alkis, Koudounas; Flavio, Giobergia

This paper presents SVELA (Selective Verification of Erasure from LLM Answers), a shared task at EVALITA 2026. SVELA challenges participants to develop methods that verify whether a Large Language Model has successfully forgotten specific information. Given models that have undergone unlearning, participants must classify fictional identities or individual facts as retained, forgotten, or never seen during training. The task provides two complementary subtasks: entity-level detection, where entire identities are classified, and instance-level detection, where individual question-answer pairs are evaluated. The task attracted eight registered teams, four of which submitted system description papers, and resulted in more than fifty valid submissions across the two subtasks. The evaluation highlights the intrinsic difficulty of unlearning verification, particularly at the instance level, where less aggregated information and more fine-grained distinctions between retain, forget, and never-seen information are required.

SVELA at EVALITA 2026: Overview of the Selective Verification of Erasure from LLM Answers Task

Savelli Claudio;La Quatra Moreno;Koudounas Alkis;Giobergia Flavio

2026-01-01

Abstract

This paper presents SVELA (Selective Verification of Erasure from LLM Answers), a shared task at EVALITA 2026. SVELA challenges participants to develop methods that verify whether a Large Language Model has successfully forgotten specific information. Given models that have undergone unlearning, participants must classify fictional identities or individual facts as retained, forgotten, or never seen during training. The task provides two complementary subtasks: entity-level detection, where entire identities are classified, and instance-level detection, where individual question-answer pairs are evaluated. The task attracted eight registered teams, four of which submitted system description papers, and resulted in more than fifty valid submissions across the two subtasks. The evaluation highlights the intrinsic difficulty of unlearning verification, particularly at the instance level, where less aggregated information and more fine-grained distinctions between retain, forget, and never-seen information are required.