Tasks

This year, the lab provides two evolving test collections, one for the web retrieval IR evaluation and one for scientific article retrieval evaluation.

Task 1. LongEval-Web Retrieval:

Objectives:

The collection aims at answering fundamental questions on the robustness and the stability of Web search engines against the evolution of the data. Regarding Websearch evaluation, LongEval focuses on the following questions:

In comparison to LongEval 2023 and 2024, in this iteration, we enlarge the training and test collections with additional snapshots that will allow fine-grained analysis of changes in the data collection from one snapshot to another.

To assess an information retrieval system, we provide several datasets of a changing Web documents and users’ queries:

In total, we will release 15 datasets considering documents and queries adressed at the specific month snapshot.

Task 2. LongEval-Sci Retrieval:

The second task of the LongEval 2025 Lab is similar to the first task, with the difference that the test collections contain scientific publications acquired from the CORE collection of open access scholarly documents.

Similarly to Task 1, we will use the click information to drive the relevance assessments for the test collections which consists of two main components that contain both the search and click information :

Since this is the first time this task is organized, the number of dataset snapshots is lower than those in the first task.

Evaluation

The submitted systems will be evaluated in two ways:

(1) nDCG scores calculated on provided test sets. Such a classical evaluation measure is consistent with Web search, for which the discount emphasises the ordering of the top results.

(2) Relative nDCG Drop (RnD) measured by computing the difference between snapshots test sets. This measure supports the evaluation of the impact of the data changes on the systems’ results.