Description | Dates | Organizers | Tasks | Data | Submissions | 2023 |
The data for this task is a sequence of web document collections and queries provided by Qwant.
Description of the DataQueries:
The queries are extracted from Qwant’s search logs, based on a set of selected topics. The query set was created in French and was automatically translated to English. To deal with the translation quality, 4 translations are provided for each query, sorted by their estimated translation probability.
Documents:
The document collection includes documents that are selected to be retrieved for each query. The first step for creating the document collection is to extract from the index the content of all the documents that have been displayed in SERPs for the queries that we selected. In addition to these documents, potentially non-relevant documents are randomly sampled from Qwant index in order to better represent the nature of a Web test collection. A random sampling process has been applied to alleviate bias and prevalence of relevant documents. Filters were applied to exclude spam and adult content.
Relevance estimates:
The relevance estimates for LongEval-Retrieval are obtained through automatic collection of user implicit feedback. This implicit feedback is obtained with a click model, based on Dynamic Bayesian Networks trained on Qwant data. The output of the click model represents an attractiveness probability, which is turned to a 3-level scale score (0 = not relevant, 1 = relevant, 2 = highly relevant). This set of relevance estimates will be completed with explicit relevance assessment after the submission deadline.
The overview of the data creation process is displayed in the Figure below:
Participants to LongEval 2024 can use the LongEval 2023 for training as well.
If you experience any problems with loggin to the Lindat/Clarin website, please first check the instructions and contact the organizers.
More details about the collection can be found in a paper: P. Galuscakova, R. Deveaud, G. Gonzalez-Saez, P. Mulhem, L. Goeuriot, F. Piroi, M. Popel: LongEval-Retrieval: French-English Dynamic Test Collection for Continuous Web Search Evaluation.