Submissions

Task 1. LongEval-Retrieval:

Submissions will be done using through CodaLab compition, you can access COMPETITION HERE . Each team will need to participat in the competition and submit their runs through it.

All the participants need to provide results for each submitted system on both Lag6 and Lag8 test sets. This allows the organizers to acquire the information about the change of the system’s performance. Participants can submit up to 10 systems. Participants also need to provide a short description of each of the submitted systems.

We further denote the submission of a single system as the run. The individual runs need to be submitted in the TREC format. For each query in each run, it is allowed to return up to 1000 documents.

Each system should be submitted in a single zipped file consisting of a following tuple:

 

team_system.lag6 contains a run (a single TREC file) of the system acquired on the Lag6 test collection.
team_system.lag8 contains a run (a single TREC file) of the system acquired on the Lag8 test collection.
team_system.meta contains a short description of the approach. This file should contain the information which indexing and ranking methods were applied, what type of training was applied and which training data were used. Please specify if you used statistical or neural approaches and if you used sparse or dense retrieval methods. Also, please include the information if the approach uses a single ranking approach, multiple-stages of rankers or any (and what) other type of fusion. Participants also need to describe if they used French data, provided English translations, 1-best or n-best translations, or their own translations and the resources (memory, GPUs, CPUs) used. Participants should use this provided form for filling all the system details.

All the files in a single zipped document should thus correspond to a single system. The name of this system should contain the name of the team and an unique identifier of this system. The suffix of each file should either be lag6, lag8 or meta. For example, if the file contains the submission of the BM25 system of the RSA_TU_UGA team applied on the lag6 collection, the file name can be RSA_TU_UGA_BM25.lag6.

Each system might be either run on French or English data (or their combination). The participants might also opt to use their own translations systems or even manual translations. However, if any manual intervention is used, even for the translation, participants need to clearly state this in the system description.

The submitted runs (i.e. team_system.lag6 and team_system.lag8) should follow the TrecRun format:

qid Q0 docno rank score tag

where:

Example:
1 Q0 nhslo3844_12_012186 1 1.73315273652 mySystem
1 Q0 nhslo1393_12_003292 2 1.72581054377 mySystem
1 Q0 nhslo3844_12_002212 3 1.72522727817 mySystem
1 Q0 nhslo3844_12_012182 4 1.72522727817 mySystem
1 Q0 nhslo1393_12_003296 5 1.71374426875 mySystem

Task 2. LongEval-Classification:

Codalab will be shared with participants upon data completion. You need to register through CLEF website to get all updates

Participants are expected to propose temporally persistent classifiers based on state-of-the-art data-centric or architecture-centric computational methods. The goal is to achieve high weighted-F1 performance across short and long temporally distant test sets while maintaining a reasonable RPD when compared to a test set from the same time period as training. We intend to use RoBERTa as a baseline classifier for our task because it has been demonstrated to be persistent over time.

Participants are expected to propose temporally persistent classifiers based on state-of-the-art data-centric or architecture-centric computational methods. The goal is to achieve high weighted-F1 performance across short and long temporally distant test sets while maintaining a reasonable RPD when compared to a test set from the same time period as training. We intend to use RoBERTa as a baseline classifier for our task because it has been demonstrated to be persistent over time.

Practice [Pre-Evaluation]

You can access the COMPETITION HERE and submit to Practice to evaluate your model and practice submittion process
You can download the training and practice sets from here: Training data with two temporal practice sets

Submission format
When submitting to Codalab, please submit a single zip file containing a folder called “submission”. This folder must contain THREE files:
1. WithinPractice_predictions.txt (with within predictions - within_practice.csv)
2. ShortPractice_predictions.txt (with distant predictions - short_practice.csv)

Evaluation

You can access the COMPETITION HERE and submit to Evaluation to evaluate your model and rank its performance
You can download the evaluation set from here: Three temporal evaluation sets without gold labels

Submission format
When submitting to Codalab, please submit a single zip file containing a folder called “submission”. This folder must contain THREE files:
1. WithinDev_predictions.txt (with within predictions - within_dev.csv)
2. ShortDev_predictions.txt (with distant/short predictions - short_dev.csv)
3. LongDev_predictions.txt (with distant/long predictions - long_dev.csv)

Notes

Use Format checking script for test your formatting and look into examples provided here: with baseline model results

The submissions for each sub-task will be ranked based on the first metric of macro-averaged F1. We encourage participants to contribute to both sub-tasks in order to be correctly placed on a joint leader board, as well as to enable better analysis of their system performance in both settings. Evaluate your model using CodaLab