This paper presents an NLP research system designed to geolocate tweets within Italy, a country renowned for its diverse linguistic landscape. Our methodology consists of a two-step process involving pre-training and fine-tuning phases. In the pre-training step, we take a semi-supervised approach and introduce two additional tasks. The primary objective of these tasks is to provide the language model with comprehensive knowledge of language varieties, focusing on both the sentence and token levels. Subsequently, during the fine-tuning phase, the model is adapted explicitly for two subtasks: coarse- and fine-grained variety geolocation. To evaluate the effectiveness of our methodology, we participate in the GeoLingIt 2023 shared task and assess our model’s performance using standard metrics. Ablation studies demonstrate the crucial role of thepre-training step in enhancing the model’s performance on both tasks

DANTE at GeoLingIt: Dialect-Aware Multi-Granularity Pre-training for Locating Tweets within Italy

Moreno La Quatra;
2023-01-01

Abstract

This paper presents an NLP research system designed to geolocate tweets within Italy, a country renowned for its diverse linguistic landscape. Our methodology consists of a two-step process involving pre-training and fine-tuning phases. In the pre-training step, we take a semi-supervised approach and introduce two additional tasks. The primary objective of these tasks is to provide the language model with comprehensive knowledge of language varieties, focusing on both the sentence and token levels. Subsequently, during the fine-tuning phase, the model is adapted explicitly for two subtasks: coarse- and fine-grained variety geolocation. To evaluate the effectiveness of our methodology, we participate in the GeoLingIt 2023 shared task and assess our model’s performance using standard metrics. Ablation studies demonstrate the crucial role of thepre-training step in enhancing the model’s performance on both tasks
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11387/167846
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact