Summarizing trending topics in large collections of Facebook posts is particularly relevant to profile social user activities and interests. However, automatically generating these summaries poses significant challenges due to the high heterogeneity of the input data, the limited fluency of extractive summaries, and the absence of abstractive summarization methods capable of handling multiple posts simultaneously. Existing abstractive models are either not suited to handle large post collections or disregard topic-level text relations. In this work, we present TASP, a novel tool for trending topic detection and summarization from English-written Facebook posts. It trains abstractive summarization models on multi-post collections by leveraging a shortlist of authoritative posts published by renowned newspapers. At inference time, TASP first creates clusters of semantically similar social posts, each one representing a distinct topic, using pre-trained transformer-based language models. Then, it generates abstractive summaries of the clusters for which authoritative information is missing. To the best of our knowledge, TASP is the first available tool suited to abstractive multi-post summarization. We test our approach on a large-scale dataset of real Facebook posts. The results show (1) The higher effectiveness of transformer-based approaches in generating topic-specific post clusters compared to traditional methods. (2) The importance of attending long pieces of text in multi-post abstractive summary generation.
TASP: Topic-based abstractive summarization of Facebook text posts
La Quatra, Moreno;
2024-01-01
Abstract
Summarizing trending topics in large collections of Facebook posts is particularly relevant to profile social user activities and interests. However, automatically generating these summaries poses significant challenges due to the high heterogeneity of the input data, the limited fluency of extractive summaries, and the absence of abstractive summarization methods capable of handling multiple posts simultaneously. Existing abstractive models are either not suited to handle large post collections or disregard topic-level text relations. In this work, we present TASP, a novel tool for trending topic detection and summarization from English-written Facebook posts. It trains abstractive summarization models on multi-post collections by leveraging a shortlist of authoritative posts published by renowned newspapers. At inference time, TASP first creates clusters of semantically similar social posts, each one representing a distinct topic, using pre-trained transformer-based language models. Then, it generates abstractive summaries of the clusters for which authoritative information is missing. To the best of our knowledge, TASP is the first available tool suited to abstractive multi-post summarization. We test our approach on a large-scale dataset of real Facebook posts. The results show (1) The higher effectiveness of transformer-based approaches in generating topic-specific post clusters compared to traditional methods. (2) The importance of attending long pieces of text in multi-post abstractive summary generation.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.