Abstract This dataset provides a JSON export of Dialetti al cinema, a curated corpus of Italian fiction feature films (1944–2000) annotated for dialectal representation in filmic speech. Each record includes film metadata (title, year, directors, production data), external identifiers, and structured linguistic annotations describing dialect type, narrative-pragmatic function (controlled vocabulary), and distribution across character roles (main, secondary, background). Developed within the CHANGES–Spoke 2 framework, the dataset models dialect use as a component of intangible cultural heritage and supports computational analysis, digital humanities research, and reuse in FAIR-compliant environments. This release constitutes a frozen, citable snapshot of the database at the time of deposit. Description This dataset contains a structured JSON export of the Dialetti al cinema database, a digital humanities initiative investigating dialectal variation in Italian cinema as a component of intangible cultural heritage. The project, developed within the CHANGES – Spoke 2 framework (PNRR), maps Italian fiction feature films according to: the dialects represented, their narrative-pragmatic functions, their distribution across character roles, and their geographic anchoring. The dataset reflects the corpus strategy and annotation model described in: Idini, M., Ludovico, L. A., Vena, M. V., & Mazzaggio, G.Dialects in Italian Cinema as Intangible Cultural Heritage: A Geo-Referenced Platform and Annotation Model Compliant with FAIR Principles. The project combines a carefully curated corpus of Italian-produced fiction feature films with a linguistically grounded annotation model centered on the concept of parlato filmico (filmic speech). The data are structured through a controlled vocabulary of functional descriptors that capture the narrative and pragmatic roles of dialects, and are made accessible through a FAIR-compliant web service designed to support findability, interoperability, and reuse. The dataset covers the period from 1944 to 2000 and includes exclusively Italian-produced fiction feature films. The corpus has been defined according to explicit selection criteria, focusing on films recognized by major national and international awards, as detailed in the associated publication. The primary linguistic focus of the dataset is the representation of dialects in filmic speech, analyzed in relation to their narrative function and character distribution....
Dialetti al cinema: Annotated Dataset of Dialectal Representation in Italian Films (1944–2000)
Greta Mazzaggio
2026-01-01
Abstract
Abstract This dataset provides a JSON export of Dialetti al cinema, a curated corpus of Italian fiction feature films (1944–2000) annotated for dialectal representation in filmic speech. Each record includes film metadata (title, year, directors, production data), external identifiers, and structured linguistic annotations describing dialect type, narrative-pragmatic function (controlled vocabulary), and distribution across character roles (main, secondary, background). Developed within the CHANGES–Spoke 2 framework, the dataset models dialect use as a component of intangible cultural heritage and supports computational analysis, digital humanities research, and reuse in FAIR-compliant environments. This release constitutes a frozen, citable snapshot of the database at the time of deposit. Description This dataset contains a structured JSON export of the Dialetti al cinema database, a digital humanities initiative investigating dialectal variation in Italian cinema as a component of intangible cultural heritage. The project, developed within the CHANGES – Spoke 2 framework (PNRR), maps Italian fiction feature films according to: the dialects represented, their narrative-pragmatic functions, their distribution across character roles, and their geographic anchoring. The dataset reflects the corpus strategy and annotation model described in: Idini, M., Ludovico, L. A., Vena, M. V., & Mazzaggio, G.Dialects in Italian Cinema as Intangible Cultural Heritage: A Geo-Referenced Platform and Annotation Model Compliant with FAIR Principles. The project combines a carefully curated corpus of Italian-produced fiction feature films with a linguistically grounded annotation model centered on the concept of parlato filmico (filmic speech). The data are structured through a controlled vocabulary of functional descriptors that capture the narrative and pragmatic roles of dialects, and are made accessible through a FAIR-compliant web service designed to support findability, interoperability, and reuse. The dataset covers the period from 1944 to 2000 and includes exclusively Italian-produced fiction feature films. The corpus has been defined according to explicit selection criteria, focusing on films recognized by major national and international awards, as detailed in the associated publication. The primary linguistic focus of the dataset is the representation of dialects in filmic speech, analyzed in relation to their narrative function and character distribution....I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


