Uncategorized

Digital Commons and AI Day

Wikimedia Canada and INRS are organizing the Digital Commons and AI Day on November 1, 2024 at the Hôtel Concorde in Quebec City, with the support of the Consulat Général de France à Québec and the CRIHN. The conferences will be in French only.

This day aims to open a discussion on the place of knowledge commons as a training resource for AI in French-speaking territories. Large language models (LLMs, such as those behind services like ChatGPT) have an Anglo-Saxon bias, as they are mostly trained on English-language corpora. Training AI on corpora in other languages, including French, but also the languages of French-speaking territories (indigenous languages, regional languages), is therefore crucial to maintaining cultural diversity when accessing knowledge via these new tools. This challenge is viewed as a sovereignty issue in French and Quebec public policy.

This event ties in with the WikiConvention francophone, which takes place on November 2 and 3 in the same venue.

The knowledge commons, and Wikimedia projects in particular (including the Wikipedia encyclopedia and Lingua Libre’s oral resources), provide invaluable data for training AI in languages with limited resources. Similarly, heritage collections (national libraries and archives, audiovisual collections) are coveted for training French-speaking AIs. In both cases (the knowledge commons and the public domain), the question arises of the types of relationships (economic, legal and ethical) to be built with AI industry players.

Dialogue between Wikimedia projects and public heritage institutions on these issues remains very limited to date. Yet it is crucial to deepen these exchanges at a time when relationships and models are crystallizing that will have an impact on the years to come. The aim of this day is to bring together players from the commons, the public sector and AI industry to discuss and envisage new avenues for collaboration.

Two themes are proposed to explore these issues:
– Linguistic and cultural diversity in AI models: what contribution can the commons and the public domain make?
– Commons and the public domain: beyond the provision of data for model training, what sustainable relationships?

Program (in French only):

9:00 – 9:15Ouverture de la journée par Louis Germain, directeur général de Wikimédia Canada et Maryana Iskander, PDG de la Fondation Wikimédia (intervention vidéo)
9:15-10:15Panel – Quelles relations entre Wikimédia et les grands modèles de langue (LLMs) ?
Animation par Jenny Ebermann, directrice exécutive de Wikimédia Suisse
Intervenantes et intervenants :
– Rémy Gerbet, directeur exécutif de Wikimédia France
– Anastasia Stasenko par zoom, co-fondatrice de la start-up pleias
– Liam Wyatt, gestionnaire de programme à Wikimedia Enterprise
10:15-10:45 Pause
10:45 – 11:45Panel – Bibliothèques francophones et corpus d’entraînement des IA
Animation par Nathalie Casemajor, professeure à l’Institut national de la recherche scientifique
Intervenantes et intervenants :
– Jean-Philippe Moreux, chef de mission IA à la Bibliothèque nationale de France
– Viriya Thach, responsable du secteur stratégie numérique et intelligence d’affaires à Bibliothèque et Archives nationales du Québec
11:45 – 12:30 Atelier – Enjeux communs entre Wikimédia et acteurs publics
Animation par Nathalie Casemajor, professeure à l’Institut national de la recherche scientifique
12:30 – 14:00Pause du midi
14:00 – 15:00Panel – Diversité dans les IA francophones
Animation par Ayla Rigouts Terryn, professeure à l’Université de Montréal
Intervenantes et intervenants :
– Lucie Gianola, chargée de mission pour les technologies, la recherche et l’innovation à la Délégation générale à la langue française et aux langues de France, Ministère de la Culture, France
– Thomas Mboa, chercheur en résidence au Centre d’expertise international de Montréal en intelligence artificielle
15:00 – 15:15Pause
15:15 – 16:15Atelier – Pistes d’action croisée pour le futur
16:15 – 16:30Clôture de la journée