Presentación
Este encuentro internacional reúne a expertos en el ámbito de la lingüística computacional, del procesamiento del lenguaje natural (PLN) y de la inteligencia artificial (IA) aplicada al lenguaje. El programa del evento incluye dos conferencias plenarias, impartidas por el Profesor Carlos Periñán Pascual (Universidad Politécnica de Valencia) y por el Profesor Ruslan Mitkov (University of Lancaster) y nueve papers que se han distribuido en tres bloques temáticos:
- La enseñanza de PLN en la era de la inteligencia artificial.
- Investigación en lingüística computacional y PLN con GPT y modelos de lenguaje grandes (large language models).
- Experiencias interdisciplinares en el área de las humanidades computacionales.
La temática de este evento se alinea con los ejes vertebradores del proyecto de Economía Digital del Lenguaje, que se resumen en las diversas aplicaciones de PLN a tareas de comprensión, interpretación y generación de lenguaje, los avances de la IA en el ámbito lingüístico, la oferta de formación universitaria especializada que capacite a profesionales en este campo, así como de actividades, eventos y cursos que pongan en contacto a estudiantes, egresados, profesorado universitario, empresas, entidades colaboradoras y expertos en PLN e IA aplicados al lenguaje. Esta actividad de networking internacional pretende, por tanto, vincular a expertos investigadores y docentes de un conjunto de universidades europeas y americanas que trabajan activamente en alguna de las líneas previamente mencionadas, compartir enfoques en torno a ellas y crear una red de contactos que permita abrir líneas de trabajo conjunto en un futuro.
La lengua de todas las sesiones es el inglés.
This international meeting brings together experts in the fields of Computational Linguistics, Natural Language Processing, and Artificial Intelligence applied to language. The program includes two keynote sessions, delivered by Dr. Ruslan Mitkov (University of Lancaster) and Dr. Carlos Periñán Pascual (Polytechnic University of Valencia) and nine papers distributed across three thematic blocks:
- Teaching NLP in the era of Artificial Intelligence.
- Research in Computational Linguistics and NLP with Large Language Models.
- Interdisciplinary experiences in the area of computational humanities.
The theme of this event aligns with the main axes of the Digital Economy of Language project, which are summarized in various applications of NLP to language comprehension, interpretation, and generation tasks, advances in AI in the linguistic domain, the provision of specialized university training for professionals in this field, as well as activities, events, and courses that gather students, graduates, university faculty, companies, and experts in NLP and AI applied to language. Thie meeting will connect expert researchers and university instructors from international universities who are actively working in any of the aforementioned lines, share approaches around them, and create a network of contacts that will allow to open lines of joint work in the future.
The language of all sessions is English.
Programme
9.00 a 11.00 horas
Teaching NPL in the era of artificial intelligence
Teaching Digital Humanities to the students of Philology
Ondřej Tichý
University Charles in Prague
Building Competences of Computational Linguists in the AI Era: Objectives and Challenges at Complutense University of Madrid
Ana María Fernández-Pampillón Cesteros
Doaa Samy Khalil Shawer
Universidad Complutense de Madrid
Assessment Practice for Higher Education in the Era of Generative AI
Seun Ajao
Manchester Metropolitan University
Reflective experiences of teaching both NLP and research informed multimodal AI solutions
Kulvinder Panesar
University of Bradford
Computational linguistics is a relatively mature field which has previously been of interest largely to students with a background in either quantitative linguistics or computer science. But the rapid rise of generative artificial intelligence has created a tremendous interest in computational linguistics (i) from more traditional linguists, (ii) from researchers in entirely different fields, and (iii) from the general public. This interest creates a new challenge of how to teach students from such varied backgrounds about CL and NLP. In this talk I present some lessons learned from my own experiences in teaching computational linguistics to non-computational students, including experience designing a MOOC about NLP on edX.
The talk will cover our experience in teaching various techniques and methodologies of Digital Humanities to students of (especially English) philology. The scope of methodologies ranges from digitization and encoding (through various projects like An Anglo-Saxon Dictionary Online https://bosworthtoller.com/, the Database of MEdieval CZech textual sources in translation https://mecz.kreas.ff.cuni.cz/ or the Thesauri of Czech https://najdislovo.cz/ as well as summer schools), data-mining or web-scraping, data wrangling and transformation (in Data Processing course), diachronic language corpus construction, transformation, querying and analysis (Introduction to English Diachronic Corpora) to statistical analysis, visualisation, machine learning and AI powered classification (in Digital Methodologies and AI in Social Sciences and Humanities, as well as in Research Methods in Applied Linguistics). This year, the Faculty of Arts, Charles University has also founded a Center for Digital Humanities (https://cdh.ff.cuni.cz/) whose goal is to foster and coordinate not only DH research, but also teaching of digital methodologies to students of social sciences and humanities. The talk will also cover our initial experiences in this effort and our plans for the future.
Over the last few years, Large Language Models (LLMs) not only have made a leap forward in the Natural Language Processing (NLP) but have also impacted other areas of Artificial Intelligence such as knowledge representation or automatic reasoning (for tasks such as problem solving, planning, learning). Such advances in the linguistic capacity thanks to LLMS have made a huge difference in how intelligent these machines are.
In this dynamic panorama and in this article/presentation, we would like to open a multilateral discussion with the different stakeholders (academia, industry, public administration, civil society) to share the different experiences concerning the relevant role of the “computational linguist” in the AI era highlighting three issues: 1) How to shape the future of the “computational linguists” in this era? 2) What are the skills he/she should have in order to meet the growing needs of the NLP and the AI field? 3) What are the best practices and experiences in building such competences?
In this open discussion, we will share our experience at UCM on how we envisage the pivotal role of the computational linguist within the growing demand for language specialists and professionals. In this respect, we will assess the keys for a successful development of competences through training and teaching new generations of professionals, capable of understanding, developing, evaluating and improving these models across the different NLP tasks. In this sense, over generations, the Complutense University of Madrid has decided to take a stance in favor of building capacities and competences of computational linguists with a wide and proven experience since 1996.
This presentation aims to contribute in gaining insights on “Teaching NPL in the era of artificial intelligence” by raising the key questions and challenges in training computational linguists and preparing them in order to successfully integrate into the professional and research field of NLP.
The higher education (HE) sector benefits every nation’s economy and society at large. However, their contributions are being challenged by advanced technologies like generative AI tools. We provide a comprehensive demonstration of the use of generative AI tools towards assessment design, and pedagogic practice, and evidence of their impact on learning. We provided a detailed review of the use of generative AI tools in academia and experimented using three assessment instruments from the data science, data analytics, and construction management disciplines. Our findings are two-fold: first, the findings revealed that generative AI tools possess subject knowledge, problem-solving, analytical, critical thinking, and presentation skills, and thus, can limit learning when used unethically. Secondly, the design of the assessment of certain disciplines reveals the limitation of the generative AI tools. Based on our findings, we make recommendations on how AI tools can be utilised for teaching and learning in HE.
Artificial Intelligence is everywhere and transformative. The power of NLP, conversational AI and generative AI has taken the world by storm from a research, commercial and public perspective. In December 2022, a 100 million users used OpenAI's ChatGPT on a weekly basis. In 2023, it went mainstream in the business world. In 2024, this disruption is and aspects of AI, has the potential to reshape globally businesses, customer engagement and other sectors and thus a game changer. For the future employable NLP/AI/computational linguistics - multidisciplinary practitioners - both teaching, learning, research and project work are fundamental. To address some of this – I will talk about ‘reflective experiences of teaching both NLP and research informed multimodal AI solutions. I will cover: (1) ‘where are we with NLP and generative AI in 2024? (2) what technologies underpin the teaching of this and how does it improve employability? (3) what is nature of the multimodal AI research discussed in the talk? A key take away from the session (4) what strategies are used to enable research informed teaching based on the example discussed?
11.00 a 11.30 horas
Coffee break
11.30 a 12.30 horas
Keynote session: Constructing synthetic corpora with large language models
Carlos Periñán Pascual
Universidad Politécnica de Valencia/Universitat Politècnica de València
Recent advances in pre-trained large language models (LLMs) have accelerated the development of natural language generation systems that produce high-quality (i.e. realistic, coherent, and fluent) texts, as in the tasks of translation, summarisation, creative writing, and question answering, among many others. In this regard, customising the content and style of the generated texts results from imposing task-specific constraints in the prompt, a set of natural-language statements in the form of instructions and examples provided to the LLM to produce the expected output. Therefore, prompting is considered a way of programming pre-trained LLMs without retraining. To this end, a good understanding of the language of prompting is required since the black-box nature of LLMs hinders the process of elaborating prompts that can achieve the desired effect while preventing unwanted text generations. In this context, prompt engineering plays a critical role in designing effective prompts, identifying errors generated by LLMs, devising strategies to fix the errors, and assessing model performance. One of the applications of the emerging field of controllable text generation using LLMs is the construction of synthetic corpora.
To illustrate, we present the process of synthetic corpora development from a prompt engineering approach in the framework of the ALLEGRO project, a multimodal system for reconstructing the state of society as interpreted by the collective intelligence of social media users. In other words, ALLEGRO analyses user-generated content to extract relevant knowledge for detecting problems in a given community, thus considering users as witnesses of society. One of the modules in ALLEGRO is DIAPASON, which can explore English and Spanish text messages in user-generated content units by integrating natural language processing, text mining, and knowledge engineering techniques. In the ontology of DIAPASON, each community problem type (e.g. animal control, parking, road surface, sewers, etc.) is described by a problem schema, a formal representation that contains semantic concepts and pragmatic functions aimed at dealing with the propositional and non-propositional dimensions of the meaning of each problem type. The challenge is to leverage problem schemas to automatically construct prompts whose semantic, pragmatic, and discourse constraints steer the creation of a synthetic training dataset that should be sufficiently robust to make DIAPASON perform well when classifying real-world texts about a wide range of community problem types. As the performance of supervised deep-learning models to classify texts closely depends on the quantity and quality of training data, preparing an extensive, manually crafted annotated dataset is time-consuming, error-prone, and expensive, so synthetically generated data can play a significant role in such systems.
12.30 a 13.30 horas
Research in Computational Linguistics and NLP with Large Language Models
NLP in the era of LLMs: Stochastic parrots in a Blackbox
Serge Sharoff
University of Leeds
AI generation results enriched with simplified explanations based on linguistic features (GRESEL)
Antonio Moreno Sandoval
Universidad Autónoma de Madrid
Large language models (LLMs), such as ChatGPT, have recently made a transformative impact on our lives by reaching or surpassing human performance in many areas considered to require human intelligence, such as education or law. However, despite their empirical success, we do not know whether these models make the right predictions for the right reasons. Future research in this area will greatly benefit from better understanding of their behaviour by aligning them with human knowledge to expose their biases and hallucinations. The talk will explain the possibilities through examples of using such models to predict genre and complexity of texts.
We want to go one step further AI approaches using language models as we will explore Retrieval-Augmented Generation (RAG) technology, which combines natural language generation (NLG) with information retrieval (IR) to improve the quality of answers generated by large language models (LLMs). This technology helps LLMs to generate coherent and fluent text that is contextually accurate and grounded in real-world knowledge using question-answering capabilities.
The central ideas are:
- Question-Answering based on RAG.
- Domains of exploration: financial reporting, fiction (novels), and communication (magazines and postcolonial literature in the Philippines).
We will start from our previous experience in the processing of financial narratives, and we will enter the world of digital humanities with Don Quixote and postcolonial literature.
This is a joint project between the UAM and UNED.
13.30 a 14.30 horas
Lunch break
14.30 a 16.00 horas
Interdisciplinary experiences in computing-humanities
Crisis talk: NLP-informed discourse analysis of the public debate around the energy crisis and climate change
Tony Russell-Rose
Goldsmiths, University of London
Computational sociolinguistics through the lens of social media
David Sánchez
Universidad de las Islas Baleares/Universitat de les Illes Balears
Machine-assisted Approaches to Editing Chaucer
Michael Pidd
University of Sheffield
A prominent media topic in the UK in the early 2020s is the energy crisis affecting the UK and most of Europe. It brings into a single public debate issues of energy dependency and sustainability, fair distribution of economic burdens and cost of living, as well as climate change, risk, and sustainability. In this talk, we investigate the public discourse around the energy crisis to identify how these pivotal and contradictory issues are reconciled in this debate and to identify which social actors are involved and the role they play. We analyse a document corpus retrieved from UK newspapers from January 2014 to March 2023. We apply a variety of natural language processing and data visualisation techniques to identify key topics, novel trends, critical social actors, and the role they play in the debate, along with the sentiment associated with those actors and topics. We combine automated techniques with manual discourse analysis to explore and validate the insights revealed in this study. The findings verify the utility of these techniques by providing a flexible and scalable pipeline for discourse analysis and providing critical insights for climate change – energy crisis nexus research.
Traditional methodological approaches to the study of language variation and change are typically based on interviews that produce data of excellent quality but suffer from two serious limitations. First, the amount of data is scarce. This can be fixed with the use of big data sources, which has resulted in an unprecedented avalanche of content as in the case of social media corpora.
Second, the particular choice of informants and linguistic features is not completely free of ad-hoc assumptions, biases or prejudices. What is needed in this case is an automatic analysis based on computational techniques. A great benefit of leveraging these modern approaches and data would be the integration of variational and social information into NLP systems.
We illustrate this methodology with the discussion of two works. In the first place, we address the issue of cultural regions based on the premise that that cultural affiliation can be inferred from the topics that people discuss among themselves. Then, we find the regional hotspots of words’ usage in Twitter and using principal component analysis and hierarchical clustering we obtain clear cultural areas in the US and the topics of discussion that define them. Interestingly, we find noncontinguous regional patterns, unlike traditional proposals, partly arising from a divide between rural and urban speeches. In our second example, we analyze syntactic variation by mapping departures from standard English in seven thousand administrative areas of England and Wales. Employing income data and a geolocalized microblogging corpus, we find not only that the usage of standard forms depend on people's socioeconomic background, as expected, but more significantly this correlation depends on the mixing of speakers from different socioeconomic classes. Our results suggest that the more different socioeconomic classes mix, the less interdependent the frequency of their departures from standard grammar and their income become.
The C21 Editions is a joint UK-Ireland digital humanities project funded by the Arts & Humanities Research Council and the Irish Research Council. The project aims to advance the state-of-the-art in digital scholarly editing by 1) establishing the protocols and best practices for creating scholarly editions of born-digital content such as social media; and 2) exploring machine-assisted approaches to the editing of digital scholarly editions.
In this talk I will report on the second strand of the C21 Editions Project. We explored the use of generative AI and Machine Learning, alongside open linked data, in order to create an online teaching edition of The Pardoner's Prologue & Tale from Geoffrey Chaucer's The Canterbury Tales. The edition features a diplomatic transcription, a modern translation, scholarly commentaries and annotations, an automatic text-to-speech narration of the Middle English text, and teaching activities that are aimed at training students to engage more critically with AI. The results of these approaches will be demonstrated and discussed, and their implications for the future of scholarly editing will be considered.
16.00 a 17.00 horas
Keynote session: The future of Natural Language Proccessing
Ruslan Mitkov
Lancaster University
Natural Language Processing (NLP) is undergoing dynamic and unprecedented changes as never before. While we have always known that NLP is not a magic technology which always has been far from 100% accurate, the landscape of Language and Translation Technology is changing. First Deep Learning methods and now Large Language Models, have taken the world by storm. This talk will seek to shed a light on the future of Natural Language Processing.
The keynote will the sketch the history of Natural Language Processing and Machine Translation and will review latest advances powered by Deep Learning and Large Language Models. It will then critically look at the employment of LLMs in Natural Language Processing and Machine Translation, reporting on recent original research of the speaker which compares LLMs, Deep Learning and rule-based approaches for selected NLP tasks and applications. This research is seeking to reply to questions such as (i) Are rule-based methods a thing of the past? (ii) Are LLMs always superior to Deep Learning methods? (iii) Where Deep Learning and LLMs do well and where not so well? (iv) Where is Natural Language Processing heading?
At the end of the presentation the speaker will emphasise that he is not a clairvoyant but on the basis of his experience in the field he will attempt to predict the likely future of artificial intelligence as compared to human intelligence taking language as a testbed.
Inscripción y certificado
La entrada al evento es libre hasta completar aforo. El programa del encuentro se estructura en dos conferencias plenarias y tres paneles de comunicaciones. Se emitirán certificados de asistencia a aquellos que asistan al menos a tres de estos bloques (ej.: una conferencia plenaria y dos paneles de comunicaciones).
Fin de plazo de inscripción: 17/06/2024
Coordinadores académicos
Javier Martín Arista
Raquel Vea Escarza
Universidad de La Rioja
ecodigleng@unirioja.es
Plan de Transformación
Economía Digital del Lenguaje e Inteligencia Artificial
Universidad de La Rioja
Etiquetas
Categorías
Noticias relacionadas
“Con la Inteligencia Artificial entramos en una nueva era, como cuando empezó Internet. Esto es una nueva imprenta"
El rector de la UR afirma que Dialnet será una “palanca para el desarrollo de La Rioja, España y Latinoamérica”
Actividades relacionadas