Description
One of the outcomes of the Transformation Plan, through the Digital Economy of Language and Artificial Intelligence Line, has been the establishment of an international teaching and research network called ATLAS. ATLAS (Advanced Technologies for Language Analysis Systems) is a global consortium of researchers and educators dedicated to advancing the frontiers of language technology through collaborative innovation. Our network brings together experts in natural language processing, computational linguistics, and artificial intelligence to tackle the challenges of human-machine communication.
ATLAS fosters knowledge exchange, promotes cutting-edge research, and develops educational frameworks for the next generation of language technology specialists. Through international cooperation, we aim to bridge geographic and linguistic barriers as well as to enhance human-AI interaction across disciplines. ATLAS was established at the University of La Rioja in June 2024 with the objectives of initiating mobility and exchange actions, collaborating in the proposal of attractive educational offerings in the field of computational linguistics related to artificial intelligence, and carrying out actions aimed at obtaining joint funding from international agencies.
One of ATLAS's founding principles is interdisciplinarity. We are convinced that advances in understanding human language, its relationship with other cognitive sciences, and ethical and efficient interaction between humans and machines can only be addressed by interdisciplinary research and teaching teams composed of computing engineers, scientists, and linguists. The other founding principle of ATLAS is the commitment to efficient artificial intelligence, requiring resources that can be managed by medium-sized organisations such as universities and small and medium enterprises, and that maximises the data from data both well-documented languages and from languages with less available treebanks and datasets.
Programme
Monday, July 7
09:00–09:30 horas
Welcome
09:30–10:00 horas
Opening session
10:00–11:00 horas
Plenary lecture. Safeguarding Efficacy in Large Language Models: evaluating Resistance to Human-Written and Algorithmic Adversarial Prompts
Seun Ajao
Manchester Metropolitan University
This work evaluates Large Language Model (LLM) security by comprehensively assessing model robustness and attack efficiency across four categories of jailbreaks (human-written, Greedy Coordinate Gradient (GCG) and Tree-of-Attacks-with-pruning (TAP). Four popular LLMs are evaluated against 200 prompts from each category to calculate attack success rates. Results show that human-written and AutoDAN attacks are the most successful against Phi-2, while Llama-2 demonstrates the greatest robustness. GCG and TAP attacks show limited efficiency against their target model (Llama-2) but demonstrate high transferability to GPT-4 and Phi-2. Areas of weakness are identified for each model: Phi-2 shows the least robustness to socioeconomic harms, information safety, and malicious use prompts; Llama-2 performed poorly with socioeconomic harms and information safety; GPT models showed vulnerability to malicious use prompts.
11:00–12:00 horas
Networking coffee for speakers and organisers
12:00–12:30 horas
Reflections on Building and Deploying LLMs in Production
Tony Russell-Rose
Goldsmiths University of London
This talk explores the use of language models to support academic search and systematic review, focusing on the generation of query suggestions through knowledge-based methods, context-free models, and large language models (LLMs). Drawing on two complementary studies - an offline evaluation using real-world search strategies, and an online user study - we compare the effectiveness and user perceptions of various approaches. We also reflect on the practical challenges of deploying NLP systems in production, and share insights from our ongoing migration from custom-built models to hosted LLM services, with implications for scalability, cost, and development practices.
12:30–13:00 horas
ROBOT-TALK Project for the Recognition of the Robotic Origin of Texts: Methodologies and Results
Ana Fernández Pampillón
Doaa Samy Khalil Shawer
Universidad Complutense de Madrid
The detection of automatically generated texts—commonly referred to as Text Machine Generation (TMG)—has gained increasing attention due to the growing impact of Large Language Models (LLMs) in producing texts of high grammatical and semantic quality. These texts may, in many cases, be indistinguishable from those written by humans (Casal & Kessler, 2023; Uchendu et al., 2023; Jones & Bergen, 2024, among others). While the high quality of such content offers clear benefits, its potential misuse for malicious or criminal ends poses substantial threats to public safety. Documented uses of these texts encompass disinformation campaigns, reputational damage, fraudulent impersonation, and threatening communications (Pavlyshenko, 2022, among others). In such contexts, the availability of reliable methods and tools to distinguish automatically generated texts from those authored by humans becomes essential (Maloyan et al., 2022).
The automatic or semi-automatic detection of texts produced by LLMs remains an unresolved challenge. Consequently, the primary objective of our research is to examine how the methods for identifying machine-generated texts can be improved. This investigation focuses specifically on the Spanish language and is conducted within the framework of the project “ROBOT-TALK: Recognition of the Robotic Origin of Texts. Task Automation and Linguistic Knowledge,” PID2022-140897OB-I00, funded by the Spanish Ministry of Science and Innovation.
The initial hypothesis supporting this research relies on observing significant linguistic differences between texts generated by Large Language Models (LLMs) and those written by humans (Alonso Simón et al., 2023). Additionally, our hypothesis proposes that each LLM possesses a unique writing style or an “ idiolect”.
Our participation in Red ATLAS will focus on presenting the results obtained from the analysis of potentially distinctive linguistic features of texts generated by Large Language Models (LLMs) using a specially created corpus, the ROBOT-TALK corpus. It the first comparable corpus of Spanish texts—including scientific linguistics articles, news reports, and film reviews—authored by humans and by four different Large Language Models (LLMs): OpenAI’s ChatGPT-3.5-turbo, OpenAI’s ChatGPT-4, Google’s Gemini, and Mixtral AI’s Mixtral-8x7B-Instruct-v0.1.
In addition, we will present the results obtained from the automatic baseline classification of human-generated versus machine-generated texts using machine learning classifiers without any explicit linguistic features, as well as our current lines of research. Finally, we will conclude the presentation by sharing and discussing with the attendees our preliminary conclusions regarding the challenges of identifying AI generated texts.
13:00–14:00 horas
Business meeting
14:00 horas
Networking lunch for speakers and organisers
21:30 horas
Networking dinner for speakers and organisers
Tuesday, July 8
9:30–10:00 horas
Lyrics information processing: The case of Flamenco
David Sánchez Martín
Universidad de las Islas Baleares
Automated analysis of lyrics can be applied to author classification, detection of music styles or categorization based on age, just to mention a few recent examples. On the other hand, computational folkloristics applies computational techniques to folklore with the aim of understanding oral traditions from a quantitative perspective. Here, we discuss a computational approach to Flamenco, a type of music that deeply represents the intangible cultural heritage of Andalusia.
Specifically, we analyze a corpus of 2000 Flamenco lyrics and find that Flamenco styles are characterized by unique lexical features. Our main result is based on a supervised Multinomial Naive Bayes algorithm that allows us to classify the main Flamenco genres employing their lexical variation. Further, we identify the semantic fields that are essential in each genre and propose a metric to quantify the inter-genre distance.
Using this, we shed light on the historical connections among Flamenco styles and the branches in which the different genres are structured.
10:00–10:30 horas
Teaching (with) AI in Humanities and Social Sciences
Ondřej Tichý
Charles University Prague
One thing is clear about the role of AI in higher education: it is unavoidable. However, many other aspects remain uncertain. This paper aims to provide illustrative examples, offer several suggestions, and—most importantly—foster a discussion about how and in what contexts AI should be both taught and used in the context of humanities and social sciences.
The official Recommendations regarding the use of generative artificial intelligence for university educators at Charles University advise educators to “Monitor developments in AI tools and spend some of your time exploring their capabilities. Check out what they can do, how they can benefit your work, and how reliable they are or aren’t... Actively use these tools where appropriate. Encourage students to use AI tools while respecting their varying levels of knowledge and skills.”
These recommendations are, perhaps necessarily, somewhat vague—particularly regarding questions such as: To what extent should teachers and students study the theory behind large language models to truly understand their capabilities and limitations? How should AI be studies and taught? In which areas is the use of AI most beneficial, and where might it pose the greatest challenges?
10:00–10:30 horas
A semantic graph motivated and ethics by design strategy for AI solution development – a case study approach.
Kulvinder Panesar
University of Bradford
2025 is the year of the graph and deploying ethically engaged and responsible practices in generative AI solutions - amongst other significant milestones in the transformative generative AI space. We take a case study approach using a live project to explore both goals. Firstly, KGs can be used as semantic-RAG (Retrieval Augmented Retrieval) feeding into the creation of generative AI applications with results having better accuracy. We discuss ways of how KGs can achieve this and how it can be an impactful technology, and a critical enabler for Composite AI: a synergy of data driven data and symbolic AI (meaning-driven) (Gartner, 2024) and review a specific knowledge graph solution. Secondly, we explore aspects of ethically engaged and responsible practices in AI solutions where ethics-by-design strategy motivated by risk factors, privacy by design, AI ethics, data ethics, regulation and compliance come into play. We review how an ethical toolkit can help to manage a developing AI solution with aligned tools such as guiderails and humans-in-the loop. Finally, we will provide key take aways and recommendations for evolving AI solution types such as AI Agent and Agentic AI.
11:00–11:30 horas
Michael J. Pidd
University of Sheffield
Rather than presenting the findings of a completed research project, this talk will outline a series of early-stage funding proposals currently in development. The aim is to seek feedback and insights from the audience and, ideally, to identify opportunities for partnership and collaboration as co-investigators. All proposals are being developed by the Digital Humanities Institute at the University of Sheffield and focus on texts, language, and machine learning models. The projects explore the use of AI to support scholarly editing workflows, correct dirty OCR, assist in literary interpretation, and analyse stylistic features in large language models. Potential funders include Horizon MSCA, the Leverhulme Trust, and UKRI’s Arts and Humanities Research Council. All discussion expected to take place under Chatham House Rule.
11:30–12:30 horas
Networking coffee for speakers and organisers
12:30–13:00 horas
Corpus Palaeography: Machine Learning, Scribal Profiling and the Dating and Localisation of Manuscripts Containing Old English, c. 800–1200
Mark Faulkner
Trinity College Dublin
Recent innovations in machine learning and digital typesetting offer the scope for a paradigm shift in philological data extraction, analysis and argumentation, where texts are compared not on the basis of generalisation and exemplification, but millions of individual datapoints. A Handwritten Text Recognition (HTR) model, trained on c. 800 pages (c. 250,000 words) of Old English, copied between 800 and 1200, was trained to recognise a character inventory of almost 600 letter-forms and marks of punctuation with a character error rate of just 4.15%. Through case studies based on the training data of an individual scribe’s handling of s, of 15,500 instances of y across four centuries, and using dimensionality reduction techniques to compare scribes on the basis of 800,000 letter-forms, I show that patterns exist which are invisible to more traditional methods. I further demonstrate that the uncorrected output of the model on an unseen manuscript is accurate enough to predict its relative date. We argue our approach applied to the c. 1,500 scribal stints, 7.5m words and 28m letters across the 45,000 manuscript pages containing Old English has the potential to provide an entirely new understanding of the development of writing, of literary history and the history of the English language. The methodology is also readily applicable to other linguistic traditions.
13:00–13:30 horas
Large Language Models and Inclusive Democratic Spaces: the case of iDem
Serge Sharoff
University of Leeds
Persons with intellectual or psychosocial disabilities or migrants are often excluded from deliberative and participatory political processes, democratic participation in civic platforms and the Web, due to a fairly complex language used by policy makers and institutions. Our vision in the iDem project is to improve this situation by developing Artificial Intelligence models to make democratic spaces more inclusive and accessible for all. In this talk I will focus on early steps in identifying complex linguistic phenomena in the specific context of the project in Catalan, Italian and Spanish.
13:30–14:00 horas
LLM-driven critical analysis of affective polarisation
Carlos Periñán Pascual
Universidad Politécnica de Valencia
Affective polarisation has emerged as a key issue in contemporary democratic societies. Unlike ideological polarisation, which focuses on policy disagreements, affective polarisation involves emotional hostility and mistrust towards those holding opposing political views. Affective polarisation has typically been measured through survey instruments (e.g. feeling thermometer), which have significantly advanced our understanding of the phenomenon. However, these tools present critical limitations, so there is an urgent need for new methodologies that can explore affective polarisation more dynamically. In this context, we can develop an AI-based assistant for LLM-driven critical discourse analysis grounded on a three-layered methodological framework: description (including background, thematic and linguistic analyses), interpretation, and explanation. This methodology is intended to provide a detailed rationale behind the generated analysis so that the user's confidence in the quality of the result can be increased. This talk aims to shed light on the strengths and weaknesses of this qualitative approach to affective polarisation.
14:00 horas
Networking lunch for speakers and organisers
Who is this event intended for?
- Final-year undergraduate students
- Master students
- PhD students
- Academic and research staff
- Postdoctoral researchers
- Independent scholars and researchers
- Professionals working in related fields
Registration
Registration deadline: July 3
Attendance
Attendance to the event is open and free of charge. Participants are welcome to join in person without prior registration or to follow the live streaming freely through the Youtube channel of Universidad de La Rioja.
Certificate of Attendance
To obtain the certificate of attendance, registration is free but required in advance. Only those who have registered and attend at least seven out of the ten sessions, either in person or via synchronous online participation, will be eligible to receive the certificate. Attendance will be monitored for registered participants only.
Etiquetas
Categorías
Noticias relacionadas
“Con la Inteligencia Artificial entramos en una nueva era, como cuando empezó Internet. Esto es una nueva imprenta"
El rector de la UR afirma que Dialnet será una “palanca para el desarrollo de La Rioja, España y Latinoamérica”
Actividades relacionadas