About

What Observatorio Lázaro is, how it works, and who is behind it.

What is Observatorio Lázaro?

Observatorio Lázaro is a project that automatically analyses and extracts the anglicisms appearing every day in the news published by some twenty Spanish press outlets, including elDiario.es, El País, El Mundo, ABC, La Vanguardia, El Confidencial, 20minutos, Agencia EFE, La Marea, El Economista, Marca, Fotogramas, Rolling Stone, Elle and El Mundo Today.

Every day, Lázaro reads the press, detects unassimilated borrowings (mostly anglicisms), records them in a database and publishes the data on this website, where they can be freely searched, compared and downloaded.

How does Lázaro work?

The core of the project is a machine learning model that detects possible foreign words (mostly anglicisms) in Spanish-language press. Although the model was trained to extract anglicisms, it occasionally extracts borrowings from other languages too.

Lázaro's anglicism extraction model is a BiLSTM-CRF that uses embeddings trained on bilingual ES-EN text, as well as subword embeddings (BPE embeddings and character embeddings). Technical information about the model is available in this scientific paper. An earlier version of the observatory (live from April 2020 to August 2022) ran on a CRF model; the details of that earlier model can be read in this document.

The observatory's code and the training corpus are available on GitHub. The trained, ready-to-use detection model is available through HuggingFace and the Python library pylazaro.

Since extraction is fully automatic, the data may contain errors: words wrongly labelled as anglicisms, or anglicisms that go unnoticed.

This talk from the 2021 Trabalengua conference (in Spanish) explains the inner workings of the project:

How to cite

If Observatorio Lázaro or its data are used in research, they can be cited as follows:

@misc{observatoriolazaro,
  author    = {{\'A}lvarez Mellado, Elena},
  title     = {Observatorio L{\'a}zaro: observatorio del anglicismo
               en la prensa espa{\~n}ola},
  year      = {2020},
  url       = {https://observatoriolazaro.es},
  note      = {Accessed: 2026-06-23}
}

To cite the detection model, the reference is the ACL 2022 paper:

@inproceedings{alvarez-mellado-lignos-2022-detecting,
  title     = {Detecting Unassimilated Borrowings in {S}panish:
               {A}n Annotated Corpus and Approaches to Modeling},
  author    = {{\'A}lvarez Mellado, Elena and Lignos, Constantine},
  booktitle = {Proceedings of the 60th Annual Meeting of the
               Association for Computational Linguistics
               (Volume 1: Long Papers)},
  year      = {2022},
  publisher = {Association for Computational Linguistics},
  pages     = {3868--3888},
  doi       = {10.18653/v1/2022.acl-long.268}
}

Publications

Álvarez Mellado, E. Lexical borrowing detection as a sequence labeling task: Data, modeling and evaluation methods for anglicism retrieval in Spanish, PhD dissertation, UNED, 2025.
Álvarez Mellado, E., Lignos, C. Detecting Unassimilated Borrowings in Spanish: An Annotated Corpus and Approaches to Modeling, Proceedings of the 60th Annual Meeting of the ACL, 2022.
Álvarez Mellado, E. Extracting English Lexical Borrowings from Spanish Newswire, Proceedings of the Society for Computation in Linguistics: Vol. 4, Art. 41, 2021.
Álvarez Mellado, E. An Annotated Corpus of Emerging Anglicisms in Spanish Newspaper Headlines, 4th Workshop on Computational Approaches to Code Switching, 2020.
Álvarez Mellado, E. Lázaro: An Extractor of Emergent Anglicisms in Spanish Newswire, MS thesis, Brandeis University, 2020.

Bot: @lazarobot

The new anglicisms Lázaro finds (those the model has not seen before) are posted daily on Twitter and BlueSky, together with their context of appearance and a link to the news article.

What is Lázaro not?

The purpose of the project is to observe, describe and analyse anglicism usage in the Spanish press. Under no circumstances is the goal of the project to shame, point fingers at or criticise the use of anglicisms, or those who use them. Nor is it the purpose of this project to propose alternative translations.

The motivation behind Observatorio Lázaro is not to defend some supposed linguistic purity of Spanish, but to study the phenomenon of lexical borrowing in the press empirically, from a data-driven perspective.

Why Lázaro?

The project's name is a tribute to the Spanish philologist Lázaro Carreter, whose columns on linguistic prescription in the media (and very especially on the use of anglicisms) were very popular in Spain throughout the 1980s and 1990s.

Awards

Adam Kilgarriff Prize, awarded biennially to a linguist under 40 for projects in corpus linguistics, computational linguistics and lexicography.
Archiletras Research Award, granted by Archiletras magazine.
Generation Google Scholarship, awarded by Google.
HDH 2021 Award for best tool or resource, from the Hispanic Digital Humanities association.
Outstanding Corpus Thesis Award 2021 (MS level), from the Institute for Corpus Research at Incheon National University (South Korea).
Karen Spärck Jones 2020 Award for Outstanding Achievement in Natural Language Processing, from Brandeis University (Massachusetts).

In the media

Interview on Un idioma sin fronteras, RNE.
Radiografía del anglicismo en la prensa española, in Archiletras.
Interview on La Tarde, COPE.
20 anglicismos nuevos cada día, a review by Álex Grijelmo in El País.
Julia en la Onda, Onda Cero [minute 1:10:00].
Con la lengua fuera, a podcast by Macarena Gil and Nerea Fernández de Gobeo.
En la punta de la lengua, Cadena SER Burgos.

Research using Observatorio Lázaro

Luján-García, C. & Núñez Nogueroles, E. E. (2024), An Analysis of specialized sports-related Anglicisms: Their use in the European Spanish press nowadays, Revista de Estudos da Linguagem 31 (3), 1071–1115.
Luján-García, C. & Núñez Nogueroles, E. E. (2024), The use of nicknames to refer to Premier League English Football Teams in Spanish digital press, Lengua y Sociedad 23 (2), 535–556.
Luján-García, C. & Núñez Nogueroles, E. E. (2024), On Political dream teams and Financial killers: Sports Anglicisms and Metaphorical Uses in Spanish Digital Press, International Journal of English Studies 24 (1), 77–97.
De Hoyos, J. C. (2023), Anglicismos en la lengua de la economía: entre el préstamo crudo y la adaptación léxica, CLINA 9 (1), 113–134.
Luján-García, C. (2023), Adults only or pets welcome: Use of Anglicisms in the tourist domain in Spanish digital press, Lengua y Habla 27, 267–284.
Luján-García, C. (2023), 'Drink for thought': Anglicismos en el campo de la bebida en la prensa digital española, Borealis 12 (2), 343–360.
Luján-García, C. (2023), Anglicisms in Spanish gastronomy: new words for new eating habits, Sintagma 35, 51–69.
Lillo, A. (2022), Anglicismos coloquiales en la toponimia española: Colloquial Anglicisms in Spanish toponymy, Lebende Sprachen 67 (1), 133–167.
Núñez Nogueroles, E. E. & Luján-García, C. (2022), Percepciones y uso autodeclarado de anglicismos del campo de las TIC por parte de estudiantes universitarios españoles, Miscelánea 66, 41–67.

Credits

Observatorio Lázaro is a project by Elena Álvarez Mellado. The seed of the project was conceived at the BLT Lab (Broadening Linguistic Technologies) at Brandeis University (Massachusetts) under the supervision of Constantine Lignos, and it was developed as a PhD project in the Natural Language Processing and Information Retrieval research group at UNED under the supervision of Julio Gonzalo and Constantine Lignos.