AI Transcription and Translation in Journalism

The second briefing from the AI and Journalism Research Working Group finds that while journalists are using AI transcription and translation systems, accuracy and accessibility vary, making continued human oversight essential.


TL;DR

Journalists are actively using AI tools for both transcription and translation, but they experience varying levels of difficulty accessing the tools and varying accuracy of the outputs, due to geography, resources and other factors.

AI transcription and translation systems can save time, compared with a fully manual process. Still, human review of AI transcription and translation is critical for ensuring accuracy and identifying potential errors, missing information and language biases.

AI tools for transcription and translation are rapidly improving, but significant gaps remain for “low-resource” languages (i.e., languages with relatively little textual data online that can be used to train AI models).

Training data can produce inherent biases in AI translation and transcription tools, which can lead to inaccurate outputs for journalistic content.

AI tools are “epistemologically indifferent” to truth, meaning they are stochastic models that generate words based on probabilities and do not have a way to determine truth.

Table of Contents

Introduction

CNTI’s AI and Journalism Research Working Group looked at 55 research studies and other articles from computer science, social science and linguistics disciplines to better understand how AI is shaping transcription and translation and what these developments mean for journalism. These studies include data representing a range of geographic contexts and languages.

About

This is the second in a series of reports from the AI and Journalism Research Working Group convened by the Center for News, Technology & Innovation (CNTI). The working group currently consists of 18 cross-industry members from around the world, bringing research, journalism and technology expertise to the discussions.

The goal of the working group is to offer succinct summaries of global research in specific topics at the intersection of journalism and AI. Each quarter, the working group will synthesize the state of research across two to three topics for journalism practitioners, researchers and industry leaders around the world, focusing on actionable recommendations for journalism — not other fields that are concerned with AI.

In each report, we lay out the general findings of the research to date, suggested considerations and/or actions for practitioners and areas where more or new research is needed. This report was prepared by the research and professional staff of CNTI in partnership with several external contributors who collectively authored this briefing. If you have ideas or research findings that are important for CNTI and the working group to include, please email them to info@cnti.org.

What do we mean by “AI”?

This report uses the OECD definition: “An AI system is a machine-based system that, for explicit or implicit objectives, infers, from the input it receives, how to generate outputs such as predictions, content, recommendations, or decisions that can influence physical or virtual environments. Different AI systems vary in their levels of autonomy and adaptiveness after deployment.

Wherever possible, we try to use specific terms rather than “AI” to avoid conflation or confusion. Journalism has been adopting forms of automation for more than 50 years,1 but widespread use of the term “AI” is more recent — and may include both newer technologies and those that have been in use for quite some time.

Findings

The research suggests:

  • Journalists are actively using AI tools for both transcription and translation, but they experience varying levels of difficulty accessing the tools and varying accuracy of the outputs, due to geography, resources and other factors.2
  • AI transcription and translation systems can save time, compared with a fully manual process. Still, human review of AI transcription and translation is critical for ensuring accuracy and identifying potential errors, missing information and language biases.3 The most promising workflows make it easy for humans to review,4 and research suggests human review remains necessary for several reasons, especially in public-facing contexts.
    • AI tools for transcription and translation are rapidly improving, but significant gaps remain for “low-resource” languages (i.e., languages with relatively little textual data online that can be used to train AI models).5 Most of the languages spoken today are considered “low-resource” because there is not sufficient content available online, including languages spoken by tens or even hundreds of millions of people. Even among English speakers, only a limited variety of accents and dialects are transcribed at least mostly correctly by AI-mediated communication software.6
    • Training data can produce inherent biases in AI translation and transcription tools,7 which can lead to inaccurate outputs for journalistic content.
    • AI tools are “epistemologically indifferent”8 to truth, meaning they are stochastic models that generate words based on probabilities and do not have a way to determine truth. This is one reason many existing tools vary in the quality of their outputs for transcription and translation.9

Artificial intelligence (AI) systems are increasingly being used for transcription — the process of converting audio to written form — and translation — the process of converting content in a source language to a target language. Journalist-facing transcription tools like Otter and Trint launched nearly ten years ago,10 and automated translation has been available to everyday users — at least for a limited set of languages — since the release of Google Translate in 2006.11 Advances in AI systems mean that Google Translate and similar programs continue to improve rapidly.12 However, journalists around the world do not uniformly experience the benefits of AI technologies in assisting with transcription and/or translation. Inconsistent access to technology and varying availability of high-quality and verified training data remain major challenges.

Overall, AI models are improving. However, transcription tools remain most accurate for a relatively narrow range of standard American English dialects and accents,13 and translation tools remain most accurate for only a few language pairs.14 There are significant gaps in performance for “low-resource” languages (i.e., languages with relatively little textual data online that can be used to train AI models) as well as concerns about accuracy in those languages. It should be noted that “low-resource” languages include a number of languages spoken by hundreds of millions of people. In fact, English is the language of about 50% of online content; the largest proportion of online content that any other language represents is 6%.15 Thousands of languages make only a minuscule imprint on the internet.16 Some of the ongoing efforts to close the divide between “low resource” and “high resource” languages focus on creating and improving training data. For example, Nigerian start-up Goloka is working with Meta to collect data in five Nigerian languages17 as part of a larger Meta-UNESCO partnership.

AI translation and transcription are rapidly developing areas of research. Nearly all the research articles cited in this briefing are from the last five years, and machine translation and transcription are steadily improving. The working group expects to see continued progress in building AI tools for these uses, especially for languages that have not received as much attention. However, there is also evidence that unverified AI translations are creating misinformation on sites like Wikipedia, and those low-quality translations are being used to train the next generation AI translation tools, leading to worse performance.18

Journalism Use Cases

Journalists regularly use AI tools for transcription and translation to assist in the production of news content, generally with human oversight over the process.19 Journalists also use transcription tools, such as the Houston Chronicle’s Meeting Monitor, to share summaries of government meetings20 with the public. Large-scale automated transcription and translation tools, including A European Perspective and Dubawa, are assisting journalists to report on topics that would otherwise be difficult and time consuming to monitor. A European Perspective consists of 10 broadcasters across nine European countries that exchange content using AI transcription and translation.21 The automatic transcription of audio-visual content and translation between languages encourages greater coverage of European news topics while also using editorial oversight to correct cultural or linguistic details in outputs. Dubawa, an AI fact-checking system in Ghana and Nigeria, was specifically trained using local dialects and accents. The tool transcribes radio broadcasts and checks for mis- and disinformation in several local languages.22 In a similar vein, Paraguayan news outlet El Surti is building a community-based Guaraní language dataset and AI tools.23

The degree to which AI systems have been implemented for transcription and translation depends on newsroom resources and languages in use. Through a series of semi-structured interviews in South Africa, one study finds that AI tools are mostly being integrated in larger newsrooms and used to aid in public-facing translation, particularly at public media outlets with mandates to publish in multiple official languages.24 Still, journalists — especially those in the Global South — are concerned about the utility of AI tools for transcription and translation, given reported challenges with accents and lower accuracy for local languages.25

Newsrooms find that using AI tools for translation saves time but can result in inaccuracies, necessitating human review, especially in audience-facing content. One study found that AI translations of international news in Tanzania were mostly accurate, but about 13% of the sentences included mistranslations, minor ambiguities or inaccuracies that missed cultural details, such as incorrectly translating the English phrase “street food” word-for-word as “food of the road” in Kiswahili, rather than using an idiomatic phrase.26 Similarly, a 2023 study benchmarking AI transcription and translation tools for journalists also found that they can save considerable time even with human review, but translations into English perform better than other languages.27 In general, hybrid translation, in which machine translations are reviewed by human experts, is a promising approach.28

Translation and transcription tools can also be used to personalize news content for different audiences. For example, publishers can add widgets to their websites or apps that would allow audiences to access an automatically generated transcript of video or audio content, or to access all content in the language of their choice. Publishers are increasingly interested in providing these features.29 However, publisher interest is currently far outpacing audience interest: according to the 2025 Reuters Digital News Report, 65% of publishers were actively exploring AI translation content but only 24% of audiences said they were interested in using AI translation.30 Similarly, 75% of publishers were exploring making audio available in text format (and vice versa), but only 15% of audiences expressed an interest. The reasons for these gaps are beyond the scope of this report, but researchers have suggested they may be linked to a lack of awareness of what the features might look like, or to a broader audience distrust of AI in journalism.31 However, audiences are more comfortable with AI translation than with many other newsroom uses of AI.32

What Level of Accuracy is Good Enough?

One question that has not been addressed by research, is what level of accuracy is good enough? For example, is it appropriate to use AI tools if translation and transcription reach a certain level of accuracy? Even a tool with 95% accuracy may still miss crucial cultural and language nuances. Thus, human review and revision are necessary to ensure accuracy and appropriateness for most public-facing content. Future research examining the value of different strategies for improving accuracy and appropriateness, such as diversifying sources of training data, metadata, language-specific models or algorithms, can help determine how to address accuracy standards. However, research can only inform what is, at its core, a value judgment.

Technical Evaluations of AI Transcription and Translation

Models for these types of tasks are rapidly improving,33 and translations between certain language pairs — like Spanish and English — generally perform well.34 Researchers find that while AI tools cannot currently handle cultural details and ambiguity as well as human experts can, they are promising in a number of other contexts, such as (1) translations of medical terms from English to German35 and (2) translations of legal documents across Arabic and English.36 Researchers also find that AI translations from Indonesian to English often rival those of students in translation educational programs when it comes to implementing techniques like paraphrasing and structural transposition.37 Meanwhile, AI transcription is being used to increase coverage of government meetings.38 Summarizing these meetings may be a particularly fruitful use of AI transcription because participation follows a consistent structure and because figurative language and wordplay are rare. Progress in this field continues as research and development of novel techniques to build training datasets advances and as the development of specific translation and transcription models receive more attention.39

While AI transcription and translation technologies are improving, recent research also highlights limitations and shortcomings of these AI tools40 — including those particularly relevant to professional fields like journalism.41 These limitations include the tools’ (1) inability to fully handle language ambiguity and cultural nuance, (2) struggle to perform tasks at the level of human experts and (3) biases in outputs based on personal characteristics and attributes present in speech and text data. Evaluating the quality of these tools is challenging in itself: some metrics are overly simplistic, while more holistic methods for evaluating quality or accuracy are opaque and hard to interpret.42

AI translation tends to focus on words rather than meaning, but languages have a great deal of typological variation and do not necessarily have parallel sentence structure. Moreover, word-level translation often focuses overly on referential meaning (i.e., what something is about) and lacks attention to indexical meaning (i.e., the social functions of language). Languages do not align one-to-one, with words carrying different formality and/or emotional meanings; AI tools may use the incorrect word to reflect the perspective of the speaker.43 For example, professional speech is less formal in English than in Korean44 or Japanese,45 and a more literal translation into those two languages will often be socially inappropriate. The outputs these systems produce also change depending on how the translation is described through prompting, such as using specific requests to retain key themes from the source language versus merely asking for a translation into a given language.46 

There is also evidence that AI translation models produce results with gender biases47 (though these biases are diminishing as models improve) by consistently assigning gender to professions (e.g., assuming doctors are men and nurses are women).48 A review of 133 studies finds that much research on this topic treats gender bias as a purely technical or linguistic problem, rather than examining its social dimensions; these authors also note that most of the studies used machines rather than people to evaluate bias.49 Given the social nature of bias, the authors raise concerns about this evaluation method. In practice, these studies suggest that journalistic content that is translated without careful review may inadvertently produce outputs that include biased pronouns, occupations or perspectives stemming from the AI tool’s training data.

AI transcription tools carry their own limitations, such as a tendency to add content that was never said by the source. This type of error appears more commonly when transcribing speech with longer gaps between words and phrases.50 Strikingly, there are also critical deficiencies in these tools when transcribing audio from people who speak any form of English besides a fairly narrowly defined set of standard American accents, such as both World Englishes51 and African American Vernacular English.52 

Overall, a recurring theme in the literature reviewed by the working group was that AI tools for transcription and translation of “low-resource” languages are severely lacking.53 There are significant gaps between human-written and LLM outputs in languages other than English,54 with journalists in the Global South reporting less confidence in these tools than those in the Global North.55 Among “low-resource” languages, machine translation for signed languages lags even further behind spoken ones.56 The most used signed language data sets are small and frequently rely on interpreted data, which is likely to include considerable interference from spoken languages.57 It is not yet clear how best to represent signed languages computationally, nor how best to evaluate translation between signed and spoken languages.58

Although there are many challenges for AI transcription and translation, greater attention is being focused on “low-resource” languages than before. Two prominent examples, Masakhane and Dataphyte, seek to improve natural language processing (NLP) research across Africa. Yet more needs to be done, including designing tools for local settings — particularly in locations that have thus far received less attention from technology companies and AI developers.

Global perspectives

Which languages are in common use vary from country to country, as does the pervasiveness of multilingualism.

Working group members Joshua Olufemi and Oluseyi Olufemi share their perspective:

“The discussion in this edition centres around three critical issues. First, the limits to the accuracy of existing LLM applications in high-resource languages such as English and Mandarin. The second is the implication of demographic and cultural contexts — such as accent, parlance, and nuances — that determine the output of the AI tools. The third relates to local initiatives for innovation and access to data for training AI tools around transcription and translation of low-resource languages.

“In any case, it is important to expand research and practice beyond just the demand-side effects of journalism’s use of transcription and translation tools. This includes examining supply-side resources, such as Indigenous language content, particularly in broadcast media. In Nigeria, more than 20 Indigenous languages are used in broadcast journalism, representing valuable resources for training AI in low-resource media. Additionally, these languages offer opportunities for media innovation in contexts with limited resources.

“There is great potential to support both the technological development of AI and media’s practical multilingual reality. What is needed now is collaboration across the board — including linguists, media and communication practitioners, AI technologists and development policy actors.”

Where More Research Would Be Helpful

  • Day-to-day use: The research does not yet include a deep understanding of where and when journalists are using AI tools for transcription and translation, nor does it explain where and when they would like to once they feel adequate tools are available. What specific issues are the journalists finding? How might these issues be addressed? 
  • Good-enough accuracy: What level of accuracy is good enough for journalistic content?59 For example, is it acceptable if AI translation tools achieve 95% accuracy when handling text between two languages? The incorrect 5% may provide essential cultural context for audiences in the target language. Research can help inform news organizations and journalism providers to decide what they are comfortable with as current tools are unable to achieve 100% accuracy in every context. Journalism-specific benchmarks for assessing AI transcription and translation tools are also worth developing to address industry-specific needs.
  • Languages studied: The research on bias in machine translation has been limited to relatively few languages — with a bias toward written texts — and has focused primarily on translation into English.60 It also primarily focuses on gender bias in isolation from other social identities and contexts. Expanding research in this area would help news workers make informed decisions about when AI translation is appropriate and when the risks are too high.
  • Downstream effects: There is little research about the downstream effects that bias in the tools might have on journalism. For example, if journalists are working under time pressure, are they less likely to interview sources whose voice automated tools do not transcribe as well? Are journalists identifying and editing gender bias in translation tools, or is it impacting their reporting, their audiences’ understanding or both?
  • Validated data in more languages: If the goal is to use large language models (LLMs) for transcription and translation, considerable attention and effort needs to be placed on building robust LLM training data in non-English and low-resource languages.61 We need more high-quality translation training data62 that has been validated in both the source and target languages. Further research should examine how to do this most effectively and efficiently to create inclusive AI models.
  • Third-party tools: Larger news organizations are developing in-house AI models, particularly adaptations of smaller language models that can be run entirely locally.63 Meanwhile, many smaller newsrooms are relying on pre-existing models from technology developers. More research needs to consider the potential impacts of relying on third-party tools,64 and explore how smaller newsrooms can (1) adapt (fine-tune) existing models to better fit their needs and/or (2) build custom AI tools that are financially viable for them.

Current working group members

A list of current working group members and their affiliations is shown here:

Jaemark Tordecilla
Independent Media Advisor, Philippines

Akintunde Babatunde
Executive Director, Centre for Journalism Innovation and Development

Claudia Báez 
Associate Consultant, Fathm

Jay Barchas-Lichtenstein
Senior Research Manager, Center for News, Technology & Innovation

Madhav Chinnappa
Independent Media Consultant

Utsav Gandhi
PhD Student, University of Illinois Chicago

Samuel Jens
Former Associate Researcher, Center for News, Technology & Innovation

Amy Mitchell
Executive Director, Center for News, Technology & Innovation

Chris Moran 
Head of Editorial Innovation, Guardian News & Media

Sophie Morosoli
Postdoctoral Researcher at the AI, Media & Democracy Lab, University of Amsterdam

Gary Mundy
Director Research, Policy and Impact, Thomson Foundation

Oluwapelumi Oginni
Project Manager, AI Initiatives, Centre for Journalism Innovation and Development

Joshua Olufemi
Executive Director, Dataphyte Foundation

Oluseyi Olufemi
Nigeria Country Director, Dataphyte

Esteban Ponce de León
Resident Fellow, Digital Forensic Research Lab (DFRLab) at the Atlantic Council

Amy Ross Arguedas
Research Fellow at the Reuters Institute for the Study of Journalism

Zara Schroeder
Researcher, Research ICT Africa

Felix M. Simon
Research Fellow in AI and News, Reuters Institute for the Study of Journalism & Research Associate, Oxford Internet Institute, University of Oxford

Scott Timcke
Senior Research Associate, Research ICT Africa

References

Alonso Jiménez, E., & Rosado, J. A. (2024). Un análisis del framing de noticias electorales generadas y traducidas mediante inteligencia artificial generativa (ChatGPT-3). Revista Científica de Información y Comunicación21, 303–333. https://doi.org/10.12795/IC.2024.I21.14

Ananny, M., & Pearce, M. (2025, May 12). How We’re Using AI. Columbia Journalism Review. https://www.cjr.org/feature/how-were-using-ai-tech-gina-chua-nicholas-thompson-emilia-david-zach-seward-millie-tran.php

Asi, N., Fauzan, A., Nugraha, R. F., Binti, J. A. Y. P., & Vanesa, N. (2024). Culturally distinctive features in journalistic text: A case study on students’ vs. AI-generated translations. Yavana Bhāshā: Journal of English Language Education7(1), 54–67.

Beckett, C., & Yaseen, M. (2023). Generating Change: A global survey of what news organisations are doing with AI. The London School of Economics and Political Science. https://www.journalismai.info/research/2023-generating-change

Brandom, R. (2023, June 7). What languages dominate the internet? Rest of World. https://restofworld.org/2023/internet-most-used-languages/

Breaking Language Barriers with AI: Maximizing Accuracy and Efficiency with Machine Translation Technology. (2023). Computer Graphics World46(3), 24–27.

Canavilhas, J. (2022). Artificial intelligence applied to journalism: A case study of the “A European Perspective” (UER). Revista Latina de Comunicación Social80, 1–16. https://doi.org/10.4185/RLCS-2022-1534

Caswell, I., & Liang, B. (2020, June 8). Recent Advances in Google Translate. Google Research. https://research.google/blog/recent-advances-in-google-translate/

Chan, M. P. Y., Choe, J., Li, A., Chen, Y., Gao, X., & Holliday, N. (2022). Training and typological bias in ASR performance for world Englishes. Interspeech 2022, 1273–1277. https://doi.org/10.21437/Interspeech.2022-10869

Chang, K., Chou, Y.-H., Shi, J., Chen, H.-M., Holliday, N., Scharenborg, O., & Mortensen, D. R. (2024). Self-supervised Speech Representations Still Struggle with African American Vernacular English (No. arXiv:2408.14262). arXiv. https://doi.org/10.48550/arXiv.2408.14262

Court, S., & Elsner, M. (2024). Shortcomings of LLMs for Low-Resource Translation: Retrieval and Understanding are Both the Problem. arXiv. https://doi.org/10.48550/ARXIV.2406.15625

De Coster, M., Shterionov, D., Van Herreweghe, M., & Dambre, J. (2024). Machine translation from signed to spoken languages: State of the art and challenges. Universal Access in the Information Society23(3), 1305–1331. https://doi.org/10.1007/s10209-023-00992-1

Dubois, D. J., Holliday, N., Waddell, K., & Choffnes, D. (2024). Fair or Fare? Understanding Automated Transcription Error Bias in Social Media and Videoconferencing Platforms. Proceedings of the International AAAI Conference on Web and Social Media18(1), 367–380. https://doi.org/10.1609/icwsm.v18i1.31320

Fredrikzon, J. (2025). Rethinking Error: “Hallucinations” and Epistemological Indifference. Critical AI3(1). https://doi.org/10.1215/2834703X-11700255

Frey, C. B., & Llanos-Paredes, P. (2025, March 22). Lost in translation: AI’s impact on translators and foreign language skills. CEPR. https://cepr.org/voxeu/columns/lost-translation-ais-impact-translators-and-foreign-language-skills

Ghosh, S., & Caliskan, A. (2023). ChatGPT Perpetuates Gender Bias in Machine Translation and Ignores Non-Gendered Pronouns: Findings across Bengali and Five other Low-Resource Languages. https://doi.org/10.48550/ARXIV.2305.10510

Gondwe, G. (2025). AI in African Newsrooms: Evaluating Translation Accuracy, Reliability, and Cultural Sensitivity in Tanzanian Media. Journalism Practice0(0), 1–20. https://doi.org/10.1080/17512786.2025.2507091

Guo, Y., Conia, S., Zhou, Z., Li, M., Potdar, S., & Xiao, H. (2025). Do Large Language Models Have an English Accent? Evaluating and Improving the Naturalness of Multilingual LLMs (No. arXiv:2410.15956). arXiv. https://doi.org/10.48550/arXiv.2410.15956

Hagar, N., Cai, M., & Gilbert, J. (2025, September 24). Tiny Tools: A Framework for Human-Centered Technology in Journalism. Generative AI in the Newsroom. https://generative-ai-newsroom.com/tiny-tools-a-framework-for-human-centered-technology-in-journalism-e2176dd66cbc

Howcroft, D. M., & Gkatzia, D. (2022). Most NLG is Low-Resource: Here’s what we can do about it. Proceedings of the 2nd Workshop on Natural Language Generation, Evaluation, and Metrics (GEM), 336–350. https://doi.org/10.18653/v1/2022.gem-1.29

Jarnow, J. (2017, April 26). Transcribing Audio Sucks—So Make the Machines Do It. Wired. https://www.wired.com/2017/04/trint-multi-voice-transcription/

Judah, J. (2025, September 25). How AI and Wikipedia have sent vulnerable languages into a doom spiral. MIT Technology Review. https://www.technologyreview.com/2025/09/25/1124005/ai-wikipedia-vulnerable-languages-doom-spiral/

Kahn, G. (2025, May 27). These pioneers are working to keep their countries’ languages alive in the age of AI news. Reuters Institute for the Study of Journalism. https://reutersinstitute.politics.ox.ac.uk/news/these-pioneers-are-working-keep-their-countries-languages-alive-age-ai-news

Kocmi, T., Avramidis, E., Bawden, R., Bojar, O., Dranch, K., Dvorkovich, A., Dukanov, S., Fedorova, N., Fishel, M., Freitag, M., Gowda, T., Grundkiewicz, R., Haddow, B., Karpinska, M., Koehn, P., Lakougna, H., Lundin, J., Murray, K., Nagata, M., … Zouhar, V. (2025). Preliminary Ranking of WMT25 General Machine Translation Systems. arXiv. https://doi.org/10.48550/ARXIV.2508.14909

Kocmi, T., Avramidis, E., Bawden, R., Bojar, O., Dvorkovich, A., Federmann, C., Fishel, M., Freitag, M., Gowda, T., Grundkiewicz, R., Haddow, B., Karpinska, M., Koehn, P., Marie, B., Monz, C., Murray, K., Nagata, M., Popel, M., Popović, M., … Zouhar, V. (2024). Findings of the WMT24 General Machine Translation Shared Task: The LLM Era Is Here but MT Is Not Solved Yet. Proceedings of the Ninth Conference on Machine Translation, 1–46. https://doi.org/10.18653/v1/2024.wmt-1.1

Koenecke, A., Choi, A. S. G., Mei, K. X., Schellmann, H., & Sloane, M. (2024). Careless Whisper: Speech-to-Text Hallucination Harms. The 2024 ACM Conference on Fairness, Accountability, and Transparency, 1672–1681. https://doi.org/10.1145/3630106.3658996

Langer, U. (2025, August 6). How Hearst’s DevHub is Building AI Tools That Work for Local News [Substack newsletter]. News Machines. https://newsmachines.substack.com/p/hearst-dev-hub-ai-tools-that-work

Lee, T. K. (2024). Artificial intelligence and posthumanist translation: ChatGPT versus the translator. Applied Linguistics Review15(6), 2351–2372. https://doi.org/10.1515/applirev-2023-0122

Leiter, C., Lertvittayakumjorn, P., Fomicheva, M., Zhao, W., Gao, Y., & Eger, S. (2024). Towards Explainable Evaluation Metrics for Machine Translation. Journal of Machine Learning Research25.

Levit, M., Huang, Y., Chang, S., & Gong, Y. (2017). Don’t Count on ASR to Transcribe for You: Breaking Bias with Two Crowds. Interspeech 2017, 3941–3945. https://doi.org/10.21437/Interspeech.2017-164

Mari, W. (2024). The Pre-History of News-Industry Discourse Around Artificial Intelligence. Emerging Media, 2(3), 499–522. https://doi.org/10.1177/27523543241279577

McKean, E., & Fitzgerald, W. (2024). The ROI of AI in lexicography. Lexicography11(1), 7–27. https://doi.org/10.1558/lexi.27569

Moghe, N., Fazla, A., Amrhein, C., Kocmi, T., Steedman, M., Birch, A., Sennrich, R., & Guillou, L. (2025). Machine Translation Meta Evaluation through Translation Accuracy Challenge Sets. Computational Linguistics51(1), 73–137. https://doi.org/10.1162/coli_a_00537

Moneus, A. M., & Sahari, Y. (2024). Artificial intelligence and human translation: A contrastive study based on legal texts. Heliyon10(6), e28106. https://doi.org/10.1016/j.heliyon.2024.e28106

Munoriyarwa, A., Chiumbu, S., & Motsaathebe, G. (2023). Artificial Intelligence Practices in Everyday News Production: The Case of South Africa’s Mainstream Newsrooms. Journalism Practice17(7), 1374–1392. https://doi.org/10.1080/17512786.2021.1984976

Newman, N., & Cherubini, F. (2025). Journalism, media, and technology trends and predictions 2025. Reuters Institute for the Study of Journalism. https://reutersinstitute.politics.ox.ac.uk/journalism-media-and-technology-trends-and-predictions-2025

Noll, R., Berger, A., Kieu, D., Mueller, T., O. Bohmann, F., Müller, A., Holtz, S., Stoffers, P., Hoehl, S., Guengoeze, O., Eckardt, J.-N., Storf, H., & Schaaf, J. (2025). Assessing GPT and DeepL for terminology translation in the medical domain: A comparative study on the human phenotype ontology. BMC Medical Informatics and Decision Making25(1), 237. https://doi.org/10.1186/s12911-025-03075-8

Novytska, O., Romanchuk, H., Vorobets, O., Zhornokui, U., Slyvka, L., & Bohdan, V. (2025). Translation of Subtitles: Neurolinguistic and Cognitive Aspects. BRAIN. Broad Research in Artificial Intelligence and Neuroscience16(1), 229–242. https://doi.org/10.70594/brain/16.1/17

Ohumu, S. (2025, May 6). Case Study: Dubawa Audio. You and AI. https://www.youandai.org/p/case-study-dubawa-audio

Ojewale, V., Raji, I. D., & Venkatasubramanian, S. (2025). Multi-lingual Functional Evaluation for Large Language Models (No. arXiv:2506.20793). arXiv. https://doi.org/10.48550/arXiv.2506.20793

Ojo, J., Ogundepo, O., Oladipo, A., Ogueji, K., Lin, J., Stenetorp, P., & Adelani, D. I. (2025). AfroBench: How Good are Large Language Models on African Languages? Findings of the Association for Computational Linguistics: ACL 2025, 19048–19095. https://doi.org/10.18653/v1/2025.findings-acl.976

Okolo, C. T., & Tano, M. (2024, December 12). Closing the gap: A call for more inclusive language technologies. Brookings. https://www.brookings.edu/articles/closing-the-gap-a-call-for-more-inclusive-language-technologies/

Orimemi, V. (2025, April 23). Goloka Analytics partners Meta to include Nigerian languages in AI. TheCable. https://www.thecable.ng/goloka-analytics-partners-meta-to-include-nigerian-languages-in-ai/

Pava, J. N., Meinhardt, C., Zaman, H. B. U., Friedman, T., Truong, S. T., Zhang, D., Cryst, E., Marivate, V., & Koyejo, S. (2025). Mind the (Language) Gap: Mapping the Challenges of LLM Development in Low-Resource Language Contexts. Stanford University Human-Centered Artificial Intelligence. https://hai.stanford.edu/policy/mind-the-language-gap-mapping-the-challenges-of-llm-development-in-low-resource-language-contexts

Prates, M. O. R., Avelar, P. H., & Lamb, L. C. (2020). Assessing gender bias in machine translation: A case study with Google Translate. Neural Computing and Applications32(10), 6363–6381. https://doi.org/10.1007/s00521-019-04144-6

Qingliang, R. (2024). Development of Translator Studies in the Era of Artificial Intelligence: Opportunities, Challenges and the Road Ahead. Pakistan Journal of Life and Social Sciences (PJLSS)22(2). https://doi.org/10.57239/PJLSS-2024-22.2.001720

Ross Arguedas, A. (2025). How audiences think about news personalisation in the AI era. Reuters Institute for the Study of Journalism. https://reutersinstitute.politics.ox.ac.uk/digital-news-report/2025/how-audiences-think-about-news-personalisation-ai-era

Savoldi, B., Bastings, J., Bentivogli, L., & Vanmassenhove, E. (2025). A decade of gender bias in machine translation. Patterns6(6), 101257. https://doi.org/10.1016/j.patter.2025.101257

Schellmann, H. (2025, August 19). I Tested How Well AI Tools Work for Journalism. Columbia Journalism Review. https://www.cjr.org/analysis/i-tested-how-well-ai-tools-work-for-journalism.php

Shahmerdanova, R. (2025). Artificial Intelligence in Translation: Challenges and Opportunities. Acta Globalis Humanitatis et Linguarum2(1), 62–70. https://doi.org/10.69760/aghel.02500108

Simon, F. M. (2025). Rationalisation of the news: How AI reshapes and retools the gatekeeping processes of news organisations in the United Kingdom, United States and Germany. New Media & Society. https://doi.org/10.1177/14614448251336423

Simon, F. M., & Isaza-Ibarra, L. F. (2023). AI in the news: Reshaping the information ecosystem? https://ora.ox.ac.uk/objects/uuid:9947240c-06d3-42c2-9a23-57ff4559b63c

Simon, F. M., Nielsen, R. K., & Fletcher, R. (2025). Generative AI and news report 2025: How people think about AI’s role in journalism and society. Reuters Institute for the Study of Journalism. https://doi.org/10.60625/RISJ-5BJV-YT69

Song, Y. (2020). Ethics of journalistic translation and its implications for machine translation: A case study in the South Korean context. Babel. Revue Internationale de La Traduction / International Journal of Translation66(4–5), 829–846. https://doi.org/10.1075/babel.00188.son

Spencer, C. (2025, November 4). Inside the New Multilingual Newsrooms using GenAI for Translation. Generative AI in the Newsroom. https://generative-ai-newsroom.com/inside-the-new-multilingual-newsrooms-using-genai-for-translation-4c3b17269811 

Tokalac, S. S. (2023, November 28). A translation quality assessment by journalists for journalists. BBC News Labs. https://www.bbc.co.uk/rdnewslabs/news/multilingual-assessment

Ullmann, S. (2022). Gender Bias in Machine Translation Systems. In A. Hanemaayer (Ed.), Artificial Intelligence and Its Discontents (pp. 123–144). Springer International Publishing. https://doi.org/10.1007/978-3-030-88615-8_7

Valdez Sanabria, A., & Auyanet, S. (2025, July 17). Guarani AI: When building language tech means building community. JournalismAI. https://www.journalismai.info/blog/5fcm6ayykhqq7564kbvt9nw92wwmy9

Vo, L. T. (2025, January 10). Misinformation on TikTok: How Documented Examined Hundreds of

Videos in Different Languages. Global Investigative Journalism Network. https://gijn.org/stories/tiktok-misinformation-how-documented-translated-hundreds-videos/

W3Techs. (n.d.). Usage Statistics of Content Languages for Websites, October 2025. Retrieved October 22, 2025, from https://w3techs.com/technologies/overview/content_language

Wang, H. (2022). Short Sequence Chinese-English Machine Translation Based on Generative Adversarial Networks of Emotion. Computational Intelligence and Neuroscience2022, 1–10. https://doi.org/10.1155/2022/3385477

Wolfe, R., Braffort, A., Efthimiou, E., Fotinea, E., Hanke, T., & Shterionov, D. (2025). Special issue on sign language translation and avatar technology. Universal Access in the Information Society24(1), 1–3. https://doi.org/10.1007/s10209-023-01014-w

Yan, J., Yan, P., Chen, Y., Li, J., Zhu, X., & Zhang, Y. (2024). Benchmarking GPT-4 against Human Translators: A Comprehensive Evaluation Across Languages, Domains, and Expertise Levels. arXiv. https://doi.org/10.48550/ARXIV.2411.13775

Appendix

Works referenced for AI transcription and translation

PaperFocusScope
Alonso Jiménez & Rosado, 2024TranslationCreation of 68 Spanish-language political articles using ChatGPT-3 and translated into English.
Asi et al., 2024TranslationExamination of how ChatGPT 3.5 and DeepL compare to student translators across six texts.
Beckett & Yaseen, 2023BothSurvey of 105 news organizations from 46 countries as well as interviews with journalists and newsroom staff.
Canavilhas, 2022TranslationCase study of the “A European Perspective” project with further analysis of 54 news items from the website RTP- Rádio Televisão Portuguesa.
Chan et al., 2022TranscriptionExamination of the accuracy of Otter.ai across 24 World Englishes (= 1,227 recordings).
Chang et al., 2024TranscriptionComparison of the accuracy of semi-supervised learning models on Mainstream American English (3.33 hours of recordings) and African American Vernacular English using (19.39 hours of recordings).
Court & Elsner, 2024TranslationTranslation experiments with 50 pairs of Spanish-Quechua (Indigenous Peruvian language) using GPT-3.5 turbo, GPT-4o, Gemini 1.5 Pro and Llama 3.
De Coster et al., 2024TranslationReview on machine translation from signed to spoken languages.
Dubois et al., 2024TranscriptionExamination of the accuracy of seven transcription tools (e.g., YouTube, Facebook Video, Microsoft Stream, Zoom, BlueJeans, Webex and Google Meet) using 846 TED talk speakers (194 hours of content).
Fredrikzon, 2025OtherThought article about AI “hallucinations” and mistakes.
Frey & Llanos-Paredes, 2025TranslationExamination of U.S. data of Google Translate search data, translator job postings and local wage and employment stats from 2010 to 2023.
Ghosh & Caliskan, 2023TranslationAssessment of GPT performance of 50 occupations from English to Bengali, Farsi, Malay, Tagalog, Thai and Turkish. 
Gondwe, 2025TranslationCase study in Tanzania of 19 news organizations and interviews with 38 news editors.
Guo et al., 2025TranslationComparison of AI models (Llama, Qwen and Mistral) for English, Chinese and French translations using 3,722 Wikipedia entries.
Hagar et al., 2025OtherFramework for using small language models in the newsroom
Howcroft & Gkatzia, 2022OtherOverview of natural language generation approaches for low-resource languages.
Kocmi et al., 2024TranslationResults of 11 language pair translations from 28 participants’ models in addition to 8 LLMs and 4 online translation providers.
Kocmi et al., 2025TranslationPreliminary results of 32 language pair translations from 36 participants’ models.
Koenecke et al., 2024TranscriptionAnalysis of Whisper transcription “hallucinations” in English (= 187 audio segments).
Lee, 2024TranslationOverview of machine translation technologies and how they compare to human translators.
Leiter et al., 2024TranslationConcept paper identifying key properties of machine translation metrics.
Levit et al., 2017TranscriptionDevelopment of a crowdsourcing approach that includes automatic speech recognition and human graders for building transcription data.
Moghe et al., 2025TranslationPresentation of a new accuracy metric for AI translation using 36,000+ examples across 146 language pairs.
Moneus & Sahari, 2024TranslationComparison of 10 professional translators with three AI tools (ChatSonic, Bing Chat, and ChatGPT-4) on six legal texts in Arabic and English.
Munoriyarwa et al., 2023TranslationSemi-structured interviews with South African journalists from six news organizations.
Noll et al., 2025TranslationResults of medical experts grading translations with ChatGPT and DeepL of 120 medical terms and 180 synonyms from English to German.
Novytska et al., 2025TranslationSynthesis of research on audiovisual translation with a specific focus on subtitles.
Ojewale et al., 2025TranslationExamination of functional multi-lingual model performance for translating two datasets in English into French, Spanish, Hindi, Arabic and Yoruba.
Ojo et al., 2025TranslationDevelopment of a large-scale LLM evaluation benchmark called AfroBench which includes 15 tasks, 22 datasets and 64 indigenous African languages.
Pava et al., 2025OtherWhite paper on approaches to building data resources for “low-resource” languages.
Prates et al., 2020TranslationAssessment of gender bias using a list of occupations (n = 1,019) and translating these occupations in 14 languages into English.
Qingliang, 2024TranslationSummary of translator and AI research and potential future developments.
Ross Arguedas, 2024BothBroad study about public attitudes towards AI uses in journalism, including transcription and translation.
Savoldi et al., 2025TranslationExamination of 133 papers published between 2016 and December 2024 on the topic of gender bias in automatic (machine) translation.
Schellmann, 2025TranscriptionStudy employing four chatbots (ChatGPT-4o, Opus 4, Perplexity Pro, Gemini 2.5 Pro) to test transcription of local government meetings in Clayton County, GA; Cleveland, OH; and Long Beach, NY.
Shahmerdanova, 2025TranslationReview article of AI and translation research.
Simon, 2025BothBroad study of AI use in the journalism industry and how it impacts industry gatekeeping.
Simon & Isaza-Ibarra, 2023BothSummary of how AI is being used and integrated in the journalism industry.
Simon et al., 2025BothBroad study about the public’s attitudes towards AI in journalism.
Song, 2020TranslationExamination of 188 news stories from March 2001 to March 2019 and compared official newspaper translations to three machine translation tools (Google Translate, Papago and Kakao).
Tokalac, 2023BothStudy in which journalist-evaluators perform three tasks: (1) transcriptions in their language, (2) English translation into their language and (3) translating their language into English to test model performance.
Ullmann, 2022TranslationSummary of existing literature on gender bias in AI translation.
Wang, 2022TranslationDevelopment of a novel neural machine translation model that uses a generative adversarial network (GAN) and tests using 1M English to Chinese sentences.
Wolfe et al., 2025TranslationIntroduction to a special issue on machine translation for signed languages.
Yan et al., 2024TranslationAnalysis comparing ChatGPT-4 to three levels of human translator expertise across three language pairs: Chinese-English, Russian-English, Chinese-Hindi.

Not included: 15 resources providing background information about AI transcription and translation, most of which were news articles (Ananny & Pearce, 2025; Brandom, 2023; Caswell & Liang, 2020; Jarnow, 2017; Judah, 2025; Kahn, 2025; Langer, 2025; Newman & Cherubini, 2025; Ohumu, 2025; Okolo & Tano, 2025; Spencer, 2025; Valdez Sanabria & Auyanet, 2025; Vo, 2025; W3Techs n.d.; Breaking Language Barriers with AI, 2023).


Footnotes

  1. Mari, 2024 ↩︎
  2. Ananny & Pearce, 2025; Beckett & Yaseen, 2023; Gondwe, 2025; Kahn, 2025; Munoriyarwa et al., 2023; Simon & Isaza-Ibarra, 2023 ↩︎
  3. Qingliang; 2024; Shahmerdanova, 2025; Tokalac, 2023 ↩︎
  4. Spencer, 2025 ↩︎
  5. Court & Elsner, 2024; Kocmi et al., 2024; Kocmi et al. 2025; Pava et al., 2025; Moghe et al., 2025 ↩︎
  6. Chan et al., 2022; Chang et al., 2024 ↩︎
  7. Ghosh & Caliskan, 2023; Prates et al., 2020; Savoldi et al., 2025; Ullmann, 2022 ↩︎
  8. Fredrikzon, 2025 ↩︎
  9. Dubois et al., 2024 ↩︎
  10. Jarnow, 2017 ↩︎
  11. Frey & Llanos-Paredes, 2025 ↩︎
  12. Caswell & Liang, 2020 ↩︎
  13. Chan et al., 2022; Chang et al., 2024 ↩︎
  14. Court & Elsner, 2024; Kocmi et al., 2024; Kocmi et al. 2025 ↩︎
  15. W3Techs, n.d. ↩︎
  16. Brandom, 2023 ↩︎
  17. Orimemi, 2025 ↩︎
  18. Judah, 2025 ↩︎
  19. Ananny & Pearce, 2025; Beckett & Yaseen, 2023; Simon & Isaza-Ibarra, 2023; Canavilhas, 2022; Ohumu, 2025; Simon, 2025 ↩︎
  20. Langer, 2025 ↩︎
  21. Canavilhas, 2022 ↩︎
  22. Ohumu, 2025. See Vo 2025 for another example. ↩︎
  23. Valdez Sanabria & Auyanet, 2025 ↩︎
  24. Munoriyarwa et al., 2023 ↩︎
  25. Beckett & Yaseen, 2023; Munoriyarwa et al., 2023; Kahn, 2025 ↩︎
  26. Gondwe, 2025 ↩︎
  27. Tokalac, 2023 ↩︎
  28. Shahmerdanova, 2025 ↩︎
  29. Newman & Cherubini, 2025; Ross Arguedas, 2025 ↩︎
  30. Ross Arguedas, 2025 ↩︎
  31. Ross Arguedas, 2025 ↩︎
  32. Simon et al., 2025 ↩︎
  33. Breaking Language Barriers with AI: Maximizing Accuracy and Efficiency with Machine Translation Technology, 2023 ↩︎
  34. Alonso Jiménez & Rosado, 2024 ↩︎
  35. Noll et al., 2025 ↩︎
  36. Moneus & Sahari, 2024 ↩︎
  37. Asi et al., 2024 ↩︎
  38. Langer, 2025; Schellmann, 2025 ↩︎
  39. Guo et al., 2025; Kocmi et al., 2024; Kocmi et al. 2025; Levit et al., 2017; Moghe et al. 2025; Wang, 2022 ↩︎
  40. Lee, 2024; Novytska et al., 2025; Yan et al., 2024 ↩︎
  41. Schellmann, 2025; Song, 2020 ↩︎
  42. Leiter et al., 2024 ↩︎
  43. Song, 2020 ↩︎
  44. Song, 2020 ↩︎
  45. Lee, 2024 ↩︎
  46. Lee, 2024 ↩︎
  47. Savoldi et al., 2025; Ullmann, 2022 ↩︎
  48. Ghosh & Caliskan, 2023; Prates et al., 2020 ↩︎
  49. Savoldi et al., 2025 ↩︎
  50. Koenecke et al., 2024; Schellmann, 2025 ↩︎
  51. Chan et al., 2022 ↩︎
  52. Chang et al., 2024 ↩︎
  53. Ojewale et al., 2025; Ojo et al., 2025; Pava et al., 2025 ↩︎
  54. Guo et al., 2025 ↩︎
  55. Beckett & Yaseen, 2023 ↩︎
  56. Wolfe et al., 2025 ↩︎
  57. De Coster et al., 2024 ↩︎
  58. De Coster et al., 2024 ↩︎
  59. How best to evaluate translation is largely out of scope, but see Leiter 2024. ↩︎
  60. Savoldi et al., 2025 ↩︎
  61. Howcroft & Gkatzia, 2022 ↩︎
  62. Qingliang, 2024 ↩︎
  63. Hagar et al., 2025 ↩︎
  64. Simon, 2024 ↩︎