REVIEW

Eur. J. Cult. Manag. Policy, 09 September 2025

Volume 15 - 2025 | https://doi.org/10.3389/ejcmp.2025.14009

Artificial intelligence for cultural heritage research: the challenges in UK copyright law and policy

  • 1. Brunel Law School, Brunel University of London, Uxbridge, United Kingdom

  • 2. City Law School, City St George’s, University of London, London, United Kingdom

Article metrics

751

Views

115

Downloads

Abstract

Artificial intelligence (AI) is revolutionising our relationship with cultural heritage, enhancing access to, engagement with and preservation of collections and heritage sites. AI is also being used as a valuable research tool in the context of heritage collections. However, as materials protected by copyright may be used in AI development, training and use, copyright law can become an obstacle to important AI deployments in the heritage sector, an area which is currently understudied from the United Kingdom (UK) perspective. This article explores the intricate interplay between cultural heritage, AI and copyright law, demonstrating the main copyright law and policy challenges facing cultural heritage professionals and researchers in using AI in the UK for heritage research. It highlights the complexity and uncertainties as regards the current Text and Data Mining exception in the UK Copyright, Designs and Patents Act 1988 (UK CDPA), emphasising the need for an improved legal framework that balances copyright protection with the benefits of AI for cultural heritage research and management. It also reveals the underrepresentation of the heritage sector in AI regulation and copyright policy discussions in the UK. This exploration underscores the imperative for an inclusive policy dialogue that considers the perspectives and evidence of the cultural heritage sector in its full breadth and diversity (including related researchers) in shaping copyright law reform and AI regulation, and for further research to be carried out in this field.

Introduction

The adoption of new technologies in the cultural heritage sector often raises issues around copyright law, particularly whether current legislation is fit for purpose. Historically, copyright law has constantly adapted to address technological advances (Gervais et al., 2024, p. 28). With the increasing use of Artificial Intelligence (AI) across sectors, copyright laws are once again under scrutiny. However, there is a specific need to address the unique copyright challenges AI presents to the UK heritage sector, which are currently underexplored.1

Cultural heritage institutions, entrusted with the preservation of the legacies of humankind, are navigating complex legal terrain in their pursuit of modernisation. AI is revolutionising the forms of usage of cultural heritage materials, including as a tool for preservation, research, and dissemination. The volumes of data now available create traction to digital humanities approaches, facilitating the processing of data in scale “to yield new analytical insights that were not possible at the level of individual documents and sources” (Ahnert et al., 2023). While the importance of digital heritage collections as AI data2 is being discussed in current academic and sector-specific debates, there is scope for investigation of related copyright challenges in the UK.3 Since copyright materials may be used in AI development, training and use, the current issues and uncertainties in copyright law may pose obstacles and ultimately hinder important AI applications in the traditionally risk-averse heritage sector. Although these issues are being explored in Europe,4 the UK lags behind, making this a crucial and novel area of investigation in a jurisdiction that can influence legislations worldwide. This paper therefore aims to contribute to current underdeveloped UK policy discussions on AI and copyright with such a heritage perspective, by analysing the particular shortcomings of, and proposing solutions for, the UK Text and Data Mining (TDM) exception for non-commercial research, which we argue is not currently fit for purpose when applied specifically to heritage research.

Section Artificial intelligence and cultural heritage highlights the importance of AI in the heritage sector with UK examples. Section Copyright challenges arising from AI uses in the cultural heritage sector in the UK examines key issues in UK copyright law, particularly the TDM exception, crucial for AI development and knowledge discovery in heritage research. Section Current copyright and AI policy and regulation efforts in the UK: scope for further heritage sector participation addresses the state of current policy discussions on copyright and AI regulation, emphasising the need for greater involvement and evidence from the cultural heritage sector.

Artificial intelligence and cultural heritage

Although there is no generally accepted definition for AI (Sheikh et al., 2023; Guadamuz, 2024), for the purpose of this paper it refers to computer programmes that perform tasks associated with human intelligence, such as language comprehension, image recognition, and learning from experience (Pavis, 2023). AI is the umbrella term including machine learning, whereby a computer programme is taught to identify patterns in data and apply this knowledge to new data (Drexl et al., 2019; Iglesias Portela et al., 2019).

With libraries, archives and museum collections increasingly digitised, AI is revolutionising heritage practice and research. Pavis (2023), p. 7 identifies three overlapping heritage areas of AI application when used legally and ethically: “heritage and collections management, use and research; visitor experience; and general business operations and management”. AI enhances engagement with cultural heritage collections by producing innovative documenting, managing, and visiting tools (Bordoni et al., 2016). Tools and techniques of AI also “make it possible to build the fine instruments that augment the day-to-day work of librarians and the researchers they serve” (Coleman, 2019, p. iii). Digital humanities scholar Professor Jane Winters explains that “AI is essential for cleaning, exploring, and visualizing archival and special collections, especially with born-digital archives” (Patton 2024). According to the European Regions Research and Innovation Network (ERRIN), “AI is set to revolutionise cultural heritage by enhancing the preservation, restoration, and accessibility of artefacts and historical sites”, aiding conservation efforts by detecting early signs of deterioration (ERRIN 2024). Recent advances in AI have led to innovative heritage applications, e.g., in archaeology (Aslan et al., 2020; Chetouani et al., 2020; Ostertag and Beurton-Aimar, 2020); document digitalization and character recognition (Nguyen et al., 2020); discovery, description, classification, and preservation (Girbacia, 2024); and reconstruction of heritage buildings and sites (Arzomand et al., 2024), including in the context of the Notre Dame cathedral fire (Pasikowska-Schnass and Lim, 2023).

In the UK, there is an increasing interest in understanding the role of AI for the heritage sector. In a 2023 survey involving 154 members of the UK Heritage Pulse, respondents recognised the transformative potential of AI but raised concerns about challenges, including skills and funding shortages (Cantrill-Fenwick, 2023). 24% of respondents were aware of their organisation using AI, with “the highest proportion (41%) saying it was used to help with planning for the future (such as generating ideas), closely followed by marketing including generating content, editing, programming adverts and analysing or interpreting data” (Cantrill-Fenwick, 2023). Half of respondents had considered how AI may change how people interact with their organisation, including to research their organisation or area of heritage (30%) and to reproduce copyrighted materials (28%) (Cantrill-Fenwick, 2023).

It is unclear how the UK Heritage Pulse survey defined AI, and respondents may have focused more on commercial generative AI tools. The term “AI” is often perceived as a buzzword synonymous to generative AI tools such as OpenAI’s Chat-GPT and Midjourney, which dominate current policy discussions (as we will see in Section Current copyright and AI policy and regulation efforts in the UK: scope for further heritage sector participation), when in fact there is much more to AI beyond such commercial generative AI tools. Kretschmer et al. (2024) p. 119–121 distinguish pre-trained machine learning models (such as OpenAI’s), and more bespoke researcher-trained models (which researchers may prefer for more accuracy and reduced bias), as well as the difference between Natural Language Understanding algorithms and Natural Language Generation algorithms (also known as Generative AI). Understanding how bespoke and non-commercial AI models are used as tools for research or collections management in heritage contexts is crucial, as they present different copyright law implications - particularly regarding “non-commercial research” - compared to commercial generative AI tools, as we will see in Section Copyright challenges arising from AI uses in the cultural heritage sector in the UK.

In the UK, examples of bespoke uses of AI for heritage and collections management, use and research include: “making content, information and collections easier to find; generating new insights or knowledge from existing content, information and collections; and, supporting data collection, restoration and conservation work” (Pavis, 2023, p. 9). The FloraGuard project, involving the Royal Botanic Gardens Kew, aimed to tackle illegal trade in endangered plants, and developed AI algorithms so that “the researchers could more efficiently search for and extract information relating to the illegal harvest and sale of endangered plants, from a range of cyber hotspots.”5 The Living With Machines project, involving the British Library, investigated the impact of technology on the lives of ordinary people during the Industrial Revolution, using machine learning to analyse data at scale.6 The Transforming Collections project, involving the Tate, “combines critical art historical and museological research with participatory interactive machine learning design to surface suppressed histories, amplify marginalised voices and re-evaluate artists and artworks ignored or sidelined by dominant narratives.”7

While AI presents valuable opportunities for heritage research, it also poses challenges, particularly in ensuring responsible use. Pavis (2023) identifies the following risks: bias, discrimination and misinformation; lack of transparency and traceability; undervalued contribution; human labour replacement; privacy, copyright and other rights infringement. This paper focuses on the copyright risks (and related issues such as transparency and bias) applicable to AI heritage research in the UK, a discussion which while emerging8 remains underexplored.

Testing AI in heritage settings offers an important opportunity for assessing the benefits and risks of the technologies at stake. Museums, in particular, play a key role in critically engaging with AI and its impact, “by being open and accountable about what technologies they are using, and through public programs and contemporary collecting to develop visitor literacy around AI” (Murphy and Villaespesa, 2020, p. 3). Heritage stakeholders (including researchers in such contexts) are thus ideally positioned to promote meaningful discussions on AI and the law, and to provide insights that can shape better policies and regulation, balancing the sector’s AI use for cultural and societal benefits with respecting the core rights of creators. Therefore, it is important to understand what specific copyright risks and issues face heritage stakeholders (including researchers) when using AI.

Copyright challenges arising from AI uses in the cultural heritage sector in the UK

The basics of UK copyright law, and how it applies to AI and heritage collections: copyright works, owners and rights

Training AI systems, including constructing corpora for machine learning, often involves using data protected by copyright such as texts, images, and videos (Iglesias Portela et al., 2019; Kretschmer et al., 2024 p. 110).9

Ahnert et al. (2023) highlight copyright and contractual challenges in using digitised historical collections in the context of the Living With Machines project. They refer specifically to copyright legislation and digitisation funding in the UK (and resulting contractual agreements), creating a “mixed-rights” landscape resulting in access issues. They emphasise a “patchwork approach to digitisation”, which can result in unrepresentative collections, potentially biasing AI research. Resolving these challenges requires systemic changes in funding priorities and national copyright policy, project-level solutions being insufficient (Ahnert et al., 2023, p. 23). Given that the copyright status of materials constitutes one of the selection criteria for decisions on digitisation (Tolfo et al., 2023, p. 31; Beelen et al., 2023, p. 4), we believe copyright impacts on issues of bias, potentially affecting the quality of AI research. To explain this connection, we need to analyse basic copyright concepts.

Copyright law protects author’s creations, categorised as “works”, which in the UK are: “(a) original literary, dramatic, musical or artistic works, (b) sound recordings, films or broadcasts, and (c) the typographical arrangement of published editions.”10 Beyond the concept of “works”, UK copyright law also protects performances11 and sui generis database rights12 (different from the copyright granted to authors of original databases). Sui generis database rights protect against unauthorised extraction and reutilisation of substantial amounts of a database that required substantial investment in obtaining, verifying or presenting the data (Kretschmer et al., 2024, p. 124). As such, raw data or metadata (for example, dates, editions and ISBN) that may fall short of copyright protection as works could still be eligible to some degree of protection under the sui generis database right. Metadata such as reviews and summaries may also be eligible for copyright protection as works.

Copyright works have specific legal definitions, which do not necessarily resonate with non-specialised audiences. A “literary work” is not only a work of literature; it means any work “which is written, spoken or sung”,13 including books, journal articles, and other writings such as pamphlets, lectures,14 and even computer programs and databases.15 “Artistic works”, some protected irrespective of their artistic quality,16 cover not only traditional visual artworks such as paintings, but also graphic works such as maps.17 Translations, adaptations and musical arrangements are protected as works, and so are collections such as encyclopaedias and anthologies which, by reason of the selection and arrangement of their contents, constitute intellectual creations.18 This shows the wide range of materials embraced by copyright that heritage organisations may have in their collections or engage with in their activities.

To qualify for copyright protection, literary, dramatic, musical and artistic works must meet the criterion of originality, i.e., reflecting the “author’s own intellectual creation.”19 Copyright20 protection applies only to expressions, and not to ideas, procedures, operational methods or mathematical concepts as such, nor to news and facts.21 But even though such data or information are not protected (Kretschmer et al., 2024; Guadamuz, 2024), the form in which it has been expressed (as a news article and the typographical arrangement of the newspaper) will likely be.22 Therefore, heritage collection items will very likely be considered “original” and expressive works protected by copyright.

The question of whether AI-generated works qualify for copyright protection or a similar right, and who should own it, remains debated (Hugenholtz and Quintais, 2021; Ramalho, 2017; Guadamuz, 2017). But this article has a different focus, i.e., whether copyright is infringed by the use of protected materials for heritage research utilising AI.23

The first owner of copyright is usually the author, i.e., the person who creates the work.24 Objects such as newspapers, books or CDs may thus embed multiple works with different copyright owners (Bently et al., 2022, p. 62, 136–140). Ownership can also rest with employers,25 or be assigned to third parties.26 In many cases, information about the owner becomes lost, leading to the orphan works issue, which is significant for cultural heritage institutions (Korn, 2009). In a rights clearance simulation study, the British Library has estimated that over 40% of the potentially in-copyright works were orphan works (Stratton, 2011; Rosati, 2019).

Copyright protection generally lasts for the author’s life plus 70 years,27 after which the work enters the public domain. Until then, permissions are needed if the intended use falls under the exclusive rights of copyright owners.28 Particularly relevant are the rights to copy the work29 (reproduction right), to communicate a work to the public30 and to make an adaptation.31 Infringement occurs when someone, without permission, engages in such restricted acts in relation to a substantial part (qualitatively, rather than quantitatively: Rosati, 2019, p. 206) of the work.32 Reproducing or communicating to the public even a small part of a work could infringe copyright, if that part represents the originality of the work, i.e., the “author’s own intellectual creation” for literary, dramatic, musical or artistic works.33 For entrepreneurial works (i.e., sound recordings, film recordings, broadcasts and typographical arrangements of published editions), copying any recognisable part may infringe copyright.34

These exclusive rights cover actions which are commonly part of AI development, training and usage, such as creating digital copies of works, further copies of already digitised works, sharing copies for verification, or showing in AI outputs parts of works. Text and data mining (necessary for AI development and training) can implicate copyright if it involves making copies of works or breaching publishers’ licences (Bently et al., 2022, p. 260).35 In some cases, therefore, creation of datasets and model training may infringe author’s exclusive rights (Guadamuz, 2024). As such, these activities may require permission, unless an exception applies (which we analyse in Subsection Copyright exceptions: text and data mining and beyond).

Copyright can thus hinder AI use in heritage research and management. Cultural heritage institutions house various types of copyright protected materials, which when digitised (an act which in principle requires copyright owner’s permission) become important data for AI. Holding physical works does not grant heritage institutions copyright ownership (Torremans, 2022), unless it was explicitly assigned to them (Heritage Digital, 2021). Furthermore, digitised collections (even of works in the public domain) may be subject to contractual restrictions, if digitisation was performed by a third party (Ahnert et al., 2023, p. 27). Other restrictions include agreements with donors, heirs, loaning institutions, researcher agreements and website policies (Wallace, 2023).

The difficulties of clearing copyright for heritage collection digitisation is a topic of extensive academic and sector discussion. Copyright clearance can be costly and time consuming, to the point that heritage organisations may not be able to carry it out. Stratton (2011) explains that “rights clearance of works on an individual, item by item basis is unworkable in the context of mass digitisation”. The lack of public funding results in digitisation of national assets to be undertaken by private companies that place limits on access (Ahnert et al., 2023, p. 5). Those unable to afford a licence to access materials for AI use may only be able to access cheaper and less reliable data, possibly resulting in biased AI results.36Margoni and Kretschmer (2022) p. 687 similarly highlight the risks involved in purchasing cheaper data or pre-trained models, which can result in biases and inaccuracies. Where there is uncertainty on the legality of scraping data, there is risk of copyright infringement, a situation which favours the development of “foundation” AI models “developed by the few large tech corporations which have access to the necessary data and can afford the uncertainties and costs of potential copyright litigation” (Kretschmer et al., 2024, p. 125; Margoni et al., 2022). This situation invites the consolidation of a “techno-economic oligopoly” and unsustainable (in legal, economic, social, cultural and environmental terms) practices of “data extractivism” or “data colonialism” (Kretschmer et al., 2024, p. 125; Couldry and Mejias, 2019).

Wallace (Wallace, 2022a; Wallace, 2022b) highlights the risk-averse attitudes adopted in heritage management, digitisation, and online dissemination due to copyright complexities, and the immense resources required for copyright clearance, including staff and financial, which are usually limited in the heritage sector. Additionally, in light of the heritage sector’s role as custodians of culture and knowledge, the ethical, accuracy and reliability issues in AI-mediated research will discourage the use of lower-quality data by heritage professionals and researchers. In cases of legal uncertainty, projects will likely either be regulated under licensing terms for institutions that can afford them, or abandoned if licensing is unaffordable or impractical.

The biases, omissions and inaccuracies that may be generated by datasets produced or models trained on the basis of copyright permissions, particularly in heritage contexts, require further investigation. As highlighted in recent government consultation, “works being mined can be restricted by curatorial bias, only mining what is available under licence, rather than what would be most useful for the purposes of the AI.”37 Choosing the most affordable or easily accessible dataset may also not be in the best interest of the research question. The Living With Machines team addressed the challenges in timely obtaining digital data, noting this could have pushed them to work with more permissively available datasets, as pursuing their preferred dataset as determined by their research agenda (which they decided to do) required complex negotiations and legal expertise to navigate copyright and licensing issues; the team noted that the “current time-frames of this process are not compatible with publicly funded projects, which are by necessity time-limited in nature and assume a quick start from day one” (Ahnert et al., 2023, p. 30–31). These kinds of resources, including of time and expertise, may not be available to other projects or institutions.

We believe that this encapsulates a key issue, in that the copyright status of the dataset will dictate what research can be made, limiting researchers’ ability to select the most appropriate datasets to answer their questions. This imposes a bar to research engaging with more contemporary themes (such as digital humanities research on late nineteenth/early twentieth century onwards), as materials are more likely to be in copyright and thus require clearance (Ahnert et al., 2023, p. 31). The quality of the input data is crucial to the machine learning process; researchers need to identify the necessary data aligning with the research purpose and might prefer to train their own models, as pre-trained embeddings rely on easily found text material leading to bias problems (Kretschmer et al., 2024, p. 111–120). Furthermore, as explains Levendowski, relying only on public domain works can also be problematic, as most of such works were published when the “literary canon” was “wealthier, whiter, and more Western”, excluding marginalised voices such as those of black, women, and LGBTQ authors–any AI system trained using such datasets would thus reflect the biases of that time (Levendowski, 2018. See also Guadamuz, 2024).

It is not in the public interest that copyright law should affect the quality of research by imposing excessive barriers to data access. Copyright exceptions exist precisely to support activities in the public interest, such as access to culture, education, research, and freedom of expression (Rendas, 2018; Geiger and Izyumenko, 2020; Jacques, 2021; Vuckovic et al., 2021; Bently et al., 2022). Our analysis will focus on examining these copyright exceptions to determine whether they are suitable for AI research in heritage contexts.

Copyright exceptions: text and data mining and beyond

Copyright exceptions often offer a clearer route for users requiring legal certainty, by allowing certain uses without the need for permissions.38 Thus, exceptions assume a particularly relevant role in the risk-averse heritage sector (Hudson, 2020). Many copyright exceptions apply to activities of heritage professionals and researchers in those contexts.39 Particularly relevant to AI research in the heritage sector is the UK exception for text and data analysis for non-commercial research40 (hereafter “text and data mining” or “TDM”).41 This exception applies to copyright works, though an equivalent exception is applicable to performances.42 TDM is arguably also possible under the non-commercial research exception to the sui generis database right,43 but further clarity on this point is needed.

The UK TDM exception, introduced in 2014, aimed to modernise UK copyright law for the digital age.44 While it predates current AI discussions, admittedly the exception can encompass AI uses.45 It was intended to support diverse research by being technologically neutral and not limited to academic papers or STEM fields.46 Even though we believe this exception in principle supports AI heritage research, its application remains unclear and may be unsuitable in practice.47 As noted in the Living With Machines project, the exception has proved difficult to use in innovative research involving diverse datasets, limiting its effectiveness in supporting national priority research in the intersection of technology and culture (Ahnert et al., 2023, p. 28). The British Library response to the 2022 UK Intellectual Property Office (UK IPO) Consultation noted that potential TDM projects have been abandoned on multiple occasions due to the inadequacy of the current s.29A exception, requiring researchers to seek permissions from rights holders—an expensive and resource-intensive process, especially given the large volumes of content required for TDM.48

We advocate for addressing these issues through copyright exceptions, rather than licensing, as a more effective approach to promote public interest activities.49 This approach would better address the resource constraints faced by the heritage sector and help mitigate the “curatorial bias” mentioned in Section The basics of UK copyright law, and how it applies to AI and heritage collections: copyright works, owners and rights.

Copies for computational text and data analysis: does it cover AI?

The UK government defines TDM as the use of automated techniques to analyse text and data for patterns, trends, and insights, which typically involves the copying of works.50 The introduction of a TDM exception for non-commercial research allows this copying without infringing copyright,51 aiming to be technologically neutral and broadly applicable.52

Recent government consultations explored the applicability of the TDM exception to AI. While copyright owners argued the exception does not cover AI, others defended TDM as integral to AI development.37 The government recognised TDM’s role in AI systems used in research and by cultural heritage organisations.53 TDM techniques are important in AI development, using the same algorithms to discover patterns in data (Rosati, 2019; Strowel and Ducato, 2021). Given the exception’s technological neutrality, we believe that the concept of computational text and data analysis in UK law54 comfortably encompasses AI development, training and use in heritage research contexts.

The types of copies allowed by this exception include those of a non-temporary nature. The “making of temporary copies” is specifically allowed in another exception,55 and the TDM exception makes no reference to allowing only temporary copies. Permanent copies of training data can be “fundamental to the replicability of machine learning results” (Kretschmer et al., 2024, p. 126). Having access to training data enables the detection of mistakes, omissions, or biases, ensuring greater transparency and accountability in decision-making (Margoni and Kretschmer, 2022, p. 688; Levendowski, 2018; Bonadio et al., 2022).

While UK law is silent on how long copies can be retained, the TDM exception for scientific research in article 3 of the EU Digital Single Market (DSM) Directive56 states that copies should be securely stored and may be retained, including for the verification of research results,57 and that rightsholders, research organisations and cultural heritage institutions should be encouraged by member states to define commonly agreed best practices on this point.58 Though the UK is not bound by the DSM Directive,59 these requirements should be observed in the UK as best practice.

In the absence of statutory clarity, evidence is starting to emerge of arrangements imposing time limits for data to be kept in storage (of 2 years, for example, see Ahnert et al., 2023, p. 28). We suggest UK IPO Guidance to clarify reasonable time periods for copy retention, which should be determined in consultation with heritage and research stakeholders, on the basis of specific verification needs and storage feasibility. Engagement with such stakeholders could also help establish appropriate standards for copy retention. Ultimately, it may be necessary to update the text of the UK TDM exception to provide clarity on this issue.

Furthermore, while the text of article 3, along with recitals 15 and 38 of the Directive 2019/790 (DSM Directive) (2019), represents progress in promoting transparency and accountability in algorithmic decision-making tools, there remains uncertainty regarding researchers’ ability to grant access to stored copies for verification, as this may constitute an act of communication to the public, not exempted in the EU TDM exceptions (Margoni and Kretschmer, 2022, p. 697). We believe this issue should also be clarified in UK law, which similarly limits the TDM exception to the right to copy (or reproduction right). If we understand that copies can be retained for research verification, the exception should be interpreted to permit this. One way this could be achieved is by interpreting the concept of “public” (in “communication to the public”) as not including individuals seeking access for research verification under specific circumstances. Further research and policy work are required on this, involving extensive engagement with heritage and research stakeholders in the UK.

The UK TDM exception only allows the making of copies and not the dissemination (i.e., “communication to the public”) of copyright works. This limitation, as noted in the Technical Review of Draft Legislation on Copyright Exceptions (UK Intellectual Property Office, 2014a) means that TDM cannot result in making full copyright works publicly available. Margoni and Kretschmer (2022) argue that confining the exception to reproduction is too restrictive. That said, other exceptions such as the quotation exception,60 may permit the communication to the public of excerpts of a work. Government guidance confirms that if parts of a work need to be quoted in TDM research outputs, the quotation exception can apply, so long as copyright laws are followed.61

A last point concerns the types of “copies” the legislation refers to, specifically whether it covers the digitisation of physical materials, or only applies to already digitised works. This is relevant to heritage collections, which contain vast amounts of undigitised materials of value to TDM research. The UK exception does not specify the format of the work to be copied, arguably allowing for the possibility of making digital copies of physical works as long as there is “lawful access” to the work. We therefore find that, ultimately, the analysis of whether the law allows digitisation of physical works for TDM conflates with the concept of lawful access, which we analyse below.

“Lawful access” and the effectiveness of the exception

The concept of lawful access within the TDM exception requires clarification. Key issues include whether physical access to works, rather than digital access, qualifies for TDM use, and how contractual terms may override the exception. We will also look at how Technological Protection Measures (TPMs) may render the exception ineffective. Lastly, we will analyse the prohibition of transferring copies as a defining aspect of lawful access, and how unsuitable this is to contemporary forms of collaborative digital heritage research.

  • (a) a lawful “digital” access?

The meaning of lawful access in the TDM exception is unclear, particularly regarding whether it applies solely to digital access or includes access to physical works, thereby allowing digitisation. We believe this point is especially relevant in heritage research, where collections often include physical works, in addition to the digital repositories typically used in academic research. While current scholarship on TDM and copyright focus on electronic data, they rarely address the digitisation of physical works or the specific needs of heritage research.

The UK exception does not define lawful access. As such, it does not qualify access as only digital access. This suggests that the creation of copies through the digitisation of a physical object, to which the copier has lawful access through having lawfully acquired it, should be allowed by the exception. In the case of a heritage organisation, we believe lawful access would include having legal ownership of objects through lawful acquisition (including bequests, field collection, gifts, purchases, exchanges and treasure).62 This interpretation aligns with the outcome of the Google Books case in the US, where digitisation of physical books owned by libraries was considered fair use for TDM purposes.63

US fair use is a far more flexible and adaptable exception to copyright than those in the UK and EU, where the tradition of a strict interpretation of exceptions is followed,64 as long as the effectiveness of the exception is not compromised and its purpose is achieved.65 We therefore once again delve into the UK law makers’ intentions and justifications for the introduction of the TDM exception, to assess whether creating digital copies of physical works for TDM purposes would fall under the exception as “lawful access.”

The UK Government’s 2012 response to the public consultation on the introduction of new exceptions, including TDM, left open the meaning of a prior “right to access”, wording it as “under a licence or otherwise” and providing as examples “subscription to a scientific journal or having copies of papers published under a Creative Commons licence”.66 In our view, this allows any kind of prior right to access, digital and physical alike. What appears important is that the exception should not undermine the publishers “control over IT systems or commercial exploitation”, and how unlikely it was that TDM copying would substitute the works.67

The UK IPO Impact Assessment stated that: “data analytics methods extract data from existing electronic information.”68 It further adds that “Copyright is not intended to prevent use of facts for research, and this exception is intended to remove the block on reuse of materials for research using these tools.”69 The IPO also noted the exception would apply “in cases where access to articles and/or data has already been gained (e.g., by subscription).”70 It appears that the IPO’s position focused on the public benefit in “more and higher quality research”, and a lowering of costs and simplification of procedures for researchers, while also offering “incentives” and “security” for publishers, and a protection against undermining their primary market for access to works.71

UK IPO Guidance explains that the new TDM exception allows “researchers to make copies of any copyright material for the purpose of computational analysis if they already have the right to read the work (that is, work that they have “lawful access” to).72 The Guidance highlights that researchers would “still have to buy subscriptions to access material; this could be from many sources including academic publishers.”73 The Guidance FAQ finally defines lawful access as covering cases: “where researchers have the legal right to access a copyright work to read it; examples could include paying for a subscription to a journal or database or material published under open licences including Creative Commons and Open Government Licences.”74

While equating the new TDM exception concept of lawful access with cases where researchers would have already had the right to read the work including through purchasing subscriptions, the guidance does not specify that this only applies to the right to read the work digitally or to digital subscriptions, thus possibly allowing physical access as lawful access. We note that journal subscriptions can be print or electronic.75

Whether lawful access should mean access to digital platforms or repositories, or if it includes physical access to works is neither sufficiently discussed in academic literature, nor expressly addressed in legislation, government or sector guidance. We believe that the wording used in policy papers and UK IPO guidance is open enough to allow an interpretation that lawful access can include physical access. We should also note Recitals 10 and 14 of the Directive 2019/790 (DSM Directive) (2019), openly defining “lawful access”, adding “other lawful means” of access.

If the legislator had intended to delimitate the mode of access to digital access, it would have done so expressly in the statutory text. We believe that the lawful acquisition of the material (be it physical or digital) by the heritage organisation, combined with the specific focus of the exception on text and data analysis for non-commercial research, offer sufficient safeguards to the commercial interests of copyright holders. This is in line with the “three-step test” in international copyright law, which dictates that exceptions should be confined to “certain special cases which do not conflict with a normal exploitation of the work and do not unreasonably prejudice the legitimate interests of the right holder.”76

We therefore argue that “lawful access” should include either physical or digital access, thus allowing for the digitisation of physical materials for non-commercial TDM research. This solution meets the exception’s objective, while allowing space for TDM to adapt in time and embrace new uses. Allowing the digitisation of physical works may help remedy issues of bias and gaps in digital collections,

77

thereby improving the quality of AI research and systems development.

  • (b) Lawful access, no contractual override of the exception, no transfer of copies and TPMs

The TDM exception in UK law contains an important caveat, denying any contractual override of the exception.78 However, in practice, the uncertainty about what “lawful access” means and the scope to regulate it through licences can undermine the exception’s effectiveness. Lawful access is a paradoxical requirement, which can subvert the innovative aims of the TDM exception (Kretschmer et al., 2024, p. 131) and represents a restriction on the enjoyment of the exception if interpreted to always depend on the terms of a contract or licence (Synodinou, 2019, p. 27). This makes the exception subject to private ordering (Geiger et al., 2019; European Copyright Society, 2017; Margoni and Kretschmer, 2022, p. 697), as “the exception can effectively be denied to certain users by a right holder who refuses to grant ‘lawful access’ to works or who grants such access on a conditional basis only” (European Copyright Society, 2017, p. 4). It can also reflect on access licensing pricing, by allowing publishers to price TDM into subscriptions fees, which many organisations will not be able to acquire (European Copyright Society, 2017, p. 4). Geiger et al. (2019) and Bottis et al. (2019) argue that lawful access for TDM increases the cost of research, potentially pricing out underfunded institutions, exacerbating existing inequities in scientific and technological development. Also, “lawful access” can “severely impair other fundamental rights such as the freedom of information and to inform the public about specific undisclosed but publicly relevant issues” (Margoni and Kretschmer, 2022, p. 697. See also Dusollier, 2020, p. 987).

We believe uncertainties around the concept of “lawful access” are problematic within risk-averse heritage research contexts. The European Copyright Society (2017) p. 4 explains legal uncertainty prevents risk-avoiding beneficiaries from relying on exceptions, noting also that the benefit of exceptions, particularly based on fundamental rights or public interests such as research exceptions, should not be dependent on market decisions of copyright owners.

Furthermore, the TDM exception prohibits transferring copies of works to any other person unless authorised by the copyright owner.79 This restriction is unclear80 and misaligned with the collaborative nature of modern research in heritage contexts, where organisations holding relevant data often partner with those possessing the computational resources and expertise needed for AI projects. Research practices have evolved significantly since the exception’s introduction in 2014, and this requirement has not kept pace, potentially forcing research collaborations to rely on licensing agreements for data transfer or storage with repositories. The limitations of “lawful access” became evident in the Living With Machines project, where a key challenge “was the transfer of data between spaces, from the data owner (BL) to the owner of the infrastructure (The Alan Turing Institute)”, and researchers had to negotiate a bespoke agreement with a commercial partner (FindMyPast) to access digitised data from the British Library holdings, even though the British Library were project partners (Ahnert et al., 2023, p. 28). Even materials that are out of copyright may be subject to contractual restrictions if they have been digitised by a third party, as seen in the case of the British Newspaper Archive, which was digitised by FindMyPast (Ahnert et al., 2023, p. 27).

Kretschmer et al. (2024) p. 131–132 argue that the lack of clarity around the conditions for copying in machine learning contexts is likely to affect scientific research, highlighting that “lawful access” terms will dictate what research is possible and at what cost, meaning that research which should benefit from the copyright exception would instead be governed by licensing agreements. In such cases, rightsholders could threaten to withdraw access from institutions. To illustrate the above, Kretschmer et al. provide an example involving research and heritage organisations with lawful access to broadcasting or newspaper digital archives: rights holders could choose to license these materials to AI companies for machine learning, potentially threatening to revoke access to archives in settings where they are utilised for research serving public interests. Kretschmer et al. (2024) p. 127, also argue that the power asymmetry of certain markets, combined with the techno-legal uncertainty that they discuss, may operate a de facto circumvention of the mandatory nature of the TDM exception in art 3 (Directive 2019/790 (DSM Directive), 2019) through access condition practices using Application Programming Interfaces (APIs).

Even though the exception cannot be contracted out, “licences may still impose conditions of access to the licensor’s computer system, for example, to maintain security or stability” (Bently et al., 2022). These access conditions may also be achieved by the implementation of TPMs.

Lawful access to a database does not permit the circumvention of TPMs put in place by the rightsholder to safeguard it. However, rightsholders’ use of TPMs must still adhere to the principle of proportionality (Recital 16 of the Directive 2019/790 (DSM Directive), 2019). Measures applied by rightsholders cannot go beyond what is necessary for the security and integrity of the networks and databases (art 3(3) Directive 2019/790 (DSM Directive), 2019). Despite this, “security controls, if too invasive, might deter or encumber legitimate TDM activities by researchers” (Dusollier, 2020, p. 296). In the UK, a remedy is available in case of TPMs “abuse”, where technological measures prevent acts that are permitted (such as copyright exceptions).81 The solution in the UK, however, is that a notice of complaint should be issued to the Secretary of State, and we have no knowledge of this remedy having been used.

Ultimately, these lawful access conditions (technological or contractual) should not undermine the effectiveness of the exception by creating excessive barriers or legal uncertainty. For this reason, the UK IPO Guidance on the TDM exception states that publishers and content providers can implement reasonable measures to maintain network security or stability, but these should not prevent or unduly restrict researchers from making necessary copies for text and data mining. Any contract terms that prevent researchers from making copies of works to which they have lawful access for TDM purposes will be unenforceable.82

We believe that “lawful access” conditions, including API and TPM practices, need further investigation to assess their impact on heritage sector research under the UK TDM exception. Additionally, we suggest revising the prohibition on transferring copies to expressly allow research collaborations and practices (such as funder requirements) that require such transfers, which could also help mitigate barriers for smaller institutions and researchers with limited resources.

  • (c) Does “lawful access” allow web scraping?

A key point of uncertainty in UK law is whether web scraping is permitted. While Recital 14 of the Directive 2019/790 (DSM Directive) (2019) clearly states that lawful access should cover freely available online content, UK law lacks clarity. Rosati (2019) argues that activities like web mining, which use data mining techniques to extract knowledge from web data, may not require authorisation from copyright holders. UK IPO Guidance states that examples of lawful access “could include” materials accessed via subscriptions or open licenses,83 which could suggest (particularly for the more risk averse) that web scraping may not be allowed unless the content is licensed as open access or depending on the terms of use of the website. UK IPO Guidance should be clearer on this point. We believe consultation with heritage sector stakeholders and researchers is needed to understand web mining practices and challenges, and how copyright law can better support heritage-related web research.

“Non-commercial research”: what of public-private collaborations and heritage management?

The TDM exception’s restriction to non-commercial research raises two key issues for AI usage in the heritage sector. First, it reflects the uncertainty on what “non-commercial” means,

84

particularly given current policy calls for heritage organisations to commercialise their digital assets to generate income (as per the Mendoza Review), and also in the context of public-private collaborations, for example, in cases of heritage digitisation undertaken by commercial partners. We will consider whether “non-commercial” should be replaced with “non-profit”. Second, we will explore the meaning of “research” to assess whether certain heritage management practices fall within this concept, and whether a new exception is needed.

  • (a) “Non-commercial”: a bar to public-private partnerships and heritage organisations’ income streams?

Heritage institutions are increasingly expected to explore new income streams through digital opportunities and may also collaborate with other partners, including commercial ones, in TDM/AI projects. It is not clear if the UK TDM exception for “non-commercial” research would apply to these projects.

The Directive 2019/790 (DSM Directive) (2019) explains in Recital 28 in relation to the preservation exception that heritage institutions do not necessarily have the technical means or expertise to undertake digital preservation, and might need the assistance of other cultural institutions or other third parties for that purpose, and as such cultural heritage institutions should be allowed to rely on such third parties acting on their behalf and under their responsibility. We believe that a similar situation can be seen in relation to AI research.

It has been argued that the EU exception in art 3 of the Directive 2019/790 (DSM Directive) (2019) includes both commercial and non-commercial research (Rosati, 2019, p. 212). Provided that the research is done by the parties allowed in the exception, i.e., “research organisations and cultural heritage institutions” (and the commercial partner has no decisive influence and control over the research organisation)85, and it is for the purposes of “scientific research”, there is no additional requirement in the EU exception that this research should be non-commercial.

UK IPO Guidance explains that the non-commercial nature of the research does not prohibit the publication of research outputs in commercial publications, but cautions researchers on the need to carefully assess the original purpose of the research.86 This opens the door to the idea that “non-commercial” research may include certain commercial aspects, although what these aspects might be remain unclear, posing challenges for researchers, in public-private partnerships (PPPs), where the boundary between commercial and non-commercial becomes blurred.87

The academic literature also acknowledges this uncertainty. Aplin and Davis (2021) address the complexities surrounding the definition of “non-commercial” research, questioning whether research must be entirely free from commercial intent to qualify. Brown et al. (2023) discuss the evolution of the non-commercial research exception under UK law, noting that the restriction to non-commercial purposes has been a point of contention. This ambiguity has affected both academic and professional research environments. Brown et al. suggest, and we agree, that the purpose of the research is to be assessed at the time it is conducted, and that it should be sufficient if there is ‘a’ non-commercial purpose, while noting the extensive ambiguity in the distinction between commercial and non-commercial research.

One case that precedes the UK TDM exception addresses the issue under the fair dealing exception for non-commercial research (s 29(1) UK CDPA, 1988). In Controller HMSO and Ordnance Survey v Green Amps (2007) EWHC 2755 (Ch) the court determined that research by a commercial entity could not be considered non-commercial, even if it lacked an initial commercial purpose. If courts today had to interpret the UK TDM exception for non-commercial research, they would likely apply this precedent to delineate that TDM research conducted by commercial entities should not benefit from the exception. However, it remains unclear whether non-commercial entities, such as universities and heritage organisations, can collaborate with commercial entities on specific aspects of the research.

We believe UK law should embrace the reality of public-private collaborations in TDM/AI research, ensuring that important collaborative efforts are not excluded from the exception. The impact on heritage research and potential solutions for legal interpretation—and, if necessary, changes to the law—warrant further empirical research and focused policy consultation, which we will explore in more detail in Section Current copyright and AI policy and regulation efforts in the UK: scope for further heritage sector participation.

The difficulties in defining “non-commercial” research are longstanding, as seen in earlier contexts, such as s 29(1) (UK CDPA, 1988), and the orphan works policies as per government response in 2012.88 In light of these difficulties, we believe that further research and policy work should consider whether instead of “non-commercial research”, the exception should be framed to apply to research led by non-profit organisations, in the same manner that other heritage sector exceptions are framed in UK law, such as s. 40(A)(2) (UK CDPA, 1988) which allows the “lending of copies” by “public libraries” and by those libraries and archives “not conducted for profit”.89 This would arguably constitute a more objective standard than having to determine whether the research is “non-commercial.”

Brexit provides flexibility to pursue a new exception for TDM that would not need to conform with the limits of EU law. The UK would however need to adhere to international conventions that set limits and minimum standards to national copyright legislation. Importantly in this case is the already mentioned three-step test, and our preferred more balanced interpretation of the test76. It is imperative to understand, in a heritage context, what are such special cases that would require an exception, and what is a normal exploitation of works in those contexts. Heritage sector stakeholders (including researchers in those contexts) can also elucidate on the reasonability of the prejudices to rightsholders legitimate interests. We believe reasonability is a relative concept, and in the context of an exception should involve an assessment of harms (for example, limitation on income streams for rightsholders) versus benefits (for example, better research, preservation of heritage and access to culture).

The UK has consulted on the need for a broader exception,90 with policy work reflecting this. A key issue has been the non-commercial aspect of the current exception. Among the responses received, AIPPI UK highlighted that s. 29A presents difficulties for non-commercial organisations collaborating with commercial entities on AI development.91 Similarly, the BBC raised concerns about the exception’s limitations when non-commercial entities outsource TDM to commercial third parties, particularly around lawful access and the lack of clarity on whose “purpose” should be relevant to define whether the research is commercial or not.92 The European Alliance for Research Excellence also noted that the UK exception is too narrow for today’s research landscape, especially in public-private partnerships, which discourages TDM activities due to copyright concerns.93

Gathering further evidence from heritage stakeholders is crucial for policymakers to create accurate guidance or legislative solutions for the TDM exception. It is important to address one final issue: the concept of “research” and whether it applies to certain heritage management practices.

  • (b) “Research”: what of heritage management?

A last point should be made as regards the copyright exceptions landscape to support AI uses in heritage contexts, which relates to the framing of the exception for research only, and as such it could exclude other types of AI usages for heritage management such as for cataloguing, preservation and reconstruction efforts.

It is important to remember that the exception was not intended to apply only for “scientific” research, with a discussion on the inclusion of the qualifier “scientific” as having a possible consequence of leading to the incorrect conclusion that the exception would only apply to academic papers or STEM research.94 We believe that a broader interpretation of “research” could encompass certain heritage management activities aimed at internal organisational and collections management improvements.

Given the importance of heritage cataloguing data for research quality and accuracy, it is crucial that the TDM exception supports heritage AI projects aimed at improving this data. For instance, the Transforming Collections project, part of the UKRI/AHRC Towards a National Collection programme, is a clear research initiative. However, it also has the potential to provide AI tools useful to the heritage sector beyond the project’s life. We believe that future uses of such tools in heritage organisations should be covered by the TDM exception, enabling internal research and enhancing collections data for future AI-driven research.

The future of AI research based on collections data looks promising, especially with national initiatives to unify collections in the UK. In addition to the Towards a National Collection programme, a key development is the launch of the Museum Data Service (MDS). This collaboration between Art UK, Collections Trust, and the University of Leicester will unify over 100 million museum records, offering the most comprehensive dataset of the nation’s museum holdings. The MDS will be a vital resource for researchers, educators, curators, and content creators. As noted by the Minister of State for Science, Innovation, and Culture, this initiative enhances museums’ digital capabilities, creating new opportunities for research, collaboration, and preservation (Knowledge Integration, 2024).

Recent case law in Germany, i.e., Robert Kneschke v LAION gemeinnütziger e.V (case No. 310 O 227/23), 2024,95 supports the idea that creating datasets for AI training through web scraping publicly available images can be considered research, as it contributes to future knowledge generation (Hembt et al., 2025). How similar reasoning would be considered in UK courts is uncertain. But we believe this case supports the understanding that certain heritage management TDM practices that contribute to future AI research (e.g., by improving cataloguing data) may also fall under the research exception. The question of whether the heritage sector needs a new exception to cover further AI-based heritage management activities requires further research and policy development.

While academic perspectives are valuable, they often remain untested in court and thus may not provide the legal certainty that stakeholders need to confidently rely on an exception. As of the writing of this paper, no case law has considered the (in our view unclear) s. 29A TDM exception in the UK. It is thus important to delve into current law reform debates, including on this provision.

Current copyright and AI policy and regulation efforts in the UK: scope for further heritage sector participation

The UK government’s ambition to lead in AI innovation and research,96 has prompted significant policy work and stakeholder engagement, including how copyright law can support this goal. Given that many AI uses in heritage contexts are research-driven (Section Artificial intelligence and cultural heritage), the views of stakeholders in the heritage sector could offer important insights into how copyright law supports or hinders the sector’s use of AI, and what regulatory improvements are needed. This section will analyse the policy work undertaken in the UK, and assess the extent to which the heritage sector has been included in discussions based on publicly available data (including scholarly papers, policy documents, public consultation responses and official reports).

The UK IPO led a significant policy effort in 2020 by publishing a call for views on artificial intelligence and intellectual property.97 The 2021 outcome report noted that 92 responses were received,98 coming from various stakeholders, including copyright owners, creative and technology industries, licensing bodies, legal representatives, and academics.99 Notably absent from this list were cultural heritage sector stakeholders, perhaps due to the small number of heritage respondents100 or the conflation of the heritage sector with creative industries (which we do not believe is appropriate). The task of classifying respondents into categories is complex, as there may be overlaps - for example, although heritage organisations are often classed as “users”, in some cases they may also fall under a rightsholders category.101 It would have been important to see a deliberate mention to the heritage sector category both in the call for views and the government response.

The government response did not mention heritage stakeholders, or heritage uses of AI, except the more general remarks on benefits for researchers and creators, the risks for human creators, and the need to ensure that measures implemented will encourage AI for the public good while protecting intellectual property rights; and that in doing so the UK IPO will collaborate with “experts from business, technology and research” and “developers and users of AI and owners and users of intellectual property”.102

One of the next steps outlined was to review how copyright owners license their works for AI use and explore ways to improve licensing or copyright exceptions to support innovation and research.103 Indeed, conflicting responses to the call for views were given by copyright owners and users on the matter of use of copyright material for training AI. The preferred approach for most copyright owners was a voluntary licensing model, arguing that it would better balance remuneration with AI access. Many felt that current copyright exceptions do not apply to machine learning processes and that a licensing model would offer greater certainty.104 Many copyright owners expressed concerns about moving towards an exception that would allow commercial TDM.105

On the other hand, users of copyright materials, including “technology firms, entrepreneurs and researchers”, noted the disadvantages of relying on licences, including the high costs, which may only be affordable for “established or large businesses”, and the curatorial bias that may be generated by only mining content “available under licence, rather than what would be most useful for the purposes of the AI”.102 The focus on “businesses” in the discussions fails to consider public heritage organisations, such as museums and libraries, which face similar financial challenges and would struggle to afford licensing fees for AI-related projects.

Following the initial call for views, the UK IPO issued a public consultation on how AI should be addressed in the patent and copyright systems, receiving 88 written submissions from sectors such as the creative industries, technology, pharmaceuticals, the third sector, academia, and legal and IP professions,106 but few from heritage stakeholders.107 Despite this, the Government’s response acknowledged that TDM is used for training AI systems and has applications in research, journalism, business analytics, and by cultural heritage organisations, and proposed expanding the scope of the TDM exception to permit TDM for any purpose (including commercial), while safeguarding rightsholders to protect their content, including a requirement for lawful access.109 As seen in Section Copyright challenges arising from AI uses in the cultural heritage sector in the UK, the requirement of “lawful access” is in itself extremely unclear, and it was not clear how the government would address such ambiguities.

Concerns raised by the creative industries and parliamentarians led to the abandonment of this reform. In 2023, the UK Minister for Science, Research, and Innovation confirmed that plans to broaden TDM exceptions had been shelved. The House of Lords Communications and Digital Committee (2023) report on the creative industries recommended pausing the proposed changes and conducting an impact assessment on the potential effects on the creative sector, with industry groups arguing that weakening copyright protection could harm creators by reducing incentives for future investment in their work. The Committee noted that while AI development is important, it should not be pursued “at all costs,”109 and any changes to the TDM regime must be balanced against the interests of the creative industries. Though the report briefly mentioned museums and galleries digitising collections and referenced research council programmes supporting heritage digitisation, heritage sector concerns were largely absent from the decision to halt the new TDM exception.110

The UK Government’s AI White Paper outlined its plan to balance IP protection with AI development, by following Sir Patrick Vallance’s recommendations.111 Vallance recommended that “Government should announce a clear policy position on the relationship between intellectual property law and generative AI to provide confidence to innovators and investors” stating “an urgent need to prioritise practical solutions to the barriers faced by AI firms in accessing copyright and database materials.”112 Vallance added that “government should work with the AI and creative industries to develop ways to enable TDM for any purpose, and to include the use of publicly available content including that covered by intellectual property as an input to TDM (including databases).”113 We find that the “AI and creative industries” framing does not sufficiently contemplate the heritage sector, and policy language should be revisited to engage heritage stakeholders more expressly. This would allow the understanding of what are the relevant AI technologies at stake (rather than only focussing on generative AI), barriers faced by this specific sector (rather than only “AI firms”), and how a new TDM exception could contemplate the specific uses by such stakeholders.

In response to the Vallance recommendation, the UK IPO instructed the drafting of a code of practice to help AI firms access copyrighted materials while protecting creators’ rights.114 However, the working group formed to create this code lacked diverse participation and the heritage sector was underrepresented based on the publicly available members list (the British Library was the only heritage organisation listed, against a high number of rightsholders and technology stakeholders).115 This limited representation of the heritage sector neglects critical perspectives on ethics, bias, and cultural impact, which are critical for shaping balanced AI policy. Government should be more inclusive in forming such groups instead of limiting the discussion on such an important code of practice to “AI firms and rights holders”.116

The working group did not reach consensus on an effective voluntary code.117House of Commons Culture, Media and Sport Committee (2024) expressed concern over the lack of agreement between the creative industries and AI developers regarding creators’ consent and compensation concerning the utilisation of their works for AI training purposes. The Committee urged the Government to establish mechanisms that enable creators to enforce their consent and receive equitable compensation when their works are employed by AI systems. Government had previously stated that if the code of practice was not adopted, new legislation could be considered.118

More recently, the UK government issued new consultation which continues to focus primarily on the creative industries and AI companies while largely omitting considerations related to cultural heritage.119 Notably, the consultation proposes a new exception for commercial TDM including an opt-out provision for rights holders, allowing them to exclude their works from AI training datasets.120 While this aims to protect copyright interests, the possibility of opt-outs raises concerns about potential biases, omissions, and incomplete datasets that could skew and compromise AI research. This is particularly problematic if this provision was to apply in research and heritage contexts, which could become the case in light of the uncertainties regarding the current non-commercial research TDM exception and risk-averse attitudes of heritage stakeholders as discussed in Section Copyright challenges arising from AI uses in the cultural heritage sector in the UK. It is unclear how the proposed new exception will impact the existing TDM exception for non-commercial research, which we believe is not fit for purpose for heritage research, and should be clarified and expanded (as proposed above) before the introduction of any new exception. The boundaries between any new commercial TDM exception and the non-commercial research exception must be carefully and clearly delineated, to resolve existing issues and protect and promote AI research in heritage contexts.

Additionally, Recommendation 13 of the AI Opportunities Action Plan121 proposes establishing a copyright-cleared British media asset training dataset through partnerships with heritage institutions, which - while framed as a means to advance AI development - effectively commercialises heritage data, reinforcing an industry-driven focus that sidelines broader cultural heritage considerations. The above appear counter to the mission of publicly funded cultural institutions, and as such must be carefully considered in close consultation with heritage stakeholders.122 Whatever the next policy step is, we advocate that meaningful engagement with the heritage sector in its full breadth and diversity is required.

The reluctance to implement a broader TDM exception, despite early considerations, reflects the dominance of creative industry concerns. Kretschmer et al. (2024) p. 112 state that “policy making may be anecdotally driven by examples that surface through lobbying processes or the latest technological applications”, the current policy context being dominated by discussions on user-facing generative AI applications such as Chat GPT. But the “real world” (as Kretschmer et al. put it) of machine learning is not limited to these more dominant scenarios that attract much attention. We believe this is a crucial point for the need to advocate for further policy work engaging less represented stakeholders, such as those in the wide and diverse heritage sector.

AI and copyright policy discussions tend to focus more on creative industries and AI businesses. This can result in the exclusion, in policy discourses and debates, of the heritage sector. It would be important to understand why such public organisations are not more robustly involved, and how can the language of consultations, call for views and policy reports be improved to include the need, more explicitly, for evidence from this sector. It is remarkable that the European Parliament has specifically addressed the intersection of AI and cultural heritage through a dedicated briefing ‘Artificial intelligence in the context of cultural heritage and museums’, exploring the legal challenges faced by the sector. In contrast, the focus of current efforts in the UK Parliament has been more concentrated on the impact of AI on the creative industries, with less attention given to the cultural heritage sector. Events such as ‘Changes and Challenges in Heritage and Open Knowledge’ supported by the National Lottery Heritage Fund (Naomi Korn Associates, 2025) and ‘Roundtable on ICH inventorying, Intellectual Property and Artificial Intelligence’ (Deacon, 2024) promote important discussions and we believe government should proactively seek such forms of engagement to inform its policy.

Conclusion

This paper outlined the current important applications of AI in the UK heritage sector and highlighted the legal challenges, particularly in copyright law, for heritage stakeholders using AI in research and heritage management. High clearance costs and risk-averse attitudes in digitisation projects, alongside the limitations of the TDM exception for non-commercial research, were discussed.

We argued that AI research (and related heritage management) in cultural heritage contexts should benefit from the TDM exception. Licensing may not be feasible or affordable, and can introduce bias and other issues related to dataset completeness and appropriateness. Practical and legal issues, such as ambiguities in the “lawful access” requirement, need to be clarified. Addressing these challenges would support the public interest, alleviate resource constraints, and improve the quality of heritage research and management.

The current UK TDM exception should be interpreted to cover digitisation of physical materials when needed for AI research. The quotation exception can supplement the TDM exception, particularly for sharing excerpts publicly. In view of the problems with the “non-commercial” terminology, expanding the exception to “non-profit” research, or to research led by non-profit organisations, and clearly contemplating public-private partnerships should be considered. We also argued that the TDM exception should be amended to clearly allow the transfer of copies between research project partners. These issues should be resolved and clarified to support AI heritage research and management, particularly in light of current discussions on including a commercial TDM exception with opt-out.

We highlighted the underrepresentation of cultural heritage stakeholders in UK copyright and AI debates, and the missed opportunity to enrich current policy discussions with important heritage perspectives. These perspectives will include crucial considerations on ethics, bias and cultural impact of AI, and allow policy work to focus on other types of heritage and research-relevant AI models beyond the commercial generative AI tools that current policy and public debate tend to focus on. This gap underscores the need for more inclusive policy language that invites broader participation beyond AI and creative industries. More empirical research is necessary to identify the copyright issues experienced and the barriers preventing heritage sector involvement in these discussions. We recommend increased proactive government engagement with the heritage sector (including researchers) in AI and copyright policy, building on efforts by bodies such as the National Lottery Heritage Fund. Projects such as Living With Machines demonstrate the need to address copyright challenges in heritage research, providing a foundation for future legal and policy reforms.

Statements

Author contributions

Research designed and led by PW. Both authors participated in the research, interpretation and analysis of sources and in drafting the manuscript, under PW guidance. PW revised, finalized and submitted the manuscript. Both authors contributed to the article and approved the submitted version.

Funding

The author(s) declare that financial support was received for the research and/or publication of this article. This work was supported by the Brunel Interdisciplinary Research Labs (BRIL); the Athena Swan Research Grant; and Bridging Responsible AI Divides (BRAID) with funds from the Arts and Humanities Research Council [grant number AH/X007146/1]. The BRIL funding supported the initial development of this research (including covering Research Assistantship costs). The research was further developed during Paula Westenberger’s Athena Swan Research leave, allowing further research and initial drafting to be conducted (including covering Research Assistantship costs). The final stage of research and writing up occurred thanks to Paula Westenberger’s BRAID Research Fellowship, when the paper was completed. The APCs for this publication were funded by the UKRI open access block grant.

Acknowledgments

The authors are grateful for the excellent research assistance by Imani Wilson (LLB, Brunel Law School). The authors also wish to thank the following people for their valuable feedback on an earlier draft of this paper (any errors, omissions and opinions remain our own): Prof. Ruth Ahnert, Prof. Tanya Aplin, Dr. Shane Burke, Josie Fraser, Prof. Mick Grierson, Prof. Jonathan Griffiths, Dr. Luke McDonagh, Bartolomeo Meletti, Dr. Alan Paton, Dr. Anna-Maria Sichani, Dr. Andrea Wallace, and Dr. Ermioni Xanthopoulou. A preprint version of this paper was made available at SSRN: Westenberger, Paula and Farmaki, Despoina, Artificial Intelligence for Cultural Heritage Research: the Challenges in UK Copyright Law and Policy (February 23, 2025). Available at SSRN: https://ssrn.com/abstract=5153757 or http://dx.doi.org/10.2139/ssrn.5153757.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declare that no Generative AI was used in the creation of this manuscript.

Footnotes

1.^A preprint version of this paper was made available: Westenberger and Farmaki (2025).

2.^On the concept of “collections as data”: Lampert and Lapworth (2020), Ridge (2024).

3.^Noting that relatively few books and articles address the practical steps in getting hold of data, and the restrictions that come with it, discussing specifically copyright and contractual challenges Ahnert et al. (2023) p. 23.

4.^ Europeana (2024); Pasikowska-Schnass and Lim (2023); Network of European Museum Organisations (2024); Knowledge Rights 21 (2024).

5.^Led by the University of Southampton and partnered with UK Border Force and Royal Botanic Gardens, Kew: Avis-Riordan (2020).

6.^Partnership between The Alan Turing Institute, the British Library, and the Universities of Cambridge, East Anglia, Exeter, Queen Mary University of London and King’s College, London. Living With Machines (2025).

7.^Led by University of the Arts London and Tate: UAL (2025).

8.^E.g., Hawkins and Sichani (2024), Sichani (2024), Bailey et al. (2024), Wallace (2022b).

9.^See also: UK Intellectual Property Office (2021b), para. 18.

10.^UK Copyright, Designs and Patents Act 1988 (UK CDPA, 1988), s. 1.

11.^ UK CDPA (1988), s 180.

12.^ UK. The copyright and rights in databases regulations, s. 13.

13.^ UK CDPA (1988), s. 3.

14.^ Berne Convention (1886), art. 2(1).

15.^ UK CDPA (1988), s. 3; WIPO Copyright Treaty (WCT) (WCT, 1996), art. 4 and 5.

16.^ UK CDPA (1988), s. 4(1)(a).

17.^ Berne Convention (1886), art 2(1); UK CDPA (1988), s. 4(2)(a).

18.^ Berne Convention (1886), art 2(5).

19.^ Infopaq International A/S v Danske Dagblades Forening (2012), (Case C-5/08). In the UK, the originality test required that works reflected the creator’s “skill, labour and judgement,” but this has been surpassed by the EU Infopaq “authors’ own intellectual creation” test: THJ Systems v Sheridan (2023) EWCA Civ 1354.

20.^WIPO Copyright Treaty (WCT) (WCT, 1996), art. 2; Agreement on Trade-Related Aspects of Intellectual Property Rights (1994), art 9(2).

21.^ Berne Convention (1886), art. 2(8); Walter v Steinkopff (1892), 3 Ch 489.

22.^See also UK CDPA (1988), s 3(2): literary, dramatic or musical works must be recorded in writing or otherwise.

23.^The analysis of the copyright status of artistic works utilising AI tools (or even of the tools themselves) in heritage contexts may be the scope of future work, as these works are starting to integrate UK museum collections, such as the V&A recent acquisition of MEMORY (Drawing Operations Unit Generation 2) by Sougwen Chung: Mitchell (2022). UK CDPA (1988), s 9(3) on computed-generated works: “the author shall be taken to be the person by whom the arrangements necessary for the creation of the work are undertaken.”

24.^ UK CDPA (1988), ss 9(1) and 11. As per s 9(2): that person is taken to be the producer (for sound recordings); the producer and the principal director (for films); the person making the broadcast (for broadcasts); and the publisher (for typographical arrangement of a published edition).

25.^ UK CDPA (1988), s. 11(2).

26.^ UK CDPA (1988), s. 90(1).

27.^ UK CDPA (1988), s. 12. Duration varies depending on the work.

28.^ UK CDPA (1988), s. 16.

29.^ UK CDPA (1988), s. 17.

30.^ UK CDPA (1988), s. 20.

31.^ UK CDPA (1988), s. 21.

32.^ UK CDPA (1988), s. 16(2) and (3).

33.^ Infopaq International A/S v Danske Dagblades Forening (2012), Case C-5/08: taking even 11 consecutive words could constitute infringement.

34.^ Pelham GmbH (2019), Case no. C-476/17. Contrast with earlier UK approach in England and Wales Cricket Board Ltd & Anor v Tixdaq Ltd & Anor, 2016, which focused on whether the part reproduced reflected “investment”.

35.^Some scholars suggest a purposive interpretation of the reproduction right to exclude “text and data mining” from the scope of copyright (Bently et al., 2022, p. 261). Margoni and Kretschmer (2022) argue that there should not be a need for a TDM exception for extracting informational value of protected works. Similarly, see Murray-Rust (2012). Concluding that the fact that TDM was regulated as an exception means that legislature sees these activities as falling under the remit of copyright, see Rosati (2019). A formalistic interpretation of the reproduction right, which currently prevails in landmark judicial interpretations such as Infopaq International A/S v Danske Dagblades Forening (2012), would allow copyright owners to inhibit technical copies made through TDM: European Copyright Society (2017), p. 5.

36.^See Coalition for a Digital Economy (COADEC) response to UK IPO public consultation, noting prohibitive costs (time and money) for startups and scale-ups regarding difficulties in identifying ownership and multitude of rightsholders. See also CREATE response to public consultation. UK Intellectual Property Office (2022b). The costly and lengthy process of acquiring licenses is not a new issue, and was at the core of discussions on the implementation of the UK TDM exception introduced in 2014, see UK Intellectual Property Office (2012).

37.^ UK Intellectual Property Office (2021b); See also: UK Intellectual Property Office (2021a).

38.^On the legal certainty of the EU TDM exception, see Rosati (2019) p. 214.

39.^ UK CDPA (1988), ss.40A-43A. The UK revoked the EU orphan works exception following Brexit. The use of orphan works in the UK is regulated through a government licensing scheme: UK Intellectual Property Office (2024).

40.^ UK CDPA (1988), s. 29A.

41.^Using TDM interchangeably with text and data analysis when referring to the UK exception, see: Rosati (2019) p. 198; Bently, Sherman et al. (2022) p. 260; UK Intellectual Property Office (2014b); UK Intellectual Property Office (2014a).

42.^ UK CDPA (1988), s. 1D in Schedule 2.

43.^The Copyright and Rights in Databases Regulations 1997, reg. 20. “The Government’s view is that this existing exception [reg. 20] will permit the extraction of whole works if required for text and data mining through the provision for “fair dealing with a substantial part.” (page 13): UK Intellectual Property Office (2014a). The most recent Government consultation (December 2024) however flagged that the current UK TDM exception does not extend to databases: UK Intellectual Property Office (2014a) [at para. 123].

44.^ Hargreaves (2011) p. 9 and UK Intellectual Property Office (2012).

45.^Acknowledging text and data mining is necessary to develop and train ‘artificial intelligence’ algorithms: Bently et al., (2022) p. 260. See also: Rosati (2019) p. 198; Strowel A. and Ducato R. (2021).

46.^ UK Intellectual Property Office (2014a).

47.^British Library response to the UK Intellectual Property Office (2022b). Response to the IPO Call for Views (2021), Prof Ruth Ahnert, PI of Living With Machines, as regards the TDM exception: “I would say it is not fit for purpose. … for cautious institutions with a high profile that ambiguity can be very limiting”: UK Intellectual Property Office (2021c).

48.^ UK Intellectual Property Office (2022b).

49.^The JISC report cited in the UK IPO Impact Assessment document stated that “the broader interests of equity may support the case for an exception to enable text mining so that society can maximise the potential returns from an asset in which society has made the lion’s share of investment and taken the vast majority of the risk”: UK Intellectual Property Office (2012); McDonald and Kelly (2012).

50.^ UK Intellectual Property Office (2014b).

51.^ UK Intellectual Property Office (2014a) p. 11.

52.^Ibid, p. 12.

53.^ UK Intellectual Property Office (2022a).

54.^ UK CDPA (1988) s. 29A(1)(a).

55.^ UK CDPA (1988), s. 28A. Directive 2001/29/EC (Infosoc Directive) (2001). Discussing the possible challenges in applying this exception for AI training, see Guadamuz (2024).

56.^ Directive 2019/790 (DSM Directive) (2019). Note also the TDM exception with rights reservation in article 4 DSM Directive, stating copies can be retained for as long as necessary for the purposes of text and data mining.

57.^ Directive 2019/790 (DSM Directive), 2019, art 3(2).

58.^ Directive 2019/790 (DSM Directive), 2019, art 3(4).

59.^ UK Parliament (2020).

60.^ UK CDPA (1988), s. 30(1ZA).

61.^ UK Intellectual Property Office (2014b), p. 9.

62.^The Collections Trust Spectrum standard of museum collections management in the UK (also used worldwide) explain the procedures for acquisition and accession: Collections Trust (2025a) and Collections Trust (2025b).

63.^ Authors Guild v Google, Inc., (2015) Authors Guild v Google, Inc., (2015) No.13–4829 (2d Cir. 2015) Authors Guild v Google, Inc., (2013) Authors Guild, Inc. v. Google Inc., (2013) 954 F. Supp. 2d 282 (S.D.N.Y. 2013). See also Rosati (2019).

64.^ Infopaq International A/S v Danske Dagblades Forening (2012) (Case C-5/08).

65.^ Football Association Premier League Ltd and Others v QC Leisure and Others and Karen Murphy v Media Protection Services Ltd (2011), Joined Cases C-403/08 and C-429/08.

66.^ HM Government (2012b).

67.^Ibid.

68.^ UK Intellectual Property Office (2012).

69.^Ibid.

70.^Ibid.

71.^Ibid.

72.^ UK Intellectual Property Office (2014b), p. 6.

73.^Ibid.

74.^Ibid, p. 7.

75.^See, for example, the Elsevier print and electronic subscription modes: Elsevier (2025).

76.^TRIPS Agreement, art. 13; Berne Convention, art. 9(2). The WTO interpreted the test as requiring exceptions to be clearly defined and narrow in scope and reach (WTO, 2000, p. 34). Geiger et al. (2010) promoted a declaration, with which we agree, on the need for a broader, more balanced application of the test, requiring a comprehensive overall assessment, balancing interests of rights holders and general public. See also Westenberger (2017), p. 296, 301-302, 305-306 and Hudson (2020), p. 14–19.

77.^E.g., the need to digitise Mitchell’s NPD (Ahnert et al., 2023, p. 33–34).

78.^ UK CDPA (1988), s. 29A(5).

79.^ UK CDPA (1988), s 29(A)(2)(a).

80.^See BBC’s response asking for “clarification as to the application of s29A(2)(a) … where the entity commissioning the research (or indeed a third party) collects the data”: UK Intellectual Property Office (2022b).

81.^ UK CDPA (1988), s. 296ZE.

82.^ UK Intellectual Property Office (2014b), p. 7.

83.^Ibid.

84.^Boundaries between commercial and research are increasingly vague: Matas (2025).

85.^Recital 12 and art. 2(1) (Directive 2019/790 (DSM Directive), 2019).

86.^ UK Intellectual Property Office (2014b), p. 10.

87.^See ABPI response to IPO public consultation on this point UK Intellectual Property Office (2022b).

88.^ HM Government (2012a) p. 8.

89.^See also UK CDPA (1988), ss. 42(4), 42(A)(1), and 43(A)4.

90.^ UK Intellectual Property Office (2022b).

91.^Ibid.

92.^Ibid.

93.^Ibid.

94.^ UK Intellectual Property Office (2014a) p.12.

95.^For a summary of the case, see: EUIPO (2024).

96.^ UK Intellectual Property Office (2021b) Para 23 and 25.

97.^Ibid.

98.^Ibid. Para 6. We have found 88 files, however 3 respondents submitted 2 responses each, totalling 85 individual respondents: UK Intellectual Property Office (2021c).

99.^ UK Intellectual Property Office (2021b) Para 8.

100.^We have identified as heritage respondents for the purposes of this paper: Archives and Records Association, European Alliance for Research Excellence, LACA–Libraries and Archives Copyright Alliance, and Ruth Ahnert (PI of Living with Machines). We should also note related organisations such as BAPLA and Creative Commons, noting however these do not represent only heritage stakeholders, and we thus did not consider these as heritage respondents. This is our own classification based on our analysis of responses. See all responses in the call for views webpage: UK Intellectual Property Office (2021c).

101.^See the BAPLA response, indicating a “broad and diverse membership of image rights holders and purveyors”, including cultural heritage institutions: UK Intellectual Property Office (2021c).

102.^ UK Intellectual Property Office (2021b).

103.^Ibid. (Next steps - action: copyright. Para 5).

104.^Ibid.

105.^Ibid.

106.^ UK Intellectual Property Office (2022b).

107.^We have identified as heritage respondents for the purposes of this paper: Archives and Records Association, British Library, European Alliance for Research Excellence, LACA, National Library of Scotland and Wellcome Trust. We should also note related organisations such as BAPLA, noting however it does not only represent heritage stakeholders, and we thus did not consider these as heritage respondents. This is our own classification based on our analysis of responses. See all responses in the consultation webpage: UK Intellectual Property Office (2022b).

108.^ UK Intellectual Property Office (2022a).

109.^ House of Lords Communications and Digital Committee (2023).

110.^Ibid.

111.^ Department for Science, Innovation and Technology (2023).

112.^ HM Government (2023a).

113.^Ibid.

114.^ HM Government (2023b).

115.^The selection of group members lacked transparency (Trapova, 2024). The lack of public documentation of the process is also an issue, with the only available documentation to include the Terms of Reference and the member list: UK Intellectual Property Office (2023).

116.^ HM Government (2023b).

117.^ Department for Science, Innovation and Technology (2024), [para. 29].

118.^ UK Intellectual Property Office (2023).

119.^ UK Intellectual Property Office, Department for Science, Innovation and Technology and Department for Culture, Media and Sport (2024).

120.^At EU level, see TDM exception with rights reservation in Article 4(3) (Directive 2019/790 (DSM Directive), 2019), reinforced in Article 53 (Regulation (EU) 2024/1689 (Artificial Intelligence Act), 2024). On the impact of the EU Artificial Intelligence Act for the cultural sector, see: Culture Action Europe (2024).

121.^ Department for Science, Innovation and Technology (2025a).

122.^Government partly agreed with Recommendation 13, stating it “will engage with partner organisations and industry to consider the potential role of government in taking forward this recommendation.” Department for Science, Innovation and Technology (2025b).

References

Summary

Keywords

cultural heritage, research, artificial intelligence, copyright, text and data mining

Citation

Westenberger P and Farmaki D (2025) Artificial intelligence for cultural heritage research: the challenges in UK copyright law and policy. Eur. J. Cult. Manag. Policy 15:14009. doi: 10.3389/ejcmp.2025.14009

Received

30 October 2024

Accepted

24 February 2025

Published

09 September 2025

Volume

15 - 2025

Updates

Copyright

*Correspondence: Paula Westenberger,

Disclaimer

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Outline

Cite article

Copy to clipboard


Export citation file


Share article