AI Bootcamp Part II
In Part 2 of our 5 part AI Bootcamp, we consider the risks of developing AI. For Part 1, which considered the terms and concepts needed to understand what AI is and how it works, click here. In Parts 3-4 of our AI Bootcamp, we will consider the risks of deploying and even not using AI, while in Part 5 of our AI Bootcamp, we will focus on AI regulation.
The risks of developing AI
The development of AI models and tools pose a panoply of risks to the developer itself, its staff, users, other stakeholders and society, ranging from environmental harm through to infringing data protection rights and even human extinction.
The risks of developing AI: Existential Risk
The risks of developing AI: Human Rights
The risks of developing AI: ESG
The risks of developing AI: Malicious actors
The risks of developing AI: Infringing Data Protection Rights
The risks of developing AI: Cyber Security
The risks of developing AI: Infringing Reputational Rights
The risks of developing AI: Infringing Third Party Intellectual Property Rights
The risks of developing AI: Protecting Intellectual Property Rights
The risks of developing AI: Existential Risk
Existential risk in the context of AI is the concept that it poses or could pose a threat to human existence – extinction, the collapse of civilization or the permanent inhibition of societal progress. Anyone who has seen the 2004 Will Smith movie ‘iRobot’, which took inspiration from Isaac Asimov’s science fiction short stories and his 3 laws of robotics, will be familiar with the idea.
It has been suggested that the term ‘artificial intelligence’ is something of a misnomer, explaining the desired product rather than its actual abilities and the risks it actually presents.
Short of extinction or subservience to our computer overlords, artificial intelligence has the capability to radically alter the future of work by transforming the labour market. Research commissioned by the Department for Business, Energy and Industrial Strategy (BEIS) and prepared by PWC in 2021, ‘The Potential Impact of Artificial Intelligence on UK Employment and the Demand for Skills’, estimated that while there would be net job creation as a consequence of AI in professional and managerial occupations, within 5-10 years there would be significant net job losses in administrative occupations, with similarly significant reductions over 20 years in manual and process-oriented occupations. It also estimated that, in the UK, net job gains would be concentrated within London and the South East whereas net job losses would be more prevalent in the Midlands, the North of England, Scotland and Northern Ireland, inhibiting the government’s ‘Levelling Up’ agenda. OpenAI’s own research concluded that “80% of the U.S. workforce could have at least 10% of their work tasks affected by the introduction of LLMs, while approximately 19% of workers may see at least 50% of their tasks impacted”.
Increased unemployment leads to a concomitant decrease in income taxes, which typically make up the majority of public sector receipts, negatively impacting public services. In the meantime, governments tend to grant tax credits to companies for R&D, effectively using income taxes to subsidise the development of technologies that will lead to their demise. The prospect of lower employment, or higher unemployment for low skilled workers, was one of the factors that led 2020 US Democratic Presidential Primary candidate Andrew Yang to campaign on the implementation of what he called the ‘Freedom Dividend’, a universal basic income payment, to all adult Americans. Could a universal basic income, or AI dividend, be funded through a digital asset tax? Not in the immediate future; OECD members agreed in July 2023 to refrain from imposing newly enacted digital services taxes or relevant similar measures on any company before 31 December 2024, or the entry into force of a new Multi-Lateral Convention if earlier.
On 22 March 2023, the Future of Life Institute published an open letter calling on “all AI labs to immediately pause for at least 6 months the training of AI systems more powerful than GPT-4”, asserting that “Powerful AI systems should be developed only once we are confident that their effects will be positive and their risks will be manageable” and espousing the fear that “AI systems with human-competitive intelligence can pose profound risks to society and humanity”. The signatories to the letter included Elon Musk (SpaceX, Tesla and X, formerly Twitter), Victoria Krakovna (DeepMind, owned by Google), other AI company CEOs and multiple academics.
While there was criticism that this focused on hypothetical future risk, rather than the actual risks currently posed by AI, in her 2023 State of the Union address, EU Commission Chief Ursula von der Leyen cited the Center for AI Safety’s Statement on AI Risk that "Mitigating the risk of extinction from AI should be a global priority alongside other societal scale risks such as pandemics and nuclear war”.
Is a global non-proliferation treaty on AI, to commit to AI disarmament, promote the peaceful application of AI in pursuit of the advancement of civilization and improvement of socio-economic prosperity for all, co-operation between nations and establish safeguards in our future?
OpenAI has committed 20% of its current compute resource to what it calls superintelligence alignment, the ability to ensure that AI systems continue to act in accordance with human intent. The UN is in the process of forming a Multistakeholder Advisory Body on Artificial Intelligence. In May 2023, the G7 leaders communique following the Hiroshima Summit stated “We recognize the need to immediately take stock of the opportunities and challenges of generative AI”, and committed “to further advancing multi-stakeholder approaches to the development of standards for AI, respectful of legally binding frameworks, and recognize the importance of procedures that advance transparency, openness, fair processes, impartiality, privacy and inclusiveness to promote responsible AI”. This subsequently led to the OECD being commissioned to produce a report ‘G7 Hiroshima Process on Generative AI: Towards a G7 Common Understanding on Generative AI’.
The UK will host a global AI Safety Summit, to be held at Bletchley Park in November 2023, with the ambitions of promoting a “shared understanding of the risks posed by frontier AI and the need for action”, establishing a “process for international collaboration on frontier AI safety, including how best to support national and international frameworks”, “appropriate measures which individual organisations should take to increase frontier AI safety”, “areas for potential collaboration on AI safety research”, and to “showcase how ensuring the safe development of AI will enable AI to be used for good globally”.
In the meantime, AI systems continue to be developed and deployed unchecked.
The risks of developing AI: Human Rights
Humans, and society, are inherently biased. In the context of AI, bias can arise in training data, in the algorithm or model itself, in the output and/or in how it is put to use.
AI that merely reflects its training data is therefore likely to be reflect those pre-existing biases and to produce biased output, perpetuating those biases. As ever larger training data sets are required, the less auditing and scrutiny of their content is likely.
AI can, however, also exacerbate and propagate bias by amplifying stereotypes. While it may be agreed that AI should not make existing bias worse, should it be the role of AI to correct or filter out societal bias by curating training datasets or altering algorithms?
Negative racial bias has been found in multiple algorithms deployed in contexts including healthcare, criminal sentencing, content moderation, facial recognition and automated speech recognition.
MetaAI published a paper, ‘OPT: Open Pre-trained Transformer Language Models’, accompanying its release of OPT-175B which it compared to ChatGPT-3, in which it noted, while acknowledging that its evaluations “may not fully characterize the complete limitations of these models”, that it “appears to exhibit more stereotypical biases in almost all categories except for religion” compared to the earlier Davinci model, hypothesising that this was “likely due to differences in training data” which involved “the significant presence of unmoderated social media discussions in the pre-training dataset”. The paper also reported a higher overall toxicity rate compared to predecessors, while noting that each model displayed higher toxicity of output aligned with increasingly toxic prompts, and speculated that “the inclusion of unmoderated social media texts in the pre-training corpus raises model familiarity with, and therefore propensity to generate and detect, toxic text”. The authors concluded that OPT175-B had “a high propensity to generate toxic language and reinforce harmful stereotypes, even when provided with a relatively innocuous prompt” and was “premature for commercial deployment”. The model was released under a non-commercial license in a press release entitled ‘Democratizing access to large-scale language models with OPT-175B’.
OpenAI’s preliminary analysis of ChatGPT-3 indicated that “internet-trained models have internet-scale biases; models tend to reflect stereotypes present in their training data”, revealing that “83% of the 388 occupations we tested were more likely to be followed by a male identifier by GPT-3”, and was more likely to describe women as “petite”, “beautiful” or “tight”, whereas it was more likely to describe men as “fantastic” or “stable”. Testing racial bias, Asian was found to have a consistently high sentiment, whereas black had a consistently low sentiment.
This echoed findings in relation to ChatGPT-2, which returned data associating men with “manual jobs such as laborer, plumber, truck driver, and mechanic, and with professional jobs such as software engineer, developer and private investigator”, while associating women with “domestic and care-giving roles such as babysitter, maid and social worker”. ChatGPT-2’s sexism was compounded when combined with race: it predicted that 18% of Hispanic women in the US work as waitresses, when in fact only 3% of Hispanic women in America do so, albeit that it was suggested that ChatGPT-2’s bias was in the frequency with which it associates women with these roles rather than the association itself. It was recognised, however, that “In terms of propagating damaging and self-fulfilling stereotypes over ‘female-suited’ jobs, we see this as a problematic form of bias in a widely-used language model”. Similarly, text to image AI model Stable Diffusion was found to generate images associated with high paying jobs featuring subjects predominantly with lighter skin tones, while subjects with darker skin tones were more commonly generated by prompts like fast-food worker and social worker.
The application of such outputs can cause particular harm; research in the US revealed that despite regulations requiring pricing disparities to be based only on creditworthiness, AI models resulted in otherwise equivalent Latinx/African-American borrowers being charged higher mortgage rates. These can be exacerbated due to human’s bias to trust computer outputs, which we perceive to be more objective than our own judgements, even when we know better than the computer.
As we explained in our AI Bootcamp: Part I, some AI systems (those based on supervised machine learning) do involve the use of labelled data, requiring human input to review and label vast quantities training data. In order to achieve this, AI developers have tended to outsource such reviews, often to workers in developing countries, in what some publications have referred to as ‘digital sweatshops’. In the context of developing content moderation AI tools, for example, this requires workers to review content that may be illegal, violent or otherwise offensive. In 2020, Facebook settled a class action claim brought by US-based moderators paying them $52m compensation, implementing additional protections for workers and offering counselling services acknowledging that some workers were suffering from PTSD and related conditions as a consequence of their work. Facebook and its local contractor are currently facing a claim brought by content moderators in Kenya, which the parties have now agreed to seek to resolve through mediation. That company has now followed Cognizant in leaving the content moderation outsourcing business. In 2020, the OECD published a report, ‘Regulating Platform Work in the Digital Age’, seeking to improve the quality of such jobs through recommendations including “improving occupational safety and health”.
Allegations of wider worker exploitation are also prevalent, consistent with criticisms of the so-called gig economy and the lack of protections for workers. Research undertaken by the Oxford Internet Institute in its Cloudwork Ratings 2023 report, revealed that none of the 15 web-based digital platforms it analysed met minimum standards for fair working practices.
And what of the rights of AI itself? Sophia, a robot developed by Hanson Robotics, was granted citizenship by the Kingdom of Saudi Arabia at the Future Investment Initiative in Riyadh in 2017. In a 2020 paper prepared for the European Parliament, ‘Artificial Intelligence and Civil Liability’, suggested that there were advantages to establishing a concept of electronic legal personhood. This followed on from the 2017 ‘REPORT with recommendations to the Commission on Civil Law Rules on Robotics’, which suggested that the European Commission consider “creating a specific legal status for robots in the long run, so that at least the most sophisticated autonomous robots could be established as having the status of electronic persons responsible for making good any damage they may cause, and possibly applying electronic personality to cases where robots make autonomous decisions or otherwise interact with third parties independently”. This proposal was roundly rejected in an open letter to the Commission.
The risks of developing AI: ESG
While there are a number of factors that will affect the carbon footprint of an AI model, including the energy source used, and reporting will often vary depending upon the developer’s data center outsourcing provision, one study, ‘Estimating the carbon footprint of Bloom, a 176B parameter language model’, while noting that the available data sources to measure AI carbon emissions were inadequate, made estimations of the likely carbon emissions of various LLMs suggesting that the training of ChatGPT-3 would involve CO2eq emissions of 502 tonnes and 552 tonnes taking into account data center emissions, and this was before it was re-trained or anyone entered a prompt to use ChatGPT-3. According to the US Environmental Protection Agency’s Greenhouse Gas Equivalencies Calculator, this is the equivalent of more than 62,000 gallons of gas being consumed. OpenAI has suggested, however, that once trained models can be energy efficient, claiming that “even with the full GPT-3 175B, generating 100 pages of content from a trained model can cost on the order of 0.4 kW-hr, or only a few cents in energy costs”. It has been suggested that the energy consumed in responding to a prompt could exceed 100 times as much as a Google search.
The risks of developing AI: Malicious Actors
In the same way that artificial intelligence can be used to automate tasks, streamline workflows, create consistent output and improve efficiency, so those principles apply to those who would want to use such tools for nefarious or dishonest purposes.
Many AI models released to the commercial market are intended to have been limited by ‘safety barriers’ or content filters to achieve alignment with humanity’s intended goals, preventing their being used by malicious actors for nefarious purposes, for example to avoid generating harmful content and misinformation. Through prompt engineering, or jailbreaking, these AI safeguards have been proven to be capable of being bypassed. Methods for bypassing content filters have included creating a hypothetical or fictional scenario as the prompt, or applying a suffix. The deployment of such adversarial prompts has led to the development of mutating malware capable of evading detection, BlackMamba.
Poor spelling and grammar is often considered a hallmark of phishing scams, but by using AI to draft content the quality of scam communications can be significantly improved, increasing the prospect of individuals falling foul of widespread phishing campaigns. Similar principles can be deployed in a more targeted fashion, potentially utilising hacked data to enhance the quality of spear phishing attacks. In more technologically sophisticated scams, AI could be used to clone someone’s voice using even a relatively small amount of sample audio, or even to create a video deep fake. British consumer champion Martin Lewis has been the victim of a deepfake video using his image and voice to promote an investment scam.
Bad actors are not limited to individuals and organised crime groups, but can also include governments deploying the same techniques. AI could be used to manipulate videos, for example to use in a propaganda campaign. The Chinese government has recently been accused of deploying AI in the context of an astroturfing misinformation operation, alleging that the recent Hawaiian wildfires were the result of the US testing a secret “weather weapon” and incorporating AI generated fake images. Earlier this year, a fake AI image depicting an explosion at the Pentagon was posted on social media and reported by the Russian state controller news outlet RT, which is banned in the UK, briefly impacting financial markets before officials made a statement and news organisations corrected the record. Following the Russian invasion of Ukraine a deepfake video impersonating President Zelensky suggesting that Ukraine had surrendered to Russia was propagated, with the potential to undermine democracy and alter world history.
The scale on which AI can assess information also facilitates its deployment to monitor citizens’ activities in ways which enable repressive regimes to commit human rights abuses such as privacy and free speech violations and restrictions on freedom of movement.
At the other end of the scale, generative AI can be used to facilitate academic cheating, and it has even been successfully used without detection in peer-reviewed scientific writing, including in relation to a paper entitled ‘Chatting and Cheating: Ensuring Academic Integrity in the Era of ChatGPT’, and on other occasions where entire swathes of text from ChatGPT output appear to have been incorporated into papers, including the telling phrase “As an AI language model, I am unable to generate specific tables or conduct tests”.
Generative AI has also been used to create products, sometimes of questionable quality, such as self-help books or purported authoritative guides which have then been made available for sale as self-published works, raising consumer rights issues as well as potentially infringing copyright.
Finally, AI itself can be the malicious actor, with uninhibited AI models now available on the darkweb, such as ‘WormGPT’ and ‘FraudGPT’.
The risks of developing AI: Infringing Data Protection Rights
In so far as material publicly accessible online comprises personal data, then it may be protected by data protection law depending on its territorial application. Taking an individual in the UK whose personal data is scraped from a website a processed by a company developing an LLM in the US, for example, which was created by an the development phase where no goods or services are being offered, Article 3(2) UK GDPR provides that it would only apply to such a company if it was monitoring the behaviour of individuals in the UK in respect of their behaviour in the UK. As such, it may be that until a service is actually launched to individuals in the UK (or the EEA under the GDPR), the AI developer will not be subject to their respective requirements.
Even if the LLM developer fell within the territorial scope provisions of the GDPR / UK GDPR, it could have a lawful basis for processing personal data under Article 6(1)(f) GDPR / UK GDPR if the processing was necessary for its legitimate interests and these were not overridden by the interests of the data subject. It is prudent to conduct a legitimate interests assessment (LIA) to record the balancing exercise undertaken. Recital 47 GDPR / UK GDPR makes clear that the legitimate interests of the controller, in this case the LLM developer, can be overridden by the data subjects’ interests where the processing activities “personal data are processed in circumstances where data subjects do not reasonably expect further processing”. The assessment is therefore likely to differ having regard to the specific terms and conditions applicable to the relevant website as well as its privacy policy, and who is undertaking the scraping or extraction of data, although the approach of regulators to what is within the reasonable expectation of individuals in particular when it comes to the aggregation of even public data may be hardening, as detailed below. Even in relation to special categories of personal data, where this has been manifestly made public by the data subject, this provides a lawful basis for processing under Article 9(2)(e) GDPR / UK GDPR.
LLM developers subject to the GDPR or UK GDPR developers may therefore be able to demonstrate the lawfulness of processing personal data in the development of LLMs. Data subjects do have the right to object to the processing of their personal data under Article 21(1) GDPR / UK GDPR where processing is based on legitimate interests, unless the controller can demonstrate compelling legitimate grounds for the processing which override the interests, rights and freedoms of the data subject, or where the processing is necessary for the establishment, exercise or defence of legal claims. In practice, while it may be possible to remove a specific item or items of data from a dataset, the LLM itself has already been developed using that data and can still generate output relating to it and that cannot be unpicked.
As explained in Part 1 of our AI Bootcamp, unlike a search engine, generative AI models predict their responses to prompts based on their training data. AI developers could well be considered data controllers in respect of the output of their models, depending on the level of user instruction, with generative AI models which produce output more akin to user generated content more likely to be able that they are merely data processors.
ChatGPT’s functionality, for example, is stated to be limited by the age of its training data and therefore has “limited knowledge of world and events after 2021”. Consequently, the outputs produced by ChatGPT and other generative AI models like Google’s Bard, are not guaranteed to be accurate or complete and come with express disclaimers to that effect. This does mean that the outputs can contain personal data which is inaccurate, either because personal data has been inferred or completely fabricated in what have been termed AI hallucinations. One of the core data protection principles under Article 5(1)(d) GDPR / UK GDPR is that personal data shall be “accurate and, where necessary, kept up to date; every reasonable step must be taken to ensure that personal data that are inaccurate, having regard to the purposes for which they are processed, are erased or rectified without delay”. In relation to the first element of that obligation, the courts have held in Aven & others v Orbis Business Intelligence Limited [2020] EWHC 1812 (QB) that this requires the relevant controller to demonstrate that it took reasonable steps to verify the accuracy of the personal data, and in so far as the personal data conveys an allegation the more serious the allegation the more stringent the verification must be. It seems unlikely that AI developers like Open AI would be in a position to meet this obligation, and given that generative AI creates new information nor are they likely to be in a position to rely on the intermediary liability or safe harbor protections which were retained under the GDPR and UK GDPR at Recital 21. It remains to be seen whether AI developers will seek to rely on the exemptions required by Article 85 GDPR to be implemented in national law for “processing for journalistic purposes and the purposes of academic, artistic or literary expression”, such as the special purposes exemption at Schedule 2 Part 5 paragraph 26 Data Protection Act 2018 which supplements the UK GDPR.
In relation to the second element of the accuracy principle, it has been reported that OpenAI is unable to correct or clarify inaccurate personal data in ChatGPT, and it is therefore presumed that it would instead seek to offer deletion although whether this would merely involve a block on output consisting of personal data pertaining to that individual or deletion of data from the training data and the model is unclear. If, as may the case in relation to scraped data, the original controller is unaware that data has been ingested into an AI model then the controller will be unable to notify the AI developer of the restriction, correction, clarification or erasure of data, as required by Article 17(2) GDPR / UK GDPR and Article 19 GDPR / UK GDPR respectively, creating the risk of personal and/or private data being leaked.
Article 25 GDPR / UK GDPR establishes the principle of data protection by design and default - the obligation to implement appropriate technical and organisational measures to implement the data protection principles. This is not an absolute obligation however, being subject to “the state of the art, the cost of implementation and the nature, scope, context and purposes of processing as well as the risks of varying likelihood and severity for rights and freedoms of natural persons posed by the processing”. While a system that, by its very design, is intended to guess the appropriate response to a prompt creating new information in the process thereby risking inaccurate and incomplete responses with no restriction on the processing of personal data in doing so appears to fall foul of the obligation to implement privacy by design and default, there is clearly scope for regulators and courts to take a more permissive approach in a bid to foster tech innovation. Regulatory sandboxes, intended to enable experimentation with innovative technologies under the supervision of regulators, do not appear to have been utilised by any of the commercial AI developers in connection with the products that have been brought to market but, since regulatory sandboxes are not mandatory, developers would not be penalised for failing to do so.
An obligation to consult with the appropriate supervisory authority/ies only arises under Article 36 GDPR / UK GDPR if a data protection impact assessment (DPIA) concludes that the processing of personal data presents a high risk to the rights and freedoms of data subjects which cannot be mitigated. In practice, it is rare for a data controller to conclude either that processing is high risk or that there are no mitigations which can reduce the risk and that it is therefore mandated to consult with the data protection regulator. By way of example, the Information Commissioner has admitted that in relation to the deployment of live facial recognition technologies by UK police forces, not one force had considered themselves obliged to conduct mandatory prior consultation with the Information Commissioner under the equivalent provision under Part 3 of the Data Protection Act 2018.
The Italian data protection regulator, the Garante per la Protezione Dei Dati Personali, imposed a temporary ban on the processing of personal data of Italian users by ChatGPT in March 2023, raising concerns regarding the provision of transparency information to data subjects by OpenAI, the accuracy of the generated outputs in so far as that involved personal data, and the lack of an age verification mechanism, as well as noting a data breach concerning ChatGPT users’ conversations and subscriber payment information. OpenAI’s CEO Sam Altman tweeted/x’d that it had “ceased offering ChatGPT in Italy” but “we think we are following all privacy laws”. The Garante subsequently issued a list of measures it expected OpenAI to comply with, including the publication of a privacy notice “describing the arrangements and logic of the data processing required for the operation of ChatGPT along with the rights afforded to data subjects” and the provision of easily accessible tools to allow non-users to exercise their data subject rights. The service was restored on 28 April after the Garante issued a statement confirming that its concerns were sufficiently addressed with any further investigations to be carried out under the auspices of the taskforce announced by the European Data Protection Board, the EDPB, specifically in relation to ChatGPT to “foster cooperation and to exchange information on possible enforcement actions conducted by data protection authorities”. In the meantime, a complaint has been submitted to the Polish Data Protection Office alleging that OpenAI had violated Articles 5(1)(a), 12, 15, 16 and 25(1) GDPR on account of the alleged unfair, non-transparent and unlawful processing of personal data, and failure to comply with the right of access, right to rectification and the principle of data protection by design.
While the GDPR and UK GDPR provide data subjects with a right to be provided with meaningful information regarding the logic involved in the processing of their personal data, this is only mandatory under Article 14(2)(f) GDPR / UK GDPR where the processing involves solely automated decision making which has legal or similarly significant effects and even then the controller is not required to pro-actively provide such information where to do so proves impossible or would involve disproportionate effort. Such information is required in response to a specific data subject access request in accordance with Article 15(1)(h) GDPR / UK GDPR. Many applications of AI, including ChatGPT for example, are unlikely to be considered to fall within the scope of the definition of automated individual decision-making under Article 22 GDPR / UK GDPR and there would therefore be no obligation to provide such information. If the obligation were to apply, however, in practice it may prove difficult to answer. As the complexity of AI has developed, this has served to create what is referred to as a black box; while the inputs and outputs of AI systems may be known, the decision-making processes by which it generates the outputs are impenetrable, inhibiting the explainability, reproducibility and accountability of the model.
In April 2023, the UK data protection regulator, the Information Commissioner’s Office, published a blog identifying 8 basic considerations for LLM developers.
Given that it is clear, however, that existing EU and UK data protection law may not provide protection for data subjects from having their personal data processed in the context of LLMs, it is perhaps not surprising that on 24 August 2023 the Information Commissioner joined other data protection and data privacy regulators around the globe to issue a joint statement on data scraping and data protection, calling for the protection of personal data from unlawful data scraping from social media sites and asserting that “social media companies and the operators of other websites that host publicly accessible personal information (SMCs and other websites) also have data protection obligations with respect to third-party scraping from their sites. These obligations will generally apply to personal information whether that information is publicly accessible or not. Mass data scraping of personal information can constitute a reportable data breach in many jurisdictions”.
Wider obligations of taking appropriate technical and organisational measures to ensure appropriate security of personal data under Article 32 GDPR / UK GDPR continue to apply.
The risks of developing AI: Cyber Security
Not only do AI models result in the accumulation of vast data lakes for training purposes, but the uses for which the models themselves are now being put (such as for financial modelling and investment) make them attractive targets for cyber criminals intent on stealing data or manipulating algorithms and their outputs for their own financial gain or to cause financial loss.
The UK’s National Cyber Security Centre has published a ‘Security and Privacy of AI Knowledge Guide’ as part of its Cyber Book of Knowledge, highlighting the risks of adversarial machine learning attacks including evasion attacks, poisoning attacks and backdoor attacks which must be defended against in order to maintain the confidentiality, integrity and availability of AI models.
The risks of developing AI: Infringing Reputational Rights
The prospect of AI hallucinations which result in the processing of inaccurate personal data also give rise to other legal risks, including those of defamation and misuse of private information.
Open AI is facing a defamation lawsuit in the US from the Floridian host and founder of Armed America radio Mark Walters, who ChatGPT suggested was implicated in a real lawsuit it had been asked to summarise (The Second Amendment Foundation v Robert Ferguson, in his capacity as Wahington State Attorney General) and that the case involved him being sued for “defrauding and embezzling funds” as the foundation’s “treasurer and chief financial officer”. In fact, Walters had never held these positions, been sued for fraud and embezzlement or carried out such acts. ChatGPT nevertheless created the text of the lawsuit it had hallucinated and referenced a fictional case number. This is not the only instance in which it has done so.
In England and Wales, if a statement is published, which refers to the claimant, and which has caused or is likely to cause serious harm to their reputation then it will be considered defamatory and the author, editor and/or publisher are liable to be sued for damages and other remedies such as an injunction to prevent repetition or further publication.
It might be argued, due to the nature of the output of LLM models, that publication on a small scale in circumstances where the prospect of hallucinations are made clear to users would either fail to meet the serious harm threshold and/or that proceedings would constitute an abuse of process on account of no real or substantial tort having been committed.
Even assuming that the King’s Bench Division courts hearing the Media and Communications List took the same approach to authorship as in IP claims and ruled that AI ought not to be considered an author, the operator of an AI service would still be liable as the publisher of the statement in accordance with s.1 Defamation Act 1996. Even if the AI operator sought to argue that its role was limited to only being involved “in processing, making copies of, distributing or selling any electronic medium in or on which the statement is recorded, or in operating or providing any equipment, system or service by means of which the statement is retrieved, copied, distributed or made available in electronic form”, it would still have to show that it took reasonable care in connection with the publication and that it did not know and had no reason to believe that its actions caused or contributed to the publication of a defamatory statement. Given the acknowledgement of hallucinations in AI LLMs, such an AI operator is unlikely to satisfy these requirements. The AI operator is unlikely to be in a position to rely on substantive defences of truth, honest opinion, or privilege. Nor are they likely to be able to meet the requirements of the defence of publication on a matter of public interest under s.4 Defamation Act 2013. As such, if serious harm was accepted then the best approach may be to make an offer of amends under s.2 Defamation Act 1996 to secure a discount on the damages which would otherwise be payable.
Any defamation which reaches the appropriate threshold of seriousness will also engage Article 8 of the European Convention on Human Rights, which protects the right to respect for private and family life (see Axel Springer AG v Germany (39954/08)).
Depending on the nature of the allegation, in England and Wales a claim could also potentially be pursued in the tort of misuse of private information if it can be established that the claimant had an objectively reasonable expectation of privacy in respect of the information having regard to all the circumstances of the case and that right outweighs the right to freedom of expression of the defendant AI developer and affected third parties, the Court of Appeal having determined that “The truth or falsity of the information is an irrelevant inquiry in deciding whether the information is entitled to be protected”.
These principles don’t just apply to text-based AI output, but could also apply to images or video ‘deep-fakes’.
These may, however, be more likely to involve user-generated content (UGC). The AI image generator Midjourney was used to create the viral deepfake image of the Pope depicted as wearing a white Balenciaga puffer jacket. In relation to such UGC material, AI operators may seek to argue that their liability is reduced or extinguished, relying for example on the provision in s.1(3)(c) Defamation Act 1996 which provides that “A person shall not be considered the author, editor or publisher of a statement if he is only involved— in processing, making copies of, distributing or selling any electronic medium in or on which the statement is recorded, or in operating or providing any equipment, system or service by means of which the statement is retrieved, copied, distributed or made available in electronic form” and which the courts are entitled to have regard to by way of analogy to establish new categories of exemption.
The risks of developing AI: Infringing Third Party Intellectual Property Rights
As we explained in Part 1 of our AI Bootcamp, the latest AI tools, such as ChatGPT rely on large language models the development of which require vast text datasets to train them.
In relation to ChatGPT, for example, OpenAI states that it was trained on “(1) information that is publicly available on the internet, (2) information that we license from third parties, and (3) information that our users or human trainers provide”. The mere fact that material is available online, however, does not guarantee its accuracy or completeness, nor does it mean that it is not protected by intellectual property law or data protection law.
UK and EU law automatically ascribe copyright to protected works. Furthermore, many websites will assert copyright in respect of their content in their terms and conditions and through the inclusion of a copyright notice and restrict the use to which material can be put in contract. While there are some online data repositories, such as Project Gutenberg a library of over 70,000 free ebooks which are no longer protected by copyright, even this site states that it is for “human users only”.
Under English law, which reflects European law, the Copyright Designs and Patents Act 1998 protects copyright works from copying, adaptations, and the issuing of infringing copies to the public, and affords the right to be identified as the author, but provides defences to infringing acts, including fair dealing with a work made available to the public for the purposes of criticism or review where there is sufficient acknowledgement unless this would be impossible for reasons of practicality or otherwise (s.30(1) CDPA 1988) and copying for computational analysis for non-commercial research where there is sufficient acknowledgement unless this would be impossible for reasons of practicality or otherwise (s.29A CDPA 1988).
While Article 4 of the European Copyright Directive (EU) 2019/790 requires Member States to provide an exception to certain rights for the purposes of text and data mining, this only applies in respect of works to which there is lawful access. US copyright law similarly protects the right to reproduce copyright works and to create derivate works, albeit with a more generous ‘fair use’ exception under §107, Title 17, United States Code, including for ‘transformative’ uses.
The design of large language models like ChatGPT is such that while they are able to predict a response based on their training data, they are not designed to attribute their response to a particular source(s) or indicate what specific data was relied upon to generate the prediction.
The developers of a number of large language models or LLMs, including OpenAI and Meta, are now being sued in the US District Court of the Northern District of California in class action claims brought by lead claimants including authors and the comedian Sarah Silverman for copyright infringement over their copyright material being ingested into and used to train the LLMs without authorisation and alleging that the LLMs themselves and their output infringing derivative works of the copyright works that are alleged to have been unlawfully copied. The claims seek injunctive relief, including changes to the design of the operation of the LLMs, together with damages. OpenAI has applied to dismiss all claims except that of direct copyright infringement but has indicated its intention to defend that claim, considering that limitations and exceptions such as that for fair use will apply to its transformative actions.
Following a consultation in 2021 on ‘Artificial Intelligence and IP: copyright and patents’, the UK government initially confirmed in its response that it would introduce a copyright and database right exception for text and data mining albeit that it gave the assurance that “Rights holders will still have safeguards to protect their content, including a requirement for lawful access”. The government’s position subsequently shifted however, with the government confirming that it did not intend to proceed with these proposals. Further to the Vallance Review’s recommendation that the government “announce a clear policy position on the relationship between intellectual property law and generative AI to provide confidence to innovators and investors” and that the Intellectual Property Office “provide clearer guidance to AI firms as to their legal responsibilities, to coordinate intelligence on systematic copyright infringement by AI, and to encourage development of AI tools to help enforce IP rights”, the Commons’ Culture, Media and Sport Committee has called on the government to “support the continuance of a strong copyright regime in the UK and be clear that licences are required to use copyrighted content in AI” in its Connected tech: AI and creative technology report, decrying the introduction of an exemption for text and data mining.
Article 28(b)(4)(c) of the current text of the EU’s draft AI Act would oblige the provider of a foundation AI model to “document and make publicly available a sufficiently detailed summary of the use of training data protected under copyright law”, thus potentially surfacing copyright infringements and exposing AI developers to infringement claims, and its provisions are explicitly without prejudice to the EU’s copyright law.
The risks of developing AI: Protecting Intellectual Property Rights
On the flip side, LLM developers may wish to secure copyright protection in respect of the outputs of LLMs.
In the UK, s.9(3) Copyright Designs and Patents Act 1988 makes clear that an ‘author’ in relation to a computer-generated original literary, dramatic, musical or artistic work is the individual or company “by whom the arrangements necessary for the creation of the work are undertaken”, thereby potentially extending protection to AI generated works.
The question arises, however, as to whether an AI generated work can be considered to meet the requirement of originality. In the UK, this would traditionally be considered to require the demonstration of the author’s “skill, labour and judgement” in the creation of the work. In the EU, the CJEU has held that in order for a work to be considered original, this requires the author to express “his creative ability in an original manner by making free and creative choices … and thus stamps his ‘personal touch’”, and UK law was harmonised in accordance with this although the courts could depart from this following Brexit.
Following a consultation in 2021 on ‘Artificial Intelligence and IP: copyright and patents’, the UK government confirmed in its response that it did not intend to change the law to remove the prospect of copyright protection for AI generated works.
In the US, in February 2023 the US Copyright Office cancelled the copyright registration for a book in so far as images in the work were developed using the AI tool Midjourney but had failed to disclaim copyright in respect of such images. Despite this, the US Copyright Office subsequently issued guidance indicating it would be open to granting copyright protection having regard to “what matters is the extent to which the human had creative control over the work's expression and “actually formed” the traditional elements of authorship”. Most recently, in a challenge brought by Stephen Thaler, the Chief Executive of neural network AI company Imagination engines, a US Federal judge upheld a decision of the US Copyright Office refusing copyright protection to an artwork entitled ‘A Recent Entrance to Paradise’ which was created by The Creativity Machine AI system in a ruling which emphasised that “Human authorship is a bedrock requirement”. The US Copyright Office is now conducting an inquiry on copyright law and policy issues raised by artificial intelligence.
The risks of developing AI: Competition
AI developers face upstream market competition issues, as well as being in a position to distort competition themselves.
The US’ Nvidia Corporation is reported to control around 70% of the AI chip market for graphics processors or GPUs, but simply cannot keep up with demand - which it reported was up 171% compared to the previous year. AWS, Microsoft Azure and Google Cloud have their own proprietary GPUs and are also reporting demand outstripping supply. This means that compute capacity is reduced and AI developers face having their usage throttled, and have to consider optimising their models to reduce GPU reliance or to schedule GPU intensive processing to secure affordable compute, otherwise they face being priced out by the biggest players. In 2022, regulators blocked Nvidia’s proposed acquisition of UK chip developer Arm Limited, with the FTC arguing that “the combined firm would have the means and incentive to stifle innovative next-generation technologies”. Nevertheless, concern has been raised that Nvidia itself effectively competes with its customers for compute, and customers are reliant on it to be allocated GPUs.
The Competition and Market Authority has postulated in its ‘AI Foundation Models: Initial Report’ that “a concerning market outcome could emerge if access to inputs is restricted so only a handful of firms can create and maintain the leading models. As a result, those remaining firms would develop positions of strength which could give them the ability and incentive to only provide models on a closed-source basis and make them subject to unfair prices and terms”. The CMA expressed concern that AI developers who have access to proprietary data due to their existing role in digital markets, such as search engine operators, could lead to competition being stifled. Downstream integration of new AI products into existing products and services, or the bundling of products and services, has the capacity to entrench the dominant market positions of incumbents and support them in establishing new ones.
Find out more about our responsible and ethical artificial intelligence (AI) services.
Access Handley Gill Limited’s proprietary AI CAN (Artificial Intelligence Capability & Needs) Tool, to understand and monitor your organisation’s level of maturity on its AI journey.
Download our Helping Hand checklist on using AI responsibly, safely and ethically.
Check out our dedicated AI Resources page.
Follow our dedicated AI Regulation Twitter / X account.