Time to regenerate

“Generative AI developers would be advised to re-generate their privacy policies to ensure that they are compliant with the UK GDPR, in particular by ensuring that they identify the sources of training data in sufficient detail to enable data subjects to understand if their personal data may have been obtained through web scraping. ”

— Handley Gill Limited

The Information Commissioner’s Office has published its fourth call for evidence in its Generative AI consultation series on ‘engineering individual rights into generative AI models’. The call for evidence is open until 10 June 2024.

The draft guidance proceeds on the basis that web scraping personal data for the purpose of training the large language models underlying generative AI is lawful (although this was subject to a previous consultation to which the ICO’s response has not yet been published), notwithstanding the recognition that the use of web scraped data for this purpose is “likely to be beyond people’s reasonable expectations at the time they provided data to a website” and that “In some cases, people will not even be aware of information that has been posted or leaked about them online”.

While that may augur well for AI developers, the ICO’s proposed stance on what is necessary to comply with the transparency obligations under Article 14 in privacy notices/privacy policies will be less welcome, both from a data protection perspective and as greater transparency around sources of training material could highlight potential copyright infringement. We previously responded to the Information Commissioner’s call for evidence on the lawful basis for web scraping to train generative AI models.

By way of example:

OpenAI’s privacy policy provides a link to further information about training states that “OpenAI’s large language models, including the models that power ChatGPT, are developed using three primary sources of information: (1) information that is publicly available on the internet, (2) information that we license from third parties, and (3) information that our users or our human trainers provide” and that they “only use publicly available information that is freely and openly available on the Internet – for example, we do not seek information behind paywalls or from the “dark web.”
Google’s generic privacy policy, which applies to its Gemini generative AI tool (formerly known as Bard), states “In some circumstances, Google also collects information about you from publicly accessible sources” and explains that means that it will “collect information that's publicly available online or from other public sources to help train Google's AI models and build products and features like Google Translate, Gemini Apps and Cloud AI capabilities”.

The ICO’s draft guidance, however, asserts that, in order to meet the obligation under Article 14(5)(c) to “take appropriate measures to protect the data subject’s rights and freedoms and legitimate interests, including making the information publicly available” in circumstances where notifying affected data subjects individually would “prove impossible or would involve a disproportionate effort”, “Vague statements around data sources (eg just ‘publicly accessible information’) are unlikely to help individuals understand whether their personal data may be part of the training dataset or who the initial controller is likely to be”.

Should your organisation require support in reviewing and revising its privacy notice/privacy policy, please contact us.

Organisations developing or deploying AI who wish to comply with existing regulatory obligations or to ensure that they are at the forefront of using AI safely, responsibly and ethically, should conduct an AI risk assessment, establish an AI governance programme and appoint an AI Responsible Officer. If Handley Gill can support you with any of these, please contact us.

Find out more about our data protection and data privacy services.

Find out more about our responsible and ethical artificial intelligence (AI) services.

Access Handley Gill Limited’s proprietary AI CAN (Artificial Intelligence Capability & Needs) Tool, to understand and monitor your organisation’s level of maturity on its AI journey.

Download our Helping Hand checklist on using AI responsibly, safely and ethically.

Check out our dedicated AI Resources page.

Follow our dedicated AI Regulation Twitter / X account.