Is the net closing for GenAI developers?

A new report published by the UK’s Information Commissioner’s Office – the ICO – a few weeks back saw some pretty stark reading for AI companies:

We found a serious lack of transparency, especially in relation to training data within the industry, which our consultation responses show is negatively impacting the public’s trust in AI…

Generative AI developers, it’s time to tell people how you’re using their information.
https://ico.org.uk/about-the-ico/media-centre/news-and-blogs/2024/12/generative-ai-developers-it-s-time-to-tell-people-how-you-re-using-their-information/

More than that, people feel that it’s time that the law was properly tightened up around what data can be scraped to feed new GenAI training. Currently, there is very little effective regulation, which is causing major problems in the creative industries in particular. That has to change.

Much of it hinges on this idea of ‘necessity’:

In terms of the need to use web-scraped data for training a generative AI model (the ‘necessity test’), the following points arose in the responses:

Generative AI developers and the wider technology sector stated that, because of the quantity of data required, training generative AI models cannot happen without the use of web-scraped data. Similarly, they argued that large datasets with a wide variety of data help ensure the effective performance of models and avoid biases or inaccuracies.

On the other hand, many respondents, especially from the creative industries, argued that there were alternative ways to collect data to train generative AI models, such as licensing datasets directly from publishers. Therefore, they argued that using web-scraped data couldn’t meet the necessity test.
https://ico.org.uk/media/about-the-ico/what-we-do/our-work-on-artificial-intelligence/response-to-the-consultation-series-on-generative-ai-0-0.pdf

The argument from the developers seems somewhat circular: we’ve got a shonky product that’s not great yet because we need to steal more data to make it better! So let us keep stealing! Then we will be able to create something that really will kill all creative jobs…

The ICO has been stronger than it has been in its response:

We expect generative AI developers and deployers to substantially improve how they fulfil their transparency obligations towards people, in a way that is meaningful rather than a token gesture.

I think that’s a move in the right direction. There should be specific ‘opt-in’ for allowing data to be scraped, and there needs to be remuneration for those whose work has been taken and is being used for profit.

It is nothing less than a battle for our creative souls. And it is one we have to win. As I note at the end of God-like:

Artificially Intelligent systems are going to work miracles in our midst. They will do wonderful things for us, and their invention will stand as one of the most remarkable acts in all human history, on a par with the discovery of fire. But their unleashing on our world also risks it being engulfed by them and utterly consumed in the heat of their flames. However we might quibble over the semantics of their ‘intelligence’, their digital forms are going to be able to adapt to attach themselves to our lives at a speed that our physical, evolutionary bodies will struggle to cope with. This is why we must make ourselves ready.

We are about to face a force of empire more aggressive, more powerful and more agile than anything in history. Backed by insane amounts of capital, it will demand the extraction of returns for its masters and will do so by attempting to occupy every area of our lives. It is already in almost every pocket, in every living room and every workplace. It is ready. Are we?
God-like – a 500-Year History of Artificial Intelligence

Go grab a copy here.