OpenAI has catalyzed a Generative Artificial Intelligence (AI) revolution, but its ChatGPT chatbot requires extensive datasets.
A class-action suit has been filed, alleging that the data utilized violates the copyrights and privacy of millions of individuals.
The majority of Generative AI models depend on Large Language Models, which involves the gathering of information from multiple sources.
This newly initiated litigation could potentially establish a precedent, having far-reaching implications for the realm of AI.
OpenAI Inc. Has Been Accused Of Illegally Acquiring Personal Data And Engaging In Unauthorized Commercial Use, According To A Recent Lawsuit
A class action lawsuit was filed against ChatGPT creator OpenAI in San Francisco federal court this week, alleging infringement of the copyrights and privacy rights of millions of users.
The complaint stated that OpenAI’s Machine Learning technology was trained on texts “copied without consent, credit, or compensation,” according to Ryan Clarkson, the managing partner of the Clarkson law firm that filed the case.
He further explained that “the firm wants to represent real people whose information was stolen and commercially misappropriated to create this very powerful technology.
All of that information is being taken at scale when it was never intended to be utilized by a large language model.” The law firm seeks to place safeguards on how AI algorithms are trained, as well as ensure people are compensated for their work if used.
Can Generative Artificial Intelligence Systems That Source Content From The Internet Be Held Legally Responsible For Compensation?
Authors Paul Tremblay and Mona Awad, both based in Massachusetts, have stated that ChatGPT produces “very accurate” summaries of their works, leading them to believe that their books are likely included in ChatGPT’s database of scraped material.
OpenAI has asserted its defense of “Fair Use” of copyrighted work; however, Katherine Gardner—an intellectual property lawyer at Gunderson Dettmer—may bolster this defense by maintaining that Large Language Models used to train platforms like ChatGPT, Mid-journey, Dall-E, Google’s Bard and other Generative AI platforms largely rely on publicly-available information.
Gardner noted: “When you place content on a social media site or any site, you’re typically granting a very broad license to the site to be able to utilize your content in any way.”
This lawsuit could uncover multiple facets concerning the datasets and how they are accessed and employed for training Generative AI platforms.