This comes as no surprise since AI models rely on vast datasets to analyze and understand various texts and languages.
For example, ChatGPT is trained on a massive text dataset from the public domain, while DarKBERT gathers information from the dark web, including hacker forums and criminal sources.
With the insatiable appetite for data in AI, Google’s updated policy makes it clear that any online content could now be utilized for training purposes.
The changes in Google’s privacy policies are outlined on its website, emphasizing the use of publicly available information to improve services and develop new AI products and features such as Google Translate, Bard, and Cloud AI capabilities.
The updated policy expands on previous mentions of “language models” to now include “AI models” and incorporates Cloud AI and Google Bard into the list of products and features utilizing user data.
Web scraping plays a crucial role in supplying AI models like ChatGPT with data from the internet.
By extracting valuable information from online sources, these models can provide sentiment analysis and other insights to users.
However, web scraping can also raise concerns as it may violate website terms of service that prohibit such practices.
In response to the widespread data scraping and manipulation, Elon Musk recently implemented restrictions on Twitter accounts, limiting the number of readings per day, while Twitter itself limited browsing access for users without accounts.
These measures aim to address the challenges posed by excessive data scraping and protect user privacy.