The widespread application of advanced AI language models like OpenAI’s ChatGPT, based on the GPT-4 architecture, has transformed fields like virtual personal assistants and content generation.
While ChatGPT’s capabilities are impressive, its accuracy and reliability are constantly being questioned when it comes to answering queries in different languages. What fuels this proclamation?
The objective of the test was to craft news articles espousing prevalent China-related misinformation narratives.
NewsGuard, a fact-checking organization, recently reported that ChatGPT is more likely to generate false information in Chinese dialects than when responding to English queries.
The report claims that during an April 2023 evaluation, NewsGuard engaged ChatGPT-3.5 with seven different prompts in English, simplified Chinese, and traditional Chinese.
In the English-language endeavor, ChatGPT tactfully refrained from producing erroneous assertions for six of the seven prompts, even when persistently nudged with leading inquiries.
In stark contrast, the chatbot generated the fallacious claims all seven times in both simplified and traditional Chinese.
Data and Training – The Backbone of AI-Language Models
According to experts, the primary reason behind ChatGPT’s uneven performance across languages is the data and training process.
Language models are constructed using massive text datasets from diverse sources like books, articles, and websites.
The quality and quantity of data available for different languages directly impact the AI model’s performance.
The more data available for a language, the better the model can learn its intricacies and provide accurate and reliable responses. Unfortunately, not all languages have equal representation in the available data.
Maria Toneva, an AI and NLP researcher
It’s also being said that while these models possess multilingual capabilities, the languages do not inherently influence each other.
They coexist as separate yet connected portions of the dataset, and the model currently lacks a mechanism to evaluate the disparities in phrases or predictions across these distinct areas.
Given this, languages with less online presence, less diverse data sources, and those with complex grammar and syntax are more likely to produce less accurate or misleading information.
In some cases, the AI model may generate outputs that seem to “lie” due to a lack of understanding or inability to grasp the nuances of the language.
Another contributing factor to ChatGPT’s language-based disparity may be the training data’s cultural nuances and inherent biases.
Since the AI model learns from existing text, it may inadvertently absorb and reproduce cultural biases and stereotypes in the data.
Consequently, the AI system may sometimes provide biased or culturally insensitive responses in certain languages.
Addressing the Challenges
Addressing the disparity in ChatGPT’s performance requires a multi-faceted approach.
Researchers and developers are actively working to improve data quality and expand the representation of underrepresented languages.
One such effort involves the collection of more diverse, high-quality data sources that accurately reflect linguistic variations and cultural nuances.
It’s not merely about more propaganda in one language versus another but also about subtle biases or beliefs
Additionally, developers are focusing on addressing the biases present in the training data.
Techniques like fairness-aware machine learning and the implementation of external human feedback loops can help mitigate bias and improve the overall performance of AI systems across languages.
Collaboration between academia, industry, and communities is also essential to raise awareness of the challenges faced by AI language models and to share knowledge, resources, and best practices in developing inclusive AI systems.
This report serves as a reminder that when ChatGPT or a comparable model provides an answer, it is essential to question the source of that answer and the trustworthiness of the data upon which it is based instead of solely relying on the model’s response.