Digital code and Chinese flag representing cybersecurity in China.
Anton Petrus | Moment | Getty Images
AI companies in China are undergoing a government review of their large language models, aimed at ensuring they “embody core socialist values,” according to a report by the Financial Times.
The review is being carried out by the Cyberspace Administration of China (CAC), the government’s chief internet regulator, and will cover players across the spectrum, from tech giants like ByteDance and Alibaba to small startups.
AI models will be tested by local CAC officials for their responses to a variety of questions, many related to politically sensitive topics and Chinese President Xi Jinping, FT said. The model’s training data and safety processes will also be reviewed.
An anonymous source from a Hangzhou-based AI company who spoke with the FT said that their model didn’t pass the first round of testing for unclear reasons. They only passed the second time after months of “guessing and adjusting,” they said in the report.
The CAC’s latest efforts illustrate how Beijing has walked a tightrope between catching up with the U.S. on GenAI while also keeping a close eye on the technology’s development, ensuring that AI-generated content adheres to its strict internet censorship policies.
The country was amongst the first to finalize rules governing generative artificial intelligence last year, including the requirement that AI services adhere to “core values of socialism” and not generate “illegal” content.
Meeting the censorship policies requires “security filtering,” and has been made complicated as Chinese LLMs are still trained on a significant amount of English language content, multiple engineers and industry insiders told the FT.
According to the report, filtering is done by removing “problematic information” from AI model training data and then creating a database of words and phrases that are sensitive.
The regulations have reportedly led the country’s most popular chatbots to often decline to answer questions on sensitive topics such as the 1989 Tiananmen Square protests.
However, during the CAC testing, there are limits on the number of questions LLMs can decline outright, so models need to be able to generate “politically correct answers” to sensitive questions.
An AI expert working on a chatbot in China told the FT that it is difficult to prevent LLMs from generating all potentially harmful content, so they instead build an additional layer on the system that replaces problematic answers in real-time.
Regulations, as well as U.S. sanctions that have restricted access to chips used to train LLMs, have made it hard for Chinese firms to launch their own ChatGPT-like services. China, however, dominates the global race in generative AI patents.
Read the full report from the FT