A PYMNTS Company

AI Models Fall Short of Key European Standards in New Compliance Test

 |  October 16, 2024

Several leading artificial intelligence models are struggling to meet stringent European Union regulations in areas such as cybersecurity resilience and the prevention of discriminatory outputs, according to data reviewed by Reuters. The results come from a newly developed tool designed to test compliance with the EU’s upcoming Artificial Intelligence Act.

The European Union has long debated regulations for AI systems, but the public release of OpenAI’s ChatGPT in 2022 accelerated these discussions. The chatbot’s rapid popularity and the surrounding concerns over potential existential risks prompted lawmakers to draw up specific rules aimed at “general-purpose” AI (GPAI) systems. In response, a new AI evaluation framework has been developed, offering insights into the performance of top-tier models against the incoming legal standards.

New AI Compliance Tool Highlights Concerns

A tool designed by Swiss startup LatticeFlow AI, in collaboration with research institutes ETH Zurich and Bulgaria’s INSAIT, tested AI models from companies like OpenAI, Meta, Alibaba and others across numerous categories aligned with the EU’s AI Act. This tool has been praised by European officials as a valuable resource for measuring AI models’ readiness for compliance.

According to Reuters, the AI models were assessed in areas like technical robustness, safety and other critical factors. The models received scores ranging from 0 to 1, with a higher score indicating greater compliance. Most models tested, including those from OpenAI, Meta and Alibaba, scored an average of 0.75 or above. However, the “Large Language Model (LLM) Checker” also revealed significant shortcomings in areas that will need improvement if these companies hope to avoid regulatory penalties.

Companies that fail to meet the requirements of the AI Act could face fines of up to 35 million euros ($38 million), or 7% of their global annual turnover. Although the EU is still defining how rules around generative AI, such as ChatGPT, will be enforced, this tool provides early indicators of areas where compliance may be lacking.

Key Shortcomings: Bias and Cybersecurity

One of the most critical areas highlighted by the LLM Checker is the issue of discriminatory output. Many generative AI models have been found to reflect human biases related to gender, race and other factors. In this category, OpenAI’s GPT-3.5 Turbo model received a score of 0.46, while Alibaba’s Qwen1.5 72B Chat model fared even worse, scoring just 0.37.

Cybersecurity vulnerabilities were also spotlighted. LatticeFlow tested for “prompt hijacking,” a form of attack in which hackers use deceptive prompts to extract sensitive information. Meta’s Llama 2 13B Chat model scored 0.42 in this category, while French startup Mistral’s 8x7B Instruct model scored 0.38, according to Reuters.

Anthropic’s Claude 3 Opus, backed by Google, performed the best overall, receiving an average score of 0.89, making it the top performer across most categories.

A Step Toward Regulatory Compliance

The LLM Checker was developed to align with the AI Act’s evolving requirements and is expected to play a larger role as enforcement measures are introduced over the next two years. LatticeFlow has made the tool freely available, allowing developers to test their models’ compliance online.

Petar Tsankov, CEO and co-founder of LatticeFlow, told Reuters that while the results were generally positive, they also serve as a roadmap for companies to make necessary improvements. “The EU is still working out all the compliance benchmarks, but we can already see some gaps in the models,” he said. Tsankov emphasized that with more focus on optimizing for compliance, AI developers can better prepare their models to meet the stringent standards of the AI Act.

Although some companies declined to comment, including Meta and Mistral, and others like OpenAI, Anthropic and Alibaba did not respond to requests for comment, the European Commission has been following the tool’s development closely. A spokesperson for the Commission stated that the platform represents “a first step” in translating the EU AI Act into technical compliance requirements, signaling that more detailed enforcement measures are on the way.

This new test provides tech companies with valuable insights into the challenges ahead as they work to meet the EU’s AI regulations, which are expected to be fully implemented by 2025.

Source: Reuters