Several leading artificial intelligence models are struggling to meet stringent European Union regulations in areas such as cybersecurity resilience and the prevention of discriminatory outputs, according to data reviewed by Reuters. The results come from a newly developed tool designed to test compliance with the EU’s upcoming Artificial Intelligence Act.
The European Union has long debated regulations for AI systems, but the public release of OpenAI’s ChatGPT in 2022 accelerated these discussions. The chatbot’s rapid popularity and the surrounding concerns over potential existential risks prompted lawmakers to draw up specific rules aimed at “general-purpose” AI (GPAI) systems. In response, a new AI evaluation framework has been developed, offering insights into the performance of top-tier models against the incoming legal standards.
New AI Compliance Tool Highlights Concerns
A tool designed by Swiss startup LatticeFlow AI, in collaboration with research institutes ETH Zurich and Bulgaria’s INSAIT, tested AI models from companies like OpenAI, Meta, Alibaba and others across numerous categories aligned with the EU’s AI Act. This tool has been praised by European officials as a valuable resource for measuring AI models’ readiness for compliance.
According to Reuters, the AI models were assessed in areas like technical robustness, safety and other critical factors. The models received scores ranging from 0 to 1, with a higher score indicating greater compliance. Most models tested, including those from OpenAI, Meta and Alibaba, scored an average of 0.75 or above. However, the “Large Language Model (LLM) Checker” also revealed significant shortcomings in areas that will need improvement if these companies hope to avoid regulatory penalties.
Companies that fail to meet the requirements of the AI Act could face fines of up to 35 million euros ($38 million), or 7% of their global annual turnover. Although the EU is still defining how rules around generative AI, such as ChatGPT, will be enforced, this tool provides early indicators of areas where compliance may be lacking.
Key Shortcomings: Bias and Cybersecurity
One of the most critical areas highlighted by the LLM Checker is the issue of discriminatory output. Many generative AI models have been found to reflect human biases related to gender, race and other factors. In this category, OpenAI’s GPT-3.5 Turbo model received a score of 0.46, while Alibaba’s Qwen1.5 72B Chat model fared even worse, scoring just 0.37.
Cybersecurity vulnerabilities were also spotlighted. LatticeFlow tested for “prompt hijacking,” a form of attack in which hackers use deceptive prompts to extract sensitive information. Meta’s Llama 2 13B Chat model scored 0.42 in this category, while French startup Mistral’s 8x7B Instruct model scored 0.38, according to Reuters.
Anthropic’s Claude 3 Opus, backed by Google, performed the best overall, receiving an average score of 0.89, making it the top performer across most categories.
A Step Toward Regulatory Compliance
The LLM Checker was developed to align with the AI Act’s evolving requirements and is expected to play a larger role as enforcement measures are introduced over the next two years. LatticeFlow has made the tool freely available, allowing developers to test their models’ compliance online.
Petar Tsankov, CEO and co-founder of LatticeFlow, told Reuters that while the results were generally positive, they also serve as a roadmap for companies to make necessary improvements. “The EU is still working out all the compliance benchmarks, but we can already see some gaps in the models,” he said. Tsankov emphasized that with more focus on optimizing for compliance, AI developers can better prepare their models to meet the stringent standards of the AI Act.
Although some companies declined to comment, including Meta and Mistral, and others like OpenAI, Anthropic and Alibaba did not respond to requests for comment, the European Commission has been following the tool’s development closely. A spokesperson for the Commission stated that the platform represents “a first step” in translating the EU AI Act into technical compliance requirements, signaling that more detailed enforcement measures are on the way.
This new test provides tech companies with valuable insights into the challenges ahead as they work to meet the EU’s AI regulations, which are expected to be fully implemented by 2025.
Source: Reuters
Featured News
Big Tech Braces for Potential Changes Under a Second Trump Presidency
Nov 6, 2024 by
CPI
Trump’s Potential Shift in US Antitrust Policy Raises Questions for Big Tech and Mergers
Nov 6, 2024 by
CPI
EU Set to Fine Apple in First Major Enforcement of Digital Markets Act
Nov 5, 2024 by
CPI
Six Indicted in Federal Bid-Rigging Schemes Involving Government IT Contracts
Nov 5, 2024 by
CPI
Ireland Secures First €3 Billion Apple Tax Payment, Boosting Exchequer Funds
Nov 5, 2024 by
CPI
Antitrust Mix by CPI
Antitrust Chronicle® – Remedies Revisited
Oct 30, 2024 by
CPI
Fixing the Fix: Updating Policy on Merger Remedies
Oct 30, 2024 by
CPI
Methodology Matters: The 2017 FTC Remedies Study
Oct 30, 2024 by
CPI
U.S. v. AT&T: Five Lessons for Vertical Merger Enforcement
Oct 30, 2024 by
CPI
The Search for Antitrust Remedies in Tech Leads Beyond Antitrust
Oct 30, 2024 by
CPI