
Microsoft is set to introduce a safety ranking for artificial intelligence models sold to its cloud customers, the Financial Times reported.
The move is aimed at building trust as it offers AI solutions from companies such as OpenAI and Elon Musk’s xAI.
Sarah Bird, Microsoft’s head of Responsible AI, announced the addition of a “safety” category to its “model leaderboard”, a feature rolled out for developers to rank AI models from various providers, including DeepSeek and Mistral.
The leaderboard, accessible to thousands of clients on the Azure Foundry developer platform, currently ranks AI models based on quality, cost, and throughput.
The new safety ranking is designed to help customers understand AI models’ capabilities and make informed purchasing decisions.
Bird emphasised that the safety metric will ensure customers can “just directly shop and understand” the models.

US Tariffs are shifting - will you react or anticipate?
Don’t let policy changes catch you off guard. Stay proactive with real-time data and expert analysis.
By GlobalDataMicrosoft’s new safety metric will be hinged on its ToxiGen benchmark, which measures implicit hate speech, and the Center for AI Safety’s Weapons of Mass Destruction Proxy benchmark, which assesses potential misuse of AI models for harmful purposes.
This move will address customer concerns about data and privacy risks posed by AI models, especially when used autonomously.
Positioning itself as an “agnostic platform” for generative AI, Microsoft is partnering with xAI and Anthropic to sell their models, and has invested around $14bn in OpenAI, reported the publication.
Recently, Microsoft began offering xAI’s Grok models under the same terms as OpenAI.
xAI has since implemented a new monitoring policy. Bird highlighted the importance of internal review and customer use of benchmarks in assessing AI models.
While there is no global standard for AI safety testing, the EU’s AI Act, set to take effect later this year, will require companies to conduct safety tests. Some companies are reportedly spending less on risk mitigation, though they claim to maintain safety standards.
Bird stressed the need for significant investment in evaluation to ensure high-quality models.
In April 2025, Microsoft launched an “AI red teaming agent” to automate stress testing of computer programmes by simulating attacks to identify vulnerabilities.
Bird explained that users can specify risks and attack difficulty, and the system will conduct tests.
Earlier this in June 2025, Microsoft announced additional job cuts, impacting over 300 employees, as part of its ongoing efforts to manage costs.