Google launches Gemini 3.1 Flash Lite as fastest and cheapest Gemini 3 model

By Estefano Gomez · Published March 3, 2026 · 2 min read · Source: Crypto Briefing

Google launches Gemini 3.1 Flash Lite as fastest and cheapest Gemini 3 model

Gemini 3.1 Flash Lite delivers faster output speeds, improved reasoning benchmarks and low cost API pricing for large scale applications.

Add us on Google by Estefano Gomez Mar. 3, 2026

Google today introduced Gemini 3.1 Flash Lite, a new artificial intelligence model designed to deliver faster responses and lower operating costs within the company’s Gemini 3 model family.

The model is rolling out in preview to developers through the Gemini API in Google AI Studio and to enterprise customers through Vertex AI.

Google described Gemini 3.1 Flash Lite as the fastest and most cost-efficient model in the Gemini 3 series, built specifically for high-volume workloads where latency and cost are critical.

Pricing for the model starts at $0.25 per million input tokens and $1.50 per million output tokens, positioning it as one of the lowest cost options in Google’s current AI model lineup.

According to benchmarks cited by Google, Gemini 3.1 Flash Lite delivers a 2.5 times faster time to first answer token compared with Gemini 2.5 Flash and produces output 45 percent faster while maintaining similar or better quality.

Performance benchmarks also place the model competitively against other lightweight AI models. Gemini 3.1 Flash Lite achieved an Elo score of 1432 on the Arena AI leaderboard and recorded 86.9 percent on the GPQA Diamond reasoning benchmark and 76.8 percent on the MMMU Pro multimodal benchmark.

Google said the model is designed to handle high-frequency developer tasks such as translation, content moderation and large-scale instruction following, while still supporting more complex workloads like interface generation, simulation creation and structured data tasks.

The release also introduces adjustable thinking levels within AI Studio and Vertex AI, allowing developers to control how much reasoning the model performs depending on the complexity of a task. This flexibility is intended to help teams balance cost, speed and accuracy when deploying AI applications at scale.

Disclosure: This article was edited by Estefano Gomez. For more information on how we create and review content, see our Editorial Policy.

This article was originally published on Crypto Briefing and is republished here under RSS syndication for informational purposes. All rights and intellectual property remain with the original author. If you are the author and wish to have this article removed, please contact us at [email protected].