Top News

Google has introduced Gemini 3.1 Flash-Lite, which it says is its “fastest and most cost-efficient Gemini 3 series model.”

“Starting today, 3.1 Flash-Lite is rolling out in a preview to developers via the Gemini API in Google AI Studio and for enterprises via Vertex AI,” the company said in a blog post.

Priced at $0.25 per million input tokens and $1.50 per million output tokens, Flash-Lite is significantly cheaper than flagship models such as Gemini 3.1 Pro ($2.00 per million input tokens and $1.50 per million output tokens).

Google claims it “outperforms 2.5 Flash with a 2.5X faster Time to First Answer Token and 45% increase in output speed, according to the Artificial Analysis benchmark, while maintaining similar or better quality.”

What Gemini 3.1 Flash-Lite can do

The model comes with ‘thinking levels’ in AI Studio and Vertex AI, giving developers the ability to control how much the model “thinks” for each task — important for managing high-frequency workloads.

“3.1 Flash-Lite can tackle tasks at scale, like high-volume translation and content moderation, where cost is a priority. And it can also handle more complex workloads where more in-depth reasoning is needed, like generating user interfaces and dashboards, creating simulations or following instructions,” the blog post said.

Early-access developers and companies, including Latitude, Cartwheel, and Whering, are already testing Flash-Lite for large-scale problem solving. Early testers highlighted 3.1 Flash-Lite’s efficiency and reasoning capabilities, saying it can “handle complex inputs with the precision of a larger-tier model, plus follow instructions and maintain adherence,” according to the blog post

Benchmarks and performance

Gemini 3.1 Flash-Lite got an Elo score of 1432 on the Arena.ai Leaderboard, outperforming other models in its tier for reasoning and multimodal understanding. It achieved 86.9% on GPQA Diamond and 76.8% on MMMU Pro, surpassing even larger Gemini models from previous generations, such as 2.5 Flash.

The model combines speed, cost efficiency, and flexible reasoning, making it suitable for both high-volume routine tasks and more complex AI workloads.