SPEED AND SCALE, FROM PROTOTYPE TO PRODUCTION

GroqCloud

The AI inference platform built for developers. Fast responses, scalable performance, and costs you can plan for. Available in public, private, or co-cloud instances.

Start for Free

Built for speed and precision

Groq runs the models you care about.

Take advantage of fast AI inference performance, powered by our purpose-built LPU, for leading GenAI models across text, audio, and vision modalities.

Support for LLMs, STT, TTS, and image-to-text models
Optimized for popular models
Industry standard frameworks and integrations

Start Building

Build now and scale as your needs grow

GroqCloud Plans

Free
Great for anyone to get started with our APIs.
- Build and Test on Groq
- Community Support
- Zero-data Retention Available
Price
$0
Start for Free
Developer
Great for developers and startups to scale up and pay as you go
Everything on the Starter Plan, plus:
- Higher Token Limits
- Chat Support
- Flex Service Tier
- Batch Processing
- Spend Limits
- Prompt Caching
Price
Pay Per Token
Get Started
Enterprise
Great for businesses who require custom solutions for large-scale needs
Everything on the Developer Plan, plus:
- Custom Models
- Regional Endpoint Selection
- Performance Tier
- Scalable Capacity
- Dedicated Support
- LoRA Fine-Tunes
Price
Contact Us
Get Started

Consistent Performance, Predictable Spend

Lower latency means less compute time, no batching required. Record-setting performance. Usage-based.

Try GroqCloud Now

What inference provider are you using or considering using to access models?

Source: Artificial Analysis AI Adoption Survey 2025

Designed for inference. Not adapted for it.

Established in 2016 for inference, Groq is literally built different. It’s the only custom-built inference chip that fuels developers with the performance they need at a cost that doesn’t hold them back.

Learn More About the LPU

On-Prem Optionality

GroqRack

Available by request, the LPU powering GroqCloud can be deployed on-prem with GroqRack. Ideal for regulated industries or air-gapped environments. Seamless transition between cloud and local deployment.

Inquire Now