Provider profile: Groq — Groq builds custom LPU (Language Processing Unit) silicon for AI inference. Their cloud API delivers industry-leading speed — sub-100ms time-to-first-token and 300+ tokens/second throughput — running popular open-source models like Llama 4, Llama 3.3, and Qwen3.
← All providers
Groq
US4 modelsLPU-powered inference — the fastest AI inference in the world
Groq builds custom LPU (Language Processing Unit) silicon for AI inference. Their cloud API delivers industry-leading speed — sub-100ms time-to-first-token and 300+ tokens/second throughput — running popular open-source models like Llama 4, Llama 3.3, and Qwen3.
- ✓Ultra-fast inference
- ✓Open source models
- ✓Tool use
- ✓Low latency
Ultra-fast inferenceOpen source modelsTool useLow latency