Provider profile: GroqGroq builds custom LPU (Language Processing Unit) silicon for AI inference. Their cloud API delivers industry-leading speed — sub-100ms time-to-first-token and 300+ tokens/second throughput — running popular open-source models like Llama 4, Llama 3.3, and Qwen3.

← All providers

Groq

US4 models

LPU-powered inference — the fastest AI inference in the world

Groq builds custom LPU (Language Processing Unit) silicon for AI inference. Their cloud API delivers industry-leading speed — sub-100ms time-to-first-token and 300+ tokens/second throughput — running popular open-source models like Llama 4, Llama 3.3, and Qwen3.

  • Ultra-fast inference
  • Open source models
  • Tool use
  • Low latency
Ultra-fast inferenceOpen source modelsTool useLow latency
Customer Support