A company called Taalas is making custom silicon for running LLMs, and is reporting 10X faster and 10X cheaper generation costs per token than running on standard Nvidia chips: https://taalas.com/the-path-to-ubiquitous-ai/. At the moment it's only running Llama 3.1 8B, so you don't get the raw power of the latest Claudes or Codices, but see - even if there are no smarter models coming out, there's a 10X speedup in current models available for the taking. 10X less power consumption is also (finally) good news for the environmental impact of LLMs.
You can play with their model at https://chatjimmy.ai/.