Pyramid Flow arrived in October 2024 with an unusually honest pitch: not "the best video model," but the most ownable one. The weights are MIT-licensed and sitting on Hugging Face, the training method is published, and — the part that actually matters for most people — it runs on a single consumer GPU instead of a data-center rack.
That last point is the whole story. Plenty of "open" video models are open the way a supercar is purchasable: technically yes, practically no. Pyramid Flow's team trained it in roughly 20,700 A100-hours and tuned it so that, with CPU offloading, it fits in under 12 GB of VRAM (under 8 GB if you're patient). If you have a recent gaming card, you can generate 768p, 24fps, up-to-10-second clips on your own machine — no API key, no per-second meter running.

So how good is it? This is a first look — I haven't run my own batch on local hardware yet, so I'm not putting a number on it — but the official samples tell an honest story. Wide, slow, atmospheric shots (the snowy Tokyo street above, fireworks, churning water) look genuinely good. Ask for people moving, fast action, or fine detail, and the 2024-ness shows: limbs warp, faces smear, and ten seconds is about as far as coherence stretches.
Who it's for
If you want to learn how video diffusion actually works, build a pipeline you control, or generate atmospheric B-roll for free, Pyramid Flow is one of the best on-ramps that exists. The license is clean, the hardware bar is genuinely low, and the community has already wired it into ComfyUI.
Who should skip it
If you need broadcast-ready, character-consistent clips today, you'll be happier renting a hosted model like Kling — you give up ownership, but you get a real jump in polish. And if you won't touch Python or ComfyUI, start with the Hugging Face Space demo before committing to a local install.
Eighteen months on, Pyramid Flow is no longer the frontier — newer open models like HunyuanVideo and Mochi push quality higher. But for "a real video model I can keep, on hardware I already own," it remains a remarkably sensible answer.
