Reference · stack overview
The six layers of an inference deployment
The inference stack has six layers. Each one has component choices, and every choice cascades. Swap an accelerator and the power envelope changes. Swap the serving engine and the throughput coefficient changes. Swap the fabric and the maximum useful replica count changes. Tap any tile for the full explanation and the catalog options.
After this overview, two worked examples (enterprise and departmental) show how these layers compose into real deployments.