Stress-testing AI inference profitability
Author built a simulator to stress-test AI inference economics, showing profitability needs multiple assumptions to justify current capex.
- AI inference can become highly profitable if paid adoption scales and model architectures optimize for lower active parameters.
- Current capex cycle is hard to justify unless utilization, GPU amortization, and token pricing align perfectly, risking commodity pricing.
I built a small simulator to stress-test the unit economics of AI inference.
The question I wanted to isolate is simple: under what assumptions does frontier AI inference become profitable enough to justify the current capex cycle?
My current read is that AI inference can become very profitable, but not just because inference gets cheaper. The profitable case needs several assumptions to line up at the same time:
- paid adoption scales quickly
- GPU capacity does not outrun demand by too much
- deployed models keep moving toward lower active-parameter serving architectures
- throughput/batching improves materially
- GPU amortization is long enough and cost of capital is not punishing
- realized token revenue does not collapse toward commodity pricing
The biggest swing factors in the model are not electricity. They are utilization, active model size, GPU/data center amortization, and blended revenue per token.
That makes the investment question less “will AI be useful?” and more “who can monetize inference at a margin high enough to support the capex?”
App:
https://msg32jebwg56opz2avykhcai-profitability-simulator.streamlit.app/
Would be interested in pushback from an investing perspective, especially if the model misses a major cost/revenue category or overstates how hard it is to get to profitable inference.
Tokens economics /= AI inference ROI. ROI will be measured on the basis of who can perform end user business objectives at the lowest possible cost. This means if a token can be avoided or done cheaper on device, that’s how ROI will be created. Winners will be orchestrators that help enterprises avoid expensive token generation or data center inference.

r/investing