Tensor Language Model Generates Optimized Tensor Schedules Without Search
A new generative compiler framework called Tensor Language Model uses a GPT-2 architecture to produce optimized tensor program schedules directly from learned patterns. The approach eliminates exhaustive search or reinforcement learning steps during compilation while maintaining competitive runtime performance on models such as ResNet-50, BERT, GPT-2, and LLAMA-7B.
nature.comCom describes a Tensor Language Model (TLM) that frames tensor program scheduling as a language modeling task. The model is built on a GPT-2 architecture and pre-trained on millions of tensor programs represented as compact sequences that include operator graphs, hardware metadata, and reconfiguration choices.
TLM generates optimized schedules through direct inference rather than runtime search or reinforcement learning. 25 times faster than heuristic approaches such as Roller, while delivering similar execution performance.
Deep learning workloads have grown in complexity, requiring tensor programs that run efficiently across CPUs, GPUs, and specialized accelerators. Traditional vendor libraries provide hand-optimized kernels for common operations but are limited in generality and costly to maintain for new hardware or non-standard operators.
Automatic tensor compilers such as TVM, Halide, Ansor, and MetaSchedule address these limits by searching large spaces of possible code rewrites. Search-based systems can reach high performance but incur long compilation times, sometimes hours or days for large models, while heuristic systems reduce compilation time at the risk of missing optimal schedules.
TLM converts the scheduling problem into autoregressive sequence generation, allowing the model to produce context-sensitive decisions conditioned on operator structure and hardware properties. The paper reports that the resulting schedules achieve a balance between compilation speed and runtime efficiency across tested workloads.
Experiments covered ResNet-50, BERT, GPT-2, and LLAMA-7B. Results showed that TLM maintains execution performance comparable to existing methods while substantially reducing the time required to produce a working schedule. The framework is described as hardware-agnostic and reproducible, offering a generative alternative to current search or heuristic paradigms in deep learning compilation.
Key Facts
Potential Impact
- 01
Developers may reduce time spent waiting for tensor program compilation in deep learning pipelines.
- 02
Hardware-agnostic compilation could lower engineering effort required when deploying models on new accelerators.
- 03
Research groups may explore additional generative approaches for other compiler optimization tasks.
Transparency Panel
Related Stories
SemaforAnthropic Raises $65 Billion at $965 Billion Valuation
Anthropic completed a $65 billion funding round at a $965 billion valuation. The round follows earlier growth that exceeded internal forecasts and a separate agreement to lease computing capacity.
thesouthafrican.comSouth African Researchers Develop Quantum and AI Tools for Cybersecurity
Scientists and startup companies in South Africa are applying quantum communication and AI-powered tools to address rising global cyber threats. The work focuses on strengthening data protection methods.
France 24EU Discusses Readiness for Artificial Intelligence Changes
A France 24 program examined whether European Union policies can address the effects of artificial intelligence. The discussion covered potential impacts across daily life and economic sectors.