
2023年ACM SIGPLAN 2023 International Conference on Compiler Construction (CC)会议将于2023年2月25日至26日在加拿大蒙特利尔举行。CC会议是编译器建构领域的重要会议,主要关注系统的分析、转换或执行等编译器工作。
本次CC会议将与CGO、HPCA、PPoPP三个学术会议共同举行,是疫情后首次线下会议。华为多伦多异构编译器实验室作为CC 2023的赞助方,将派出专家团队前往会议现场,针对矢量与并行、调度与调优、代码生成和合成、优化等编译器和编程工具前沿技术,进行交流和互动。
本文将介绍如下重点议程前瞻:
CC Session: Scheduling & Tuning
论文:
Efficiently Learning Locality Optimizations by Decomposing Transformation Domains
作者:
Tharindu Patabandi (University of Utah), Mary Hall (University of Utah)
摘要:
Optimizing compilers for efficient machine learning are more important than ever due to the rising ubiquity of the application domain in numerous facets of life. Predictive model-guided compiler optimization is sometimes used to derive sequences of loop transformations that optimize the performance of the machine learning computations. However, training-data generation for these models often requires the traversal of prohibitively expensive schedule spaces and executing code variants corresponding to different schedule options. The size of these search spaces can quickly explode when predicting the combined effects of multiple loop transformations. This paper characterizes a learning strategy for deriving transformation sequences called Composed Singular Prediction (CSP).
Instead of a monolithic cost model that predicts the profitability of a given transformation sequence, CSP exploits a collection of cost models, each trained on a particular loop transformation domain. In a case study, a domain-specific compiler deploys the learned models to predict loop tiling and loop permutation schedules to perform data locality optimization of Conv2d kernels. The system achieves performance improvements up to 4.0x against Intel oneDNN while saving ~105.3x in training data collection time compared to exhaustive exploration of the design space.
领军教授:

Mary Hall, University of Utah
Mary Hall is a professor in the Computer Science department at University of Utah. She directs the Compiler Technology to Optimize Performance (CTOP) research group. Her research interests cover Automatic performance tuning, Model-guided empirical optimization, Interprocedural analysis and optimization, parallelizing compilers, programming support for optimization and parallelization, PIM-based architectures, compiling to FPGA-based systems.
论文:
A Deep Learning Model for Loop Interchange
作者:
Lina Mezdour (NYU Abu Dhabi), ESI, Khadidja Kadem (NYU Abu Dhabi, ESI), Massinissa Merouani (NYU Abu Dhabi), Amina Selma Haichour (ESI), Saman Amarasinghe (Massachusetts Institute of Technology), Riyadh Baghdadi (NYU Abu Dhabi)
摘要:
Loop interchange is an important code optimization that improves data locality and extracts parallelism. While previous research in compilers has tried to automate the selection of which loops to interchange, existing methods have two limitations. First, they use less precise machine models. This is mainly because developing a model to predict whether to interchange two loops is challenging since such a prediction depends on many factors. Second, existing methods scale quadratically with the number of loop levels of a given loop nest. This is mainly because they use a model to evaluate all the possible loop interchanges and pick the best one which is time and resources consuming. In this paper, we propose a novel deep learning model for loop interchange that addresses the previous two limitations. It takes a code representation as input and predicts the best pair of loops to interchange. Compared to state-of-the-art, it has two main differences: first, it is data-driven, and therefore it is more precise. Second, it requires constant time to predict the best loop interchange. This is in contrast to state-of-the-art deep learning models that are used to evaluate all the loop pairs and then pick the best one. The proposed model is the first deep learning model that requires constant time to predict the best loops to interchange. The model is implemented and evaluated in the Tiramisu compiler, a state-of-the-art polyhedral compiler. We evaluated the proposed model on a benchmark of Tiramisu programs and showed an accuracy of 76.66% for 1-shot and 94% for 2-shots. Experiments show that our model outperforms the cost model currently used by the Tiramisu compiler by 6.66% in terms of 1-shot accuracy, and 14% with 2-shots accuracy, while at the same time reducing the total execution time needed for predicting the best pair of loops to interchange.
领军教授:

Saman Amarasinghe, MIT
Saman P. Amarasinghe is a Professor in the Department of Electrical Engineering and Computer Science at Massachusetts Institute of Technology and a member of its Computer Science and Artificial Intelligence Laboratory (CSAIL) where he leads the Commit compiler group. He is a world leader in the field of high-performance domain-specific languages. Prof. Amarasinghe's group developed the Halide, TACO, Simit, StreamIt, StreamJIT, PetaBricks, MILK, Cimple, and GraphIt domain-specific languages and compilers, all of which combine language design and sophisticated compilation techniques to deliver unprecedented performance for targeted application domains such as image processing, stream computations, and graph analytics. Dr. Amarasinghe also pioneered the application of machine learning for compiler optimizations, from Meta optimization in 2003 to OpenTuner extendable autotuner today.
CC Session: Code Generation & Synthesis
论文:
Matching linear algebra and tensor code to specialized hardware accelerators
作者:
Pablo Antonio Martínez (University of Murcia), Jackson Woodruff (University of Edinburgh), Jordi Armengol-Estapé (University of Edinburgh), Gregorio Bernabé (University of Murcia), José Manuel García (University of Murcia), Michael F. P. O'Boyle (University of Edinburgh)
摘要:
Dedicated tensor accelerators demonstrate the importance of linear algebra in modern applications. Such accelerators have the potential for impressive performance gains, but require programmers to rewrite code using vendor APIs - a barrier to wider scale adoption. Recent work overcomes this by matching and replacing patterns within code, but such approaches are fragile and fail to cope with the diversity of real-world codes.
We develop ATC, a compiler that uses program synthesis to map regions of code to specific APIs. The mapping space that ATC explores is combinatorially large, requiring the development of program classification, dynamic analysis, variable constraint generation and lexical distance matching techniques to make it tractable.
We apply ATC to real-world tensor and linear algebra codes and evaluate them against four state-of-the-art approaches. We accelerate between 2.6x and 7x more programs, leading to over an order of magnitude performance improvement.
领军教授:

Michael F. P. O'Boyle, University of Edinburgh
Michael F. P. O'Boyle is a professor at University of Edinburgh, Personal Chair in Computer Science and member of Institute for Computing Systems Architecture. His research interests include heterogeneous code discovery and optimization, Neural Machine Translation and Neural Synthesis, Deep Neural Network system stack, Software Defined Hardware, auto-parallelizing compilers, GPGPU multi-core platforms, Machine learning based optimization, compiler/architecture co-design space exploration, and very high level programming languages.
CC Session: Optimizations
论文:
A Hotspot-Driven Semi-Automated Competitive Analysis Framework for Identifying Compiler Key Optimizations
作者:
Wenlong Mu (East China Normal University), Yilei Zhang (East China Normal University), Bo Huang (East China Normal University), Jianmei Guo (East China Normal University), Shiqiang Cui (Hangzhou Hongjun Microelectronics Technology Co., Ltd.)
摘要:
High-performance compilers play an important role in improving the run-time performance of a program, and it is hard and time-consuming to identify the key optimizations implemented in a high-performance compiler with traditional program analysis. In this paper, we propose a hotspot-driven semi-automated competitive analysis framework for identifying key optimizations through comparing the hotspot codes generated by any two different compilers. Our framework is platform-agnostic and works well on both AArch64 and X64 platforms, which automates the stages of hotspot detection and dynamic binary instrumentation only for selected hotspots. With the instrumented instruction characterization information, the framework users can analyze the binary code within a much smaller scope to explore practical optimizations implemented in any of the compilers compared. To demonstrate the effectiveness and practicality, we conduct experiments on SPECspeed 2017 Integer benchmarks (CINT2017) and their binaries generated by open-source GCC compiler versus proprietary Huawei BiSheng and Intel ICC compilers on AArch64 and X64 platforms respectively. Empirical studies show that our methods can identify several significant optimizations that have been implemented by proprietary compilers and as well can be implemented in open-source compilers. To our industry partner, the identified key optimizations shed great light on optimizing their GCC-based product compiler, which delivers 20.83% improvement for SPECrate 2017 Integer on AArch64 platform.
论文:
LAGrad: Statically Optimized Differentiable Programming in MLIR
作者:
Mai Jacob Peng (McGill University), Christophe Dubach (McGill University)
摘要:
Automatic differentiation (AD) is a central algorithm in deep learning and the emerging field of differentiable programming. However, the performance of AD remains a significant bottleneck in these fields. Training large models requires repeatedly evaluating gradients via AD potentially millions of times. Additionally, the most common form of AD incurs an asymptotically large memory cost relative to the original function being differentiated.
This paper introduces LAGrad, a reverse-mode, source-to-source AD system that leverages high-level information in MLIR to produce efficient differentiated code. LAGrad employs a collection of novel static optimizations that benefit from the semantics of high-level MLIR dialects to exploit the sparsity and structured control flow of generated code.
Using these, LAGrad is able to achieve speedups of up to $2.8\times$ and use $35\times$ less memory relative to state of the art AD systems on real-world machine learning and computer vision benchmarks.
领军教授:

Christophe Dubach, McGill University
Christophe Dubach is an Associate Professor jointly appointed (as of January 1, 2020) in the department of Electrical and Computer Engineering (ECE) and the school of Computer Science (CS) at McGill University. Prior to that, he was a Reader (Associate Professor) at the University of Edinburgh in the Institute for Computing Systems Architecture. He received a PhD in Informatics from the University of Edinburgh in 2009 and holds a MSc degree in Computer Science from EPFL. In 2010, he spent one year as a visiting researcher at the IBM Watson Research Center (USA) working on the LiquidMetal project. His current interests include data-parallel language design and implementation, high-level code generation and optimization for parallel hardware (e.g. GPU, FPGAs), architecture design space exploration, and the use of machine-learning techniques applied to all these topics.
信息来源:
https://conf.researchr.org/home/CC-2023
后续毕昇编译公众号将持续关注CC 2023会议技术动向,为大家带来精彩的技术分享!

2023年ACM SIGPLAN 2023 International Conference on Compiler Construction (CC)会议将于2023年2月25日至26日在加拿大蒙特利尔举行。CC会议是编译器建构领域的重要会议,主要关注系统的分析、转换或执行等编译器工作。
本次CC会议将与CGO、HPCA、PPoPP三个学术会议共同举行,是疫情后首次线下会议。华为多伦多异构编译器实验室作为CC 2023的赞助方,将派出专家团队前往会议现场,针对矢量与并行、调度与调优、代码生成和合成、优化等编译器和编程工具前沿技术,进行交流和互动。
本文将介绍如下重点议程前瞻:
CC Session: Scheduling & Tuning
论文:
Efficiently Learning Locality Optimizations by Decomposing Transformation Domains
作者:
Tharindu Patabandi (University of Utah), Mary Hall (University of Utah)
摘要:
Optimizing compilers for efficient machine learning are more important than ever due to the rising ubiquity of the application domain in numerous facets of life. Predictive model-guided compiler optimization is sometimes used to derive sequences of loop transformations that optimize the performance of the machine learning computations. However, training-data generation for these models often requires the traversal of prohibitively expensive schedule spaces and executing code variants corresponding to different schedule options. The size of these search spaces can quickly explode when predicting the combined effects of multiple loop transformations. This paper characterizes a learning strategy for deriving transformation sequences called Composed Singular Prediction (CSP).
Instead of a monolithic cost model that predicts the profitability of a given transformation sequence, CSP exploits a collection of cost models, each trained on a particular loop transformation domain. In a case study, a domain-specific compiler deploys the learned models to predict loop tiling and loop permutation schedules to perform data locality optimization of Conv2d kernels. The system achieves performance improvements up to 4.0x against Intel oneDNN while saving ~105.3x in training data collection time compared to exhaustive exploration of the design space.
领军教授:
Mary Hall, University of Utah
Mary Hall is a professor in the Computer Science department at University of Utah. She directs the Compiler Technology to Optimize Performance (CTOP) research group. Her research interests cover Automatic performance tuning, Model-guided empirical optimization, Interprocedural analysis and optimization, parallelizing compilers, programming support for optimization and parallelization, PIM-based architectures, compiling to FPGA-based systems.
论文:
A Deep Learning Model for Loop Interchange
作者:
Lina Mezdour (NYU Abu Dhabi), ESI, Khadidja Kadem (NYU Abu Dhabi, ESI), Massinissa Merouani (NYU Abu Dhabi), Amina Selma Haichour (ESI), Saman Amarasinghe (Massachusetts Institute of Technology), Riyadh Baghdadi (NYU Abu Dhabi)
摘要:
Loop interchange is an important code optimization that improves data locality and extracts parallelism. While previous research in compilers has tried to automate the selection of which loops to interchange, existing methods have two limitations. First, they use less precise machine models. This is mainly because developing a model to predict whether to interchange two loops is challenging since such a prediction depends on many factors. Second, existing methods scale quadratically with the number of loop levels of a given loop nest. This is mainly because they use a model to evaluate all the possible loop interchanges and pick the best one which is time and resources consuming. In this paper, we propose a novel deep learning model for loop interchange that addresses the previous two limitations. It takes a code representation as input and predicts the best pair of loops to interchange. Compared to state-of-the-art, it has two main differences: first, it is data-driven, and therefore it is more precise. Second, it requires constant time to predict the best loop interchange. This is in contrast to state-of-the-art deep learning models that are used to evaluate all the loop pairs and then pick the best one. The proposed model is the first deep learning model that requires constant time to predict the best loops to interchange. The model is implemented and evaluated in the Tiramisu compiler, a state-of-the-art polyhedral compiler. We evaluated the proposed model on a benchmark of Tiramisu programs and showed an accuracy of 76.66% for 1-shot and 94% for 2-shots. Experiments show that our model outperforms the cost model currently used by the Tiramisu compiler by 6.66% in terms of 1-shot accuracy, and 14% with 2-shots accuracy, while at the same time reducing the total execution time needed for predicting the best pair of loops to interchange.
领军教授:
Saman Amarasinghe, MIT
Saman P. Amarasinghe is a Professor in the Department of Electrical Engineering and Computer Science at Massachusetts Institute of Technology and a member of its Computer Science and Artificial Intelligence Laboratory (CSAIL) where he leads the Commit compiler group. He is a world leader in the field of high-performance domain-specific languages. Prof. Amarasinghe's group developed the Halide, TACO, Simit, StreamIt, StreamJIT, PetaBricks, MILK, Cimple, and GraphIt domain-specific languages and compilers, all of which combine language design and sophisticated compilation techniques to deliver unprecedented performance for targeted application domains such as image processing, stream computations, and graph analytics. Dr. Amarasinghe also pioneered the application of machine learning for compiler optimizations, from Meta optimization in 2003 to OpenTuner extendable autotuner today.
CC Session: Code Generation & Synthesis
论文:
Matching linear algebra and tensor code to specialized hardware accelerators
作者:
Pablo Antonio Martínez (University of Murcia), Jackson Woodruff (University of Edinburgh), Jordi Armengol-Estapé (University of Edinburgh), Gregorio Bernabé (University of Murcia), José Manuel García (University of Murcia), Michael F. P. O'Boyle (University of Edinburgh)
摘要:
Dedicated tensor accelerators demonstrate the importance of linear algebra in modern applications. Such accelerators have the potential for impressive performance gains, but require programmers to rewrite code using vendor APIs - a barrier to wider scale adoption. Recent work overcomes this by matching and replacing patterns within code, but such approaches are fragile and fail to cope with the diversity of real-world codes.
We develop ATC, a compiler that uses program synthesis to map regions of code to specific APIs. The mapping space that ATC explores is combinatorially large, requiring the development of program classification, dynamic analysis, variable constraint generation and lexical distance matching techniques to make it tractable.
We apply ATC to real-world tensor and linear algebra codes and evaluate them against four state-of-the-art approaches. We accelerate between 2.6x and 7x more programs, leading to over an order of magnitude performance improvement.
领军教授:
Michael F. P. O'Boyle, University of Edinburgh
Michael F. P. O'Boyle is a professor at University of Edinburgh, Personal Chair in Computer Science and member of Institute for Computing Systems Architecture. His research interests include heterogeneous code discovery and optimization, Neural Machine Translation and Neural Synthesis, Deep Neural Network system stack, Software Defined Hardware, auto-parallelizing compilers, GPGPU multi-core platforms, Machine learning based optimization, compiler/architecture co-design space exploration, and very high level programming languages.
CC Session: Optimizations
论文:
A Hotspot-Driven Semi-Automated Competitive Analysis Framework for Identifying Compiler Key Optimizations
作者:
Wenlong Mu (East China Normal University), Yilei Zhang (East China Normal University), Bo Huang (East China Normal University), Jianmei Guo (East China Normal University), Shiqiang Cui (Hangzhou Hongjun Microelectronics Technology Co., Ltd.)
摘要:
High-performance compilers play an important role in improving the run-time performance of a program, and it is hard and time-consuming to identify the key optimizations implemented in a high-performance compiler with traditional program analysis. In this paper, we propose a hotspot-driven semi-automated competitive analysis framework for identifying key optimizations through comparing the hotspot codes generated by any two different compilers. Our framework is platform-agnostic and works well on both AArch64 and X64 platforms, which automates the stages of hotspot detection and dynamic binary instrumentation only for selected hotspots. With the instrumented instruction characterization information, the framework users can analyze the binary code within a much smaller scope to explore practical optimizations implemented in any of the compilers compared. To demonstrate the effectiveness and practicality, we conduct experiments on SPECspeed 2017 Integer benchmarks (CINT2017) and their binaries generated by open-source GCC compiler versus proprietary Huawei BiSheng and Intel ICC compilers on AArch64 and X64 platforms respectively. Empirical studies show that our methods can identify several significant optimizations that have been implemented by proprietary compilers and as well can be implemented in open-source compilers. To our industry partner, the identified key optimizations shed great light on optimizing their GCC-based product compiler, which delivers 20.83% improvement for SPECrate 2017 Integer on AArch64 platform.
论文:
LAGrad: Statically Optimized Differentiable Programming in MLIR
作者:
Mai Jacob Peng (McGill University), Christophe Dubach (McGill University)
摘要:
Automatic differentiation (AD) is a central algorithm in deep learning and the emerging field of differentiable programming. However, the performance of AD remains a significant bottleneck in these fields. Training large models requires repeatedly evaluating gradients via AD potentially millions of times. Additionally, the most common form of AD incurs an asymptotically large memory cost relative to the original function being differentiated.
This paper introduces LAGrad, a reverse-mode, source-to-source AD system that leverages high-level information in MLIR to produce efficient differentiated code. LAGrad employs a collection of novel static optimizations that benefit from the semantics of high-level MLIR dialects to exploit the sparsity and structured control flow of generated code.
Using these, LAGrad is able to achieve speedups of up to $2.8\times$ and use $35\times$ less memory relative to state of the art AD systems on real-world machine learning and computer vision benchmarks.
领军教授:
Christophe Dubach, McGill University
Christophe Dubach is an Associate Professor jointly appointed (as of January 1, 2020) in the department of Electrical and Computer Engineering (ECE) and the school of Computer Science (CS) at McGill University. Prior to that, he was a Reader (Associate Professor) at the University of Edinburgh in the Institute for Computing Systems Architecture. He received a PhD in Informatics from the University of Edinburgh in 2009 and holds a MSc degree in Computer Science from EPFL. In 2010, he spent one year as a visiting researcher at the IBM Watson Research Center (USA) working on the LiquidMetal project. His current interests include data-parallel language design and implementation, high-level code generation and optimization for parallel hardware (e.g. GPU, FPGAs), architecture design space exploration, and the use of machine-learning techniques applied to all these topics.
信息来源:
https://conf.researchr.org/home/CC-2023
后续毕昇编译公众号将持续关注CC 2023会议技术动向,为大家带来精彩的技术分享!