Skip to content

Commit 940b668

Browse files
nikiccuviper
authored andcommitted
[NewPM] Fix MergeFunctions scheduling
MergeFunctions (as well as HotColdSplitting an IROutliner) are incorrectly scheduled under the new pass manager. The code makes it look like they run towards the end of the module optimization pipeline (as they should), while in reality the run at the start. This is because the OptimizePM populated around them is only scheduled later. I'm fixing this by moving these three passes until after OptimizePM to avoid splitting the function pass pipeline. It doesn't seem important to me that some of the function passes run after these late module passes. Differential Revision: https://reviews.llvm.org/D115098 (cherry picked from commit ae7f468)
1 parent 4fb9523 commit 940b668

File tree

2 files changed

+19
-26
lines changed

2 files changed

+19
-26
lines changed

llvm/lib/Passes/PassBuilder.cpp

Lines changed: 17 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1420,23 +1420,6 @@ PassBuilder::buildModuleOptimizationPipeline(OptimizationLevel Level,
14201420

14211421
addVectorPasses(Level, OptimizePM, /* IsFullLTO */ false);
14221422

1423-
// Split out cold code. Splitting is done late to avoid hiding context from
1424-
// other optimizations and inadvertently regressing performance. The tradeoff
1425-
// is that this has a higher code size cost than splitting early.
1426-
if (EnableHotColdSplit && !LTOPreLink)
1427-
MPM.addPass(HotColdSplittingPass());
1428-
1429-
// Search the code for similar regions of code. If enough similar regions can
1430-
// be found where extracting the regions into their own function will decrease
1431-
// the size of the program, we extract the regions, a deduplicate the
1432-
// structurally similar regions.
1433-
if (EnableIROutliner)
1434-
MPM.addPass(IROutlinerPass());
1435-
1436-
// Merge functions if requested.
1437-
if (PTO.MergeFunctions)
1438-
MPM.addPass(MergeFunctionsPass());
1439-
14401423
// LoopSink pass sinks instructions hoisted by LICM, which serves as a
14411424
// canonicalization pass that enables other optimizations. As a result,
14421425
// LoopSink pass needs to be a very late IR pass to avoid undoing LICM
@@ -1463,6 +1446,23 @@ PassBuilder::buildModuleOptimizationPipeline(OptimizationLevel Level,
14631446
for (auto &C : OptimizerLastEPCallbacks)
14641447
C(MPM, Level);
14651448

1449+
// Split out cold code. Splitting is done late to avoid hiding context from
1450+
// other optimizations and inadvertently regressing performance. The tradeoff
1451+
// is that this has a higher code size cost than splitting early.
1452+
if (EnableHotColdSplit && !LTOPreLink)
1453+
MPM.addPass(HotColdSplittingPass());
1454+
1455+
// Search the code for similar regions of code. If enough similar regions can
1456+
// be found where extracting the regions into their own function will decrease
1457+
// the size of the program, we extract the regions, a deduplicate the
1458+
// structurally similar regions.
1459+
if (EnableIROutliner)
1460+
MPM.addPass(IROutlinerPass());
1461+
1462+
// Merge functions if requested.
1463+
if (PTO.MergeFunctions)
1464+
MPM.addPass(MergeFunctionsPass());
1465+
14661466
if (PTO.CallGraphProfile)
14671467
MPM.addPass(CGProfilePass());
14681468

llvm/test/Transforms/PhaseOrdering/X86/merge-functions.ll

Lines changed: 2 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -90,15 +90,8 @@ bb3: ; preds = %bb1, %bb2
9090

9191
define i1 @test2(i32 %c) {
9292
; CHECK-LABEL: @test2(
93-
; CHECK-NEXT: entry:
94-
; CHECK-NEXT: [[SWITCH_TABLEIDX:%.*]] = add i32 [[C:%.*]], -100
95-
; CHECK-NEXT: [[TMP0:%.*]] = icmp ult i32 [[SWITCH_TABLEIDX]], 20
96-
; CHECK-NEXT: [[SWITCH_CAST:%.*]] = trunc i32 [[SWITCH_TABLEIDX]] to i20
97-
; CHECK-NEXT: [[SWITCH_DOWNSHIFT:%.*]] = lshr i20 -490991, [[SWITCH_CAST]]
98-
; CHECK-NEXT: [[TMP1:%.*]] = and i20 [[SWITCH_DOWNSHIFT]], 1
99-
; CHECK-NEXT: [[SWITCH_MASKED:%.*]] = icmp ne i20 [[TMP1]], 0
100-
; CHECK-NEXT: [[I_0:%.*]] = select i1 [[TMP0]], i1 [[SWITCH_MASKED]], i1 false
101-
; CHECK-NEXT: ret i1 [[I_0]]
93+
; CHECK-NEXT: [[TMP2:%.*]] = tail call i1 @test1(i32 [[TMP0:%.*]]) #[[ATTR0:[0-9]+]]
94+
; CHECK-NEXT: ret i1 [[TMP2]]
10295
;
10396
entry:
10497
%i = alloca i8, align 1

0 commit comments

Comments
 (0)