Date: Fri, 20 Apr 2018 22:46:22 +0000 (UTC) From: Brooks Davis <brooks@FreeBSD.org> To: ports-committers@freebsd.org, svn-ports-all@freebsd.org, svn-ports-head@freebsd.org Subject: svn commit: r467849 - in head/devel/llvm60: . files files/clang Message-ID: <201804202246.w3KMkM6J006704@repo.freebsd.org>
next in thread | raw e-mail | index | archive | help
Author: brooks Date: Fri Apr 20 22:46:22 2018 New Revision: 467849 URL: https://svnweb.freebsd.org/changeset/ports/467849 Log: Merge r332833 from FreeBSD HEAD. This should ensure clang does not use pushf/popf sequences to save and restore flags, avoiding problems with unrelated flags (such as the interrupt flag) being restored unexpectedly. PR: 225330 Added: head/devel/llvm60/files/clang/patch-fsvn-r332833-clang (contents, props changed) head/devel/llvm60/files/patch-fsvn-r332833 (contents, props changed) Modified: head/devel/llvm60/Makefile Modified: head/devel/llvm60/Makefile ============================================================================== --- head/devel/llvm60/Makefile Fri Apr 20 21:50:41 2018 (r467848) +++ head/devel/llvm60/Makefile Fri Apr 20 22:46:22 2018 (r467849) @@ -2,7 +2,7 @@ PORTNAME= llvm DISTVERSION= 6.0.0 -PORTREVISION= 1 +PORTREVISION= 2 CATEGORIES= devel lang MASTER_SITES= http://${PRE_}releases.llvm.org/${LLVM_RELEASE}/${RCDIR} PKGNAMESUFFIX= ${LLVM_SUFFIX} Added: head/devel/llvm60/files/clang/patch-fsvn-r332833-clang ============================================================================== --- /dev/null 00:00:00 1970 (empty, because file is newly added) +++ head/devel/llvm60/files/clang/patch-fsvn-r332833-clang Fri Apr 20 22:46:22 2018 (r467849) @@ -0,0 +1,258 @@ +commit f13397cb22ae77e9b18e29273e2920bd63c17ef1 +Author: dim <dim@FreeBSD.org> +Date: Fri Apr 20 18:20:55 2018 +0000 + + Recommit r332501, with an additional upstream fix for "Cannot lower + EFLAGS copy that lives out of a basic block!" errors on i386. + + Pull in r325446 from upstream clang trunk (by me): + + [X86] Add 'sahf' CPU feature to frontend + + Summary: + Make clang accept `-msahf` (and `-mno-sahf`) flags to activate the + `+sahf` feature for the backend, for bug 36028 (Incorrect use of + pushf/popf enables/disables interrupts on amd64 kernels). This was + originally submitted in bug 36037 by Jonathan Looney + <jonlooney@gmail.com>. + + As described there, GCC also uses `-msahf` for this feature, and the + backend already recognizes the `+sahf` feature. All that is needed is + to teach clang to pass this on to the backend. + + The mapping of feature support onto CPUs may not be complete; rather, + it was chosen to match LLVM's idea of which CPUs support this feature + (see lib/Target/X86/X86.td). + + I also updated the affected test case (CodeGen/attr-target-x86.c) to + match the emitted output. + + Reviewers: craig.topper, coby, efriedma, rsmith + + Reviewed By: craig.topper + + Subscribers: emaste, cfe-commits + + Differential Revision: https://reviews.llvm.org/D43394 + + Pull in r328944 from upstream llvm trunk (by Chandler Carruth): + + [x86] Expose more of the condition conversion routines in the public + API for X86's instruction information. I've now got a second patch + under review that needs these same APIs. This bit is nicely + orthogonal and obvious, so landing it. NFC. + + Pull in r329414 from upstream llvm trunk (by Craig Topper): + + [X86] Merge itineraries for CLC, CMC, and STC. + + These are very simple flag setting instructions that appear to only + be a single uop. They're unlikely to need this separation. + + Pull in r329657 from upstream llvm trunk (by Chandler Carruth): + + [x86] Introduce a pass to begin more systematically fixing PR36028 + and similar issues. + + The key idea is to lower COPY nodes populating EFLAGS by scanning the + uses of EFLAGS and introducing dedicated code to preserve the + necessary state in a GPR. In the vast majority of cases, these uses + are cmovCC and jCC instructions. For such cases, we can very easily + save and restore the necessary information by simply inserting a + setCC into a GPR where the original flags are live, and then testing + that GPR directly to feed the cmov or conditional branch. + + However, things are a bit more tricky if arithmetic is using the + flags. This patch handles the vast majority of cases that seem to + come up in practice: adc, adcx, adox, rcl, and rcr; all without + taking advantage of partially preserved EFLAGS as LLVM doesn't + currently model that at all. + + There are a large number of operations that techinaclly observe + EFLAGS currently but shouldn't in this case -- they typically are + using DF. Currently, they will not be handled by this approach. + However, I have never seen this issue come up in practice. It is + already pretty rare to have these patterns come up in practical code + with LLVM. I had to resort to writing MIR tests to cover most of the + logic in this pass already. I suspect even with its current amount + of coverage of arithmetic users of EFLAGS it will be a significant + improvement over the current use of pushf/popf. It will also produce + substantially faster code in most of the common patterns. + + This patch also removes all of the old lowering for EFLAGS copies, + and the hack that forced us to use a frame pointer when EFLAGS copies + were found anywhere in a function so that the dynamic stack + adjustment wasn't a problem. None of this is needed as we now lower + all of these copies directly in MI and without require stack + adjustments. + + Lots of thanks to Reid who came up with several aspects of this + approach, and Craig who helped me work out a couple of things + tripping me up while working on this. + + Differential Revision: https://reviews.llvm.org/D45146 + + Pull in r329673 from upstream llvm trunk (by Chandler Carruth): + + [x86] Model the direction flag (DF) separately from the rest of + EFLAGS. + + This cleans up a number of operations that only claimed te use EFLAGS + due to using DF. But no instructions which we think of us setting + EFLAGS actually modify DF (other than things like popf) and so this + needlessly creates uses of EFLAGS that aren't really there. + + In fact, DF is so restrictive it is pretty easy to model. Only STD, + CLD, and the whole-flags writes (WRFLAGS and POPF) need to model + this. + + I've also somewhat cleaned up some of the flag management instruction + definitions to be in the correct .td file. + + Adding this extra register also uncovered a failure to use the + correct datatype to hold X86 registers, and I've corrected that as + necessary here. + + Differential Revision: https://reviews.llvm.org/D45154 + + Pull in r330264 from upstream llvm trunk (by Chandler Carruth): + + [x86] Fix PR37100 by teaching the EFLAGS copy lowering to rewrite + uses across basic blocks in the limited cases where it is very + straight forward to do so. + + This will also be useful for other places where we do some limited + EFLAGS propagation across CFG edges and need to handle copy rewrites + afterward. I think this is rapidly approaching the maximum we can and + should be doing here. Everything else begins to require either heroic + analysis to prove how to do PHI insertion manually, or somehow + managing arbitrary PHI-ing of EFLAGS with general PHI insertion. + Neither of these seem at all promising so if those cases come up, + we'll almost certainly need to rewrite the parts of LLVM that produce + those patterns. + + We do now require dominator trees in order to reliably diagnose + patterns that would require PHI nodes. This is a bit unfortunate but + it seems better than the completely mysterious crash we would get + otherwise. + + Differential Revision: https://reviews.llvm.org/D45673 + + Together, these should ensure clang does not use pushf/popf sequences to + save and restore flags, avoiding problems with unrelated flags (such as + the interrupt flag) being restored unexpectedly. + + Requested by: jtl + PR: 225330 + MFC after: 1 week + +diff --git llvm/tools/clang/include/clang/Driver/Options.td llvm/tools/clang/include/clang/Driver/Options.td +index ad72aef3fc9..cab450042e6 100644 +--- tools/clang/include/clang/Driver/Options.td ++++ tools/clang/include/clang/Driver/Options.td +@@ -2559,6 +2559,8 @@ def mrtm : Flag<["-"], "mrtm">, Group<m_x86_Features_Group>; + def mno_rtm : Flag<["-"], "mno-rtm">, Group<m_x86_Features_Group>; + def mrdseed : Flag<["-"], "mrdseed">, Group<m_x86_Features_Group>; + def mno_rdseed : Flag<["-"], "mno-rdseed">, Group<m_x86_Features_Group>; ++def msahf : Flag<["-"], "msahf">, Group<m_x86_Features_Group>; ++def mno_sahf : Flag<["-"], "mno-sahf">, Group<m_x86_Features_Group>; + def msgx : Flag<["-"], "msgx">, Group<m_x86_Features_Group>; + def mno_sgx : Flag<["-"], "mno-sgx">, Group<m_x86_Features_Group>; + def msha : Flag<["-"], "msha">, Group<m_x86_Features_Group>; +diff --git llvm/tools/clang/lib/Basic/Targets/X86.cpp llvm/tools/clang/lib/Basic/Targets/X86.cpp +index cfa6c571d6e..8251e6abd64 100644 +--- tools/clang/lib/Basic/Targets/X86.cpp ++++ tools/clang/lib/Basic/Targets/X86.cpp +@@ -198,6 +198,7 @@ bool X86TargetInfo::initFeatureMap( + LLVM_FALLTHROUGH; + case CK_Core2: + setFeatureEnabledImpl(Features, "ssse3", true); ++ setFeatureEnabledImpl(Features, "sahf", true); + LLVM_FALLTHROUGH; + case CK_Yonah: + case CK_Prescott: +@@ -239,6 +240,7 @@ bool X86TargetInfo::initFeatureMap( + setFeatureEnabledImpl(Features, "ssse3", true); + setFeatureEnabledImpl(Features, "fxsr", true); + setFeatureEnabledImpl(Features, "cx16", true); ++ setFeatureEnabledImpl(Features, "sahf", true); + break; + + case CK_KNM: +@@ -269,6 +271,7 @@ bool X86TargetInfo::initFeatureMap( + setFeatureEnabledImpl(Features, "xsaveopt", true); + setFeatureEnabledImpl(Features, "xsave", true); + setFeatureEnabledImpl(Features, "movbe", true); ++ setFeatureEnabledImpl(Features, "sahf", true); + break; + + case CK_K6_2: +@@ -282,6 +285,7 @@ bool X86TargetInfo::initFeatureMap( + setFeatureEnabledImpl(Features, "sse4a", true); + setFeatureEnabledImpl(Features, "lzcnt", true); + setFeatureEnabledImpl(Features, "popcnt", true); ++ setFeatureEnabledImpl(Features, "sahf", true); + LLVM_FALLTHROUGH; + case CK_K8SSE3: + setFeatureEnabledImpl(Features, "sse3", true); +@@ -315,6 +319,7 @@ bool X86TargetInfo::initFeatureMap( + setFeatureEnabledImpl(Features, "prfchw", true); + setFeatureEnabledImpl(Features, "cx16", true); + setFeatureEnabledImpl(Features, "fxsr", true); ++ setFeatureEnabledImpl(Features, "sahf", true); + break; + + case CK_ZNVER1: +@@ -338,6 +343,7 @@ bool X86TargetInfo::initFeatureMap( + setFeatureEnabledImpl(Features, "prfchw", true); + setFeatureEnabledImpl(Features, "rdrnd", true); + setFeatureEnabledImpl(Features, "rdseed", true); ++ setFeatureEnabledImpl(Features, "sahf", true); + setFeatureEnabledImpl(Features, "sha", true); + setFeatureEnabledImpl(Features, "sse4a", true); + setFeatureEnabledImpl(Features, "xsave", true); +@@ -372,6 +378,7 @@ bool X86TargetInfo::initFeatureMap( + setFeatureEnabledImpl(Features, "cx16", true); + setFeatureEnabledImpl(Features, "fxsr", true); + setFeatureEnabledImpl(Features, "xsave", true); ++ setFeatureEnabledImpl(Features, "sahf", true); + break; + } + if (!TargetInfo::initFeatureMap(Features, Diags, CPU, FeaturesVec)) +@@ -768,6 +775,8 @@ bool X86TargetInfo::handleTargetFeatures(std::vector<std::string> &Features, + HasRetpoline = true; + } else if (Feature == "+retpoline-external-thunk") { + HasRetpolineExternalThunk = true; ++ } else if (Feature == "+sahf") { ++ HasLAHFSAHF = true; + } + + X86SSEEnum Level = llvm::StringSwitch<X86SSEEnum>(Feature) +@@ -1240,6 +1249,7 @@ bool X86TargetInfo::isValidFeatureName(StringRef Name) const { + .Case("rdrnd", true) + .Case("rdseed", true) + .Case("rtm", true) ++ .Case("sahf", true) + .Case("sgx", true) + .Case("sha", true) + .Case("shstk", true) +@@ -1313,6 +1323,7 @@ bool X86TargetInfo::hasFeature(StringRef Feature) const { + .Case("retpoline", HasRetpoline) + .Case("retpoline-external-thunk", HasRetpolineExternalThunk) + .Case("rtm", HasRTM) ++ .Case("sahf", HasLAHFSAHF) + .Case("sgx", HasSGX) + .Case("sha", HasSHA) + .Case("shstk", HasSHSTK) +diff --git llvm/tools/clang/lib/Basic/Targets/X86.h llvm/tools/clang/lib/Basic/Targets/X86.h +index 590531c1785..fa2fbee387b 100644 +--- tools/clang/lib/Basic/Targets/X86.h ++++ tools/clang/lib/Basic/Targets/X86.h +@@ -98,6 +98,7 @@ class LLVM_LIBRARY_VISIBILITY X86TargetInfo : public TargetInfo { + bool HasPREFETCHWT1 = false; + bool HasRetpoline = false; + bool HasRetpolineExternalThunk = false; ++ bool HasLAHFSAHF = false; + + /// \brief Enumeration of all of the X86 CPUs supported by Clang. + /// Added: head/devel/llvm60/files/patch-fsvn-r332833 ============================================================================== --- /dev/null 00:00:00 1970 (empty, because file is newly added) +++ head/devel/llvm60/files/patch-fsvn-r332833 Fri Apr 20 22:46:22 2018 (r467849) @@ -0,0 +1,1623 @@ +commit f13397cb22ae77e9b18e29273e2920bd63c17ef1 +Author: dim <dim@FreeBSD.org> +Date: Fri Apr 20 18:20:55 2018 +0000 + + Recommit r332501, with an additional upstream fix for "Cannot lower + EFLAGS copy that lives out of a basic block!" errors on i386. + + Pull in r325446 from upstream clang trunk (by me): + + [X86] Add 'sahf' CPU feature to frontend + + Summary: + Make clang accept `-msahf` (and `-mno-sahf`) flags to activate the + `+sahf` feature for the backend, for bug 36028 (Incorrect use of + pushf/popf enables/disables interrupts on amd64 kernels). This was + originally submitted in bug 36037 by Jonathan Looney + <jonlooney@gmail.com>. + + As described there, GCC also uses `-msahf` for this feature, and the + backend already recognizes the `+sahf` feature. All that is needed is + to teach clang to pass this on to the backend. + + The mapping of feature support onto CPUs may not be complete; rather, + it was chosen to match LLVM's idea of which CPUs support this feature + (see lib/Target/X86/X86.td). + + I also updated the affected test case (CodeGen/attr-target-x86.c) to + match the emitted output. + + Reviewers: craig.topper, coby, efriedma, rsmith + + Reviewed By: craig.topper + + Subscribers: emaste, cfe-commits + + Differential Revision: https://reviews.llvm.org/D43394 + + Pull in r328944 from upstream llvm trunk (by Chandler Carruth): + + [x86] Expose more of the condition conversion routines in the public + API for X86's instruction information. I've now got a second patch + under review that needs these same APIs. This bit is nicely + orthogonal and obvious, so landing it. NFC. + + Pull in r329414 from upstream llvm trunk (by Craig Topper): + + [X86] Merge itineraries for CLC, CMC, and STC. + + These are very simple flag setting instructions that appear to only + be a single uop. They're unlikely to need this separation. + + Pull in r329657 from upstream llvm trunk (by Chandler Carruth): + + [x86] Introduce a pass to begin more systematically fixing PR36028 + and similar issues. + + The key idea is to lower COPY nodes populating EFLAGS by scanning the + uses of EFLAGS and introducing dedicated code to preserve the + necessary state in a GPR. In the vast majority of cases, these uses + are cmovCC and jCC instructions. For such cases, we can very easily + save and restore the necessary information by simply inserting a + setCC into a GPR where the original flags are live, and then testing + that GPR directly to feed the cmov or conditional branch. + + However, things are a bit more tricky if arithmetic is using the + flags. This patch handles the vast majority of cases that seem to + come up in practice: adc, adcx, adox, rcl, and rcr; all without + taking advantage of partially preserved EFLAGS as LLVM doesn't + currently model that at all. + + There are a large number of operations that techinaclly observe + EFLAGS currently but shouldn't in this case -- they typically are + using DF. Currently, they will not be handled by this approach. + However, I have never seen this issue come up in practice. It is + already pretty rare to have these patterns come up in practical code + with LLVM. I had to resort to writing MIR tests to cover most of the + logic in this pass already. I suspect even with its current amount + of coverage of arithmetic users of EFLAGS it will be a significant + improvement over the current use of pushf/popf. It will also produce + substantially faster code in most of the common patterns. + + This patch also removes all of the old lowering for EFLAGS copies, + and the hack that forced us to use a frame pointer when EFLAGS copies + were found anywhere in a function so that the dynamic stack + adjustment wasn't a problem. None of this is needed as we now lower + all of these copies directly in MI and without require stack + adjustments. + + Lots of thanks to Reid who came up with several aspects of this + approach, and Craig who helped me work out a couple of things + tripping me up while working on this. + + Differential Revision: https://reviews.llvm.org/D45146 + + Pull in r329673 from upstream llvm trunk (by Chandler Carruth): + + [x86] Model the direction flag (DF) separately from the rest of + EFLAGS. + + This cleans up a number of operations that only claimed te use EFLAGS + due to using DF. But no instructions which we think of us setting + EFLAGS actually modify DF (other than things like popf) and so this + needlessly creates uses of EFLAGS that aren't really there. + + In fact, DF is so restrictive it is pretty easy to model. Only STD, + CLD, and the whole-flags writes (WRFLAGS and POPF) need to model + this. + + I've also somewhat cleaned up some of the flag management instruction + definitions to be in the correct .td file. + + Adding this extra register also uncovered a failure to use the + correct datatype to hold X86 registers, and I've corrected that as + necessary here. + + Differential Revision: https://reviews.llvm.org/D45154 + + Pull in r330264 from upstream llvm trunk (by Chandler Carruth): + + [x86] Fix PR37100 by teaching the EFLAGS copy lowering to rewrite + uses across basic blocks in the limited cases where it is very + straight forward to do so. + + This will also be useful for other places where we do some limited + EFLAGS propagation across CFG edges and need to handle copy rewrites + afterward. I think this is rapidly approaching the maximum we can and + should be doing here. Everything else begins to require either heroic + analysis to prove how to do PHI insertion manually, or somehow + managing arbitrary PHI-ing of EFLAGS with general PHI insertion. + Neither of these seem at all promising so if those cases come up, + we'll almost certainly need to rewrite the parts of LLVM that produce + those patterns. + + We do now require dominator trees in order to reliably diagnose + patterns that would require PHI nodes. This is a bit unfortunate but + it seems better than the completely mysterious crash we would get + otherwise. + + Differential Revision: https://reviews.llvm.org/D45673 + + Together, these should ensure clang does not use pushf/popf sequences to + save and restore flags, avoiding problems with unrelated flags (such as + the interrupt flag) being restored unexpectedly. + + Requested by: jtl + PR: 225330 + MFC after: 1 week + +diff --git llvm/include/llvm/CodeGen/MachineBasicBlock.h b/contrib/llvm/include/llvm/CodeGen/MachineBasicBlock.h +index 0c9110cbaa8..89210e16629 100644 +--- include/llvm/CodeGen/MachineBasicBlock.h ++++ include/llvm/CodeGen/MachineBasicBlock.h +@@ -449,6 +449,13 @@ class MachineBasicBlock + /// Replace successor OLD with NEW and update probability info. + void replaceSuccessor(MachineBasicBlock *Old, MachineBasicBlock *New); + ++ /// Copy a successor (and any probability info) from original block to this ++ /// block's. Uses an iterator into the original blocks successors. ++ /// ++ /// This is useful when doing a partial clone of successors. Afterward, the ++ /// probabilities may need to be normalized. ++ void copySuccessor(MachineBasicBlock *Orig, succ_iterator I); ++ + /// Transfers all the successors from MBB to this machine basic block (i.e., + /// copies all the successors FromMBB and remove all the successors from + /// FromMBB). +diff --git llvm/lib/CodeGen/MachineBasicBlock.cpp b/contrib/llvm/lib/CodeGen/MachineBasicBlock.cpp +index 209abf34d88..cd67449e3ac 100644 +--- lib/CodeGen/MachineBasicBlock.cpp ++++ lib/CodeGen/MachineBasicBlock.cpp +@@ -646,6 +646,14 @@ void MachineBasicBlock::replaceSuccessor(MachineBasicBlock *Old, + removeSuccessor(OldI); + } + ++void MachineBasicBlock::copySuccessor(MachineBasicBlock *Orig, ++ succ_iterator I) { ++ if (Orig->Probs.empty()) ++ addSuccessor(*I, Orig->getSuccProbability(I)); ++ else ++ addSuccessorWithoutProb(*I); ++} ++ + void MachineBasicBlock::addPredecessor(MachineBasicBlock *Pred) { + Predecessors.push_back(Pred); + } +diff --git llvm/lib/Target/X86/Disassembler/X86Disassembler.cpp b/contrib/llvm/lib/Target/X86/Disassembler/X86Disassembler.cpp +index c58254ae38c..b3c491b3de5 100644 +--- lib/Target/X86/Disassembler/X86Disassembler.cpp ++++ lib/Target/X86/Disassembler/X86Disassembler.cpp +@@ -265,13 +265,10 @@ MCDisassembler::DecodeStatus X86GenericDisassembler::getInstruction( + /// @param reg - The Reg to append. + static void translateRegister(MCInst &mcInst, Reg reg) { + #define ENTRY(x) X86::x, +- uint8_t llvmRegnums[] = { +- ALL_REGS +- 0 +- }; ++ static constexpr MCPhysReg llvmRegnums[] = {ALL_REGS}; + #undef ENTRY + +- uint8_t llvmRegnum = llvmRegnums[reg]; ++ MCPhysReg llvmRegnum = llvmRegnums[reg]; + mcInst.addOperand(MCOperand::createReg(llvmRegnum)); + } + +diff --git llvm/lib/Target/X86/X86.h b/contrib/llvm/lib/Target/X86/X86.h +index 36132682429..642dda8f422 100644 +--- lib/Target/X86/X86.h ++++ lib/Target/X86/X86.h +@@ -66,6 +66,9 @@ FunctionPass *createX86OptimizeLEAs(); + /// Return a pass that transforms setcc + movzx pairs into xor + setcc. + FunctionPass *createX86FixupSetCC(); + ++/// Return a pass that lowers EFLAGS copy pseudo instructions. ++FunctionPass *createX86FlagsCopyLoweringPass(); ++ + /// Return a pass that expands WinAlloca pseudo-instructions. + FunctionPass *createX86WinAllocaExpander(); + +diff --git llvm/lib/Target/X86/X86FlagsCopyLowering.cpp b/contrib/llvm/lib/Target/X86/X86FlagsCopyLowering.cpp +new file mode 100644 +index 00000000000..1b6369b7bfd +--- /dev/null ++++ lib/Target/X86/X86FlagsCopyLowering.cpp +@@ -0,0 +1,777 @@ ++//====- X86FlagsCopyLowering.cpp - Lowers COPY nodes of EFLAGS ------------===// ++// ++// The LLVM Compiler Infrastructure ++// ++// This file is distributed under the University of Illinois Open Source ++// License. See LICENSE.TXT for details. ++// ++//===----------------------------------------------------------------------===// ++/// \file ++/// ++/// Lowers COPY nodes of EFLAGS by directly extracting and preserving individual ++/// flag bits. ++/// ++/// We have to do this by carefully analyzing and rewriting the usage of the ++/// copied EFLAGS register because there is no general way to rematerialize the ++/// entire EFLAGS register safely and efficiently. Using `popf` both forces ++/// dynamic stack adjustment and can create correctness issues due to IF, TF, ++/// and other non-status flags being overwritten. Using sequences involving ++/// SAHF don't work on all x86 processors and are often quite slow compared to ++/// directly testing a single status preserved in its own GPR. ++/// ++//===----------------------------------------------------------------------===// ++ ++#include "X86.h" ++#include "X86InstrBuilder.h" ++#include "X86InstrInfo.h" ++#include "X86Subtarget.h" ++#include "llvm/ADT/ArrayRef.h" ++#include "llvm/ADT/DenseMap.h" ++#include "llvm/ADT/STLExtras.h" ++#include "llvm/ADT/ScopeExit.h" ++#include "llvm/ADT/SmallPtrSet.h" ++#include "llvm/ADT/SmallSet.h" ++#include "llvm/ADT/SmallVector.h" ++#include "llvm/ADT/SparseBitVector.h" ++#include "llvm/ADT/Statistic.h" ++#include "llvm/CodeGen/MachineBasicBlock.h" ++#include "llvm/CodeGen/MachineConstantPool.h" ++#include "llvm/CodeGen/MachineDominators.h" ++#include "llvm/CodeGen/MachineFunction.h" ++#include "llvm/CodeGen/MachineFunctionPass.h" ++#include "llvm/CodeGen/MachineInstr.h" ++#include "llvm/CodeGen/MachineInstrBuilder.h" ++#include "llvm/CodeGen/MachineModuleInfo.h" ++#include "llvm/CodeGen/MachineOperand.h" ++#include "llvm/CodeGen/MachineRegisterInfo.h" ++#include "llvm/CodeGen/MachineSSAUpdater.h" ++#include "llvm/CodeGen/TargetInstrInfo.h" ++#include "llvm/CodeGen/TargetRegisterInfo.h" ++#include "llvm/CodeGen/TargetSchedule.h" ++#include "llvm/CodeGen/TargetSubtargetInfo.h" ++#include "llvm/IR/DebugLoc.h" ++#include "llvm/MC/MCSchedule.h" ++#include "llvm/Pass.h" ++#include "llvm/Support/CommandLine.h" ++#include "llvm/Support/Debug.h" ++#include "llvm/Support/raw_ostream.h" ++#include <algorithm> ++#include <cassert> ++#include <iterator> ++#include <utility> ++ ++using namespace llvm; ++ ++#define PASS_KEY "x86-flags-copy-lowering" ++#define DEBUG_TYPE PASS_KEY ++ ++STATISTIC(NumCopiesEliminated, "Number of copies of EFLAGS eliminated"); ++STATISTIC(NumSetCCsInserted, "Number of setCC instructions inserted"); ++STATISTIC(NumTestsInserted, "Number of test instructions inserted"); ++STATISTIC(NumAddsInserted, "Number of adds instructions inserted"); ++ ++namespace llvm { ++ ++void initializeX86FlagsCopyLoweringPassPass(PassRegistry &); ++ ++} // end namespace llvm ++ ++namespace { ++ ++// Convenient array type for storing registers associated with each condition. ++using CondRegArray = std::array<unsigned, X86::LAST_VALID_COND + 1>; ++ ++class X86FlagsCopyLoweringPass : public MachineFunctionPass { ++public: ++ X86FlagsCopyLoweringPass() : MachineFunctionPass(ID) { ++ initializeX86FlagsCopyLoweringPassPass(*PassRegistry::getPassRegistry()); ++ } ++ ++ StringRef getPassName() const override { return "X86 EFLAGS copy lowering"; } ++ bool runOnMachineFunction(MachineFunction &MF) override; ++ void getAnalysisUsage(AnalysisUsage &AU) const override; ++ ++ /// Pass identification, replacement for typeid. ++ static char ID; ++ ++private: ++ MachineRegisterInfo *MRI; ++ const X86InstrInfo *TII; ++ const TargetRegisterInfo *TRI; ++ const TargetRegisterClass *PromoteRC; ++ MachineDominatorTree *MDT; ++ ++ CondRegArray collectCondsInRegs(MachineBasicBlock &MBB, ++ MachineInstr &CopyDefI); ++ ++ unsigned promoteCondToReg(MachineBasicBlock &MBB, ++ MachineBasicBlock::iterator TestPos, ++ DebugLoc TestLoc, X86::CondCode Cond); ++ std::pair<unsigned, bool> ++ getCondOrInverseInReg(MachineBasicBlock &TestMBB, ++ MachineBasicBlock::iterator TestPos, DebugLoc TestLoc, ++ X86::CondCode Cond, CondRegArray &CondRegs); ++ void insertTest(MachineBasicBlock &MBB, MachineBasicBlock::iterator Pos, ++ DebugLoc Loc, unsigned Reg); ++ ++ void rewriteArithmetic(MachineBasicBlock &TestMBB, ++ MachineBasicBlock::iterator TestPos, DebugLoc TestLoc, ++ MachineInstr &MI, MachineOperand &FlagUse, ++ CondRegArray &CondRegs); ++ void rewriteCMov(MachineBasicBlock &TestMBB, ++ MachineBasicBlock::iterator TestPos, DebugLoc TestLoc, ++ MachineInstr &CMovI, MachineOperand &FlagUse, ++ CondRegArray &CondRegs); ++ void rewriteCondJmp(MachineBasicBlock &TestMBB, ++ MachineBasicBlock::iterator TestPos, DebugLoc TestLoc, ++ MachineInstr &JmpI, CondRegArray &CondRegs); ++ void rewriteCopy(MachineInstr &MI, MachineOperand &FlagUse, ++ MachineInstr &CopyDefI); ++ void rewriteSetCC(MachineBasicBlock &TestMBB, ++ MachineBasicBlock::iterator TestPos, DebugLoc TestLoc, ++ MachineInstr &SetCCI, MachineOperand &FlagUse, ++ CondRegArray &CondRegs); ++}; ++ ++} // end anonymous namespace ++ ++INITIALIZE_PASS_BEGIN(X86FlagsCopyLoweringPass, DEBUG_TYPE, ++ "X86 EFLAGS copy lowering", false, false) ++INITIALIZE_PASS_END(X86FlagsCopyLoweringPass, DEBUG_TYPE, ++ "X86 EFLAGS copy lowering", false, false) ++ ++FunctionPass *llvm::createX86FlagsCopyLoweringPass() { ++ return new X86FlagsCopyLoweringPass(); ++} ++ ++char X86FlagsCopyLoweringPass::ID = 0; ++ ++void X86FlagsCopyLoweringPass::getAnalysisUsage(AnalysisUsage &AU) const { ++ AU.addRequired<MachineDominatorTree>(); ++ MachineFunctionPass::getAnalysisUsage(AU); ++} ++ ++namespace { ++/// An enumeration of the arithmetic instruction mnemonics which have ++/// interesting flag semantics. ++/// ++/// We can map instruction opcodes into these mnemonics to make it easy to ++/// dispatch with specific functionality. ++enum class FlagArithMnemonic { ++ ADC, ++ ADCX, ++ ADOX, ++ RCL, ++ RCR, ++ SBB, ++}; ++} // namespace ++ ++static FlagArithMnemonic getMnemonicFromOpcode(unsigned Opcode) { ++ switch (Opcode) { ++ default: ++ report_fatal_error("No support for lowering a copy into EFLAGS when used " ++ "by this instruction!"); ++ ++#define LLVM_EXPAND_INSTR_SIZES(MNEMONIC, SUFFIX) \ ++ case X86::MNEMONIC##8##SUFFIX: \ ++ case X86::MNEMONIC##16##SUFFIX: \ ++ case X86::MNEMONIC##32##SUFFIX: \ ++ case X86::MNEMONIC##64##SUFFIX: ++ ++#define LLVM_EXPAND_ADC_SBB_INSTR(MNEMONIC) \ ++ LLVM_EXPAND_INSTR_SIZES(MNEMONIC, rr) \ ++ LLVM_EXPAND_INSTR_SIZES(MNEMONIC, rr_REV) \ ++ LLVM_EXPAND_INSTR_SIZES(MNEMONIC, rm) \ ++ LLVM_EXPAND_INSTR_SIZES(MNEMONIC, mr) \ ++ case X86::MNEMONIC##8ri: \ ++ case X86::MNEMONIC##16ri8: \ ++ case X86::MNEMONIC##32ri8: \ ++ case X86::MNEMONIC##64ri8: \ ++ case X86::MNEMONIC##16ri: \ ++ case X86::MNEMONIC##32ri: \ ++ case X86::MNEMONIC##64ri32: \ ++ case X86::MNEMONIC##8mi: \ ++ case X86::MNEMONIC##16mi8: \ ++ case X86::MNEMONIC##32mi8: \ ++ case X86::MNEMONIC##64mi8: \ ++ case X86::MNEMONIC##16mi: \ ++ case X86::MNEMONIC##32mi: \ ++ case X86::MNEMONIC##64mi32: \ ++ case X86::MNEMONIC##8i8: \ ++ case X86::MNEMONIC##16i16: \ ++ case X86::MNEMONIC##32i32: \ ++ case X86::MNEMONIC##64i32: ++ ++ LLVM_EXPAND_ADC_SBB_INSTR(ADC) ++ return FlagArithMnemonic::ADC; ++ ++ LLVM_EXPAND_ADC_SBB_INSTR(SBB) ++ return FlagArithMnemonic::SBB; ++ ++#undef LLVM_EXPAND_ADC_SBB_INSTR ++ ++ LLVM_EXPAND_INSTR_SIZES(RCL, rCL) ++ LLVM_EXPAND_INSTR_SIZES(RCL, r1) ++ LLVM_EXPAND_INSTR_SIZES(RCL, ri) ++ return FlagArithMnemonic::RCL; ++ ++ LLVM_EXPAND_INSTR_SIZES(RCR, rCL) ++ LLVM_EXPAND_INSTR_SIZES(RCR, r1) ++ LLVM_EXPAND_INSTR_SIZES(RCR, ri) ++ return FlagArithMnemonic::RCR; ++ ++#undef LLVM_EXPAND_INSTR_SIZES ++ ++ case X86::ADCX32rr: ++ case X86::ADCX64rr: ++ case X86::ADCX32rm: ++ case X86::ADCX64rm: ++ return FlagArithMnemonic::ADCX; ++ ++ case X86::ADOX32rr: ++ case X86::ADOX64rr: ++ case X86::ADOX32rm: ++ case X86::ADOX64rm: ++ return FlagArithMnemonic::ADOX; ++ } ++} ++ ++static MachineBasicBlock &splitBlock(MachineBasicBlock &MBB, ++ MachineInstr &SplitI, ++ const X86InstrInfo &TII) { ++ MachineFunction &MF = *MBB.getParent(); ++ ++ assert(SplitI.getParent() == &MBB && ++ "Split instruction must be in the split block!"); ++ assert(SplitI.isBranch() && ++ "Only designed to split a tail of branch instructions!"); ++ assert(X86::getCondFromBranchOpc(SplitI.getOpcode()) != X86::COND_INVALID && ++ "Must split on an actual jCC instruction!"); ++ ++ // Dig out the previous instruction to the split point. ++ MachineInstr &PrevI = *std::prev(SplitI.getIterator()); ++ assert(PrevI.isBranch() && "Must split after a branch!"); ++ assert(X86::getCondFromBranchOpc(PrevI.getOpcode()) != X86::COND_INVALID && ++ "Must split after an actual jCC instruction!"); ++ assert(!std::prev(PrevI.getIterator())->isTerminator() && ++ "Must only have this one terminator prior to the split!"); ++ ++ // Grab the one successor edge that will stay in `MBB`. ++ MachineBasicBlock &UnsplitSucc = *PrevI.getOperand(0).getMBB(); ++ ++ // Analyze the original block to see if we are actually splitting an edge ++ // into two edges. This can happen when we have multiple conditional jumps to ++ // the same successor. ++ bool IsEdgeSplit = ++ std::any_of(SplitI.getIterator(), MBB.instr_end(), ++ [&](MachineInstr &MI) { ++ assert(MI.isTerminator() && ++ "Should only have spliced terminators!"); ++ return llvm::any_of( ++ MI.operands(), [&](MachineOperand &MOp) { ++ return MOp.isMBB() && MOp.getMBB() == &UnsplitSucc; ++ }); ++ }) || ++ MBB.getFallThrough() == &UnsplitSucc; ++ ++ MachineBasicBlock &NewMBB = *MF.CreateMachineBasicBlock(); ++ ++ // Insert the new block immediately after the current one. Any existing ++ // fallthrough will be sunk into this new block anyways. ++ MF.insert(std::next(MachineFunction::iterator(&MBB)), &NewMBB); ++ ++ // Splice the tail of instructions into the new block. ++ NewMBB.splice(NewMBB.end(), &MBB, SplitI.getIterator(), MBB.end()); ++ ++ // Copy the necessary succesors (and their probability info) into the new ++ // block. ++ for (auto SI = MBB.succ_begin(), SE = MBB.succ_end(); SI != SE; ++SI) ++ if (IsEdgeSplit || *SI != &UnsplitSucc) ++ NewMBB.copySuccessor(&MBB, SI); ++ // Normalize the probabilities if we didn't end up splitting the edge. ++ if (!IsEdgeSplit) ++ NewMBB.normalizeSuccProbs(); ++ ++ // Now replace all of the moved successors in the original block with the new ++ // block. This will merge their probabilities. ++ for (MachineBasicBlock *Succ : NewMBB.successors()) ++ if (Succ != &UnsplitSucc) ++ MBB.replaceSuccessor(Succ, &NewMBB); ++ ++ // We should always end up replacing at least one successor. ++ assert(MBB.isSuccessor(&NewMBB) && ++ "Failed to make the new block a successor!"); ++ ++ // Now update all the PHIs. ++ for (MachineBasicBlock *Succ : NewMBB.successors()) { ++ for (MachineInstr &MI : *Succ) { ++ if (!MI.isPHI()) ++ break; ++ ++ for (int OpIdx = 1, NumOps = MI.getNumOperands(); OpIdx < NumOps; ++ OpIdx += 2) { ++ MachineOperand &OpV = MI.getOperand(OpIdx); ++ MachineOperand &OpMBB = MI.getOperand(OpIdx + 1); ++ assert(OpMBB.isMBB() && "Block operand to a PHI is not a block!"); ++ if (OpMBB.getMBB() != &MBB) ++ continue; ++ ++ // Replace the operand for unsplit successors ++ if (!IsEdgeSplit || Succ != &UnsplitSucc) { ++ OpMBB.setMBB(&NewMBB); ++ ++ // We have to continue scanning as there may be multiple entries in ++ // the PHI. ++ continue; ++ } ++ ++ // When we have split the edge append a new successor. ++ MI.addOperand(MF, OpV); ++ MI.addOperand(MF, MachineOperand::CreateMBB(&NewMBB)); ++ break; ++ } ++ } ++ } ++ ++ return NewMBB; ++} ++ ++bool X86FlagsCopyLoweringPass::runOnMachineFunction(MachineFunction &MF) { ++ DEBUG(dbgs() << "********** " << getPassName() << " : " << MF.getName() ++ << " **********\n"); ++ ++ auto &Subtarget = MF.getSubtarget<X86Subtarget>(); ++ MRI = &MF.getRegInfo(); ++ TII = Subtarget.getInstrInfo(); ++ TRI = Subtarget.getRegisterInfo(); ++ MDT = &getAnalysis<MachineDominatorTree>(); ++ PromoteRC = &X86::GR8RegClass; ++ ++ if (MF.begin() == MF.end()) ++ // Nothing to do for a degenerate empty function... ++ return false; ++ ++ SmallVector<MachineInstr *, 4> Copies; ++ for (MachineBasicBlock &MBB : MF) ++ for (MachineInstr &MI : MBB) ++ if (MI.getOpcode() == TargetOpcode::COPY && ++ MI.getOperand(0).getReg() == X86::EFLAGS) ++ Copies.push_back(&MI); ++ ++ for (MachineInstr *CopyI : Copies) { ++ MachineBasicBlock &MBB = *CopyI->getParent(); ++ ++ MachineOperand &VOp = CopyI->getOperand(1); ++ assert(VOp.isReg() && ++ "The input to the copy for EFLAGS should always be a register!"); ++ MachineInstr &CopyDefI = *MRI->getVRegDef(VOp.getReg()); ++ if (CopyDefI.getOpcode() != TargetOpcode::COPY) { ++ // FIXME: The big likely candidate here are PHI nodes. We could in theory ++ // handle PHI nodes, but it gets really, really hard. Insanely hard. Hard ++ // enough that it is probably better to change every other part of LLVM ++ // to avoid creating them. The issue is that once we have PHIs we won't ++ // know which original EFLAGS value we need to capture with our setCCs ++ // below. The end result will be computing a complete set of setCCs that ++ // we *might* want, computing them in every place where we copy *out* of ++ // EFLAGS and then doing SSA formation on all of them to insert necessary ++ // PHI nodes and consume those here. Then hoping that somehow we DCE the ++ // unnecessary ones. This DCE seems very unlikely to be successful and so ++ // we will almost certainly end up with a glut of dead setCC ++ // instructions. Until we have a motivating test case and fail to avoid ++ // it by changing other parts of LLVM's lowering, we refuse to handle ++ // this complex case here. ++ DEBUG(dbgs() << "ERROR: Encountered unexpected def of an eflags copy: "; ++ CopyDefI.dump()); ++ report_fatal_error( ++ "Cannot lower EFLAGS copy unless it is defined in turn by a copy!"); ++ } ++ ++ auto Cleanup = make_scope_exit([&] { ++ // All uses of the EFLAGS copy are now rewritten, kill the copy into ++ // eflags and if dead the copy from. ++ CopyI->eraseFromParent(); ++ if (MRI->use_empty(CopyDefI.getOperand(0).getReg())) ++ CopyDefI.eraseFromParent(); ++ ++NumCopiesEliminated; ++ }); ++ ++ MachineOperand &DOp = CopyI->getOperand(0); ++ assert(DOp.isDef() && "Expected register def!"); ++ assert(DOp.getReg() == X86::EFLAGS && "Unexpected copy def register!"); ++ if (DOp.isDead()) ++ continue; ++ ++ MachineBasicBlock &TestMBB = *CopyDefI.getParent(); ++ auto TestPos = CopyDefI.getIterator(); ++ DebugLoc TestLoc = CopyDefI.getDebugLoc(); ++ ++ DEBUG(dbgs() << "Rewriting copy: "; CopyI->dump()); ++ ++ // Scan for usage of newly set EFLAGS so we can rewrite them. We just buffer ++ // jumps because their usage is very constrained. ++ bool FlagsKilled = false; ++ SmallVector<MachineInstr *, 4> JmpIs; ++ ++ // Gather the condition flags that have already been preserved in ++ // registers. We do this from scratch each time as we expect there to be ++ // very few of them and we expect to not revisit the same copy definition ++ // many times. If either of those change sufficiently we could build a map ++ // of these up front instead. ++ CondRegArray CondRegs = collectCondsInRegs(TestMBB, CopyDefI); ++ ++ // Collect the basic blocks we need to scan. Typically this will just be ++ // a single basic block but we may have to scan multiple blocks if the ++ // EFLAGS copy lives into successors. ++ SmallVector<MachineBasicBlock *, 2> Blocks; ++ SmallPtrSet<MachineBasicBlock *, 2> VisitedBlocks; ++ Blocks.push_back(&MBB); ++ VisitedBlocks.insert(&MBB); ++ ++ do { ++ MachineBasicBlock &UseMBB = *Blocks.pop_back_val(); ++ ++ // We currently don't do any PHI insertion and so we require that the ++ // test basic block dominates all of the use basic blocks. ++ // ++ // We could in theory do PHI insertion here if it becomes useful by just ++ // taking undef values in along every edge that we don't trace this ++ // EFLAGS copy along. This isn't as bad as fully general PHI insertion, ++ // but still seems like a great deal of complexity. ++ // ++ // Because it is theoretically possible that some earlier MI pass or ++ // other lowering transformation could induce this to happen, we do ++ // a hard check even in non-debug builds here. ++ if (&TestMBB != &UseMBB && !MDT->dominates(&TestMBB, &UseMBB)) { ++ DEBUG({ ++ dbgs() << "ERROR: Encountered use that is not dominated by our test " ++ "basic block! Rewriting this would require inserting PHI " ++ "nodes to track the flag state across the CFG.\n\nTest " ++ "block:\n"; ++ TestMBB.dump(); ++ dbgs() << "Use block:\n"; ++ UseMBB.dump(); ++ }); ++ report_fatal_error("Cannot lower EFLAGS copy when original copy def " ++ "does not dominate all uses."); ++ } ++ ++ for (auto MII = &UseMBB == &MBB ? std::next(CopyI->getIterator()) ++ : UseMBB.instr_begin(), ++ MIE = UseMBB.instr_end(); ++ MII != MIE;) { ++ MachineInstr &MI = *MII++; ++ MachineOperand *FlagUse = MI.findRegisterUseOperand(X86::EFLAGS); ++ if (!FlagUse) { ++ if (MI.findRegisterDefOperand(X86::EFLAGS)) { ++ // If EFLAGS are defined, it's as-if they were killed. We can stop ++ // scanning here. ++ // ++ // NB!!! Many instructions only modify some flags. LLVM currently ++ // models this as clobbering all flags, but if that ever changes ++ // this will need to be carefully updated to handle that more ++ // complex logic. ++ FlagsKilled = true; ++ break; ++ } ++ continue; ++ } ++ ++ DEBUG(dbgs() << " Rewriting use: "; MI.dump()); ++ ++ // Check the kill flag before we rewrite as that may change it. ++ if (FlagUse->isKill()) ++ FlagsKilled = true; ++ ++ // Once we encounter a branch, the rest of the instructions must also be ++ // branches. We can't rewrite in place here, so we handle them below. ++ // ++ // Note that we don't have to handle tail calls here, even conditional ++ // tail calls, as those are not introduced into the X86 MI until post-RA ++ // branch folding or black placement. As a consequence, we get to deal ++ // with the simpler formulation of conditional branches followed by tail ++ // calls. ++ if (X86::getCondFromBranchOpc(MI.getOpcode()) != X86::COND_INVALID) { ++ auto JmpIt = MI.getIterator(); ++ do { ++ JmpIs.push_back(&*JmpIt); ++ ++JmpIt; ++ } while (JmpIt != UseMBB.instr_end() && ++ X86::getCondFromBranchOpc(JmpIt->getOpcode()) != *** DIFF OUTPUT TRUNCATED AT 1000 LINES ***
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201804202246.w3KMkM6J006704>