Date: Tue, 12 Jun 2012 15:04:19 +0000 (UTC) From: "Pedro F. Giffuni" <pfg@FreeBSD.org> To: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: svn commit: r236962 - in head/contrib/gcc: . config/i386 doc Message-ID: <201206121504.q5CF4JYo060137@svn.freebsd.org>
next in thread | raw e-mail | index | archive | help
Author: pfg Date: Tue Jun 12 15:04:18 2012 New Revision: 236962 URL: http://svn.freebsd.org/changeset/base/236962 Log: Add experimental support for amdfam10/barcelona from the GCC 4.3 branch. Initial support for the AMD barcelona chipsets has been available in the gcc43 branch under GPLv2 but was not included when the Core 2 support was brought to the system gcc. AMD and some linux distributions (OpenSUSE) did a backport of the amdfam10 support and made them available. Unfortunately this is still experimental and while it can improve performance, enabling the CPUTYPE may break some C++ ports (like clang). Special care was taken to make sure that the patches predate the GPLv3 switch upstream. Tested by: Vladimir Kushnir Reviewed by: mm Approved by: jhb (mentor) MFC after: 2 weeks Added: head/contrib/gcc/config/i386/ammintrin.h (contents, props changed) Modified: head/contrib/gcc/ChangeLog.gcc43 head/contrib/gcc/builtins.c head/contrib/gcc/config.gcc head/contrib/gcc/config/i386/athlon.md head/contrib/gcc/config/i386/driver-i386.c head/contrib/gcc/config/i386/i386.c head/contrib/gcc/config/i386/i386.h head/contrib/gcc/config/i386/i386.md head/contrib/gcc/config/i386/i386.opt head/contrib/gcc/config/i386/mmx.md head/contrib/gcc/config/i386/pmmintrin.h head/contrib/gcc/config/i386/sse.md head/contrib/gcc/config/i386/tmmintrin.h head/contrib/gcc/doc/extend.texi head/contrib/gcc/doc/invoke.texi head/contrib/gcc/fold-const.c head/contrib/gcc/gimplify.c head/contrib/gcc/tree-ssa-ccp.c head/contrib/gcc/tree-ssa-pre.c Modified: head/contrib/gcc/ChangeLog.gcc43 ============================================================================== --- head/contrib/gcc/ChangeLog.gcc43 Tue Jun 12 14:56:08 2012 (r236961) +++ head/contrib/gcc/ChangeLog.gcc43 Tue Jun 12 15:04:18 2012 (r236962) @@ -1,3 +1,8 @@ +2007-05-01 Dwarakanath Rajagopal <dwarak.rajagopal@amd.com> + + * doc/invoke.texi: Fix typo, 'AMD Family 10h core' instead of + 'AMD Family 10 core'. + 2007-05-01 Dwarakanath Rajagopal <dwarak.rajagopal@amd.com> (r124339) * config/i386/i386.c (override_options): Accept k8-sse3, opteron-sse3 @@ -5,10 +10,39 @@ with SSE3 instruction set support. * doc/invoke.texi: Likewise. +2007-05-01 Dwarakanath Rajagopal <dwarak.rajagopal@amd.com> + + * config/i386/i386.c (override_options): Tuning 32-byte loop + alignment for amdfam10 architecture. Increasing the max loop + alignment to 24 bytes. + +2007-04-12 Richard Guenther <rguenther@suse.de> + + PR tree-optimization/24689 + PR tree-optimization/31307 + * fold-const.c (operand_equal_p): Compare INTEGER_CST array + indices by value. + * gimplify.c (canonicalize_addr_expr): To be consistent with + gimplify_compound_lval only set operands two and three of + ARRAY_REFs if they are not gimple_min_invariant. This makes + it never at this place. + * tree-ssa-ccp.c (maybe_fold_offset_to_array_ref): Likewise. + 2007-04-07 H.J. Lu <hongjiu.lu@intel.com> (r123639) * config/i386/i386.c (ix86_handle_option): Handle SSSE3. +2007-03-28 Dwarakanath Rajagopal <dwarak.rajagopal@amd.com> + + * config.gcc: Accept barcelona as a variant of amdfam10. + * config/i386/i386.c (override_options): Likewise. + * doc/invoke.texi: Likewise. + +2007-02-09 Dwarakanath Rajagopal <dwarak.rajagopal@amd.com> + + * config/i386/driver-i386.c: Turn on -mtune=native for AMDFAM10. + (bit_SSE4a): New. + 2007-02-08 Harsha Jagasia <harsha.jagasia@amd.com> (r121726) * config/i386/xmmintrin.h: Make inclusion of emmintrin.h @@ -26,6 +60,173 @@ * config/i386/i386.c (override_options): Set PTA_SSSE3 for core2. +2007-02-05 Harsha Jagasia <harsha.jagasia@amd.com> + + * config/i386/athlon.md (athlon_fldxf_k8, athlon_fld_k8, + athlon_fstxf_k8, athlon_fst_k8, athlon_fist, athlon_fmov, + athlon_fadd_load, athlon_fadd_load_k8, athlon_fadd, athlon_fmul, + athlon_fmul_load, athlon_fmul_load_k8, athlon_fsgn, + athlon_fdiv_load, athlon_fdiv_load_k8, athlon_fdiv_k8, + athlon_fpspc_load, athlon_fpspc, athlon_fcmov_load, + athlon_fcmov_load_k8, athlon_fcmov_k8, athlon_fcomi_load_k8, + athlon_fcomi, athlon_fcom_load_k8, athlon_fcom): Added amdfam10. + +2007-02-05 Harsha Jagasia <harsha.jagasia@amd.com> + + * config/i386/i386.md (x86_sahf_1, cmpfp_i_mixed, cmpfp_i_sse, + cmpfp_i_i387, cmpfp_iu_mixed, cmpfp_iu_sse, cmpfp_iu_387, + swapsi, swaphi_1, swapqi_1, swapdi_rex64, fix_truncsfdi_sse, + fix_truncdfdi_sse, fix_truncsfsi_sse, fix_truncdfsi_sse, + x86_fldcw_1, floatsisf2_mixed, floatsisf2_sse, floatdisf2_mixed, + floatdisf2_sse, floatsidf2_mixed, floatsidf2_sse, + floatdidf2_mixed, floatdidf2_sse, muldi3_1_rex64, mulsi3_1, + mulsi3_1_zext, mulhi3_1, mulqi3_1, umulqihi3_1, mulqihi3_insn, + umulditi3_insn, umulsidi3_insn, mulditi3_insn, mulsidi3_insn, + umuldi3_highpart_rex64, umulsi3_highpart_insn, + umulsi3_highpart_zext, smuldi3_highpart_rex64, + smulsi3_highpart_insn, smulsi3_highpart_zext, x86_64_shld, + x86_shld_1, x86_64_shrd, sqrtsf2_mixed, sqrtsf2_sse, + sqrtsf2_i387, sqrtdf2_mixed, sqrtdf2_sse, sqrtdf2_i387, + sqrtextendsfdf2_i387, sqrtxf2, sqrtextendsfxf2_i387, + sqrtextenddfxf2_i387): Added amdfam10_decode. + + * config/i386/athlon.md (athlon_idirect_amdfam10, + athlon_ivector_amdfam10, athlon_idirect_load_amdfam10, + athlon_ivector_load_amdfam10, athlon_idirect_both_amdfam10, + athlon_ivector_both_amdfam10, athlon_idirect_store_amdfam10, + athlon_ivector_store_amdfam10): New define_insn_reservation. + (athlon_idirect_loadmov, athlon_idirect_movstore): Added + amdfam10. + +2007-02-05 Harsha Jagasia <harsha.jagasia@amd.com> + + * config/i386/athlon.md (athlon_call_amdfam10, + athlon_pop_amdfam10, athlon_lea_amdfam10): New + define_insn_reservation. + (athlon_branch, athlon_push, athlon_leave_k8, athlon_imul_k8, + athlon_imul_k8_DI, athlon_imul_mem_k8, athlon_imul_mem_k8_DI, + athlon_idiv, athlon_idiv_mem, athlon_str): Added amdfam10. + +2007-02-05 Harsha Jagasia <harsha.jagasia@amd.com> + + * config/i386/athlon.md (athlon_sseld_amdfam10, + athlon_mmxld_amdfam10, athlon_ssest_amdfam10, + athlon_mmxssest_short_amdfam10): New define_insn_reservation. + +2007-02-05 Harsha Jagasia <harsha.jagasia@amd.com> + + * config/i386/athlon.md (athlon_sseins_amdfam10): New + define_insn_reservation. + * config/i386/i386.md (sseins): Added sseins to define_attr type + and define_attr unit. + * config/i386/sse.md: Set type attribute to sseins for insertq + and insertqi. + +2007-02-05 Harsha Jagasia <harsha.jagasia@amd.com> + + * config/i386/athlon.md (sselog_load_amdfam10, sselog_amdfam10, + ssecmpvector_load_amdfam10, ssecmpvector_amdfam10, + ssecomi_load_amdfam10, ssecomi_amdfam10, + sseaddvector_load_amdfam10, sseaddvector_amdfam10): New + define_insn_reservation. + (ssecmp_load_k8, ssecmp, sseadd_load_k8, seadd): Added amdfam10. + +2007-02-05 Harsha Jagasia <harsha.jagasia@amd.com> + + * config/i386/athlon.md (cvtss2sd_load_amdfam10, + cvtss2sd_amdfam10, cvtps2pd_load_amdfam10, cvtps2pd_amdfam10, + cvtsi2sd_load_amdfam10, cvtsi2ss_load_amdfam10, + cvtsi2sd_amdfam10, cvtsi2ss_amdfam10, cvtsd2ss_load_amdfam10, + cvtsd2ss_amdfam10, cvtpd2ps_load_amdfam10, cvtpd2ps_amdfam10, + cvtsX2si_load_amdfam10, cvtsX2si_amdfam10): New + define_insn_reservation. + + * config/i386/sse.md (cvtsi2ss, cvtsi2ssq, cvtss2si, + cvtss2siq, cvttss2si, cvttss2siq, cvtsi2sd, cvtsi2sdq, + cvtsd2si, cvtsd2siq, cvttsd2si, cvttsd2siq, + cvtpd2dq, cvttpd2dq, cvtsd2ss, cvtss2sd, + cvtpd2ps, cvtps2pd): Added amdfam10_decode attribute. + +2007-02-05 Harsha Jagasia <harsha.jagasia@amd.com> + + * config/i386/athlon.md (athlon_ssedivvector_amdfam10, + athlon_ssedivvector_load_amdfam10, athlon_ssemulvector_amdfam10, + athlon_ssemulvector_load_amdfam10): New define_insn_reservation. + (athlon_ssediv, athlon_ssediv_load_k8, athlon_ssemul, + athlon_ssemul_load_k8): Added amdfam10. + +2007-02-05 Harsha Jagasia <harsha.jagasia@amd.com> + + * config/i386/i386.h (TARGET_SSE_UNALIGNED_MOVE_OPTIMAL): New macro. + (x86_sse_unaligned_move_optimal): New variable. + + * config/i386/i386.c (x86_sse_unaligned_move_optimal): Enable for + m_AMDFAM10. + (ix86_expand_vector_move_misalign): Add code to generate movupd/movups + for unaligned vector SSE double/single precision loads for AMDFAM10. + +2007-02-05 Harsha Jagasia <harsha.jagasia@amd.com> + + * config/i386/i386.h (TARGET_AMDFAM10): New macro. + (TARGET_CPU_CPP_BUILTINS): Add code for amdfam10. + Define TARGET_CPU_DEFAULT_amdfam10. + (TARGET_CPU_DEFAULT_NAMES): Add amdfam10. + (processor_type): Add PROCESSOR_AMDFAM10. + + * config/i386/i386.md: Add amdfam10 as a new cpu attribute to match + processor_type in config/i386/i386.h. + Enable imul peepholes for TARGET_AMDFAM10. + + * config.gcc: Add support for --with-cpu option for amdfam10. + + * config/i386/i386.c (amdfam10_cost): New variable. + (m_AMDFAM10): New macro. + (m_ATHLON_K8_AMDFAM10): New macro. + (x86_use_leave, x86_push_memory, x86_movx, x86_unroll_strlen, + x86_cmove, x86_3dnow_a, x86_deep_branch, x86_use_simode_fiop, + x86_promote_QImode, x86_integer_DFmode_moves, + x86_partial_reg_dependency, x86_memory_mismatch_stall, + x86_accumulate_outgoing_args, x86_arch_always_fancy_math_387, + x86_sse_partial_reg_dependency, x86_sse_typeless_stores, + x86_use_ffreep, x86_use_incdec, x86_four_jump_limit, + x86_schedule, x86_use_bt, x86_cmpxchg16b, x86_pad_returns): + Enable/disable for amdfam10. + (override_options): Add amdfam10_cost to processor_target_table. + Set up PROCESSOR_AMDFAM10 for amdfam10 entry in + processor_alias_table. + (ix86_issue_rate): Add PROCESSOR_AMDFAM10. + (ix86_adjust_cost): Add code for amdfam10. + +2007-02-05 Harsha Jagasia <harsha.jagasia@amd.com> + + * config/i386/i386.opt: Add new Advanced Bit Manipulation (-mabm) + instruction set feature flag. Add new (-mpopcnt) flag for popcnt + instruction. Add new SSE4A (-msse4a) instruction set feature flag. + * config/i386/i386.h: Add builtin definition for SSE4A. + * config/i386/i386.md: Add support for ABM instructions + (popcnt and lzcnt). + * config/i386/sse.md: Add support for SSE4A instructions + (movntss, movntsd, extrq, insertq). + * config/i386/i386.c: Add support for ABM and SSE4A builtins. + Add -march=amdfam10 flag. + * config/i386/ammintrin.h: Add support for SSE4A intrinsics. + * doc/invoke.texi: Add documentation on flags for sse4a, abm, popcnt + and amdfam10. + * doc/extend.texi: Add documentation for SSE4A builtins. + +2007-01-24 Jakub Jelinek <jakub@redhat.com> + + * config/i386/i386.h (x86_cmpxchg16b): Remove const. + (TARGET_CMPXCHG16B): Define to x86_cmpxchg16b. + * config/i386/i386.c (x86_cmpxchg16b): Remove const. + (override_options): Add PTA_CX16 flag. Set x86_cmpxchg16b + for CPUs that have PTA_CX16 set. + +2007-01-18 Michael Meissner <michael.meissner@amd.com> + + * config/i386/i386.c (ix86_compute_frame_layout): Make fprintf's + in #if 0 code type correct. + 2007-01-17 Eric Christopher <echristo@apple.com> (r120846) * config.gcc: Support core2 processor. @@ -62,7 +263,30 @@ x86_pad_returns): Add m_CORE2. (override_options): Add entries for Core2. (ix86_issue_rate): Add case for Core2. + +2006-10-28 Uros Bizjak <uros@kss-loka.si> + + * config/i386/i386.h (GENERAL_REGNO_P): Use STACK_POINTER_REGNUM. + (NON_QI_REG_P): Use IN_RANGE. + (REX_INT_REGNO_P): Use IN_RANGE. + (FP_REGNO_P): Use IN_RANGE. + (SSE_REGNO_P): Use IN_RANGE. + (REX_SSE_REGNO_P): Use IN_RANGE. + (MMX_REGNO_P): Use IN_RANGE. + (STACK_REGNO_P): New macro. + (STACK_REG_P): Use STACK_REGNO_P. + (NON_STACK_REG_P): Use STACK_REGNO_P. + (REGNO_OK_FOR_INDEX_P): Use REX_INT_REGNO_P. + (REGNO_OK_FOR_BASE_P): Use GENERAL_REGNO_P. + (REG_OK_FOR_INDEX_NONSTRICT_P): Use REX_INT_REGNO_P. + (REG_OK_FOR_BASE_NONSTRICT_P): Use GENERAL_REGNO_P. + (HARD_REGNO_RENAME_OK): Use !IN_RANGE. +2006-10-28 Uros Bizjak <uros@kss-loka.si> + + * config/i386/i386.c (output_387_ffreep): Create output from a + template string for !HAVE_AS_IX86_FFREEP. + 2006-10-27 Vladimir Makarov <vmakarov@redhat.com> (r118090) * config/i386/i386.h (TARGET_GEODE): @@ -95,7 +319,31 @@ * config/i386/geode.md: New file. * doc/invoke.texi: Add entry about geode processor. - + +2006-10-24 Uros Bizjak <uros@kss-loka.si> + + * config/i386/i386.h (FIRST_PSEUDO_REGISTER): Define to 54. + (FIXED_REGISTERS, CALL_USED_REGISTERS): Add fpcr register. + (REG_ALLOC_ORDER): Add one element to allocate fpcr register. + (FRAME_POINTER_REGNUM): Update register number to 21. + (REG_CLASS_CONTENTS): Update contents for added fpcr register. + (HI_REGISTER_NAMES): Add "fpcr" for fpcr register. + + * config/i386/i386.c (regclass_map): Add fpcr entry. + (dbx_register_map, dbx64_register_map, svr4_dbx_register_map): + Add fpcr entry. + (print_reg): Assert REGNO (x) != FPCR_REG. + + * config/i386/i386.md (FPCR_REG, R11_REG): New constants. + (DIRFLAG_REG): Renumber. + (x86_fnstcw_1, x86_fldcw_1): Use FPCR_REG instead of FPSR_REG. + (*sibcall_1_rex64_v, *sibcall_value_1_rex64_v): Use R11_REG. + (sse_prologue_save, *sse_prologue_save_insn): Renumber + hardcoded SSE register numbers. + + * config/i386/mmx.md (mmx_emms, mmx_femms): Renumber + hardcoded MMX register numbers. + 2006-10-24 Richard Guenther <rguenther@suse.de> PR middle-end/28796 @@ -104,6 +352,17 @@ for deciding optimizations in consistency with fold-const.c (fold_builtin_unordered_cmp): Likewise. +2006-10-24 Richard Guenther <rguenther@suse.de> + + * builtins.c (fold_builtin_floor): Fold floor (x) where + x is nonnegative to trunc (x). + (fold_builtin_int_roundingfn): Fold lfloor (x) where x is + nonnegative to FIX_TRUNC_EXPR. + +2006-10-22 H.J. Lu <hongjiu.lu@intel.com> + + * config/i386/tmmintrin.h: Remove the duplicated content. + 2006-10-22 H.J. Lu <hongjiu.lu@intel.com> (r117958) * config.gcc (i[34567]86-*-*): Add tmmintrin.h to extra_headers. @@ -170,6 +429,18 @@ * doc/invoke.texi: Document -mssse3/-mno-ssse3 switches. +2006-10-21 H.J. Lu <hongjiu.lu@intel.com> + + * config/i386/i386.md (UNSPEC_LDQQU): Renamed to ... + (UNSPEC_LDDQU): This. + * config/i386/sse.md (sse3_lddqu): Updated. + +2006-10-21 Richard Guenther <rguenther@suse.de> + + PR tree-optimization/3511 + * tree-ssa-pre.c (phi_translate): Fold CALL_EXPRs that + got new invariant arguments during PHI translation. + 2006-10-21 Richard Guenther <rguenther@suse.de> * builtins.c (fold_builtin_classify): Fix typo. Modified: head/contrib/gcc/builtins.c ============================================================================== --- head/contrib/gcc/builtins.c Tue Jun 12 14:56:08 2012 (r236961) +++ head/contrib/gcc/builtins.c Tue Jun 12 15:04:18 2012 (r236962) @@ -7355,6 +7355,12 @@ fold_builtin_ceil (tree fndecl, tree arg } } + /* Fold floor (x) where x is nonnegative to trunc (x). */ + if (tree_expr_nonnegative_p (arg)) + return build_function_call_expr (mathfn_built_in (TREE_TYPE (arg), + BUILT_IN_TRUNC), + arglist); + return fold_trunc_transparent_mathfn (fndecl, arglist); } @@ -7442,6 +7448,18 @@ fold_builtin_int_roundingfn (tree fndecl } } + switch (DECL_FUNCTION_CODE (fndecl)) + { + CASE_FLT_FN (BUILT_IN_LFLOOR): + CASE_FLT_FN (BUILT_IN_LLFLOOR): + /* Fold lfloor (x) where x is nonnegative to FIX_TRUNC (x). */ + if (tree_expr_nonnegative_p (arg)) + return fold_build1 (FIX_TRUNC_EXPR, TREE_TYPE (TREE_TYPE (fndecl)), + arg); + break; + default:; + } + return fold_fixed_mathfn (fndecl, arglist); } Modified: head/contrib/gcc/config.gcc ============================================================================== --- head/contrib/gcc/config.gcc Tue Jun 12 14:56:08 2012 (r236961) +++ head/contrib/gcc/config.gcc Tue Jun 12 15:04:18 2012 (r236962) @@ -269,12 +269,12 @@ xscale-*-*) i[34567]86-*-*) cpu_type=i386 extra_headers="mmintrin.h mm3dnow.h xmmintrin.h emmintrin.h - pmmintrin.h tmmintrin.h" + pmmintrin.h tmmintrin.h ammintrin.h" ;; x86_64-*-*) cpu_type=i386 extra_headers="mmintrin.h mm3dnow.h xmmintrin.h emmintrin.h - pmmintrin.h tmmintrin.h" + pmmintrin.h tmmintrin.h ammintrin.h" need_64bit_hwint=yes ;; ia64-*-*) @@ -1209,14 +1209,14 @@ i[34567]86-*-solaris2*) # FIXME: -m64 for i[34567]86-*-* should be allowed just # like -m32 for x86_64-*-*. case X"${with_cpu}" in - Xgeneric|Xcore2|Xnocona|Xx86-64|Xk8|Xopteron|Xathlon64|Xathlon-fx) + Xgeneric|Xcore2|Xnocona|Xx86-64|Xamdfam10|Xbarcelona|Xk8|Xopteron|Xathlon64|Xathlon-fx) ;; X) with_cpu=generic ;; *) echo "Unsupported CPU used in --with-cpu=$with_cpu, supported values:" 1>&2 - echo "generic core2 nocona x86-64 k8 opteron athlon64 athlon-fx" 1>&2 + echo "generic core2 nocona x86-64amd fam10 barcelona k8 opteron athlon64 athlon-fx" 1>&2 exit 1 ;; esac @@ -2515,6 +2515,9 @@ if test x$with_cpu = x ; then ;; i686-*-* | i786-*-*) case ${target_noncanonical} in + amdfam10-*|barcelona-*) + with_cpu=amdfam10 + ;; k8-*|opteron-*|athlon_64-*) with_cpu=k8 ;; @@ -2555,6 +2558,9 @@ if test x$with_cpu = x ; then ;; x86_64-*-*) case ${target_noncanonical} in + amdfam10-*|barcelona-*) + with_cpu=amdfam10 + ;; k8-*|opteron-*|athlon_64-*) with_cpu=k8 ;; @@ -2795,7 +2801,7 @@ case "${target}" in esac # OK ;; - "" | k8 | opteron | athlon64 | athlon-fx | nocona | core2 | generic) + "" | amdfam10 | barcelona | k8 | opteron | athlon64 | athlon-fx | nocona | core2 | generic) # OK ;; *) Added: head/contrib/gcc/config/i386/ammintrin.h ============================================================================== --- /dev/null 00:00:00 1970 (empty, because file is newly added) +++ head/contrib/gcc/config/i386/ammintrin.h Tue Jun 12 15:04:18 2012 (r236962) @@ -0,0 +1,73 @@ +/* Copyright (C) 2007 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 2, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with GCC; see the file COPYING. If not, write to + the Free Software Foundation, 51 Franklin Street, Fifth Floor, + Boston, MA 02110-1301, USA. */ + +/* As a special exception, if you include this header file into source + files compiled by GCC, this header file does not by itself cause + the resulting executable to be covered by the GNU General Public + License. This exception does not however invalidate any other + reasons why the executable file might be covered by the GNU General + Public License. */ + +/* Implemented from the specification included in the AMD Programmers + Manual Update, version 2.x */ + +#ifndef _AMMINTRIN_H_INCLUDED +#define _AMMINTRIN_H_INCLUDED + +#ifndef __SSE4A__ +# error "SSE4A instruction set not enabled" +#else + +/* We need definitions from the SSE3, SSE2 and SSE header files*/ +#include <pmmintrin.h> + +static __inline void __attribute__((__always_inline__)) +_mm_stream_sd (double * __P, __m128d __Y) +{ + __builtin_ia32_movntsd (__P, (__v2df) __Y); +} + +static __inline void __attribute__((__always_inline__)) +_mm_stream_ss (float * __P, __m128 __Y) +{ + __builtin_ia32_movntss (__P, (__v4sf) __Y); +} + +static __inline __m128i __attribute__((__always_inline__)) +_mm_extract_si64 (__m128i __X, __m128i __Y) +{ + return (__m128i) __builtin_ia32_extrq ((__v2di) __X, (__v16qi) __Y); +} + +#define _mm_extracti_si64(X, I, L) \ +((__m128i) __builtin_ia32_extrqi ((__v2di)(X), I, L)) + +static __inline __m128i __attribute__((__always_inline__)) +_mm_insert_si64 (__m128i __X,__m128i __Y) +{ + return (__m128i) __builtin_ia32_insertq ((__v2di)__X, (__v2di)__Y); +} + +#define _mm_inserti_si64(X, Y, I, L) \ +((__m128i) __builtin_ia32_insertqi ((__v2di)(X), (__v2di)(Y), I, L)) + + +#endif /* __SSE4A__ */ + +#endif /* _AMMINTRIN_H_INCLUDED */ Modified: head/contrib/gcc/config/i386/athlon.md ============================================================================== --- head/contrib/gcc/config/i386/athlon.md Tue Jun 12 14:56:08 2012 (r236961) +++ head/contrib/gcc/config/i386/athlon.md Tue Jun 12 15:04:18 2012 (r236962) @@ -29,6 +29,8 @@ (const_string "vector")] (const_string "direct"))) +(define_attr "amdfam10_decode" "direct,vector,double" + (const_string "direct")) ;; ;; decode0 decode1 decode2 ;; \ | / @@ -131,18 +133,22 @@ ;; Jump instructions are executed in the branch unit completely transparent to us (define_insn_reservation "athlon_branch" 0 - (and (eq_attr "cpu" "athlon,k8,generic64") + (and (eq_attr "cpu" "athlon,k8,generic64,amdfam10") (eq_attr "type" "ibr")) "athlon-direct,athlon-ieu") (define_insn_reservation "athlon_call" 0 (and (eq_attr "cpu" "athlon,k8,generic64") (eq_attr "type" "call,callv")) "athlon-vector,athlon-ieu") +(define_insn_reservation "athlon_call_amdfam10" 0 + (and (eq_attr "cpu" "amdfam10") + (eq_attr "type" "call,callv")) + "athlon-double,athlon-ieu") ;; Latency of push operation is 3 cycles, but ESP value is available ;; earlier (define_insn_reservation "athlon_push" 2 - (and (eq_attr "cpu" "athlon,k8,generic64") + (and (eq_attr "cpu" "athlon,k8,generic64,amdfam10") (eq_attr "type" "push")) "athlon-direct,athlon-agu,athlon-store") (define_insn_reservation "athlon_pop" 4 @@ -153,12 +159,16 @@ (and (eq_attr "cpu" "k8,generic64") (eq_attr "type" "pop")) "athlon-double,(athlon-ieu+athlon-load)") +(define_insn_reservation "athlon_pop_amdfam10" 3 + (and (eq_attr "cpu" "amdfam10") + (eq_attr "type" "pop")) + "athlon-direct,(athlon-ieu+athlon-load)") (define_insn_reservation "athlon_leave" 3 (and (eq_attr "cpu" "athlon") (eq_attr "type" "leave")) "athlon-vector,(athlon-ieu+athlon-load)") (define_insn_reservation "athlon_leave_k8" 3 - (and (eq_attr "cpu" "k8,generic64") + (and (eq_attr "cpu" "k8,generic64,amdfam10") (eq_attr "type" "leave")) "athlon-double,(athlon-ieu+athlon-load)") @@ -167,6 +177,11 @@ (and (eq_attr "cpu" "athlon,k8,generic64") (eq_attr "type" "lea")) "athlon-direct,athlon-agu,nothing") +;; Lea executes in AGU unit with 1 cycle latency on AMDFAM10 +(define_insn_reservation "athlon_lea_amdfam10" 1 + (and (eq_attr "cpu" "amdfam10") + (eq_attr "type" "lea")) + "athlon-direct,athlon-agu,nothing") ;; Mul executes in special multiplier unit attached to IEU0 (define_insn_reservation "athlon_imul" 5 @@ -176,29 +191,35 @@ "athlon-vector,athlon-ieu0,athlon-mult,nothing,nothing,athlon-ieu0") ;; ??? Widening multiply is vector or double. (define_insn_reservation "athlon_imul_k8_DI" 4 - (and (eq_attr "cpu" "k8,generic64") + (and (eq_attr "cpu" "k8,generic64,amdfam10") (and (eq_attr "type" "imul") (and (eq_attr "mode" "DI") (eq_attr "memory" "none,unknown")))) "athlon-direct0,athlon-ieu0,athlon-mult,nothing,athlon-ieu0") (define_insn_reservation "athlon_imul_k8" 3 - (and (eq_attr "cpu" "k8,generic64") + (and (eq_attr "cpu" "k8,generic64,amdfam10") (and (eq_attr "type" "imul") (eq_attr "memory" "none,unknown"))) "athlon-direct0,athlon-ieu0,athlon-mult,athlon-ieu0") +(define_insn_reservation "athlon_imul_amdfam10_HI" 4 + (and (eq_attr "cpu" "amdfam10") + (and (eq_attr "type" "imul") + (and (eq_attr "mode" "HI") + (eq_attr "memory" "none,unknown")))) + "athlon-vector,athlon-ieu0,athlon-mult,nothing,athlon-ieu0") (define_insn_reservation "athlon_imul_mem" 8 (and (eq_attr "cpu" "athlon") (and (eq_attr "type" "imul") (eq_attr "memory" "load,both"))) "athlon-vector,athlon-load,athlon-ieu,athlon-mult,nothing,nothing,athlon-ieu") (define_insn_reservation "athlon_imul_mem_k8_DI" 7 - (and (eq_attr "cpu" "k8,generic64") + (and (eq_attr "cpu" "k8,generic64,amdfam10") (and (eq_attr "type" "imul") (and (eq_attr "mode" "DI") (eq_attr "memory" "load,both")))) "athlon-vector,athlon-load,athlon-ieu,athlon-mult,nothing,athlon-ieu") (define_insn_reservation "athlon_imul_mem_k8" 6 - (and (eq_attr "cpu" "k8,generic64") + (and (eq_attr "cpu" "k8,generic64,amdfam10") (and (eq_attr "type" "imul") (eq_attr "memory" "load,both"))) "athlon-vector,athlon-load,athlon-ieu,athlon-mult,athlon-ieu") @@ -209,21 +230,23 @@ ;; other instructions. ;; ??? Experiments show that the idiv can overlap with roughly 6 cycles ;; of the other code +;; Using the same heuristics for amdfam10 as K8 with idiv (define_insn_reservation "athlon_idiv" 6 - (and (eq_attr "cpu" "athlon,k8,generic64") + (and (eq_attr "cpu" "athlon,k8,generic64,amdfam10") (and (eq_attr "type" "idiv") (eq_attr "memory" "none,unknown"))) "athlon-vector,(athlon-ieu0*6+(athlon-fpsched,athlon-fvector))") (define_insn_reservation "athlon_idiv_mem" 9 - (and (eq_attr "cpu" "athlon,k8,generic64") + (and (eq_attr "cpu" "athlon,k8,generic64,amdfam10") (and (eq_attr "type" "idiv") (eq_attr "memory" "load,both"))) "athlon-vector,((athlon-load,athlon-ieu0*6)+(athlon-fpsched,athlon-fvector))") ;; The parallelism of string instructions is not documented. Model it same way ;; as idiv to create smaller automata. This probably does not matter much. +;; Using the same heuristics for amdfam10 as K8 with idiv (define_insn_reservation "athlon_str" 6 - (and (eq_attr "cpu" "athlon,k8,generic64") + (and (eq_attr "cpu" "athlon,k8,generic64,amdfam10") (and (eq_attr "type" "str") (eq_attr "memory" "load,both,store"))) "athlon-vector,athlon-load,athlon-ieu0*6") @@ -234,34 +257,62 @@ (and (eq_attr "unit" "integer,unknown") (eq_attr "memory" "none,unknown")))) "athlon-direct,athlon-ieu") +(define_insn_reservation "athlon_idirect_amdfam10" 1 + (and (eq_attr "cpu" "amdfam10") + (and (eq_attr "amdfam10_decode" "direct") + (and (eq_attr "unit" "integer,unknown") + (eq_attr "memory" "none,unknown")))) + "athlon-direct,athlon-ieu") (define_insn_reservation "athlon_ivector" 2 (and (eq_attr "cpu" "athlon,k8,generic64") (and (eq_attr "athlon_decode" "vector") (and (eq_attr "unit" "integer,unknown") (eq_attr "memory" "none,unknown")))) "athlon-vector,athlon-ieu,athlon-ieu") +(define_insn_reservation "athlon_ivector_amdfam10" 2 + (and (eq_attr "cpu" "amdfam10") + (and (eq_attr "amdfam10_decode" "vector") + (and (eq_attr "unit" "integer,unknown") + (eq_attr "memory" "none,unknown")))) + "athlon-vector,athlon-ieu,athlon-ieu") + (define_insn_reservation "athlon_idirect_loadmov" 3 - (and (eq_attr "cpu" "athlon,k8,generic64") + (and (eq_attr "cpu" "athlon,k8,generic64,amdfam10") (and (eq_attr "type" "imov") (eq_attr "memory" "load"))) "athlon-direct,athlon-load") + (define_insn_reservation "athlon_idirect_load" 4 (and (eq_attr "cpu" "athlon,k8,generic64") (and (eq_attr "athlon_decode" "direct") (and (eq_attr "unit" "integer,unknown") (eq_attr "memory" "load")))) "athlon-direct,athlon-load,athlon-ieu") +(define_insn_reservation "athlon_idirect_load_amdfam10" 4 + (and (eq_attr "cpu" "amdfam10") + (and (eq_attr "amdfam10_decode" "direct") + (and (eq_attr "unit" "integer,unknown") + (eq_attr "memory" "load")))) + "athlon-direct,athlon-load,athlon-ieu") (define_insn_reservation "athlon_ivector_load" 6 (and (eq_attr "cpu" "athlon,k8,generic64") (and (eq_attr "athlon_decode" "vector") (and (eq_attr "unit" "integer,unknown") (eq_attr "memory" "load")))) "athlon-vector,athlon-load,athlon-ieu,athlon-ieu") +(define_insn_reservation "athlon_ivector_load_amdfam10" 6 + (and (eq_attr "cpu" "amdfam10") + (and (eq_attr "amdfam10_decode" "vector") + (and (eq_attr "unit" "integer,unknown") + (eq_attr "memory" "load")))) + "athlon-vector,athlon-load,athlon-ieu,athlon-ieu") + (define_insn_reservation "athlon_idirect_movstore" 1 - (and (eq_attr "cpu" "athlon,k8,generic64") + (and (eq_attr "cpu" "athlon,k8,generic64,amdfam10") (and (eq_attr "type" "imov") (eq_attr "memory" "store"))) "athlon-direct,athlon-agu,athlon-store") + (define_insn_reservation "athlon_idirect_both" 4 (and (eq_attr "cpu" "athlon,k8,generic64") (and (eq_attr "athlon_decode" "direct") @@ -270,6 +321,15 @@ "athlon-direct,athlon-load, athlon-ieu,athlon-store, athlon-store") +(define_insn_reservation "athlon_idirect_both_amdfam10" 4 + (and (eq_attr "cpu" "amdfam10") + (and (eq_attr "amdfam10_decode" "direct") + (and (eq_attr "unit" "integer,unknown") + (eq_attr "memory" "both")))) + "athlon-direct,athlon-load, + athlon-ieu,athlon-store, + athlon-store") + (define_insn_reservation "athlon_ivector_both" 6 (and (eq_attr "cpu" "athlon,k8,generic64") (and (eq_attr "athlon_decode" "vector") @@ -279,6 +339,16 @@ athlon-ieu, athlon-ieu, athlon-store") +(define_insn_reservation "athlon_ivector_both_amdfam10" 6 + (and (eq_attr "cpu" "amdfam10") + (and (eq_attr "amdfam10_decode" "vector") + (and (eq_attr "unit" "integer,unknown") + (eq_attr "memory" "both")))) + "athlon-vector,athlon-load, + athlon-ieu, + athlon-ieu, + athlon-store") + (define_insn_reservation "athlon_idirect_store" 1 (and (eq_attr "cpu" "athlon,k8,generic64") (and (eq_attr "athlon_decode" "direct") @@ -286,6 +356,14 @@ (eq_attr "memory" "store")))) "athlon-direct,(athlon-ieu+athlon-agu), athlon-store") +(define_insn_reservation "athlon_idirect_store_amdfam10" 1 + (and (eq_attr "cpu" "amdfam10") + (and (eq_attr "amdfam10_decode" "direct") + (and (eq_attr "unit" "integer,unknown") + (eq_attr "memory" "store")))) + "athlon-direct,(athlon-ieu+athlon-agu), + athlon-store") + (define_insn_reservation "athlon_ivector_store" 2 (and (eq_attr "cpu" "athlon,k8,generic64") (and (eq_attr "athlon_decode" "vector") @@ -293,6 +371,13 @@ (eq_attr "memory" "store")))) "athlon-vector,(athlon-ieu+athlon-agu),athlon-ieu, athlon-store") +(define_insn_reservation "athlon_ivector_store_amdfam10" 2 + (and (eq_attr "cpu" "amdfam10") + (and (eq_attr "amdfam10_decode" "vector") + (and (eq_attr "unit" "integer,unknown") + (eq_attr "memory" "store")))) + "athlon-vector,(athlon-ieu+athlon-agu),athlon-ieu, + athlon-store") ;; Athlon floatin point unit (define_insn_reservation "athlon_fldxf" 12 @@ -302,7 +387,7 @@ (eq_attr "mode" "XF")))) "athlon-vector,athlon-fpload2,athlon-fvector*9") (define_insn_reservation "athlon_fldxf_k8" 13 - (and (eq_attr "cpu" "k8,generic64") + (and (eq_attr "cpu" "k8,generic64,amdfam10") (and (eq_attr "type" "fmov") (and (eq_attr "memory" "load") (eq_attr "mode" "XF")))) @@ -314,7 +399,7 @@ (eq_attr "memory" "load"))) "athlon-direct,athlon-fpload,athlon-fany") (define_insn_reservation "athlon_fld_k8" 2 - (and (eq_attr "cpu" "k8,generic64") + (and (eq_attr "cpu" "k8,generic64,amdfam10") (and (eq_attr "type" "fmov") (eq_attr "memory" "load"))) "athlon-direct,athlon-fploadk8,athlon-fstore") @@ -326,7 +411,7 @@ (eq_attr "mode" "XF")))) "athlon-vector,(athlon-fpsched+athlon-agu),(athlon-store2+(athlon-fvector*7))") (define_insn_reservation "athlon_fstxf_k8" 8 - (and (eq_attr "cpu" "k8,generic64") + (and (eq_attr "cpu" "k8,generic64,amdfam10") (and (eq_attr "type" "fmov") (and (eq_attr "memory" "store,both") (eq_attr "mode" "XF")))) @@ -337,16 +422,16 @@ (eq_attr "memory" "store,both"))) "athlon-direct,(athlon-fpsched+athlon-agu),(athlon-fstore+athlon-store)") (define_insn_reservation "athlon_fst_k8" 2 - (and (eq_attr "cpu" "k8,generic64") + (and (eq_attr "cpu" "k8,generic64,amdfam10") (and (eq_attr "type" "fmov") (eq_attr "memory" "store,both"))) "athlon-direct,(athlon-fpsched+athlon-agu),(athlon-fstore+athlon-store)") (define_insn_reservation "athlon_fist" 4 - (and (eq_attr "cpu" "athlon,k8,generic64") + (and (eq_attr "cpu" "athlon,k8,generic64,amdfam10") (eq_attr "type" "fistp,fisttp")) "athlon-direct,(athlon-fpsched+athlon-agu),(athlon-fstore+athlon-store)") (define_insn_reservation "athlon_fmov" 2 - (and (eq_attr "cpu" "athlon,k8,generic64") + (and (eq_attr "cpu" "athlon,k8,generic64,amdfam10") (eq_attr "type" "fmov")) "athlon-direct,athlon-fpsched,athlon-faddmul") (define_insn_reservation "athlon_fadd_load" 4 @@ -355,12 +440,12 @@ (eq_attr "memory" "load"))) "athlon-direct,athlon-fpload,athlon-fadd") (define_insn_reservation "athlon_fadd_load_k8" 6 - (and (eq_attr "cpu" "k8,generic64") + (and (eq_attr "cpu" "k8,generic64,amdfam10") (and (eq_attr "type" "fop") (eq_attr "memory" "load"))) "athlon-direct,athlon-fploadk8,athlon-fadd") (define_insn_reservation "athlon_fadd" 4 - (and (eq_attr "cpu" "athlon,k8,generic64") + (and (eq_attr "cpu" "athlon,k8,generic64,amdfam10") (eq_attr "type" "fop")) "athlon-direct,athlon-fpsched,athlon-fadd") (define_insn_reservation "athlon_fmul_load" 4 @@ -369,16 +454,16 @@ (eq_attr "memory" "load"))) "athlon-direct,athlon-fpload,athlon-fmul") (define_insn_reservation "athlon_fmul_load_k8" 6 - (and (eq_attr "cpu" "k8,generic64") + (and (eq_attr "cpu" "k8,generic64,amdfam10") (and (eq_attr "type" "fmul") (eq_attr "memory" "load"))) "athlon-direct,athlon-fploadk8,athlon-fmul") (define_insn_reservation "athlon_fmul" 4 - (and (eq_attr "cpu" "athlon,k8,generic64") + (and (eq_attr "cpu" "athlon,k8,generic64,amdfam10") (eq_attr "type" "fmul")) "athlon-direct,athlon-fpsched,athlon-fmul") (define_insn_reservation "athlon_fsgn" 2 - (and (eq_attr "cpu" "athlon,k8,generic64") + (and (eq_attr "cpu" "athlon,k8,generic64,amdfam10") (eq_attr "type" "fsgn")) "athlon-direct,athlon-fpsched,athlon-fmul") (define_insn_reservation "athlon_fdiv_load" 24 @@ -387,7 +472,7 @@ (eq_attr "memory" "load"))) "athlon-direct,athlon-fpload,athlon-fmul") (define_insn_reservation "athlon_fdiv_load_k8" 13 - (and (eq_attr "cpu" "k8,generic64") + (and (eq_attr "cpu" "k8,generic64,amdfam10") (and (eq_attr "type" "fdiv") (eq_attr "memory" "load"))) "athlon-direct,athlon-fploadk8,athlon-fmul") @@ -396,16 +481,16 @@ (eq_attr "type" "fdiv")) "athlon-direct,athlon-fpsched,athlon-fmul") (define_insn_reservation "athlon_fdiv_k8" 11 - (and (eq_attr "cpu" "k8,generic64") + (and (eq_attr "cpu" "k8,generic64,amdfam10") (eq_attr "type" "fdiv")) "athlon-direct,athlon-fpsched,athlon-fmul") (define_insn_reservation "athlon_fpspc_load" 103 - (and (eq_attr "cpu" "athlon,k8,generic64") + (and (eq_attr "cpu" "athlon,k8,generic64,amdfam10") (and (eq_attr "type" "fpspc") (eq_attr "memory" "load"))) "athlon-vector,athlon-fpload,athlon-fvector") (define_insn_reservation "athlon_fpspc" 100 - (and (eq_attr "cpu" "athlon,k8,generic64") + (and (eq_attr "cpu" "athlon,k8,generic64,amdfam10") (eq_attr "type" "fpspc")) "athlon-vector,athlon-fpsched,athlon-fvector") (define_insn_reservation "athlon_fcmov_load" 7 @@ -418,12 +503,12 @@ (eq_attr "type" "fcmov")) "athlon-vector,athlon-fpsched,athlon-fvector") (define_insn_reservation "athlon_fcmov_load_k8" 17 - (and (eq_attr "cpu" "k8,generic64") + (and (eq_attr "cpu" "k8,generic64,amdfam10") (and (eq_attr "type" "fcmov") (eq_attr "memory" "load"))) "athlon-vector,athlon-fploadk8,athlon-fvector") (define_insn_reservation "athlon_fcmov_k8" 15 - (and (eq_attr "cpu" "k8,generic64") + (and (eq_attr "cpu" "k8,generic64,amdfam10") (eq_attr "type" "fcmov")) "athlon-vector,athlon-fpsched,athlon-fvector") ;; fcomi is vector decoded by uses only one pipe. @@ -434,13 +519,13 @@ (eq_attr "memory" "load")))) "athlon-vector,athlon-fpload,athlon-fadd") (define_insn_reservation "athlon_fcomi_load_k8" 5 - (and (eq_attr "cpu" "k8,generic64") + (and (eq_attr "cpu" "k8,generic64,amdfam10") (and (eq_attr "type" "fcmp") (and (eq_attr "athlon_decode" "vector") (eq_attr "memory" "load")))) "athlon-vector,athlon-fploadk8,athlon-fadd") (define_insn_reservation "athlon_fcomi" 3 - (and (eq_attr "cpu" "athlon,k8,generic64") + (and (eq_attr "cpu" "athlon,k8,generic64,amdfam10") (and (eq_attr "athlon_decode" "vector") (eq_attr "type" "fcmp"))) "athlon-vector,athlon-fpsched,athlon-fadd") @@ -450,18 +535,18 @@ (eq_attr "memory" "load"))) "athlon-direct,athlon-fpload,athlon-fadd") (define_insn_reservation "athlon_fcom_load_k8" 4 - (and (eq_attr "cpu" "k8,generic64") + (and (eq_attr "cpu" "k8,generic64,amdfam10") (and (eq_attr "type" "fcmp") (eq_attr "memory" "load"))) "athlon-direct,athlon-fploadk8,athlon-fadd") (define_insn_reservation "athlon_fcom" 2 - (and (eq_attr "cpu" "athlon,k8,generic64") + (and (eq_attr "cpu" "athlon,k8,generic64,amdfam10") (eq_attr "type" "fcmp")) "athlon-direct,athlon-fpsched,athlon-fadd") ;; Never seen by the scheduler because we still don't do post reg-stack ;; scheduling. ;(define_insn_reservation "athlon_fxch" 2 -; (and (eq_attr "cpu" "athlon,k8,generic64") +; (and (eq_attr "cpu" "athlon,k8,generic64,amdfam10") ; (eq_attr "type" "fxch")) ; "athlon-direct,athlon-fpsched,athlon-fany") @@ -516,6 +601,23 @@ (and (eq_attr "type" "mmxmov,ssemov") (eq_attr "memory" "load"))) "athlon-direct,athlon-fploadk8,athlon-fstore") +;; On AMDFAM10 all double, single and integer packed and scalar SSEx data +;; loads generated are direct path, latency of 2 and do not use any FP +;; executions units. No seperate entries for movlpx/movhpx loads, which +;; are direct path, latency of 4 and use the FADD/FMUL FP execution units, +;; as they will not be generated. +(define_insn_reservation "athlon_sseld_amdfam10" 2 + (and (eq_attr "cpu" "amdfam10") + (and (eq_attr "type" "ssemov") + (eq_attr "memory" "load"))) + "athlon-direct,athlon-fploadk8") +;; On AMDFAM10 MMX data loads generated are direct path, latency of 4 +;; and can use any FP executions units +(define_insn_reservation "athlon_mmxld_amdfam10" 4 + (and (eq_attr "cpu" "amdfam10") + (and (eq_attr "type" "mmxmov") + (eq_attr "memory" "load"))) + "athlon-direct,athlon-fploadk8, athlon-fany") (define_insn_reservation "athlon_mmxssest" 3 (and (eq_attr "cpu" "k8,generic64") (and (eq_attr "type" "mmxmov,ssemov") @@ -533,6 +635,25 @@ (and (eq_attr "type" "mmxmov,ssemov") (eq_attr "memory" "store,both"))) "athlon-direct,(athlon-fpsched+athlon-agu),(athlon-fstore+athlon-store)") +;; On AMDFAM10 all double, single and integer packed SSEx data stores +;; generated are all double path, latency of 2 and use the FSTORE FP +;; execution unit. No entries seperate for movupx/movdqu, which are +;; vector path, latency of 3 and use the FSTORE*2 FP execution unit, +;; as they will not be generated. +(define_insn_reservation "athlon_ssest_amdfam10" 2 + (and (eq_attr "cpu" "amdfam10") + (and (eq_attr "type" "ssemov") + (and (eq_attr "mode" "V4SF,V2DF,TI") + (eq_attr "memory" "store,both")))) + "athlon-double,(athlon-fpsched+athlon-agu),((athlon-fstore+athlon-store)*2)") +;; On AMDFAM10 all double, single and integer scalar SSEx and MMX +;; data stores generated are all direct path, latency of 2 and use +;; the FSTORE FP execution unit +(define_insn_reservation "athlon_mmxssest_short_amdfam10" 2 + (and (eq_attr "cpu" "amdfam10") + (and (eq_attr "type" "mmxmov,ssemov") + (eq_attr "memory" "store,both"))) + "athlon-direct,(athlon-fpsched+athlon-agu),(athlon-fstore+athlon-store)") (define_insn_reservation "athlon_movaps_k8" 2 (and (eq_attr "cpu" "k8,generic64") (and (eq_attr "type" "ssemov") @@ -578,6 +699,11 @@ (and (eq_attr "type" "sselog,sselog1") (eq_attr "memory" "load"))) "athlon-double,athlon-fpload2k8,(athlon-fmul*2)") +(define_insn_reservation "athlon_sselog_load_amdfam10" 4 + (and (eq_attr "cpu" "amdfam10") + (and (eq_attr "type" "sselog,sselog1") + (eq_attr "memory" "load"))) + "athlon-direct,athlon-fploadk8,(athlon-fadd|athlon-fmul)") (define_insn_reservation "athlon_sselog" 3 (and (eq_attr "cpu" "athlon") (eq_attr "type" "sselog,sselog1")) @@ -586,6 +712,11 @@ (and (eq_attr "cpu" "k8,generic64") (eq_attr "type" "sselog,sselog1")) "athlon-double,athlon-fpsched,athlon-fmul") +(define_insn_reservation "athlon_sselog_amdfam10" 2 + (and (eq_attr "cpu" "amdfam10") + (eq_attr "type" "sselog,sselog1")) + "athlon-direct,athlon-fpsched,(athlon-fadd|athlon-fmul)") + ;; ??? pcmp executes in addmul, probably not worthwhile to bother about that. (define_insn_reservation "athlon_ssecmp_load" 2 (and (eq_attr "cpu" "athlon") @@ -594,13 +725,13 @@ (eq_attr "memory" "load")))) "athlon-direct,athlon-fpload,athlon-fadd") *** DIFF OUTPUT TRUNCATED AT 1000 LINES ***
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201206121504.q5CF4JYo060137>