2012年10月30日 星期二

x86-to-ARM LnQ Status

Block mode, test inputs

400.perlbench                               NR
401.bzip2          --   180              -- S
403.gcc            --   466              -- S
429.mcf            --    53.7            -- S
445.gobmk          --   874              -- S
456.hmmer                                   NR                              
458.sjeng          --   129              -- S
462.libquantum     --    15.7            -- S                            
464.h264ref        --     0.0248            RE                        
471.omnetpp        --    62.1               RE                        
473.astar          --   126              -- S                            
483.xalancbmk      --   218              -- S
------------------------------------------------------------------------------------
Summary:
  • 400.perlbench, and 456.hmmer can not run due to floating point precision error
  • 464.h264ref
  • 471.omnetpp 
================================================================
Trace Mode, test inputs
400.perlbench                               NR
401.bzip2          --   140              -- S                               
403.gcc            --   335                 RE                              
429.mcf            --    53.5            -- S                               
445.gobmk          --   533                 RE
456.hmmer                                   NR                             
458.sjeng          --   106              -- S                           
462.libquantum     --    14.3            -- S                            
464.h264ref        --     0.0257            RE                             
471.omnetpp        --    50.3               RE
473.astar          --   104              -- S
483.xalancbmk      --   169              -- S
------------------------------------------------------------------------------------
Summary:
  • Error inherit from block mode 
  • 403.gcc
  • 445.gobmk
===============================================================
guest applications are re-compiled with gcc 4.7 to make sure it really use SSE instructions
  • Perlbench still stuck in arith.t due to precision problem
  • bzip OK
  • gcc: KILLILL, illegal instruction due to incorrect encoding of VST1LNd32
    • fix alignment encoding for these instructions in getAddrMode6AddressOpValue() in ARMCodeEmitter.cpp
  • Re-test all but perlbench and hmmer
    • hang on leslie3d, try to find out why...
      • trap in infinite loop; maybe due to precision error?
      • skip it.
    • Omnetpp: Fail to run; try to find out why...
      • QEMU cannot run either.
    • Bwaves: mis-compare; does QEMU get the same result?

    • Gamess: mis-compare; does QEMU get the same result?
      • NOTE: unlike bwaves, there are minus signs where it should not appear
    • Milc: mis-compare;
    • Zeus: segfault.
      • Reason: On ARM Linux, the shared library, like ld-2.13.so, libpthread-2.13.so, etc..., are loaded starting at 0x40000000, and and the image of x86 guest starts at 0x08048000.  Zeus asks for 0x45efa000 memory for its image, which cannot fit in the ``hole'' between 0x08048000 and 0x40000000.
      • And I move qemu image to 0x90000000;
      • So, the guest image is finally put in between 0x40000000 and 0x90000000.
      • however, during execution, the guest asks more memory, and finally shit happen...
    • Gromacs: mis-compare
    • Leslie3d: mis-compare
--------------------------------------------------------------------------------------
Summary:
  • QEMU FP precision problem
    • Perlbench and hmmer trap in infinite loop due to FP precision problem
    • Omnetpp fail to run; 
    • Only cactum and namd success in floating point benchmarks...
  • 9 CINT2006 benchmarks run successfully.
  • Measure timing...
400.perlbench                               NR
401.bzip2          --      163           -- S
403.gcc            --      507           -- S
429.mcf            --       56.1         -- S
445.gobmk          --      955           -- S
456.hmmer                                   NR
458.sjeng          --      126           -- S
462.libquantum     --       16.3         -- S
464.h264ref        --      387              VE
471.omnetpp                                 NR
473.astar          --      114           -- S
483.xalancbmk      --      230           -- S
======================================================================
Trace :

401.bzip2: NOT OK

  • Infinite loop


403.gcc: mis-compare
429.mcf: OK
445.gobmk: OK
458.sjeng: OK
462.libquantum: OK
473.astar: OK

464.h264ref: NOT OK

*** longjmp causes uninitialized stack frame ***: /home/tk/lnq/install/bin/qemu-i386 terminated
Aborted (core dumped)
483.xalancbmk: NOT OK
  • Terminate without printing anything
400.perlbench                               NR
401.bzip2                                   NR
403.gcc            --      460              VE
429.mcf            --       56.5         -- S
445.gobmk          --      874           -- S
456.hmmer                                   NR
458.sjeng          --      112           -- S
462.libquantum     --       14.7         -- S
464.h264ref                                 NR
471.omnetpp                                 NR
473.astar          --       98.4         -- S
483.xalancbmk                               NR

===============================================================
Debug Trace Mode:
===============================================================
First, try to find out whose fault, and always use easiest-bug-first strategy to fight.
===============================================================
GCC: Mis-compare

  •  When in trace mode, the blocks are compiled with IFastEnable and Opt::None options. So, check whether this error comes from fast instruction selection mode.
    • Set optimization options to ``Default'' and disable IFastEnable.
      • This error is gone.
      • Confirm!
  • Now debug becomes simple: run $l gcc two times: enabling and disabling FastISel, and compares logs to locate where went wrong.
    • FAIL!
  • Another approach:
    • EnableFastISel does not affect correctness.
    • llvm optimization does!
      • When opt is set to None, gcc got segfault!
      • When opt is set to Less, gcc runs successfully.
  • Run block with None, and compare used MI.
    • Run experiment!
      • Nothing found!
  • Still don't know why llvm::CodeGenOpt::None cause fault, try find reduced example.
    • Run CINT2006 with llvm::CodeGenOpt::None
      • Wait for result...
      • Error: 1x401.bzip2 1x403.gcc 1x445.gobmk 1x464.h264ref 1x483.xalancbmk
      • Success: 1x429.mcf 1x458.sjeng 1x462.libquantum 1x473.astar 1x999.specrand














2012年10月29日 星期一

Work log

In target_i386/cpu.h

  • CPU86_LDouble  =:: struct { uint64_t low, uint16_t high; }
Note:
  • x86 versions of perlbench - arith.t and omnetpp fail to run on ARM with QEMU-0.13 official version but run successfully in QEMU-1.1.1
    • I tried two compiler : gcc-4.7 and gcc-4.4
    • The reason is that QEMU does not use CONFIG_SOFTFLOAT for i386 target, even when host is ARM. As a result, QEMU uses only 64-bit double for x86 80-bit floating point registers.
    • This is strange! I never heard DY report that he fails to run omnetpp, or perlbench.
    • Others can now run test inputs successfully
  • Plan to port LnQ to QEMU 1.2 + LLVM 3.2。
  • Correctness is far important than Performance!!!!!

Work log - ARM NEON Guest Support


  • [TODO] ARM NEON Guest Support
  • [TODO] Refine system mode experiment slide
  • [TODO] Re-implment LnQ
  • [TODO] Journal paper start
  • [Q] Blog ?
  • [Q] Plot tutorial
  • [Q]  x86_64 on ARM64, run user-mode in system mode, ARM64 emulation?
ARM NEON Guest Support:
  • recompile SPEC with flags: -static -O3 -marm -march=armv7-a -mtune=cortex-a9 -mcpu=cortex-a9 -mfloat-abi=softfp -mfpu=neon -ffast-math -ftree-vectorize -funroll-all-loops -Wl,--whole-archive -lpthread -Wl,--no-whole-archive
  • GCC configure : 
    • ../gcc-linaro-4.7-2012.05/configure --build=i686-build_pc-linux-gnu --host=i686-build_pc-linux-gnu --target=arm-linux-gnueabi --prefix=/home/tk/local/arm-toolchain --enable-languages=c,c++,fortran --with-arch=armv7-a --with-tune=cortex-a8 --with-fpu=neon --with-float=softfp --with-sysroot=/home/tk/local/arm-toolchain/.build/arm-linux-gnueabi/libc --with-pkgversion='crosstool-NG linaro-1.13.1-2012.05-20120523 - Linaro GCC 2012.05' --with-bugurl=https://bugs.launchpad.net/gcc-linaro --enable-__cxa_atexit --enable-libmudflap --enable-libgomp --enable-libssp --with-gmp=/home/tk/local/arm-toolchain/.build --with-mpfr=/home/tk/local/arm-toolchain/.build --with-mpc=/home/tk/local/arm-toolchain/.build --with-ppl=/home/tk/local/arm-toolchain/.build --with-cloog=/home/tk/local/arm-toolchain/.build --enable-cloog-backend=isl --with-libelf=/home/tk/local/arm-toolchain/.build --enable-threads=posix --disable-libstdcxx-pch --enable-linker-build-id --enable-gold --with-local-prefix=/home/tk/local/arm-toolchain/arm-linux-gnueabihf/libc --enable-c99 --enable-long-long --with-mode=arm
  • 'ptrdiff_t' does not name a type: after GCC 4.6, we need to include  manually in source files.
    • don't want to modify source files, add ``#include '' in c++config.h
    • boring...
  • when compiling 464.h264, there are error messages: ']' expect: vld4.i32 {d16, d18, d20, d22}, [sp:64] by assembler.
    • as from binutils 2.19 and 2.21 does not understand this because it use double-space register, which types are double floating point
    • solution: use 2.23. however,  it is now the loader ld has assertion error (WTF!).
    • don't know how to get rid of this assertion error, I comments out the assertion in bfd/elf32-arm.c:11757
    • the compiled executable can run successfully on pandaboard
  • Run native with test inputs
    • CINT: h264ref verification error
    • 434.zeusmp,  447.dealII, 450.soplex, 459.GemsFDTD, 482.sphinx3 runtime error.
      • 434.zeusmp: over 1G bss (0x45df2c7c bytes), fixed after add 4G swap
      • 447.dealII: segfault; fixed after changing binutils from 2.19 to 2.23
      • 482.sphinx3: program error, program misbehaved.
    • Summary: all good except 482.sphinx3
















2012年10月25日 星期四

multi-byte no ops in x86

http://www.asmpedia.org/index.php?title=NOP


90                    nop
6690                xchg    ax,ax ; 66: switch to 16-bit operand 90: opcode
0f1f00              nop     dword ptr [eax] ; 0f1f: 2-byte opcode 00: mod=00 reg=000 rm=000 [EAX]
0f1f4000           nop     dword ptr [eax] ; 0f1f: 2-byte opcode 40: mod=01 reg=000 rm=000 [EAX+0x00]
0f1f440000       nop     dword ptr [eax+eax] ; 0f1f: 2-byte opcode 44: mod=01 reg=000 rm=100 SIB + 0x00
660f1f440000    nop     word ptr [eax+eax] ; 66: switch to 16-bit operand 0f1f: 2-byte opcode 44: mod=01 reg=000 rm=100 SIB + 0x00
0f1f8000000000 nop     dword ptr [eax] ; 0f1f: 2-byte opcode 80: mod=10 reg=000 rm=000 [EAX+0x00000000]

2012年10月10日 星期三

Cross compile LnQ: Building LnQ to ARM executatble in x86 platform

Pre-requests:

  • llvm-gcc-arm:
    • ../configure --prefix=/home/tk/research/llvm-qemu/tool/llvm-gcc-4.2-2.9-arm --program-prefix=llvm- --enable-llvm=/home/tk/research/llvm-qemu/llvm/install/llvm-2.9-official --target=arm-none-linux-gnueabi --with-sysroot=/home/tk/research/llvm-qemu/tool/arm-2012.03/arm-none-linux-gnueabi/libc --enable-languages=c,c++
    • Make sure that /home/tk/research/llvm-qemu/tool/arm-2012.03/bin in PATH
    • DO USE official LLVM 2.9, NOT LnQ's LLVM 2.9.
  • llvm-2.9 x86 version, LnQ's version
    • host and target set to i686-pc-linux-gnu
  • llvm-2.9 ARM version, LnQ's version
    • host and target set to arm-none-linux-gnueabi
  • ARM toolchain
    • download from 
Environment setting:
  • LLVM_ARM=llvm-2.9-arm/bin
  • LLVM=llvm-2.9/bin
  • LLVM_GCC=llvm-gcc-4.2-2.9-arm/bin
  • 在 PATH 設定成 LLVM_ARM 先,LLVM,再 LLVM_GCC,
    • LLVM_ARM 中的 llvm-link 跟 opt 要先設成非執行檔。
    • 理由:我們需要用 $LLVM_ARM/llvm-config 來設定 LD_FLAGS,但我們也需要 $LLVM/llvm-link 跟 $LLVM/opt 這兩個檔。所以這兩個都要在 PATH 上。
LnQ configure:
  • 加入 --cross-prefix='arm-linux-gnueabi-' --cpu=armv7l 
  • configure --target-list=i386-linux-user --prefix=$INSTALL --enable-lnq --disable-strip --cross-prefix='arm-linux-gnueabi-' --cpu=armv7l


  • waste time to chase a ghost...
    • Question: number of workers can slow down Xalancbmk????
      • train input, merge used
        • #1: 334.756721
        • #2: 342.477593
        • #3: 376.607939
        • #1: 348.298319
        • #1: 333.039942
        • #3: 357.604264
        • #1: 331.753310
        • #3: 359.777587
    • Generate code duration? NO!
  • performance different, trace, i386
    • -3.37, perlbench
    • -1.25, bzip2
    • 4.82, gcc
    • -1.64, mcf
    • 0.00, gobmk
    • -2.72, 
    • -2.54
    • -6.59
    • -6.90
    • -4.65
    • 2.50
    • 3.63
  • performance difference, region, i386
    • -6.61, sjeng
    • -6.40, h264ref
    • -5.31, omnetpp
    • -2.83, libquantum
    • -2.30, bzip2
    • -2.11, mcf
    • -1.65, perlbench
    • -1.49, gobmk
    • -1.11, hmmer
    • 7.11, gcc
    • 9.26, astar
    • 2.00, xalancbmk
  • performance differece, trace, ARM
    • 483.xalancbmk   -12.53
    • 471.omnetpp     -9.11
    • 456.hmmer       -3.41
    • 429.mcf         -2.43
    • 473.astar       -2.02
    • 403.gcc         -1.33
    • 445.gobmk       -1.21
    • 400.perlbench -1.04
    • 464.h264ref     0.00
    • 401.bzip2       2.22
    • 458.sjeng       2.84
    • 462.libquantum 7.46
  • performance differece, region, ARM
    • 471.omnetpp     -16.53
    • 445.gobmk       -15.13
    • 483.xalancbmk   -12.62
    • 458.sjeng       -10.03
    • 462.libquantum -2.49
    • 473.astar       -1.83
    • 456.hmmer       -1.70
    • 401.bzip2       -0.47
    • 403.gcc         -0.16
    • 429.mcf         0.00
    • 400.perlbench 0.58


































2012年10月5日 星期五

Latex Symbols

Relation Symbols
SymbolScriptSymbolScriptSymbolScriptSymbolScriptSymbolScript
\leq\,\leq\geq\,\geq\equiv\,\equiv\models\,\models\prec\,\prec
\succ\,\succ\sim\,\sim\perp\,\perp\preceq\,\preceq\succeq\,\succeq
\simeq\,\simeq\mid\,\mid\ll\,\ll\gg\,\gg\asymp\,\asymp
\parallel\,\parallel\subset\,\subset\supset\,\supset\approx\,\approx\bowtie\,\bowtie
\subseteq\,\subseteq\supseteq\,\supseteq\cong\,\cong\sqsubset\,\sqsubset\sqsupset\,\sqsupset
\neq\,\neq\smile\,\smile\sqsubseteq\,\sqsubseteq\sqsupseteq\,\sqsupseteq\doteq\,\doteq
\frown\,\frown\in\,\in\ni\,\ni\notin\,\notin\propto\,\propto
\vdash\,\vdash\dashv\,\dashv<\,<>\,>=\,=
Binary Operations
SymbolScriptSymbolScriptSymbolScriptSymbolScript
\pm\,\pm\cap\,\cap\diamond\,\diamond\oplus\,\oplus
\mp\,\mp\cup\,\cup\bigtriangleup\,\bigtriangleup\ominus\,\ominus
\times\,\times\uplus\,\uplus\bigtriangledown\,\bigtriangledown\otimes\,\otimes
\div\,\div\sqcap\,\sqcap\triangleleft\,\triangleleft\oslash\,\oslash
\ast\,\ast\sqcup\,\sqcup\triangleright\,\triangleright\odot\,\odot
\star\,\star\vee\,\vee\bigcirc\,\bigcirc\circ\,\circ
\dagger\,\dagger\wedge\,\wedge\bullet\,\bullet\setminus\,\setminus
\ddagger\,\ddagger\cdot\,\cdot\wr\,\wr\amalg\,\amalg
Set and/or Logic Notation
SymbolScriptSymbolScript
\exists\,\exists\rightarrow\,\rightarrow or \to
\nexists\,\nexists\leftarrow\,\leftarrow or \gets
\forall\,\forall\mapsto\,\mapsto
\neg\,\neg\implies\,\implies
\subset\,\subset\Rightarrow\,\Rightarrow (preferred for implication)
\supset\,\supset\leftrightarrow\,\leftrightarrow
\in\,\in\iff\,\iff
\notin\,\notin\Leftrightarrow\,\Leftrightarrow (preferred for equivalence (iff))
\ni\,\ni\top\,\top
\land\,\land\bot\,\bot
\lor\,\lor\emptyset\, and \varnothing\,\emptyset and \varnothing
Delimiters
SymbolScriptSymbolScriptSymbolScriptSymbolScript
|\,|\|\,\|/\,/\backslash\,\backslash
\{\,\{\}\,\}\langle\,\langle\rangle\,\rangle
\uparrow\,\uparrow\Uparrow\,\Uparrow\lceil\,\lceil\rceil\,\rceil
\downarrow\,\downarrow\Downarrow\,\Downarrow\lfloor\,\lfloor\rfloor\,\rfloor
Greek Letters
SymbolScriptSymbolScript
\Alpha\, and \alpha\,\Alpha and \alpha\Nu\, and \nu\,\Nu and \nu
\Beta\, and \beta\,\Beta and \beta\Xi\, and \xi\,\Xi and \xi
\Gamma\, and \gamma\,\Gamma and \gamma\Omicron\, and \omicron\,\Omicron and \omicron
\Delta\, and \delta\,\Delta and \delta\Pi\,\pi\, and \varpi\Pi\pi and \varpi
\Epsilon\,\epsilon\, and \varepsilon\,\Epsilon\epsilon and \varepsilon\Rho\,\rho\, and \varrho\,\Rho\rho and \varrho
\Zeta\, and \zeta\,\Zeta and \zeta\Sigma\,\sigma\, and \varsigma\,\Sigma\sigma and \varsigma
\Eta\, and \eta\,\Eta and \eta\Tau\, and \tau\,\Tau and \tau
\Theta\,\theta\, and \vartheta\,\Theta\theta and \vartheta\Upsilon\, and \upsilon\,\Upsilon and \upsilon
\Iota\, and \iota\,\Iota and \iota\Phi\,\phi\,, and \varphi\,\Phi\phi and \varphi
\Kappa\,\kappa\, and \varkappa\,\Kappa\kappa and \varkappa\Chi\, and \chi\,\Chi and \chi
\Lambda\, and \lambda\,\Lambda and \lambda\Psi\, and \psi\,\Psi and \psi
\Mu\, and \mu\,\Mu and \mu\Omega\, and \omega\,\Omega and \omega
Other symbols
SymbolScriptSymbolScriptSymbolScriptSymbolScriptSymbolScript
\partial\,\partial\imath\,\imath\Re\,\Re\nabla\,\nabla\aleph\,\aleph
\eth\,\eth\jmath\,\jmath\Im\,\Im\Box\,\Box\beth\,\beth
\hbar\,\hbar\ell\,\ell\wp\,\wp\infty\,\infty\gimel\,\gimel
Trigonometric Functions
SymbolScriptSymbolScriptSymbolScriptSymbolScript
\sin\,\sin\arcsin\,\arcsin\sinh\,\sinh\sec\,\sec
\cos\,\cos\arccos\,\arccos\cosh\,\cosh\csc\,\csc
\tan\,\tan\arctan\,\arctan\tanh\,\tanh
\cot\,\cot\arccot\,\arccot\coth\,\coth