工作日誌: 11月 2012

2012年11月30日星期五

Worklog

find equivalent neon instruction for the following SSE instruction:
PCMPEQBrr
PCMPEQDrr
PSLLDri
PSUBUSBrr
PUNPCKHBWrr
PUNPCKHWDrr
PUNPCKLBWrr
PUNPCKLDQrr
PUNPCKLQDQrr
PUNPCKLWDrr

Trace Mode, CINT2006 reference input:
400.perlbench    9770    6657         1.47 *
401.bzip2        9650    2623               VE
403.gcc          8050    6133               RE
429.mcf          9120       9.01            RE
445.gobmk       10490   10356         1.01 *
456.hmmer        9330    6044         1.54 *
458.sjeng       12100   10927         1.11 *
462.libquantum 20720   22428         0.924 *
464.h264ref     22130   11599         1.91 *
471.omnetpp      6250     204               RE
473.astar        7020    4235         1.66 *
483.xalancbmk    6900    6767         1.02 *

Fail to run 401, 403, 429, 471
429.mcf needs 839 MB buf we only have 913MB in ARM, not enough memory.
After creating a swap of 1G, mcf can run successfully.
in top, only 17MB in SWAP area.

2012年11月29日星期四

IBTC + Victim Performance Evaluation

   IBTC    IBTC+Victim
2^11 0.997770 0.999741
2^12 0.999387 0.999932
2^13 0.999725 0.999981
2^14 0.999857 0.999992
2^15 0.999887
2^16 0.999949
2^17 0.999955
2^18 0.999995

Block Mode
Performance (No inline)
IBTC(2^11)+Victim
T: 369.863281   0.545624        138.762726      0.000000        230.276672      0.278259
IBTC(2^18)
T: 375.984528   0.372223        138.313660      0.000000        237.024841      0.273804

Performance - Inline
IBTC(2^11)+Victim
T: 367.764587   0.378113        141.984161      0.000000        225.127594      0.274719
IBTC(2^18)
T: 373.440033   0.549896        145.442383      0.000000        227.171478      0.276276
IBTC(2^12)+Victim (make ibtc table 2^16 KB)
T: 4m20s

Hanging at SSH2_MSG_SERVICE_ACCEPT received

Question: The connection to remote server is slow, and after use -v, it hangs at SSH2_MSG_SERVICE_ACCEPT received.
Solution:
edit /etc/ssh/sshd_config and add one line:
UseDNS no

restart ssh service and done.

2012年11月28日星期三

work log

*** longjmp causes uninitialized stack frame ***

longjmp corrupt stack exception
abort execution
maybe cpu state content got wrongly overwritten.
R7 base register is overwritten!
Mark R7 as ReservedReg in ARMBaseRegisterInfo.cpp

mmap memory manager is broken; stop using it until we fix it.

Constant Pool related information and bugs

http://weblogs.java.net/blog/mlam/archive/2008/03/cvm_jit_constan.html

constant pool bug:
Trace fragments are usually much bigger than block fragments. LLVM ARM JIT fails to put constant pool within the range of load instructions with immediate offset (ranging from 4096 to -4096 bytes), which is referred to as the out-of-range bug.
My first thought is the ARM JIT forget to take out-of-range bug into consideration, but ARMConstantIslandPass does take this into consideration. And, immediately, I found that the source of this bug is the wrongly calculated offset.
Why would this happen in trace mode? Well, this is because I add one intrinsic BLOCKLINK which takes 24 bytes but I didn't update GetInstSizeInBytes() in ARMBaseInstrInfo. So, after I add this information in GetInstSizeInBytes(), the offset is correctly calculated.

2012年11月27日星期二

Work Log

Initialize() -> StartAll() -> Create Queues and Start each threads QCond.Initialize()->
Loop() -> TryGenerateTrace() -> QCond.Wait() - until start_=false
^|----------------------------------------------------|

Before Fork
StopAll() -> set start_ to false for all threads -> -> QCond.Destroy()

After Fork
StartAll()

After inserting tasks into queue, call QCond.Wake()

http://weblogs.java.net/blog/mlam/archive/2008/03/cvm_jit_constan.html

2012年11月16日星期五

Related Works

[TODO: Add paper links to all related papers and top 10 to read ]

IBM PowerVM Lx86 http://www.ibm.com/developerworks/linux/lx86/index.html

PDF file: http://www.redbooks.ibm.com/redpapers/pdfs/redp4298.pdf

FX!32
Dynamo
Advances and Future Challenges in Binary Translation and Optimization, PROCEEDINGS OF THE IEEE, VOL. 89, NO. 11, NOVEMBER 2001
Design and Engineering of a Dynamic Binary Optimizer, PROCEEDINGS OF THE IEEE, VOL. 93, NO. 2, FEBRUARY 2005
Precise Exception Semantics in Dynamic Compilation, Proceeding CC '02 Proceedings of the 11th International Conference on Compiler Construction
Transmeta
UQBT, Walkabout
DynamoRIO

Persistent code cache

PIN
Valgrind
StartDBT, HDTrans
Dr.Memory
QEMU

2012年11月9日星期五

work log

@20121109 - 10:50 AM, Run CINT2006, train input, ARM host, block mode.

expect results: Perlbench, Omnetpp fails, others should be OK.
waiting...
400.perlbench -- 1091 -- S
401.bzip2 -- 730 -- S
403.gcc -- 427 -- S
429.mcf -- 314 -- S
445.gobmk -- 3792 -- S
456.hmmer -- 786 -- S
458.sjeng -- 3158 -- S
462.libquantum -- 71.3 -- S
464.h264ref -- 1647 -- S
471.omnetpp NR
473.astar -- 1075 -- S
483.xalancbmk -- 2544 -- S

Same as above except in trace mode.

expected result: hmmer hang in
462.libquantum hang!
=========================================
400.perlbench -- 892 -- S
401.bzip2 -- 656 -- S
403.gcc -- 363 -- S
429.mcf -- 303 -- S
445.gobmk -- 3022 VE
456.hmmer NR
458.sjeng -- 2378 -- S
462.libquantum -- 13802 RE
464.h264ref -- 250 RE
471.omnetpp -- 96.7 RE
473.astar -- 984 -- S
483.xalancbmk -- 1688 -- S
=========================================

@20121110 Run ref input

Block mode.
Perlbench hang! input splitmail; in I == E error.
bzip2 miscompared
gcc mis-compared
mcf hang!
-----------------------------------------------------------------------------------
Error: 1x400.perlbench 1x401.bzip2 1x403.gcc 1x429.mcf 1x464.h264ref 1x471.omnetpp
Success: 1x445.gobmk 1x456.hmmer 1x458.sjeng 1x462.libquantum 1x473.astar 1x483.xalancbmk
-----------------------------------------------------------------------------------
400.perlbench 9770 42463 RE
401.bzip2 9650 3163 VE
403.gcc 8050 8501 RE
429.mcf 9120 26049 RE
445.gobmk 10490 19462 0.539 *
456.hmmer 9330 7932 1.18 *
458.sjeng 12100 24805 0.488 *
462.libquantum 20720 22558 0.918 *
464.h264ref 22130 1937 RE
471.omnetpp 6250 238 RE
473.astar 7020 5281 1.33 *
483.xalancbmk 6900 7353 0.938 *

2012年11月8日星期四

work log

Optimization threads use polling to probe tasks in task queue.

It uses 17% CPU just polling empty task queue continuously.
Should change to conditional wait approach!

456.hmmer is trapped in a infinite loop when running in trace mode.

Is it because of traces? Or it is due to the ``O0'' compiled code?
Just execution ``O0'' in block mode, hmmer can successfully complete.
So, it is traces' fault!!! NOT GOOD!

Status of trace mode:

401.bzip2: OK, 142s
403.gcc: OK, 416s
429.mcf: OK, 56s
445.gobmk: OK, 804s
456.hmmer, NOT OK, infinite loop

Due to generated traces.

458.sjeng -- 116 -- S
462.libquantum -- 14.0 -- S
464.h264ref -- 256 RE (SegFault)
473.astar -- 100 -- S
483.xalancbmk -- 171 -- S

Debug 456.hmmer

Check MI used by traces and compared with those used in blocks.

Fail! they are the same

will debuggingix h264ref be slightly easier?

2012年11月7日星期三

weblog

The Problem:

When set to llvm::CodeGenOpt::None, some execution can cause segfault on ARM host.

Reduced Test Case:

Found in 483.xalanc benchmark

movd %edi,%xmm1
pshufd $0x0,%xmm1,%xmm0
mov 0x24(%esp),%ebx
lea (%ebx,%ecx,4),%ecx
mov %ecx,0x14(%esp)
xor %ecx,%ecx
mov 0x14(%esp),%ebx
movdqa %xmm0,(%ebx)
add $0x1,%ecx
add $0x10,%ebx
cmp %ebp,%ecx
jb _end

Reason:

The generated ARM code contains the instruction:

vld1.64 {d0-d1}, [sp, :128]

which requires $sp to be 16-byte (128bit) aligned.

BUT! $sp does not 16-byte aligned!

The interesting thing is, after execution this instruction, it did not throw any exception. Instead, the value of $sp changes! Therefore, any instruction that accesses the stack cause segfault.

Solution:

make sure the $sp is at least 32-byte aligned in the prologue.

Code in prologue generated by TCG ARM:

Before:

---------------------------------------------------------
push {r4, r5, r6, r8, r9, r10, r11, lr}

sub sp, sp, #128 ; 0x80 # reserve some space
bx r0 # go to code cache

pop {r4, r5, r6, r8, r9, r10, r11, pc}

----------------------------------------------------------
After
----------------------------------------------------------

push {r4, r5, r6, r8, r9, r10, r11, lr}

st sp, [r7, xxx] # store stack pointer

sub sp, sp, #65536 ; 0x10000 # reserve some space
bic sp, sp, 0x1f # align to 32-byte

bx r0 # go to code cache

ld sp, [r7, xxx] # restore stack pointer

pop {r4, r5, r6, r8, r9, r10, r11, pc}

CP1
======================================================

工作日誌

2012年11月30日星期五

Worklog

Trace Mode, Ref Input

2012年11月29日星期四

IBTC + Victim Performance Evaluation

Hanging at SSH2_MSG_SERVICE_ACCEPT received

2012年11月28日星期三

work log

Constant Pool related information and bugs

2012年11月27日星期二

Work Log

2012年11月16日星期五

Related Works

2012年11月9日星期五

work log

2012年11月8日星期四

work log

2012年11月7日星期三

weblog

The Problem:

Reduced Test Case:

Reason:

Solution:

Code in prologue generated by TCG ARM:

關於我自己

網誌存檔

2012年11月30日 星期五

2012年11月29日 星期四

2012年11月28日 星期三

2012年11月27日 星期二

2012年11月16日 星期五

2012年11月9日 星期五

2012年11月8日 星期四

2012年11月7日 星期三

The Problem:

Reduced Test Case:

Reason:

Solution:

Code in prologue generated by TCG ARM:

關於我自己

網誌存檔

2012年11月30日星期五

2012年11月29日星期四

2012年11月28日星期三

2012年11月27日星期二

2012年11月16日星期五

2012年11月9日星期五

2012年11月8日星期四

2012年11月7日星期三