工作日誌: 6月 2012

2012年6月28日星期四

Work Log

About zero length array: http://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html .
https://wiki.linaro.org/PeterMaydell/QemuVersatileExpress Linaro QEMU V.Express support

(IMG=vexpress.img ; if [ -e "$IMG" ] ; then sudo mount -o loop,offset="$(file "$IMG" | awk 'BEGIN { RS=";"; } /partition 2/ { print $7*512; }')" -t auto "$IMG" /mnt/mnt; else echo "$IMG not found"; fi )

Linaro Android QEMU V.Express: https://wiki.linaro.org/KenWerner/Sandbox/AndroidQEMU

vmlinuz and initrd.gz is in uImage and uInitrd: dd if=uImage skip=64 bs=1 to extract them

Use reboot to shutdown Andriod
ARM-VExpress image: http://releases.linaro.org/12.05/ubuntu/vexpress/

2012年6月27日星期三

Think Flow

tcg_livness_analysis, if opcode is qemu_ld or qemu_st, set all globals alive.
qemu_ld is OK now.
restore to morning status
qemu 1.01 can exit qemu when poweroff
chaos
qemu_st, fail, no output, trap in some sort of loops?
just forget what I'm going to do after viewing some web pages....

2012年6月22日星期五

Change the location of SAVE_DIRTY_STATES
Change the conditional branch of tcg_out_tlb_load from
~~It seems PAGE FAULT is just a result, because wrong PATH taken~~

WRONG! it is a mistake.

STRANGE THINK HAPPEN, BE CAREFUL, FOCUS

Think flow

I modify the conditional branch of load_tlb from JNE to JE, and related code.

Running in original QEMU can boot ARM-Linux, so far so good.

I change the location of SAVE_DIRTY_STATES, call it QEMU_TK;
QEMU_TK die after first page fault happen

Question: what's the difference between QEMU and QEMU_TK ?
difference means the state of

OK, we need to restore states back

2012年6月21日星期四

Thinking Flow

study qemu code: tcg_reg_alloc_op(): 1708

what is fixed_reg TCGemp

in tcg_global_reg_new_internal, fixed_reg is set 1
in tcg_global_mem_new_internal, fixed_reg is set 0
so, it seems it indicates whether this temp is register or not

In TCGContext, what is reg_to_temp?

in tcg_reg_alloc(), s->reg_to_temp[reg] decides whether the HOST register is mapped to any TCGTemp.
So I think, reg_to_temp indicates current HOST reg represents reg_to_temp[reg].

What is val_type in TCGTemp?

NOT CLEAR
It seems it indicates the current type of this temp.
It is possible that ts->fixed_reg && ts->val_type == TEMP_VAL_MEM; or NOT ts->fixed_reg and ts->val_type == TEMP_VAL_REG.

When does TCGArgDef args_ct set?

TCG: tcg_gen_code_common

In TCGContext:
/* liveness analysis */
uint16_t *op_dead_iargs;
/* for each operation, each bit tells if the corresponding input argument is dead */

what is tcg_op_defs

In: tcg_liveness_analysis, tcg/tcg.c: 1187
backward scan

NOTE: tcg_opc.h: definition of TCG opcodes (a.k.a TCG IR)
So, remove qemu_ld/st TCG_OPF_CALL_CLOBBER here

In tcg_liveness_analysis:
1292 } else if (def->flags & TCG_OPF_CALL_CLOBBER) {
1293 /* globals are live */
1294 memset(dead_temps, 0, s->nb_globals);
1295 }

Question: if we remove TCG_OPF_CALL_CLOBBER of qemu_ld/st, will this be a problem?

In: tcg_reg_alloc_op:

1708 if (def->flags & TCG_OPF_CALL_CLOBBER) {
1709 /* XXX: permit generic clobber register list ? */
1710 for(reg ex= 0; reg < TCG_TARGET_NB_REGS; reg++) {
1711 if (tcg_regset_test_reg(tcg_target_call_clobber_regs, reg)) {
1712 tcg_reg_free(s, reg);
1713 }
1714 }
1715 /* XXX: for load/store we could do that only for the slow path
1716 (i.e. when a memory callback is called) */
1717
1718 /* store globals and free associated registers (we assume the insn
1719 can modify any global. */
1720 save_globals(s, allocated_regs);
1721 }

Question: what does Marsalis Wallace look like ? or
What does tcg_reg_free do?

It loops over tcg_target_call_clobber_regs and
if S->temps[reg]->mem_coherent is not true, store reg back to env->temp_buf

Question: what does save_globals do?

What does ``globals'' mean?
In tcg/README, A TCG "global" is a variable which is live in all the functions (equivalent of a C global variable). They are defined before the functions defined. A TCG global can be a memory location (e.g. a QEMU CPU register), a fixed host register (e.g. the QEMU CPU state pointer) or a memory location which is stored in a register outside QEMU TBs (not implemented yet).
call temp_save to save temp
In temp_save(), save temp to env->temp_buf

==================================================================

tcg_out_op() is called to generate code for the TCG opcode.

We are interested in tcg_out_qemu_ld/st

QUESTION:

Strange enough, I cannot find lines where to save guest register states back to their canonical locations.

I only saw save back to temp_buf in 1708.

That is exactly the place.

==================================================================

Remove TCG_OPF_CALL_CLOBBER in qemu_ld

move save_dirty_state when TLB miss

program fail when the first PAGE FAULT occurs.

should compare REG contents between my version and original version

==================================================================

2012年6月20日星期三

ARM MMU introduce

ARM MMU introduce (FROM: http://www.liacs.nl/~krietvel/courses/aca2011/arm-mmu.pdf)

2012年6月19日星期二

QEMU ARM USE

when using nographic, you cannot use --<2> to switch to monitor screen since there is no graphic anymore.

Instead, use -curses

then monitor will show

ALSO, use -monitor stdio

how to change runlevel through kernel parameter append

JUST ADD THE NUMBER OF RUNLEVEL
EXAMPLE:
"root=/dev/sdb1 console=/dev/ttyAMA0 2 "

HOW to mount qcow image used by QEMU

HOW to mount qcow image used by QEMU
http://blog.loftninjas.org/2008/10/27/mounting-kvm-qcow2-qemu-disk-images/

2012年6月8日星期五

ARM v7 Instruction Manual

2012年6月6日星期三

ARM online reference site

http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0489c/Cjagdjbf.html

i7 currently RUNNING experiements

i7 currently RUNNING experiments:
TRACE_MERGE
TRACE
TRACE_NET_ORIG

Each configuration run 4 benchmark set: CINT-ARM, CINT-IA32, CFP-IA32, CFP_VECTOR-IA32
Each benchmark run 5 times.
There are 3 * 4 * 5 = 120 benchmarks need to run
estimate hours: 120 * 15000 sec = 20 days
6/26 will finish all runs!

Producing Wrong Data Without Doing Anything Obviously Wrong

LnQ Region Performance

r531 V.S r521

OMNETPP and XALANCBMK has performance down 15% and 8%.

SPECvirt_sc2010

SPECvirt_sc2010: SPEC's first benchmark addressing performance evaluation of datacenter servers used in virtualized server consolidation.

2012年6月5日星期二

statically build OpenMP program

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39176#c7
we have to link pthread ourselves.
Add
-Wl,--whole-archive -lpthread -Wl,--no-whole-archive

gcc sse builtin functions

http://gcc.gnu.org/onlinedocs/gcc-4.0.4/gcc/X86-Built_002din-Functions.html

     v8qi __builtin_ia32_paddb (v8qi, v8qi)
     v4hi __builtin_ia32_paddw (v4hi, v4hi)
     v2si __builtin_ia32_paddd (v2si, v2si)
     v8qi __builtin_ia32_psubb (v8qi, v8qi)
     v4hi __builtin_ia32_psubw (v4hi, v4hi)
     v2si __builtin_ia32_psubd (v2si, v2si)
     v8qi __builtin_ia32_paddsb (v8qi, v8qi)
     v4hi __builtin_ia32_paddsw (v4hi, v4hi)
     v8qi __builtin_ia32_psubsb (v8qi, v8qi)
     v4hi __builtin_ia32_psubsw (v4hi, v4hi)
     v8qi __builtin_ia32_paddusb (v8qi, v8qi)
     v4hi __builtin_ia32_paddusw (v4hi, v4hi)
     v8qi __builtin_ia32_psubusb (v8qi, v8qi)
     v4hi __builtin_ia32_psubusw (v4hi, v4hi)
     v4hi __builtin_ia32_pmullw (v4hi, v4hi)
     v4hi __builtin_ia32_pmulhw (v4hi, v4hi)
     di __builtin_ia32_pand (di, di)
     di __builtin_ia32_pandn (di,di)
     di __builtin_ia32_por (di, di)
     di __builtin_ia32_pxor (di, di)
     v8qi __builtin_ia32_pcmpeqb (v8qi, v8qi)
     v4hi __builtin_ia32_pcmpeqw (v4hi, v4hi)
     v2si __builtin_ia32_pcmpeqd (v2si, v2si)
     v8qi __builtin_ia32_pcmpgtb (v8qi, v8qi)
     v4hi __builtin_ia32_pcmpgtw (v4hi, v4hi)
     v2si __builtin_ia32_pcmpgtd (v2si, v2si)
     v8qi __builtin_ia32_punpckhbw (v8qi, v8qi)
     v4hi __builtin_ia32_punpckhwd (v4hi, v4hi)
     v2si __builtin_ia32_punpckhdq (v2si, v2si)
     v8qi __builtin_ia32_punpcklbw (v8qi, v8qi)
     v4hi __builtin_ia32_punpcklwd (v4hi, v4hi)
     v2si __builtin_ia32_punpckldq (v2si, v2si)
     v8qi __builtin_ia32_packsswb (v4hi, v4hi)
     v4hi __builtin_ia32_packssdw (v2si, v2si)
     v8qi __builtin_ia32_packuswb (v4hi, v4hi)

The following built-in functions are made available either with -msse, or with a combination of -m3dnow and -march=athlon. All of them generate the machine instruction that is part of the name.

     v4hi __builtin_ia32_pmulhuw (v4hi, v4hi)
     v8qi __builtin_ia32_pavgb (v8qi, v8qi)
     v4hi __builtin_ia32_pavgw (v4hi, v4hi)
     v4hi __builtin_ia32_psadbw (v8qi, v8qi)
     v8qi __builtin_ia32_pmaxub (v8qi, v8qi)
     v4hi __builtin_ia32_pmaxsw (v4hi, v4hi)
     v8qi __builtin_ia32_pminub (v8qi, v8qi)
     v4hi __builtin_ia32_pminsw (v4hi, v4hi)
     int __builtin_ia32_pextrw (v4hi, int)
     v4hi __builtin_ia32_pinsrw (v4hi, int, int)
     int __builtin_ia32_pmovmskb (v8qi)
     void __builtin_ia32_maskmovq (v8qi, v8qi, char *)
     void __builtin_ia32_movntq (di *, di)
     void __builtin_ia32_sfence (void)

The following built-in functions are available when -msse is used. All of them generate the machine instruction that is part of the name.

     int __builtin_ia32_comieq (v4sf, v4sf)
     int __builtin_ia32_comineq (v4sf, v4sf)
     int __builtin_ia32_comilt (v4sf, v4sf)
     int __builtin_ia32_comile (v4sf, v4sf)
     int __builtin_ia32_comigt (v4sf, v4sf)
     int __builtin_ia32_comige (v4sf, v4sf)
     int __builtin_ia32_ucomieq (v4sf, v4sf)
     int __builtin_ia32_ucomineq (v4sf, v4sf)
     int __builtin_ia32_ucomilt (v4sf, v4sf)
     int __builtin_ia32_ucomile (v4sf, v4sf)
     int __builtin_ia32_ucomigt (v4sf, v4sf)
     int __builtin_ia32_ucomige (v4sf, v4sf)
     v4sf __builtin_ia32_addps (v4sf, v4sf)
     v4sf __builtin_ia32_subps (v4sf, v4sf)
     v4sf __builtin_ia32_mulps (v4sf, v4sf)
     v4sf __builtin_ia32_divps (v4sf, v4sf)
     v4sf __builtin_ia32_addss (v4sf, v4sf)
     v4sf __builtin_ia32_subss (v4sf, v4sf)
     v4sf __builtin_ia32_mulss (v4sf, v4sf)
     v4sf __builtin_ia32_divss (v4sf, v4sf)
     v4si __builtin_ia32_cmpeqps (v4sf, v4sf)
     v4si __builtin_ia32_cmpltps (v4sf, v4sf)
     v4si __builtin_ia32_cmpleps (v4sf, v4sf)
     v4si __builtin_ia32_cmpgtps (v4sf, v4sf)
     v4si __builtin_ia32_cmpgeps (v4sf, v4sf)
     v4si __builtin_ia32_cmpunordps (v4sf, v4sf)
     v4si __builtin_ia32_cmpneqps (v4sf, v4sf)
     v4si __builtin_ia32_cmpnltps (v4sf, v4sf)
     v4si __builtin_ia32_cmpnleps (v4sf, v4sf)
     v4si __builtin_ia32_cmpngtps (v4sf, v4sf)
     v4si __builtin_ia32_cmpngeps (v4sf, v4sf)
     v4si __builtin_ia32_cmpordps (v4sf, v4sf)
     v4si __builtin_ia32_cmpeqss (v4sf, v4sf)
     v4si __builtin_ia32_cmpltss (v4sf, v4sf)
     v4si __builtin_ia32_cmpless (v4sf, v4sf)
     v4si __builtin_ia32_cmpunordss (v4sf, v4sf)
     v4si __builtin_ia32_cmpneqss (v4sf, v4sf)
     v4si __builtin_ia32_cmpnlts (v4sf, v4sf)
     v4si __builtin_ia32_cmpnless (v4sf, v4sf)
     v4si __builtin_ia32_cmpordss (v4sf, v4sf)
     v4sf __builtin_ia32_maxps (v4sf, v4sf)
     v4sf __builtin_ia32_maxss (v4sf, v4sf)
     v4sf __builtin_ia32_minps (v4sf, v4sf)
     v4sf __builtin_ia32_minss (v4sf, v4sf)
     v4sf __builtin_ia32_andps (v4sf, v4sf)
     v4sf __builtin_ia32_andnps (v4sf, v4sf)
     v4sf __builtin_ia32_orps (v4sf, v4sf)
     v4sf __builtin_ia32_xorps (v4sf, v4sf)
     v4sf __builtin_ia32_movss (v4sf, v4sf)
     v4sf __builtin_ia32_movhlps (v4sf, v4sf)
     v4sf __builtin_ia32_movlhps (v4sf, v4sf)
     v4sf __builtin_ia32_unpckhps (v4sf, v4sf)
     v4sf __builtin_ia32_unpcklps (v4sf, v4sf)
     v4sf __builtin_ia32_cvtpi2ps (v4sf, v2si)
     v4sf __builtin_ia32_cvtsi2ss (v4sf, int)
     v2si __builtin_ia32_cvtps2pi (v4sf)
     int __builtin_ia32_cvtss2si (v4sf)
     v2si __builtin_ia32_cvttps2pi (v4sf)
     int __builtin_ia32_cvttss2si (v4sf)
     v4sf __builtin_ia32_rcpps (v4sf)
     v4sf __builtin_ia32_rsqrtps (v4sf)
     v4sf __builtin_ia32_sqrtps (v4sf)
     v4sf __builtin_ia32_rcpss (v4sf)
     v4sf __builtin_ia32_rsqrtss (v4sf)
     v4sf __builtin_ia32_sqrtss (v4sf)
     v4sf __builtin_ia32_shufps (v4sf, v4sf, int)
     void __builtin_ia32_movntps (float *, v4sf)
     int __builtin_ia32_movmskps (v4sf)

The following built-in functions are available when -msse is used.

v4sf __builtin_ia32_loadaps (float *): Generates the movaps machine instruction as a load from memory.
void __builtin_ia32_storeaps (float *, v4sf): Generates the movaps machine instruction as a store to memory.
v4sf __builtin_ia32_loadups (float *): Generates the movups machine instruction as a load from memory.
void __builtin_ia32_storeups (float *, v4sf): Generates the movups machine instruction as a store to memory.
v4sf __builtin_ia32_loadsss (float *): Generates the movss machine instruction as a load from memory.
void __builtin_ia32_storess (float *, v4sf): Generates the movss machine instruction as a store to memory.
v4sf __builtin_ia32_loadhps (v4sf, v2si *): Generates the movhps machine instruction as a load from memory.
v4sf __builtin_ia32_loadlps (v4sf, v2si *): Generates the movlps machine instruction as a load from memory
void __builtin_ia32_storehps (v4sf, v2si *): Generates the movhps machine instruction as a store to memory.
void __builtin_ia32_storelps (v4sf, v2si *): Generates the movlps machine instruction as a store to memory.

The following built-in functions are available when -msse2 is used. All of them generate the machine instruction that is part of the name.

     int __builtin_ia32_comisdeq (v2df, v2df)
     int __builtin_ia32_comisdlt (v2df, v2df)
     int __builtin_ia32_comisdle (v2df, v2df)
     int __builtin_ia32_comisdgt (v2df, v2df)
     int __builtin_ia32_comisdge (v2df, v2df)
     int __builtin_ia32_comisdneq (v2df, v2df)
     int __builtin_ia32_ucomisdeq (v2df, v2df)
     int __builtin_ia32_ucomisdlt (v2df, v2df)
     int __builtin_ia32_ucomisdle (v2df, v2df)
     int __builtin_ia32_ucomisdgt (v2df, v2df)
     int __builtin_ia32_ucomisdge (v2df, v2df)
     int __builtin_ia32_ucomisdneq (v2df, v2df)
     v2df __builtin_ia32_cmpeqpd (v2df, v2df)
     v2df __builtin_ia32_cmpltpd (v2df, v2df)
     v2df __builtin_ia32_cmplepd (v2df, v2df)
     v2df __builtin_ia32_cmpgtpd (v2df, v2df)
     v2df __builtin_ia32_cmpgepd (v2df, v2df)
     v2df __builtin_ia32_cmpunordpd (v2df, v2df)
     v2df __builtin_ia32_cmpneqpd (v2df, v2df)
     v2df __builtin_ia32_cmpnltpd (v2df, v2df)
     v2df __builtin_ia32_cmpnlepd (v2df, v2df)
     v2df __builtin_ia32_cmpngtpd (v2df, v2df)
     v2df __builtin_ia32_cmpngepd (v2df, v2df)
     v2df __builtin_ia32_cmpordpd (v2df, v2df)
     v2df __builtin_ia32_cmpeqsd (v2df, v2df)
     v2df __builtin_ia32_cmpltsd (v2df, v2df)
     v2df __builtin_ia32_cmplesd (v2df, v2df)
     v2df __builtin_ia32_cmpunordsd (v2df, v2df)
     v2df __builtin_ia32_cmpneqsd (v2df, v2df)
     v2df __builtin_ia32_cmpnltsd (v2df, v2df)
     v2df __builtin_ia32_cmpnlesd (v2df, v2df)
     v2df __builtin_ia32_cmpordsd (v2df, v2df)
     v2di __builtin_ia32_paddq (v2di, v2di)
     v2di __builtin_ia32_psubq (v2di, v2di)
     v2df __builtin_ia32_addpd (v2df, v2df)
     v2df __builtin_ia32_subpd (v2df, v2df)
     v2df __builtin_ia32_mulpd (v2df, v2df)
     v2df __builtin_ia32_divpd (v2df, v2df)
     v2df __builtin_ia32_addsd (v2df, v2df)
     v2df __builtin_ia32_subsd (v2df, v2df)
     v2df __builtin_ia32_mulsd (v2df, v2df)
     v2df __builtin_ia32_divsd (v2df, v2df)
     v2df __builtin_ia32_minpd (v2df, v2df)
     v2df __builtin_ia32_maxpd (v2df, v2df)
     v2df __builtin_ia32_minsd (v2df, v2df)
     v2df __builtin_ia32_maxsd (v2df, v2df)
     v2df __builtin_ia32_andpd (v2df, v2df)
     v2df __builtin_ia32_andnpd (v2df, v2df)
     v2df __builtin_ia32_orpd (v2df, v2df)
     v2df __builtin_ia32_xorpd (v2df, v2df)
     v2df __builtin_ia32_movsd (v2df, v2df)
     v2df __builtin_ia32_unpckhpd (v2df, v2df)
     v2df __builtin_ia32_unpcklpd (v2df, v2df)
     v16qi __builtin_ia32_paddb128 (v16qi, v16qi)
     v8hi __builtin_ia32_paddw128 (v8hi, v8hi)
     v4si __builtin_ia32_paddd128 (v4si, v4si)
     v2di __builtin_ia32_paddq128 (v2di, v2di)
     v16qi __builtin_ia32_psubb128 (v16qi, v16qi)
     v8hi __builtin_ia32_psubw128 (v8hi, v8hi)
     v4si __builtin_ia32_psubd128 (v4si, v4si)
     v2di __builtin_ia32_psubq128 (v2di, v2di)
     v8hi __builtin_ia32_pmullw128 (v8hi, v8hi)
     v8hi __builtin_ia32_pmulhw128 (v8hi, v8hi)
     v2di __builtin_ia32_pand128 (v2di, v2di)
     v2di __builtin_ia32_pandn128 (v2di, v2di)
     v2di __builtin_ia32_por128 (v2di, v2di)
     v2di __builtin_ia32_pxor128 (v2di, v2di)
     v16qi __builtin_ia32_pavgb128 (v16qi, v16qi)
     v8hi __builtin_ia32_pavgw128 (v8hi, v8hi)
     v16qi __builtin_ia32_pcmpeqb128 (v16qi, v16qi)
     v8hi __builtin_ia32_pcmpeqw128 (v8hi, v8hi)
     v4si __builtin_ia32_pcmpeqd128 (v4si, v4si)
     v16qi __builtin_ia32_pcmpgtb128 (v16qi, v16qi)
     v8hi __builtin_ia32_pcmpgtw128 (v8hi, v8hi)
     v4si __builtin_ia32_pcmpgtd128 (v4si, v4si)
     v16qi __builtin_ia32_pmaxub128 (v16qi, v16qi)
     v8hi __builtin_ia32_pmaxsw128 (v8hi, v8hi)
     v16qi __builtin_ia32_pminub128 (v16qi, v16qi)
     v8hi __builtin_ia32_pminsw128 (v8hi, v8hi)
     v16qi __builtin_ia32_punpckhbw128 (v16qi, v16qi)
     v8hi __builtin_ia32_punpckhwd128 (v8hi, v8hi)
     v4si __builtin_ia32_punpckhdq128 (v4si, v4si)
     v2di __builtin_ia32_punpckhqdq128 (v2di, v2di)
     v16qi __builtin_ia32_punpcklbw128 (v16qi, v16qi)
     v8hi __builtin_ia32_punpcklwd128 (v8hi, v8hi)
     v4si __builtin_ia32_punpckldq128 (v4si, v4si)
     v2di __builtin_ia32_punpcklqdq128 (v2di, v2di)
     v16qi __builtin_ia32_packsswb128 (v16qi, v16qi)
     v8hi __builtin_ia32_packssdw128 (v8hi, v8hi)
     v16qi __builtin_ia32_packuswb128 (v16qi, v16qi)
     v8hi __builtin_ia32_pmulhuw128 (v8hi, v8hi)
     void __builtin_ia32_maskmovdqu (v16qi, v16qi)
     v2df __builtin_ia32_loadupd (double *)
     void __builtin_ia32_storeupd (double *, v2df)
     v2df __builtin_ia32_loadhpd (v2df, double *)
     v2df __builtin_ia32_loadlpd (v2df, double *)
     int __builtin_ia32_movmskpd (v2df)
     int __builtin_ia32_pmovmskb128 (v16qi)
     void __builtin_ia32_movnti (int *, int)
     void __builtin_ia32_movntpd (double *, v2df)
     void __builtin_ia32_movntdq (v2df *, v2df)
     v4si __builtin_ia32_pshufd (v4si, int)
     v8hi __builtin_ia32_pshuflw (v8hi, int)
     v8hi __builtin_ia32_pshufhw (v8hi, int)
     v2di __builtin_ia32_psadbw128 (v16qi, v16qi)
     v2df __builtin_ia32_sqrtpd (v2df)
     v2df __builtin_ia32_sqrtsd (v2df)
     v2df __builtin_ia32_shufpd (v2df, v2df, int)
     v2df __builtin_ia32_cvtdq2pd (v4si)
     v4sf __builtin_ia32_cvtdq2ps (v4si)
     v4si __builtin_ia32_cvtpd2dq (v2df)
     v2si __builtin_ia32_cvtpd2pi (v2df)
     v4sf __builtin_ia32_cvtpd2ps (v2df)
     v4si __builtin_ia32_cvttpd2dq (v2df)
     v2si __builtin_ia32_cvttpd2pi (v2df)
     v2df __builtin_ia32_cvtpi2pd (v2si)
     int __builtin_ia32_cvtsd2si (v2df)
     int __builtin_ia32_cvttsd2si (v2df)
     long long __builtin_ia32_cvtsd2si64 (v2df)
     long long __builtin_ia32_cvttsd2si64 (v2df)
     v4si __builtin_ia32_cvtps2dq (v4sf)
     v2df __builtin_ia32_cvtps2pd (v4sf)
     v4si __builtin_ia32_cvttps2dq (v4sf)
     v2df __builtin_ia32_cvtsi2sd (v2df, int)
     v2df __builtin_ia32_cvtsi642sd (v2df, long long)
     v4sf __builtin_ia32_cvtsd2ss (v4sf, v2df)
     v2df __builtin_ia32_cvtss2sd (v2df, v4sf)
     void __builtin_ia32_clflush (const void *)
     void __builtin_ia32_lfence (void)
     void __builtin_ia32_mfence (void)
     v16qi __builtin_ia32_loaddqu (const char *)
     void __builtin_ia32_storedqu (char *, v16qi)
     unsigned long long __builtin_ia32_pmuludq (v2si, v2si)
     v2di __builtin_ia32_pmuludq128 (v4si, v4si)
     v8hi __builtin_ia32_psllw128 (v8hi, v2di)
     v4si __builtin_ia32_pslld128 (v4si, v2di)
     v2di __builtin_ia32_psllq128 (v4si, v2di)
     v8hi __builtin_ia32_psrlw128 (v8hi, v2di)
     v4si __builtin_ia32_psrld128 (v4si, v2di)
     v2di __builtin_ia32_psrlq128 (v2di, v2di)
     v8hi __builtin_ia32_psraw128 (v8hi, v2di)
     v4si __builtin_ia32_psrad128 (v4si, v2di)
     v2di __builtin_ia32_pslldqi128 (v2di, int)
     v8hi __builtin_ia32_psllwi128 (v8hi, int)
     v4si __builtin_ia32_pslldi128 (v4si, int)
     v2di __builtin_ia32_psllqi128 (v2di, int)
     v2di __builtin_ia32_psrldqi128 (v2di, int)
     v8hi __builtin_ia32_psrlwi128 (v8hi, int)
     v4si __builtin_ia32_psrldi128 (v4si, int)
     v2di __builtin_ia32_psrlqi128 (v2di, int)
     v8hi __builtin_ia32_psrawi128 (v8hi, int)
     v4si __builtin_ia32_psradi128 (v4si, int)
     v4si __builtin_ia32_pmaddwd128 (v8hi, v8hi)

The following built-in functions are available when -msse3 is used. All of them generate the machine instruction that is part of the name.

     v2df __builtin_ia32_addsubpd (v2df, v2df)
     v4sf __builtin_ia32_addsubps (v4sf, v4sf)
     v2df __builtin_ia32_haddpd (v2df, v2df)
     v4sf __builtin_ia32_haddps (v4sf, v4sf)
     v2df __builtin_ia32_hsubpd (v2df, v2df)
     v4sf __builtin_ia32_hsubps (v4sf, v4sf)
     v16qi __builtin_ia32_lddqu (char const *)
     void __builtin_ia32_monitor (void *, unsigned int, unsigned int)
     v2df __builtin_ia32_movddup (v2df)
     v4sf __builtin_ia32_movshdup (v4sf)
     v4sf __builtin_ia32_movsldup (v4sf)
     void __builtin_ia32_mwait (unsigned int, unsigned int)

The following built-in functions are available when -msse3 is used.

v2df __builtin_ia32_loadddup (double const *): Generates the movddup machine instruction as a load from memory.

The following built-in functions are available when -m3dnow is used. All of them generate the machine instruction that is part of the name.

     void __builtin_ia32_femms (void)
     v8qi __builtin_ia32_pavgusb (v8qi, v8qi)
     v2si __builtin_ia32_pf2id (v2sf)
     v2sf __builtin_ia32_pfacc (v2sf, v2sf)
     v2sf __builtin_ia32_pfadd (v2sf, v2sf)
     v2si __builtin_ia32_pfcmpeq (v2sf, v2sf)
     v2si __builtin_ia32_pfcmpge (v2sf, v2sf)
     v2si __builtin_ia32_pfcmpgt (v2sf, v2sf)
     v2sf __builtin_ia32_pfmax (v2sf, v2sf)
     v2sf __builtin_ia32_pfmin (v2sf, v2sf)
     v2sf __builtin_ia32_pfmul (v2sf, v2sf)
     v2sf __builtin_ia32_pfrcp (v2sf)
     v2sf __builtin_ia32_pfrcpit1 (v2sf, v2sf)
     v2sf __builtin_ia32_pfrcpit2 (v2sf, v2sf)
     v2sf __builtin_ia32_pfrsqrt (v2sf)
     v2sf __builtin_ia32_pfrsqrtit1 (v2sf, v2sf)
     v2sf __builtin_ia32_pfsub (v2sf, v2sf)
     v2sf __builtin_ia32_pfsubr (v2sf, v2sf)
     v2sf __builtin_ia32_pi2fd (v2si)
     v4hi __builtin_ia32_pmulhrw (v4hi, v4hi)

The following built-in functions are available when both -m3dnow and -march=athlon are used. All of them generate the machine instruction that is part of the name.

     v2si __builtin_ia32_pf2iw (v2sf)
     v2sf __builtin_ia32_pfnacc (v2sf, v2sf)
     v2sf __builtin_ia32_pfpnacc (v2sf, v2sf)
     v2sf __builtin_ia32_pi2fw (v2si)
     v2sf __builtin_ia32_pswapdsf (v2sf)
     v2si __builtin_ia32_pswapdsi (v2si)

2012年6月3日星期日

build parsec for ARM

reference document: http://www.cs.utexas.edu/~parsec_m5/TR-09-32.pdf
cross-compilation environment:
1. HOSTTYPE=arm
2. PATH=/path/to/fake/uname/bin:$PATH
content of /path/to/fake/uname/bin/uname:
===============================
$ cat ~/research/benchmarks/parsec-2.1-arm/fake-uname/uname
#!/bin/sh

/bin/uname $* | sed 's/i686/armv7l/g'

===============================

3. cross compilation tools: arm-linux-gnueabi-*

4. host machine is i686

Steps:

1. compile tools natively

$ parsecmgmt -a build -p tools

Note: for now, use native i686 compilation flags in gcc.bldconf

2. compile apps to ARM binary:

1. set BINARY_PREFIX options in gcc.bldconf

2.1 blackscholes : OK

2.2 bodytrack:

2.2.1 In pkgs/apps/bodytrack/src/config.h.in, comment out #undef malloc

before change:

/* Define to rpl_malloc if the replacement function should be used. */

#undef malloc

after change:

/* Define to rpl_malloc if the replacement function should be used. */

//#undef malloc

2.2.2 In pkgs/apps/bodytrack/parsec/gcc-pthread.bldconf, add --host and --build.

before:

# Arguments to pass to the configure script, if it exists

build_conf="--enable-threads --disable-openmp --disable-tbb"

after:

# Arguments to pass to the configure script, if it exists

build_conf="--enable-threads --disable-openmp --disable-tbb --build=i686-linux-gnu --host=arm-linux-gnueabi"

2.3: facesim: OK

2.4: ferret: depends on gsl and imagick, so build them first, see 2.5, and 2.6. OK

2.5: gsl:

2.5.1 In pkgs/libs/gsl/parsec/gcc.bldconf, add --host and --build.

before:

# Arguments to pass to the configure script, if it exists

build_conf="--disable-shared"

after:

# Arguments to pass to the configure script, if it exists

build_conf="--disable-shared --build=i686-linux-gnu --host=arm-linux-gnueabi"

2.6: imagick: In pkgs/libs/imagick/parsec/gcc.bldconf, add --host and --build.

before:

# Arguments to pass to the configure script, if it exists

build_conf="--disable-shared --without-perl --without-magick-plus-plus --without-bzlib --without-dps --without-djvu --without-fpx --without-gslib --without-jbig --with-jpeg --without-jp2 --without-tiff --without-wmf --without-zlib --without-x --without-fontconfig --without-freetype --without-lcms --without-png --without-gvc --without-openexr --without-rsvg --without-xml"

after:

# Arguments to pass to the configure script, if it exists

2.7: freqmine: OK

2.8: raytrace: SKIP. In order to compile raytrace, libX11 must be cross-compiled which requires cross-compiling the following libraries:

libX11

libXmu

libXext

libxcb

xproto

xextproto

xtrans

libpthread_stubs

libXau

kbproto

inputproto

jpeg

2.9: swaptions: OK

2.10: fluidanimate: OK

2.11: vips: depends on glib and libxml2. libxml2 and vips only need to add --build and --host.
2.11.1: remove -L${CC_HOME}/lib in config/gcc.bldconf
before:

export LDFLAGS="$STATIC -pthread -L${CC_HOME}/lib"

after:

export LDFLAGS="$STATIC -pthread"

2.12: glib: add --host and --build in pkgs/libs/glib/parsec/gcc.bldconf.

2.12.1
before:

# Arguments to pass to the configure script, if it exists

build_conf="--disable-shared --enable-threads --with-threads=posix"

after:

# Arguments to pass to the configure script, if it exists

build_conf="--disable-shared --enable-threads --with-threads=posix --build=i686-linux-gnu --host=arm-linux-gnueabi"
2.12.2 In pkgs/libs/glib/src/configure, add following line at line 43:

ac_cv_func_posix_getpwuid_r=no$
glib_cv_stack_grows=no$
glib_cv_uscore=no$

2.13: dedup: OK. depends on ssl, see 2.14
2.14: ssl: OK.
2.14.1change gcc to arm-linux-gnueabi-gcc in pkgs/libs/ssl/src/Configure.pl line 323
before:

"linux-generic32","gcc-:-DTERMIO -O3 -fomit-frame-pointer -Wall::-D_REENTRANT::-ldl:BN_LLONG RC4_CHAR RC4_CHUNK DES_INT DES_UNROLL BF_PTR:${no_asm}:dlfcn:linux-shared:-fPIC::.so.\$(SHLIB_MAJOR).\$(SHLIB_MINOR)",

after:

"linux-generic32","arm-linux-gnueabi-gcc-:-DTERMIO -O3 -fomit-frame-pointer -Wall::-D_REENTRANT::-ldl:BN_LLONG RC4_CHAR RC4_CHUNK DES_INT DES_UNROLL BF_PTR:${no_asm}:dlfcn:linux-shared:-fPIC::.so.\$(SHLIB_MAJOR).\$(SHLIB_MINOR)",

2.14.2 comment out line 975

before:

$cflags .= " -m32 ";

after:

#$cflags .= " -m32 ";

2.15: streamcluster: OK.
2.16: canneal: OK. need pkgs/kernels/canneal/src/atomic/arm/atomic.h.
2.16.1: pkgs/kernels/canneal/src/atomic/atomic.h, add following lines:
before:

#elif defined(__alpha__) || defined(__alpha) || defined(alpha) || defined(__ALPHA__)
# include "alpha/atomic.h"
#else
# error Architecture not supported by atomic.h
#endif

after

#elif defined(__alpha__) || defined(__alpha) || defined(alpha) || defined(__ALPHA__)

# include "alpha/atomic.h"

#elif defined(__arm__) || defined(__arm) || defined(arm) || defined(__ARM__)

# include "arm/atomic.h"

#else

# error Architecture not supported by atomic.h

#endif

2.16.2: download from ftp://ftp.tw.freebsd.org/pub/FreeBSD-current/src/sys/arm/include/atomic.h.

and put to pkgs/kernels/canneal/src/atomic/arm/atomic.h

2.16.3: add following lines at line 49

before:

#ifndef _KERNEL

#include

#endif

#ifndef I32_bit

after:

#ifndef _KERNEL

#include

#endif

#define ARM_VECTORS_HIGH 0xffff0000U

#define ARM_TP_ADDRESS (ARM_VECTORS_HIGH + 0x1000)

#define ARM_RAS_START (ARM_TP_ADDRESS + 4)

#define ARM_RAS_END (ARM_TP_ADDRESS + 8)

#ifndef I32_bit

2.16.4: add following lines at line 353

before:

#define atomic_store_rel_ptr atomic_store_ptr

after:

#define atomic_store_rel_ptr atomic_store_ptr

#define atomic_load_acq_ptr atomic_load_acq_long

conlusion:
12 of 13 benchmarks built successfully.
fail applications:
1. raytrace, depends on several libX libraries, which need to be compiled in ARM.
native run: ferret failed
canneal: segfault
dedup: malloc fail

訂閱：文章 (Atom)

工作日誌

2012年6月28日星期四

Think flow

Work Log

2012年6月27日星期三

Think Flow

2012年6月22日星期五

Think flow

2012年6月21日星期四

Thinking Flow

TCG: tcg_gen_code_common

2012年6月20日星期三

ARM MMU introduce

2012年6月19日星期二

how to change runlevel through kernel parameter append

HOW to mount qcow image used by QEMU

2012年6月8日星期五

ARM v7 Instruction Manual

2012年6月6日星期三

ARM online reference site

i7 currently RUNNING experiements

Producing Wrong Data Without Doing Anything Obviously Wrong

LnQ Region Performance

SPECvirt_sc2010

2012年6月5日星期二

statically build OpenMP program

gcc sse builtin functions

2012年6月3日星期日

build parsec for ARM

關於我自己

網誌存檔

2012年6月28日 星期四

2012年6月27日 星期三

2012年6月22日 星期五

2012年6月21日 星期四

2012年6月20日 星期三

2012年6月19日 星期二

2012年6月8日 星期五

2012年6月6日 星期三

2012年6月5日 星期二

2012年6月3日 星期日

關於我自己

網誌存檔

2012年6月28日星期四

2012年6月27日星期三

2012年6月22日星期五

2012年6月21日星期四

2012年6月20日星期三

2012年6月19日星期二

2012年6月8日星期五

2012年6月6日星期三

2012年6月5日星期二

2012年6月3日星期日