- Implement Opt2 for qemu_st
- go home!
2012年6月28日 星期四
Work Log
- About zero length array: http://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html .
- https://wiki.linaro.org/PeterMaydell/QemuVersatileExpress Linaro QEMU V.Express support
- (IMG=vexpress.img ; if [ -e "$IMG" ] ; then sudo mount -o loop,offset="$(file "$IMG" | awk 'BEGIN { RS=";"; } /partition 2/ { print $7*512; }')" -t auto "$IMG" /mnt/mnt; else echo "$IMG not found"; fi )
- Linaro Android QEMU V.Express: https://wiki.linaro.org/KenWerner/Sandbox/AndroidQEMU
- vmlinuz and initrd.gz is in uImage and uInitrd: dd if=uImage skip=64 bs=1 to extract them
- Use reboot to shutdown Andriod
- ARM-VExpress image: http://releases.linaro.org/12.05/ubuntu/vexpress/
2012年6月27日 星期三
Think Flow
- tcg_livness_analysis, if opcode is qemu_ld or qemu_st, set all globals alive.
- qemu_ld is OK now.
- restore to morning status
- qemu 1.01 can exit qemu when poweroff
- chaos
- qemu_st, fail, no output, trap in some sort of loops?
- just forget what I'm going to do after viewing some web pages....
2012年6月22日 星期五
Think flow
- I modify the conditional branch of load_tlb from JNE to JE, and related code.
- Running in original QEMU can boot ARM-Linux, so far so good.
- I change the location of SAVE_DIRTY_STATES, call it QEMU_TK;
- QEMU_TK die after first page fault happen
- Question: what's the difference between QEMU and QEMU_TK ?
- difference means the state of
- OK, we need to restore states back
2012年6月21日 星期四
Thinking Flow
study qemu code: tcg_reg_alloc_op(): 1708
- what is fixed_reg TCGemp
- in tcg_global_reg_new_internal, fixed_reg is set 1
- in tcg_global_mem_new_internal, fixed_reg is set 0
- so, it seems it indicates whether this temp is register or not
- In TCGContext, what is reg_to_temp?
- in tcg_reg_alloc(), s->reg_to_temp[reg] decides whether the HOST register is mapped to any TCGTemp.
- So I think, reg_to_temp indicates current HOST reg represents reg_to_temp[reg].
- What is val_type in TCGTemp?
- NOT CLEAR
- It seems it indicates the current type of this temp.
- It is possible that ts->fixed_reg && ts->val_type == TEMP_VAL_MEM; or NOT ts->fixed_reg and ts->val_type == TEMP_VAL_REG.
- When does TCGArgDef args_ct set?
TCG: tcg_gen_code_common
In TCGContext:
/* liveness analysis */
uint16_t *op_dead_iargs;
/* for each operation, each bit tells if the corresponding input argument is dead */
what is tcg_op_defs
In: tcg_liveness_analysis, tcg/tcg.c: 1187
backward scan
NOTE: tcg_opc.h: definition of TCG opcodes (a.k.a TCG IR)
So, remove qemu_ld/st TCG_OPF_CALL_CLOBBER here
In tcg_liveness_analysis:
1292 } else if (def->flags & TCG_OPF_CALL_CLOBBER) {
1293 /* globals are live */
1294 memset(dead_temps, 0, s->nb_globals);
1295 }
Question: if we remove TCG_OPF_CALL_CLOBBER of qemu_ld/st, will this be a problem?
In: tcg_reg_alloc_op:
1708 if (def->flags & TCG_OPF_CALL_CLOBBER) {
1709 /* XXX: permit generic clobber register list ? */
1710 for(reg ex= 0; reg < TCG_TARGET_NB_REGS; reg++) {
1711 if (tcg_regset_test_reg(tcg_target_call_clobber_regs, reg)) {
1712 tcg_reg_free(s, reg);
1713 }
1714 }
1715 /* XXX: for load/store we could do that only for the slow path
1716 (i.e. when a memory callback is called) */
1717
1718 /* store globals and free associated registers (we assume the insn
1719 can modify any global. */
1720 save_globals(s, allocated_regs);
1721 }
Question: what does Marsalis Wallace look like ? or
What does tcg_reg_free do?
It loops over tcg_target_call_clobber_regs and
if S->temps[reg]->mem_coherent is not true, store reg back to env->temp_buf
Question: what does save_globals do?
- What does ``globals'' mean?
- In tcg/README, A TCG "global" is a variable which is live in all the functions (equivalent of a C global variable). They are defined before the functions defined. A TCG global can be a memory location (e.g. a QEMU CPU register), a fixed host register (e.g. the QEMU CPU state pointer) or a memory location which is stored in a register outside QEMU TBs (not implemented yet).
- call temp_save to save temp
- In temp_save(), save temp to env->temp_buf
==================================================================
tcg_out_op() is called to generate code for the TCG opcode.
We are interested in tcg_out_qemu_ld/st
QUESTION:
Strange enough, I cannot find lines where to save guest register states back to their canonical locations.
I only saw save back to temp_buf in 1708.
That is exactly the place.
==================================================================
Remove TCG_OPF_CALL_CLOBBER in qemu_ld
move save_dirty_state when TLB miss
program fail when the first PAGE FAULT occurs.
should compare REG contents between my version and original version
==================================================================
2012年6月20日 星期三
2012年6月19日 星期二
how to change runlevel through kernel parameter append
JUST ADD THE NUMBER OF RUNLEVEL
EXAMPLE:
"root=/dev/sdb1 console=/dev/ttyAMA0 2 "
EXAMPLE:
"root=/dev/sdb1 console=/dev/ttyAMA0 2 "
HOW to mount qcow image used by QEMU
HOW to mount qcow image used by QEMU
http://blog.loftninjas.org/2008/10/27/mounting-kvm-qcow2-qemu-disk-images/
http://blog.loftninjas.org/2008/10/27/mounting-kvm-qcow2-qemu-disk-images/
2012年6月8日 星期五
2012年6月6日 星期三
i7 currently RUNNING experiements
i7 currently RUNNING experiments:
TRACE_MERGE
TRACE
TRACE_NET_ORIG
Each configuration run 4 benchmark set: CINT-ARM, CINT-IA32, CFP-IA32, CFP_VECTOR-IA32
Each benchmark run 5 times.
There are 3 * 4 * 5 = 120 benchmarks need to run
estimate hours: 120 * 15000 sec = 20 days
6/26 will finish all runs!
TRACE_MERGE
TRACE
TRACE_NET_ORIG
Each configuration run 4 benchmark set: CINT-ARM, CINT-IA32, CFP-IA32, CFP_VECTOR-IA32
Each benchmark run 5 times.
There are 3 * 4 * 5 = 120 benchmarks need to run
estimate hours: 120 * 15000 sec = 20 days
6/26 will finish all runs!
Producing Wrong Data Without Doing Anything Obviously Wrong
Producing Wrong Data Without Doing Anything Obviously Wrong
SPECvirt_sc2010
SPECvirt_sc2010: SPEC's first benchmark addressing performance evaluation of datacenter servers used in virtualized server consolidation.
2012年6月5日 星期二
statically build OpenMP program
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39176#c7
we have to link pthread ourselves.
Add
-Wl,--whole-archive -lpthread -Wl,--no-whole-archive
we have to link pthread ourselves.
Add
-Wl,--whole-archive -lpthread -Wl,--no-whole-archive
gcc sse builtin functions
http://gcc.gnu.org/onlinedocs/gcc-4.0.4/gcc/X86-Built_002din-Functions.html
v8qi __builtin_ia32_paddb (v8qi, v8qi) v4hi __builtin_ia32_paddw (v4hi, v4hi) v2si __builtin_ia32_paddd (v2si, v2si) v8qi __builtin_ia32_psubb (v8qi, v8qi) v4hi __builtin_ia32_psubw (v4hi, v4hi) v2si __builtin_ia32_psubd (v2si, v2si) v8qi __builtin_ia32_paddsb (v8qi, v8qi) v4hi __builtin_ia32_paddsw (v4hi, v4hi) v8qi __builtin_ia32_psubsb (v8qi, v8qi) v4hi __builtin_ia32_psubsw (v4hi, v4hi) v8qi __builtin_ia32_paddusb (v8qi, v8qi) v4hi __builtin_ia32_paddusw (v4hi, v4hi) v8qi __builtin_ia32_psubusb (v8qi, v8qi) v4hi __builtin_ia32_psubusw (v4hi, v4hi) v4hi __builtin_ia32_pmullw (v4hi, v4hi) v4hi __builtin_ia32_pmulhw (v4hi, v4hi) di __builtin_ia32_pand (di, di) di __builtin_ia32_pandn (di,di) di __builtin_ia32_por (di, di) di __builtin_ia32_pxor (di, di) v8qi __builtin_ia32_pcmpeqb (v8qi, v8qi) v4hi __builtin_ia32_pcmpeqw (v4hi, v4hi) v2si __builtin_ia32_pcmpeqd (v2si, v2si) v8qi __builtin_ia32_pcmpgtb (v8qi, v8qi) v4hi __builtin_ia32_pcmpgtw (v4hi, v4hi) v2si __builtin_ia32_pcmpgtd (v2si, v2si) v8qi __builtin_ia32_punpckhbw (v8qi, v8qi) v4hi __builtin_ia32_punpckhwd (v4hi, v4hi) v2si __builtin_ia32_punpckhdq (v2si, v2si) v8qi __builtin_ia32_punpcklbw (v8qi, v8qi) v4hi __builtin_ia32_punpcklwd (v4hi, v4hi) v2si __builtin_ia32_punpckldq (v2si, v2si) v8qi __builtin_ia32_packsswb (v4hi, v4hi) v4hi __builtin_ia32_packssdw (v2si, v2si) v8qi __builtin_ia32_packuswb (v4hi, v4hi)The following built-in functions are made available either with -msse, or with a combination of -m3dnow and -march=athlon. All of them generate the machine instruction that is part of the name.
v4hi __builtin_ia32_pmulhuw (v4hi, v4hi) v8qi __builtin_ia32_pavgb (v8qi, v8qi) v4hi __builtin_ia32_pavgw (v4hi, v4hi) v4hi __builtin_ia32_psadbw (v8qi, v8qi) v8qi __builtin_ia32_pmaxub (v8qi, v8qi) v4hi __builtin_ia32_pmaxsw (v4hi, v4hi) v8qi __builtin_ia32_pminub (v8qi, v8qi) v4hi __builtin_ia32_pminsw (v4hi, v4hi) int __builtin_ia32_pextrw (v4hi, int) v4hi __builtin_ia32_pinsrw (v4hi, int, int) int __builtin_ia32_pmovmskb (v8qi) void __builtin_ia32_maskmovq (v8qi, v8qi, char *) void __builtin_ia32_movntq (di *, di) void __builtin_ia32_sfence (void)The following built-in functions are available when -msse is used. All of them generate the machine instruction that is part of the name.
int __builtin_ia32_comieq (v4sf, v4sf) int __builtin_ia32_comineq (v4sf, v4sf) int __builtin_ia32_comilt (v4sf, v4sf) int __builtin_ia32_comile (v4sf, v4sf) int __builtin_ia32_comigt (v4sf, v4sf) int __builtin_ia32_comige (v4sf, v4sf) int __builtin_ia32_ucomieq (v4sf, v4sf) int __builtin_ia32_ucomineq (v4sf, v4sf) int __builtin_ia32_ucomilt (v4sf, v4sf) int __builtin_ia32_ucomile (v4sf, v4sf) int __builtin_ia32_ucomigt (v4sf, v4sf) int __builtin_ia32_ucomige (v4sf, v4sf) v4sf __builtin_ia32_addps (v4sf, v4sf) v4sf __builtin_ia32_subps (v4sf, v4sf) v4sf __builtin_ia32_mulps (v4sf, v4sf) v4sf __builtin_ia32_divps (v4sf, v4sf) v4sf __builtin_ia32_addss (v4sf, v4sf) v4sf __builtin_ia32_subss (v4sf, v4sf) v4sf __builtin_ia32_mulss (v4sf, v4sf) v4sf __builtin_ia32_divss (v4sf, v4sf) v4si __builtin_ia32_cmpeqps (v4sf, v4sf) v4si __builtin_ia32_cmpltps (v4sf, v4sf) v4si __builtin_ia32_cmpleps (v4sf, v4sf) v4si __builtin_ia32_cmpgtps (v4sf, v4sf) v4si __builtin_ia32_cmpgeps (v4sf, v4sf) v4si __builtin_ia32_cmpunordps (v4sf, v4sf) v4si __builtin_ia32_cmpneqps (v4sf, v4sf) v4si __builtin_ia32_cmpnltps (v4sf, v4sf) v4si __builtin_ia32_cmpnleps (v4sf, v4sf) v4si __builtin_ia32_cmpngtps (v4sf, v4sf) v4si __builtin_ia32_cmpngeps (v4sf, v4sf) v4si __builtin_ia32_cmpordps (v4sf, v4sf) v4si __builtin_ia32_cmpeqss (v4sf, v4sf) v4si __builtin_ia32_cmpltss (v4sf, v4sf) v4si __builtin_ia32_cmpless (v4sf, v4sf) v4si __builtin_ia32_cmpunordss (v4sf, v4sf) v4si __builtin_ia32_cmpneqss (v4sf, v4sf) v4si __builtin_ia32_cmpnlts (v4sf, v4sf) v4si __builtin_ia32_cmpnless (v4sf, v4sf) v4si __builtin_ia32_cmpordss (v4sf, v4sf) v4sf __builtin_ia32_maxps (v4sf, v4sf) v4sf __builtin_ia32_maxss (v4sf, v4sf) v4sf __builtin_ia32_minps (v4sf, v4sf) v4sf __builtin_ia32_minss (v4sf, v4sf) v4sf __builtin_ia32_andps (v4sf, v4sf) v4sf __builtin_ia32_andnps (v4sf, v4sf) v4sf __builtin_ia32_orps (v4sf, v4sf) v4sf __builtin_ia32_xorps (v4sf, v4sf) v4sf __builtin_ia32_movss (v4sf, v4sf) v4sf __builtin_ia32_movhlps (v4sf, v4sf) v4sf __builtin_ia32_movlhps (v4sf, v4sf) v4sf __builtin_ia32_unpckhps (v4sf, v4sf) v4sf __builtin_ia32_unpcklps (v4sf, v4sf) v4sf __builtin_ia32_cvtpi2ps (v4sf, v2si) v4sf __builtin_ia32_cvtsi2ss (v4sf, int) v2si __builtin_ia32_cvtps2pi (v4sf) int __builtin_ia32_cvtss2si (v4sf) v2si __builtin_ia32_cvttps2pi (v4sf) int __builtin_ia32_cvttss2si (v4sf) v4sf __builtin_ia32_rcpps (v4sf) v4sf __builtin_ia32_rsqrtps (v4sf) v4sf __builtin_ia32_sqrtps (v4sf) v4sf __builtin_ia32_rcpss (v4sf) v4sf __builtin_ia32_rsqrtss (v4sf) v4sf __builtin_ia32_sqrtss (v4sf) v4sf __builtin_ia32_shufps (v4sf, v4sf, int) void __builtin_ia32_movntps (float *, v4sf) int __builtin_ia32_movmskps (v4sf)The following built-in functions are available when -msse is used.
v4sf __builtin_ia32_loadaps (float *)
- Generates the
movaps
machine instruction as a load from memory. void __builtin_ia32_storeaps (float *, v4sf)
- Generates the
movaps
machine instruction as a store to memory. v4sf __builtin_ia32_loadups (float *)
- Generates the
movups
machine instruction as a load from memory. void __builtin_ia32_storeups (float *, v4sf)
- Generates the
movups
machine instruction as a store to memory. v4sf __builtin_ia32_loadsss (float *)
- Generates the
movss
machine instruction as a load from memory. void __builtin_ia32_storess (float *, v4sf)
- Generates the
movss
machine instruction as a store to memory. v4sf __builtin_ia32_loadhps (v4sf, v2si *)
- Generates the
movhps
machine instruction as a load from memory. v4sf __builtin_ia32_loadlps (v4sf, v2si *)
- Generates the
movlps
machine instruction as a load from memory void __builtin_ia32_storehps (v4sf, v2si *)
- Generates the
movhps
machine instruction as a store to memory. void __builtin_ia32_storelps (v4sf, v2si *)
- Generates the
movlps
machine instruction as a store to memory.
int __builtin_ia32_comisdeq (v2df, v2df) int __builtin_ia32_comisdlt (v2df, v2df) int __builtin_ia32_comisdle (v2df, v2df) int __builtin_ia32_comisdgt (v2df, v2df) int __builtin_ia32_comisdge (v2df, v2df) int __builtin_ia32_comisdneq (v2df, v2df) int __builtin_ia32_ucomisdeq (v2df, v2df) int __builtin_ia32_ucomisdlt (v2df, v2df) int __builtin_ia32_ucomisdle (v2df, v2df) int __builtin_ia32_ucomisdgt (v2df, v2df) int __builtin_ia32_ucomisdge (v2df, v2df) int __builtin_ia32_ucomisdneq (v2df, v2df) v2df __builtin_ia32_cmpeqpd (v2df, v2df) v2df __builtin_ia32_cmpltpd (v2df, v2df) v2df __builtin_ia32_cmplepd (v2df, v2df) v2df __builtin_ia32_cmpgtpd (v2df, v2df) v2df __builtin_ia32_cmpgepd (v2df, v2df) v2df __builtin_ia32_cmpunordpd (v2df, v2df) v2df __builtin_ia32_cmpneqpd (v2df, v2df) v2df __builtin_ia32_cmpnltpd (v2df, v2df) v2df __builtin_ia32_cmpnlepd (v2df, v2df) v2df __builtin_ia32_cmpngtpd (v2df, v2df) v2df __builtin_ia32_cmpngepd (v2df, v2df) v2df __builtin_ia32_cmpordpd (v2df, v2df) v2df __builtin_ia32_cmpeqsd (v2df, v2df) v2df __builtin_ia32_cmpltsd (v2df, v2df) v2df __builtin_ia32_cmplesd (v2df, v2df) v2df __builtin_ia32_cmpunordsd (v2df, v2df) v2df __builtin_ia32_cmpneqsd (v2df, v2df) v2df __builtin_ia32_cmpnltsd (v2df, v2df) v2df __builtin_ia32_cmpnlesd (v2df, v2df) v2df __builtin_ia32_cmpordsd (v2df, v2df) v2di __builtin_ia32_paddq (v2di, v2di) v2di __builtin_ia32_psubq (v2di, v2di) v2df __builtin_ia32_addpd (v2df, v2df) v2df __builtin_ia32_subpd (v2df, v2df) v2df __builtin_ia32_mulpd (v2df, v2df) v2df __builtin_ia32_divpd (v2df, v2df) v2df __builtin_ia32_addsd (v2df, v2df) v2df __builtin_ia32_subsd (v2df, v2df) v2df __builtin_ia32_mulsd (v2df, v2df) v2df __builtin_ia32_divsd (v2df, v2df) v2df __builtin_ia32_minpd (v2df, v2df) v2df __builtin_ia32_maxpd (v2df, v2df) v2df __builtin_ia32_minsd (v2df, v2df) v2df __builtin_ia32_maxsd (v2df, v2df) v2df __builtin_ia32_andpd (v2df, v2df) v2df __builtin_ia32_andnpd (v2df, v2df) v2df __builtin_ia32_orpd (v2df, v2df) v2df __builtin_ia32_xorpd (v2df, v2df) v2df __builtin_ia32_movsd (v2df, v2df) v2df __builtin_ia32_unpckhpd (v2df, v2df) v2df __builtin_ia32_unpcklpd (v2df, v2df) v16qi __builtin_ia32_paddb128 (v16qi, v16qi) v8hi __builtin_ia32_paddw128 (v8hi, v8hi) v4si __builtin_ia32_paddd128 (v4si, v4si) v2di __builtin_ia32_paddq128 (v2di, v2di) v16qi __builtin_ia32_psubb128 (v16qi, v16qi) v8hi __builtin_ia32_psubw128 (v8hi, v8hi) v4si __builtin_ia32_psubd128 (v4si, v4si) v2di __builtin_ia32_psubq128 (v2di, v2di) v8hi __builtin_ia32_pmullw128 (v8hi, v8hi) v8hi __builtin_ia32_pmulhw128 (v8hi, v8hi) v2di __builtin_ia32_pand128 (v2di, v2di) v2di __builtin_ia32_pandn128 (v2di, v2di) v2di __builtin_ia32_por128 (v2di, v2di) v2di __builtin_ia32_pxor128 (v2di, v2di) v16qi __builtin_ia32_pavgb128 (v16qi, v16qi) v8hi __builtin_ia32_pavgw128 (v8hi, v8hi) v16qi __builtin_ia32_pcmpeqb128 (v16qi, v16qi) v8hi __builtin_ia32_pcmpeqw128 (v8hi, v8hi) v4si __builtin_ia32_pcmpeqd128 (v4si, v4si) v16qi __builtin_ia32_pcmpgtb128 (v16qi, v16qi) v8hi __builtin_ia32_pcmpgtw128 (v8hi, v8hi) v4si __builtin_ia32_pcmpgtd128 (v4si, v4si) v16qi __builtin_ia32_pmaxub128 (v16qi, v16qi) v8hi __builtin_ia32_pmaxsw128 (v8hi, v8hi) v16qi __builtin_ia32_pminub128 (v16qi, v16qi) v8hi __builtin_ia32_pminsw128 (v8hi, v8hi) v16qi __builtin_ia32_punpckhbw128 (v16qi, v16qi) v8hi __builtin_ia32_punpckhwd128 (v8hi, v8hi) v4si __builtin_ia32_punpckhdq128 (v4si, v4si) v2di __builtin_ia32_punpckhqdq128 (v2di, v2di) v16qi __builtin_ia32_punpcklbw128 (v16qi, v16qi) v8hi __builtin_ia32_punpcklwd128 (v8hi, v8hi) v4si __builtin_ia32_punpckldq128 (v4si, v4si) v2di __builtin_ia32_punpcklqdq128 (v2di, v2di) v16qi __builtin_ia32_packsswb128 (v16qi, v16qi) v8hi __builtin_ia32_packssdw128 (v8hi, v8hi) v16qi __builtin_ia32_packuswb128 (v16qi, v16qi) v8hi __builtin_ia32_pmulhuw128 (v8hi, v8hi) void __builtin_ia32_maskmovdqu (v16qi, v16qi) v2df __builtin_ia32_loadupd (double *) void __builtin_ia32_storeupd (double *, v2df) v2df __builtin_ia32_loadhpd (v2df, double *) v2df __builtin_ia32_loadlpd (v2df, double *) int __builtin_ia32_movmskpd (v2df) int __builtin_ia32_pmovmskb128 (v16qi) void __builtin_ia32_movnti (int *, int) void __builtin_ia32_movntpd (double *, v2df) void __builtin_ia32_movntdq (v2df *, v2df) v4si __builtin_ia32_pshufd (v4si, int) v8hi __builtin_ia32_pshuflw (v8hi, int) v8hi __builtin_ia32_pshufhw (v8hi, int) v2di __builtin_ia32_psadbw128 (v16qi, v16qi) v2df __builtin_ia32_sqrtpd (v2df) v2df __builtin_ia32_sqrtsd (v2df) v2df __builtin_ia32_shufpd (v2df, v2df, int) v2df __builtin_ia32_cvtdq2pd (v4si) v4sf __builtin_ia32_cvtdq2ps (v4si) v4si __builtin_ia32_cvtpd2dq (v2df) v2si __builtin_ia32_cvtpd2pi (v2df) v4sf __builtin_ia32_cvtpd2ps (v2df) v4si __builtin_ia32_cvttpd2dq (v2df) v2si __builtin_ia32_cvttpd2pi (v2df) v2df __builtin_ia32_cvtpi2pd (v2si) int __builtin_ia32_cvtsd2si (v2df) int __builtin_ia32_cvttsd2si (v2df) long long __builtin_ia32_cvtsd2si64 (v2df) long long __builtin_ia32_cvttsd2si64 (v2df) v4si __builtin_ia32_cvtps2dq (v4sf) v2df __builtin_ia32_cvtps2pd (v4sf) v4si __builtin_ia32_cvttps2dq (v4sf) v2df __builtin_ia32_cvtsi2sd (v2df, int) v2df __builtin_ia32_cvtsi642sd (v2df, long long) v4sf __builtin_ia32_cvtsd2ss (v4sf, v2df) v2df __builtin_ia32_cvtss2sd (v2df, v4sf) void __builtin_ia32_clflush (const void *) void __builtin_ia32_lfence (void) void __builtin_ia32_mfence (void) v16qi __builtin_ia32_loaddqu (const char *) void __builtin_ia32_storedqu (char *, v16qi) unsigned long long __builtin_ia32_pmuludq (v2si, v2si) v2di __builtin_ia32_pmuludq128 (v4si, v4si) v8hi __builtin_ia32_psllw128 (v8hi, v2di) v4si __builtin_ia32_pslld128 (v4si, v2di) v2di __builtin_ia32_psllq128 (v4si, v2di) v8hi __builtin_ia32_psrlw128 (v8hi, v2di) v4si __builtin_ia32_psrld128 (v4si, v2di) v2di __builtin_ia32_psrlq128 (v2di, v2di) v8hi __builtin_ia32_psraw128 (v8hi, v2di) v4si __builtin_ia32_psrad128 (v4si, v2di) v2di __builtin_ia32_pslldqi128 (v2di, int) v8hi __builtin_ia32_psllwi128 (v8hi, int) v4si __builtin_ia32_pslldi128 (v4si, int) v2di __builtin_ia32_psllqi128 (v2di, int) v2di __builtin_ia32_psrldqi128 (v2di, int) v8hi __builtin_ia32_psrlwi128 (v8hi, int) v4si __builtin_ia32_psrldi128 (v4si, int) v2di __builtin_ia32_psrlqi128 (v2di, int) v8hi __builtin_ia32_psrawi128 (v8hi, int) v4si __builtin_ia32_psradi128 (v4si, int) v4si __builtin_ia32_pmaddwd128 (v8hi, v8hi)The following built-in functions are available when -msse3 is used. All of them generate the machine instruction that is part of the name.
v2df __builtin_ia32_addsubpd (v2df, v2df) v4sf __builtin_ia32_addsubps (v4sf, v4sf) v2df __builtin_ia32_haddpd (v2df, v2df) v4sf __builtin_ia32_haddps (v4sf, v4sf) v2df __builtin_ia32_hsubpd (v2df, v2df) v4sf __builtin_ia32_hsubps (v4sf, v4sf) v16qi __builtin_ia32_lddqu (char const *) void __builtin_ia32_monitor (void *, unsigned int, unsigned int) v2df __builtin_ia32_movddup (v2df) v4sf __builtin_ia32_movshdup (v4sf) v4sf __builtin_ia32_movsldup (v4sf) void __builtin_ia32_mwait (unsigned int, unsigned int)The following built-in functions are available when -msse3 is used.
v2df __builtin_ia32_loadddup (double const *)
- Generates the
movddup
machine instruction as a load from memory.
void __builtin_ia32_femms (void) v8qi __builtin_ia32_pavgusb (v8qi, v8qi) v2si __builtin_ia32_pf2id (v2sf) v2sf __builtin_ia32_pfacc (v2sf, v2sf) v2sf __builtin_ia32_pfadd (v2sf, v2sf) v2si __builtin_ia32_pfcmpeq (v2sf, v2sf) v2si __builtin_ia32_pfcmpge (v2sf, v2sf) v2si __builtin_ia32_pfcmpgt (v2sf, v2sf) v2sf __builtin_ia32_pfmax (v2sf, v2sf) v2sf __builtin_ia32_pfmin (v2sf, v2sf) v2sf __builtin_ia32_pfmul (v2sf, v2sf) v2sf __builtin_ia32_pfrcp (v2sf) v2sf __builtin_ia32_pfrcpit1 (v2sf, v2sf) v2sf __builtin_ia32_pfrcpit2 (v2sf, v2sf) v2sf __builtin_ia32_pfrsqrt (v2sf) v2sf __builtin_ia32_pfrsqrtit1 (v2sf, v2sf) v2sf __builtin_ia32_pfsub (v2sf, v2sf) v2sf __builtin_ia32_pfsubr (v2sf, v2sf) v2sf __builtin_ia32_pi2fd (v2si) v4hi __builtin_ia32_pmulhrw (v4hi, v4hi)The following built-in functions are available when both -m3dnow and -march=athlon are used. All of them generate the machine instruction that is part of the name.
v2si __builtin_ia32_pf2iw (v2sf) v2sf __builtin_ia32_pfnacc (v2sf, v2sf) v2sf __builtin_ia32_pfpnacc (v2sf, v2sf) v2sf __builtin_ia32_pi2fw (v2si) v2sf __builtin_ia32_pswapdsf (v2sf) v2si __builtin_ia32_pswapdsi (v2si)
2012年6月3日 星期日
build parsec for ARM
reference document: http://www.cs.utexas.edu/~parsec_m5/TR-09-32.pdf
cross-compilation environment:
1. HOSTTYPE=arm
2. PATH=/path/to/fake/uname/bin:$PATH
content of /path/to/fake/uname/bin/uname:
===============================
$ cat ~/research/benchmarks/parsec-2.1-arm/fake-uname/uname
#!/bin/sh
/bin/uname $* | sed 's/i686/armv7l/g'
cross-compilation environment:
1. HOSTTYPE=arm
2. PATH=/path/to/fake/uname/bin:$PATH
content of /path/to/fake/uname/bin/uname:
===============================
$ cat ~/research/benchmarks/parsec-2.1-arm/fake-uname/uname
#!/bin/sh
/bin/uname $* | sed 's/i686/armv7l/g'
===============================
3. cross compilation tools: arm-linux-gnueabi-*
4. host machine is i686
Steps:
1. compile tools natively
$ parsecmgmt -a build -p tools
Note: for now, use native i686 compilation flags in gcc.bldconf
2. compile apps to ARM binary:
1. set BINARY_PREFIX options in gcc.bldconf
2.1 blackscholes : OK
2.2 bodytrack:
2.2.1 In pkgs/apps/bodytrack/src/config.h.in, comment out #undef malloc
before change:
/* Define to rpl_malloc if the replacement function should be used. */
#undef malloc
after change:
/* Define to rpl_malloc if the replacement function should be used. */
//#undef malloc
2.2.2 In pkgs/apps/bodytrack/parsec/gcc-pthread.bldconf, add --host and --build.
before:
# Arguments to pass to the configure script, if it exists
build_conf="--enable-threads --disable-openmp --disable-tbb"
after:
# Arguments to pass to the configure script, if it exists
build_conf="--enable-threads --disable-openmp --disable-tbb --build=i686-linux-gnu --host=arm-linux-gnueabi"
2.3: facesim: OK
2.4: ferret: depends on gsl and imagick, so build them first, see 2.5, and 2.6. OK
2.5: gsl:
2.5.1 In pkgs/libs/gsl/parsec/gcc.bldconf, add --host and --build.
before:
# Arguments to pass to the configure script, if it exists
build_conf="--disable-shared"
after:
# Arguments to pass to the configure script, if it exists
build_conf="--disable-shared --build=i686-linux-gnu --host=arm-linux-gnueabi"
2.6: imagick: In pkgs/libs/imagick/parsec/gcc.bldconf, add --host and --build.
before:
# Arguments to pass to the configure script, if it exists
build_conf="--disable-shared --without-perl --without-magick-plus-plus --without-bzlib --without-dps --without-djvu --without-fpx --without-gslib --without-jbig --with-jpeg --without-jp2 --without-tiff --without-wmf --without-zlib --without-x --without-fontconfig --without-freetype --without-lcms --without-png --without-gvc --without-openexr --without-rsvg --without-xml"
after:
# Arguments to pass to the configure script, if it exists
build_conf="--disable-shared --without-perl --without-magick-plus-plus --without-bzlib --without-dps --without-djvu --without-fpx --without-gslib --without-jbig --with-jpeg --without-jp2 --without-tiff --without-wmf --without-zlib --without-x --without-fontconfig --without-freetype --without-lcms --without-png --without-gvc --without-openexr --without-rsvg --without-xml --build=i686-linux-gnu --host=arm-linux-gnueabi"
2.7: freqmine: OK
2.8: raytrace: SKIP. In order to compile raytrace, libX11 must be cross-compiled which requires cross-compiling the following libraries:
libX11
libXmu
libXext
libxcb
xproto
xextproto
xtrans
libpthread_stubs
libXau
kbproto
inputproto
jpeg
2.9: swaptions: OK
2.10: fluidanimate: OK
2.11: vips: depends on glib and libxml2. libxml2 and vips only need to add --build and --host.
2.11.1: remove -L${CC_HOME}/lib in config/gcc.bldconf
before:
export LDFLAGS="$STATIC -pthread -L${CC_HOME}/lib"
after:
export LDFLAGS="$STATIC -pthread"
2.11.1: remove -L${CC_HOME}/lib in config/gcc.bldconf
before:
export LDFLAGS="$STATIC -pthread -L${CC_HOME}/lib"
after:
export LDFLAGS="$STATIC -pthread"
2.12: glib: add --host and --build in pkgs/libs/glib/parsec/gcc.bldconf.
2.12.1
before:
before:
# Arguments to pass to the configure script, if it exists
build_conf="--disable-shared --enable-threads --with-threads=posix"
after:
# Arguments to pass to the configure script, if it exists
build_conf="--disable-shared --enable-threads --with-threads=posix --build=i686-linux-gnu --host=arm-linux-gnueabi"
2.12.2 In pkgs/libs/glib/src/configure, add following line at line 43:
ac_cv_func_posix_getpwuid_r=no$
glib_cv_stack_grows=no$
glib_cv_uscore=no$
2.13: dedup: OK. depends on ssl, see 2.14
2.14: ssl: OK.
2.14.1change gcc to arm-linux-gnueabi-gcc in pkgs/libs/ssl/src/Configure.pl line 323
before:
"linux-generic32","gcc-:-DTERMIO -O3 -fomit-frame-pointer -Wall::-D_REENTRANT::-ldl:BN_LLONG RC4_CHAR RC4_CHUNK DES_INT DES_UNROLL BF_PTR:${no_asm}:dlfcn:linux-shared:-fPIC::.so.\$(SHLIB_MAJOR).\$(SHLIB_MINOR)",
2.12.2 In pkgs/libs/glib/src/configure, add following line at line 43:
ac_cv_func_posix_getpwuid_r=no$
glib_cv_stack_grows=no$
glib_cv_uscore=no$
2.14: ssl: OK.
2.14.1change gcc to arm-linux-gnueabi-gcc in pkgs/libs/ssl/src/Configure.pl line 323
before:
"linux-generic32","gcc-:-DTERMIO -O3 -fomit-frame-pointer -Wall::-D_REENTRANT::-ldl:BN_LLONG RC4_CHAR RC4_CHUNK DES_INT DES_UNROLL BF_PTR:${no_asm}:dlfcn:linux-shared:-fPIC::.so.\$(SHLIB_MAJOR).\$(SHLIB_MINOR)",
after:
"linux-generic32","arm-linux-gnueabi-gcc-:-DTERMIO -O3 -fomit-frame-pointer -Wall::-D_REENTRANT::-ldl:BN_LLONG RC4_CHAR RC4_CHUNK DES_INT DES_UNROLL BF_PTR:${no_asm}:dlfcn:linux-shared:-fPIC::.so.\$(SHLIB_MAJOR).\$(SHLIB_MINOR)",
2.14.2 comment out line 975
before:
$cflags .= " -m32 ";
after:
#$cflags .= " -m32 ";
2.15: streamcluster: OK.
2.16: canneal: OK. need pkgs/kernels/canneal/src/atomic/arm/atomic.h.
2.16.1: pkgs/kernels/canneal/src/atomic/atomic.h, add following lines:
before:
#elif defined(__alpha__) || defined(__alpha) || defined(alpha) || defined(__ALPHA__)
# include "alpha/atomic.h"
#else
# error Architecture not supported by atomic.h
#endif
2.16: canneal: OK. need pkgs/kernels/canneal/src/atomic/arm/atomic.h.
2.16.1: pkgs/kernels/canneal/src/atomic/atomic.h, add following lines:
before:
#elif defined(__alpha__) || defined(__alpha) || defined(alpha) || defined(__ALPHA__)
# include "alpha/atomic.h"
#else
# error Architecture not supported by atomic.h
#endif
after
#elif defined(__alpha__) || defined(__alpha) || defined(alpha) || defined(__ALPHA__)
# include "alpha/atomic.h"
#elif defined(__arm__) || defined(__arm) || defined(arm) || defined(__ARM__)
# include "arm/atomic.h"
#else
# error Architecture not supported by atomic.h
#endif
2.16.2: download from ftp://ftp.tw.freebsd.org/pub/FreeBSD-current/src/sys/arm/include/atomic.h.
and put to pkgs/kernels/canneal/src/atomic/arm/atomic.h
2.16.3: add following lines at line 49
before:
#ifndef _KERNEL
#include
#endif
#ifndef I32_bit
after:
#ifndef _KERNEL
#include
#endif
#define ARM_VECTORS_HIGH 0xffff0000U
#define ARM_TP_ADDRESS (ARM_VECTORS_HIGH + 0x1000)
#define ARM_RAS_START (ARM_TP_ADDRESS + 4)
#define ARM_RAS_END (ARM_TP_ADDRESS + 8)
#ifndef I32_bit
2.16.4: add following lines at line 353
before:
#define atomic_store_rel_ptr atomic_store_ptr
after:
#define atomic_store_rel_ptr atomic_store_ptr
#define atomic_load_acq_ptr atomic_load_acq_long
conlusion:
12 of 13 benchmarks built successfully.
fail applications:
1. raytrace, depends on several libX libraries, which need to be compiled in ARM.
native run: ferret failed
canneal: segfault
dedup: malloc fail
12 of 13 benchmarks built successfully.
fail applications:
1. raytrace, depends on several libX libraries, which need to be compiled in ARM.
native run: ferret failed
canneal: segfault
dedup: malloc fail
訂閱:
文章 (Atom)