- bwaves; performance drops 10%~15% after adding volatile modifier to load/store,
- possible cause should be related to guest CPU FP register RLSO.
- both trace and procedure have the same effect.
- so far, only see difference due to code motion between these two version, perhaps we should see generated code.
- we have observed over 10% mem loads for volatiled version
- it is difficult to find exactly structure difference.
- so, observer floating point operations: use FP_COMP_OPS_EXE:X87
- no difference in the number of floating point operations
- increased memory operations should be the cause the performance degradation of volatile memory.
- 20.1% performance degradation with volatile; 1390 -> 1661
- 11.86%, i.e. 305,180,492,006, extra memory loads
- 16.35%, i.e .216,346,459,794, extra memory stores
- RUN both x86 CINT CFP, ARM CINT benchmarks again before
2012年1月6日 星期五
2012 January 6
訂閱:
文章 (Atom)