2012年1月6日 星期五

2012 January 6

  1. bwaves; performance drops 10%~15% after adding volatile modifier to load/store, 
    1. possible cause should be related to guest CPU FP register RLSO.
    2. both trace and procedure have the same effect.
    3. so far, only see difference due to code motion between these two version, perhaps we should see generated code.
    4. we have observed over 10% mem loads for volatiled version 
    5. it is difficult to find exactly structure difference.
    6. so, observer floating point operations: use FP_COMP_OPS_EXE:X87
    7. no difference in the number of floating point operations
    8. increased memory operations should be the cause the performance degradation of volatile memory.
      1. 20.1% performance degradation with volatile; 1390 -> 1661
      2. 11.86%, i.e. 305,180,492,006, extra memory loads
      3. 16.35%, i.e .216,346,459,794, extra memory stores

  2. RUN both x86 CINT CFP, ARM CINT benchmarks again before