2012年7月26日 星期四

Work Flow


  • measure cycle of fast path of TMC
  • measure misses of TLB 
    • conflict miss
    • compulsory miss
    • capacity miss
=======================================================================
  • Optimization 2 seems useless
    • no performance gain in my implementation.
  • Large TLB Table: 2^8 -> 2^16
    • 2^11 has best performance gain, about 64K bytes.
    • performance drop after 2^12, 2^15 and 2^16 -40% in GCC
    • why ?
========================================================================
Enable Run-time Optimizations on Cross-ISA System

  • Cross-Page Block Linking:
    • Other approaches?
      • reverse physical page mapping
      • Check All conditions
  • Large TLB Size
    • system time increases along with size
    • probably related to tlb flush 
    • use structure of array for TLB: TBD
  • Combine Big Lv2 Cache + Victim Cacahe
    • Hash Table of Linked List
    • LRU replacement
========================================================================
  • question: is opt5 broken?
    • ???
  • Run experiments quickly
  • 實驗一定要將數據"當下"變成圖表,存下。否則只是廢物。
========================================================================
tlb_table initialization problem:
  • First initialized by memset in main->vexpress_a9_init->vexpress_common_init->cpu_init->cpu_arm_init->arm_cpu_reset->tlb_flush
  • Then initialized by memset again in main->qemu_system_reset->do_cpu_reset->arm_cpu_reset
  • Then initialized by memset AGAIN in main->qemu_system_reset->do_cpu_reset->arm_cpu_reset->tlb_flush

沒有留言:

張貼留言