- measure cycle of fast path of TMC
- measure misses of TLB
- conflict miss
- compulsory miss
- capacity miss
=======================================================================
- Optimization 2 seems useless
- no performance gain in my implementation.
- Large TLB Table: 2^8 -> 2^16
- 2^11 has best performance gain, about 64K bytes.
- performance drop after 2^12, 2^15 and 2^16 -40% in GCC
- why ?
Enable Run-time Optimizations on Cross-ISA System
- Cross-Page Block Linking:
- Other approaches?
- reverse physical page mapping
- Check All conditions
- Large TLB Size
- system time increases along with size
- probably related to tlb flush
- use structure of array for TLB: TBD
- Combine Big Lv2 Cache + Victim Cacahe
- Hash Table of Linked List
- LRU replacement
========================================================================
- question: is opt5 broken?
- ???
- Run experiments quickly
- 實驗一定要將數據"當下"變成圖表,存下。否則只是廢物。
========================================================================
tlb_table initialization problem:
- First initialized by memset in main->vexpress_a9_init->vexpress_common_init->cpu_init->cpu_arm_init->arm_cpu_reset->tlb_flush
- Then initialized by memset again in main->qemu_system_reset->do_cpu_reset->arm_cpu_reset
- Then initialized by memset AGAIN in main->qemu_system_reset->do_cpu_reset->arm_cpu_reset->tlb_flush