- opt0: rearrange assembly instructions
- opt1: sunk all miss blocks
- opt2: move redundant dirty stores to slow path
- opt3: victim tlb cache
- could have variations
- opt4: cross-page block linking
- opt5: indirect branch target caching
- opt6: enlarge TLB table size
- opt7: TLB mini buffer to reduce fast path cycles; probably failed,
=============================================================
- performance reduction?
- baseline performance?
- code_read TLB access defined in exec-all.h
- quick thought as follows
- split code access and data access, similar to i-Cache and d-Cache
- Not worth it, too much work, too little gain.
- move redundant stores to miss block:
- restore clobber flag for qemu_ld and qemu_st
- so before qemu_ld/st, dirty states will now be stored back to memory.
- and we only need to store that is NOT redundant.
- This is true for globals, what about temporaries?
- ABOUT temporaries: we only need to store temporaries in miss block.
- We store temporary variables in miss block.
- We only need to consider global variables.
- DO IT!
- Optimization 2: Boot Test:
- opt2: OK
- opt2+opt4: OK
- opt1+opt2+opt4:OK
- opt1+opt2+opt3+opt4:OK, NOT SO SURE....
- opt1+opt2+opt3+opt4+opt5: NOT OK
- opt3: OK
沒有留言:
張貼留言