2013年1月29日 星期二

IBM Developer Works

IBM Developer Works
http://www.ibm.com/developerworks/

2013年1月27日 星期日

Indirect branch profiling:

1. what is the frequency of 1-target indirect branches in benchmarks?

1-target indirect branch
there is only one target of this indirect branch instruction during execution.
therefore when we say n-target indirect branch, we mean indirect branches that has n targets during execution.

frequency 
Assume there are 100 guest indirect branches executed (dynamic count) during execution, and there are 8 1-target indirect branches executed. The frequency is 8/100 = 8%.

N-Target Indirect Branches Frequency Distribution of 400.perlbench with diffmail.pl test input:

seems not very useful information

2. what is the hit ratio of last-target prediction of indirect branches?

last-target prediction:
Predict next target of one indirect branch with its last target.

400.perlbench with ref. input diffmail.pl:

Overall 73.63%
RET 93.99%
INDIRECT_CALL 20.45%
UNCOND_INDIRECT_JMP 48.98%

gcc with ref. input 166.i:
Overall 64.77%
RET 61.08%
INDIRECT_CALL 97.93%
UNCOND_INDIRECT_JMP 66.35%

gcc with ref. input scilab.i
Overall 58.93%
RET 61.54%
INDIRECT_CALL 92.98%
UNCOND_INDIRECT_JMP 44.43%

445.gobmk with ref. test  input nngs.tst :
Overall 56.17%
RET 56.01%
INDIRECT_CALL 78.30%
UNCOND_INDIRECT_JMP 67.25%

mini cache prediction is not helpful, it degrades performance about 4%.




2013年1月25日 星期五

Some interesting slides/papers about trace optimization in JVM

http://researcher.watson.ibm.com/researcher/files/us-pengwu/challeng-potential-trace-compilation.pdf

http://researcher.watson.ibm.com/researcher/files/us-pengwu/oopsla111-wu.pdf

http://researcher.watson.ibm.com/researcher/files/us-pengwu/UIUC-Seminar-Scripting-Languages-05-03.pdf

SINOF: A dynamic-static combined framework for dynamic binary translation
http://dl.acm.org/citation.cfm?id=2350593&CFID=174278273&CFTOKEN=47796794

Similar to permanent code cache, previous compiled blocks are saved and loaded by future runs.
Saved blocks are analyzed and optimized by runtime profiling information.
1. What kind of analysis and optimization they used?
2. What kind of information do they collect at runtime?
3. What's the benefit?
First, they use their own IR, and explain why they don't use LLVM or UQBT IR.
Second, in their evaluation, both guest ISA and host ISA are IA32! But they achieved on average 1.38X normalized by native execution time.


A low-overhead dynamic optimization framework for multicores
http://dl.acm.org/citation.cfm?id=2370899&CFID=174278273&CFTOKEN=47796794
Don't know what they do from abstract.
A very short paper (2-page), but I still have no idea what they did.

Adaptive multi-level compilation in a trace-based Java JIT compiler



Adaptive multi-level compilation in a trace-based Java JIT compiler


http://dl.acm.org/citation.cfm?id=2384630&CFID=173817186&CFTOKEN=20831145
an extended work of IBM's Trace-based JVM published in CGO 2011.

The ``meat'' of this paper is : Trace Recompilation via Trace-Transition Graph.

That is, they select traces to be recompiled via Trace-Transition Graph.

First, the recompiled code fragments are still TRACE! not region.
Their scenario is there may be short fragmented traces due to the limit of maximum length in the initial trace building phase. They set max-trace-length to two.
They would like to merge those fragmented traces into one trace.

The other interesting part is the construction of the Trace-Transition Graph.
The information needed are : 1. transition between traces, and 2. how frequency between transition.
They did not use hardware performance monitoring information.
Instead, they periodically check which transition between traces and record the frequency.
They use Branch-and-link instruction for transitions between traces, rather than using jump.
In this approach, the linker register record who the source trace is.

2013年1月13日 星期日

work log

TRACE, 16370, 85460d0