工作日誌: 2013

2013年2月27日星期三

http://www.natural-science.or.jp/laboratory/virtual_laboratory001_introduction.html#1

2013年2月24日星期日

gnuplot color names

http://www.uni-hamburg.de/Wiss/FB/15/Sustainability/schneider/gnuplot/colors.htm

2013年2月6日星期三

On Stack Replacement

From this post, it said:

It’s used to convert a running function’s interpreter frame into a JIT’d frame – in the middle of that method.

[On-Stack-Replacement (OSR) compilation was first introduced in the famous hotspot server paper, to the best of my knowledge.]

A simple example copied from the same post:

public static void main(String args[]) {
    S1; i=0;
    loop:
    if(P) goto done
      S3; A[i++];
    goto loop; // <<--- here="" osr="" span="">
    done:
    S2;
  }

OSR-compiled function:

void OSR_main() {
    A=value on entry from interpreter;
    i=value on entry from interpreter;
    goto loop;
    loop:
    if(P) goto done
      S3; A[i++];
    goto loop;
    done:
    ...never reached...
}

after OSR_main is compiled, the execution will transfer from goto loop in the interpreter to the OSR_main.

But I am not quiet clear about why ``done'' part in OSR_main is never reached?

2013年2月3日星期日

Interrupt handling

In emulation, some asynchronous events may arrive in the middle of execution.

Two kinds of asynchronous events are interrupts and exception.

Semantics of Interrupts are explained in this paper.

First, it is a hardware-supported asynchronous transfer of control to an interrupt vector based on the signaling of some condition external to the processor core. An interrupt vector is a dedicated or conﬁgurable location in memory that speciﬁes the address to which execution should jump when an interrupt occurs. Second, an interrupt is the execution of an interrupt handler : code that is reachable from an interrupt vector.

It is irrelevant whether the interrupting condition originates on-chip (e.g., timer expiration) or oﬀ-chip (e.g., closure of a mechanical switch). Interrupts usually, but not always, return to the ﬂow of control that was interrupted. Typically an interrupt changes the state of main memory and of device registers, but leaves the main processor context (registers, page tables, etc.) of the interrupted computation undisturbed.

2013年1月29日星期二

IBM Developer Works

IBM Developer Works
http://www.ibm.com/developerworks/

2013年1月28日星期一

Precise Exception Semantics in Dynamic Compilation

Precise Exception Semantics in Dynamic Compilation
published in 2002, CC

http://dl.acm.org/citation.cfm?id=647478.727930

slide:

http://nick-black.com/tabpower/cc02_slides.pdf

paper:

http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.14.2274

2013年1月27日星期日

Indirect branch profiling:

1. what is the frequency of 1-target indirect branches in benchmarks?

1-target indirect branch
there is only one target of this indirect branch instruction during execution.
therefore when we say n-target indirect branch, we mean indirect branches that has n targets during execution.

frequency
Assume there are 100 guest indirect branches executed (dynamic count) during execution, and there are 8 1-target indirect branches executed. The frequency is 8/100 = 8%.

N-Target Indirect Branches Frequency Distribution of 400.perlbench with diffmail.pl test input:

seems not very useful information

2. what is the hit ratio of last-target prediction of indirect branches?

last-target prediction:
Predict next target of one indirect branch with its last target.

400.perlbench with ref. input diffmail.pl:

Overall 73.63%
RET 93.99%
INDIRECT_CALL 20.45%
UNCOND_INDIRECT_JMP 48.98%

gcc with ref. input 166.i:

Overall 64.77%

RET 61.08%

INDIRECT_CALL 97.93%

UNCOND_INDIRECT_JMP 66.35%

gcc with ref. input scilab.i

Overall 58.93%

RET 61.54%

INDIRECT_CALL 92.98%

UNCOND_INDIRECT_JMP 44.43%

445.gobmk with ref. test input nngs.tst :

Overall 56.17%

RET 56.01%

INDIRECT_CALL 78.30%

UNCOND_INDIRECT_JMP 67.25%

mini cache prediction is not helpful, it degrades performance about 4%.

2013年1月25日星期五

Some interesting slides/papers about trace optimization in JVM

http://researcher.watson.ibm.com/researcher/files/us-pengwu/challeng-potential-trace-compilation.pdf

http://researcher.watson.ibm.com/researcher/files/us-pengwu/oopsla111-wu.pdf

http://researcher.watson.ibm.com/researcher/files/us-pengwu/UIUC-Seminar-Scripting-Languages-05-03.pdf

SINOF: A dynamic-static combined framework for dynamic binary translation
http://dl.acm.org/citation.cfm?id=2350593&CFID=174278273&CFTOKEN=47796794

Similar to permanent code cache, previous compiled blocks are saved and loaded by future runs.
Saved blocks are analyzed and optimized by runtime profiling information.
1. What kind of analysis and optimization they used?
2. What kind of information do they collect at runtime?
3. What's the benefit?
First, they use their own IR, and explain why they don't use LLVM or UQBT IR.
Second, in their evaluation, both guest ISA and host ISA are IA32! But they achieved on average 1.38X normalized by native execution time.

A low-overhead dynamic optimization framework for multicores
http://dl.acm.org/citation.cfm?id=2370899&CFID=174278273&CFTOKEN=47796794
Don't know what they do from abstract.
A very short paper (2-page), but I still have no idea what they did.

Adaptive multi-level compilation in a trace-based Java JIT compiler

http://dl.acm.org/citation.cfm?id=2384630&CFID=173817186&CFTOKEN=20831145
an extended work of IBM's Trace-based JVM published in CGO 2011.

The ``meat'' of this paper is : Trace Recompilation via Trace-Transition Graph.

That is, they select traces to be recompiled via Trace-Transition Graph.

First, the recompiled code fragments are still TRACE! not region.
Their scenario is there may be short fragmented traces due to the limit of maximum length in the initial trace building phase. They set max-trace-length to two.
They would like to merge those fragmented traces into one trace.

The other interesting part is the construction of the Trace-Transition Graph.
The information needed are : 1. transition between traces, and 2. how frequency between transition.
They did not use hardware performance monitoring information.
Instead, they periodically check which transition between traces and record the frequency.
They use Branch-and-link instruction for transitions between traces, rather than using jump.
In this approach, the linker register record who the source trace is.

2013年1月13日星期日

work log

TRACE, 16370, 85460d0

工作日誌

2013年2月27日星期三

2013年2月24日星期日

gnuplot color names

2013年2月6日星期三

On Stack Replacement

2013年2月3日星期日

Interrupt handling

2013年1月29日星期二

IBM Developer Works

2013年1月28日星期一

Precise Exception Semantics in Dynamic Compilation

2013年1月27日星期日

1. what is the frequency of 1-target indirect branches in benchmarks?

seems not very useful information

2. what is the hit ratio of last-target prediction of indirect branches?

2013年1月25日星期五

Some interesting slides/papers about trace optimization in JVM

Adaptive multi-level compilation in a trace-based Java JIT compiler

Adaptive multi-level compilation in a trace-based Java JIT compiler

2013年1月13日星期日

work log

關於我自己

網誌存檔

2013年2月27日 星期三

2013年2月24日 星期日

2013年2月6日 星期三

2013年2月3日 星期日

2013年1月29日 星期二

2013年1月28日 星期一

2013年1月27日 星期日

1. what is the frequency of 1-target indirect branches in benchmarks?

seems not very useful information

2. what is the hit ratio of last-target prediction of indirect branches?

2013年1月25日 星期五

Adaptive multi-level compilation in a trace-based Java JIT compiler

2013年1月13日 星期日

關於我自己

網誌存檔

2013年2月27日星期三

2013年2月24日星期日

2013年2月6日星期三

2013年2月3日星期日

2013年1月29日星期二

2013年1月28日星期一

2013年1月27日星期日

2013年1月25日星期五

2013年1月13日星期日