我覺得我一定做過一個我不記得的夢,
夢中神過來跟我說:
她要結婚了,她想告訴你,這是人的個人意志,我無法插手,
但我可以決定她何時來告訴你比較好,所以來問你說要何時,
以下是我的(神的)分析:
一、9/18那個周末前告訴你,
雖然這樣你的 conference paper 就毀了,
但真慧去新加坡所以不會知道這件事。
二、9/25那個周末告訴你,
雖然你的 conference paper 可能可以投,
但真慧在你旁邊可能會察覺你怪怪的。
你要那個?
顯然我挑了第一個選擇
哈
2010年7月31日 星期六
2010年7月29日 星期四
TODO
1' Debug cc lazy; run perl fail; chop.t
2' Do block linking; let jitter can recompile code;
3' profile time; use previous profile time framework; use tick as time info
4' add entry code to profile block execution; can we have a generic framework for add and retrieve
statical info. how about use MACRO, and use tag to indicate type of statical, such as invoke times,
execution time
2' do block linking; let jitter can recompile code at the same address.
2.1 Locate files where we should modify.
ExecutionEngine/JIT/JITEmitter.cpp
two files
add recompileFunction in JIT.cpp
should
================================
No use; In JITEmitter::startFunction(), CurBufferPtr will be reset;
Need to implement our MemoryManagement
我可以想成這個 jitter 只有我在用,所以我設計一個我自己用的 memory manager
回家!
2' Do block linking; let jitter can recompile code;
3' profile time; use previous profile time framework; use tick as time info
4' add entry code to profile block execution; can we have a generic framework for add and retrieve
statical info. how about use MACRO, and use tag to indicate type of statical, such as invoke times,
execution time
2' do block linking; let jitter can recompile code at the same address.
2.1 Locate files where we should modify.
ExecutionEngine/JIT/JITEmitter.cpp
two files
add recompileFunction in JIT.cpp
should
================================
No use; In JITEmitter::startFunction(), CurBufferPtr will be reset;
Need to implement our MemoryManagement
我可以想成這個 jitter 只有我在用,所以我設計一個我自己用的 memory manager
回家!
2010年7月26日 星期一
Fuck!
TODO:
1. cc-lazy still has trouble running perl, need debug!
2. do block linking
block linking:
1' add an exit block;
2' modify that exit block
change to 20100726
addPointerToBasicBlock doesn't work
it die at DominatorPass, DFSPass,
I don't know why, but since that function canno use, I don't have any reason use llvm-2.8svn
change back to llvm-2.7svn
use svn command:
svn update
svn merge -r HEAD:7381
svn commit -m "Roll back to llvm-2.7svn"
TODO:
1. cc-lazy still has trouble running perl, need debug!
2. do block linking
block linking:
1' add an exit block;
2' modify that exit block
change to 20100726
addPointerToBasicBlock doesn't work
it die at DominatorPass, DFSPass,
I don't know why, but since that function canno use, I don't have any reason use llvm-2.8svn
change back to llvm-2.7svn
use svn command:
svn update
svn merge -r HEAD:7381
svn commit -m "Roll back to llvm-2.7svn"
2010年7月21日 星期三
Hot trace model
today, you need to figure out the model of building a hot trace
and you will have a meeting with Prof. Liu to discuss this.
First, collect the posts of Prof. Liu for hot trace, qutoe as follows.
maybe we can add possibility into this problem.
and you will have a meeting with Prof. Liu to discuss this.
First, collect the posts of Prof. Liu for hot trace, qutoe as follows.
We encountered a hot-trace-finding problem last Thursday. BasicallyThe above discussion did not include possibility of conditional branches.
this problem asks for a good hot trace so that a large number of
execution sequences will remain in the hot trace. Formally we are
given a directed graph G (call graph?) and a set of simple paths P in
G. Each simple path is a sequence of at least two different nodes in
G so that consecutive nodes in the path has a directed edge in G. Note
that we assume that the nodes in a simple path must be different. I am
not sure if this is the case in our system but let us assume it for
the moment. Now we want to find another simple path h (this is the hot
trace we are looking for) so that at least k of the paths in P are
contained in H. A path p is contained in another path q if and only if
q is a substring of p. Now we can define the problem -- given G, P,
and k, is there a hot trace h that cotains at least k paths in P? We
will refere to this problem as k-HOT-TRACE.
It seems that k-HOT-TRACE is NP-complete by reducing from Hamitonian
path. Given a graph H we would like to ask whethere there is a
Hamitonian path in H, we simple transform the priblem instance into a
k-HOT-TRACE problem instance. We simply use H as G, and let P be the
set of path of two nodes, i.e., all edges in G, and set k to be the
number of nodes in G minus 1. As a result if there is a solution for
the (n-1)-HOT-TRACE problem for G, there is a Hamitonian path in H,
and vice versa.
Two followups may be possible. Anyone interested in these please come
talk to me.
There could be an efficient dynamic programming solution when G is a
tree, even when we further restrict the length of the hot trace.
This seems easy. Now the problem is that we limit the length of the
hot trace, and try to find the one that contains the maximum number of
paths. We define a function P(v, l) to be the number of paths that are
contained by the hot trace that ends at v and has length l. It is easy
to write down the recursive formula that P(v, l) = P(w, l-1) + the
number of path that ends at v nad has length no more than l, where w
is the parent of v. We have N times L cells to fill, where N is the
number of tree nodes and L is the maximum hot trace length allowed.
Each cell needs no more than \log n(v) operations where n(v) is the
number of paths end at v. Roughly the total cost is no more than O(N L
\log(N)). Of course this is very rough and I need to work harder to
get a closer bound.
There could be a dynamic programming solution when G is a serial-
parallel graph.
FROM TK:
I think the function P(v, l) should be defined as the MAXIMUM number of paths that are
contained by a hot trace that ends at v and has length l since there are more
than one traces that are end at v and have length l.
Also, about the recursive function, it seems the formula only contains "one half" cases.
It only considers hot traces that pass through w and end at v.
However, there could be other ended-at-v-with-length-l hot traces that are contained only within the subtree rooted at v,
rather passing through v's parent.
However, I still need some time to figure the correct recursive function. come back later.
I write down the problem and recursive function for tree in http://www.iis.sinica.edu.tw/~tk/k-hot-path.pdf
Chun-Chen
NOTE: FROM ICE
In our system based on Qemu with LLVM, we generate the traces(i.e.
dynamic basic blocks)
which have only one entrance on the head of these blocks, so far.
Moreover, especially we want to translate a hot trace with the
optimization options of LLVM such that
the translated instructions may be reordered.
Consequently, the host code of the hot traces cannot have multiple
entrances, either.
In other word, different jump target addresses imply different traces.
Therefore, the same (static) basic blocks may be included in different
hot traces.
Furthermore, the assumption, "the nodes in a simple path must be
different", may not be suitable for our system.
Our approach prefers to construct the hot traces whose lengths are as
long as possible
since we believe that longer traces have more opportunities to be
optimized.
However, the longer traces will incur the more overhead to generate
optimized code.
We should limit the length of a hot trace.
Considering definition of a hot trace, we should consider the
frequency of the execution.
We should not only find the longer hot traces on the CFG but also make
the hot code segments included in the hot traces.
Due to these reasons,
I think that the hot trace problem should be modified as follows to
match the practice of our system.
We still have a directed graph G.
A node in G represents a (static) basic block.
An edge connecting two node in G represents that one of these two node
may be executed after the other one is executed.
An edge in G has a weight to represent the frequency of the execution
between the two nodes connected by this edge.
We limit the lengths of hot traces under a given k to consider the
overhead of generate optimized code.
We want to find the minimum number of the hot traces with length
constraint to include the hot edges
where hot edges are the edges which have the weight larger than a
given threshold.
(Note that the same nodes can be included in different hot traces.)
If we remove the weights on the edges but give each node a weight to
represent the execution frequency of this node, instead,
we may have another similar version of this problem as follows.
We want to find the minimum number of the hot traces with length
constraint to include the hot nodes
where hot nodes are the nodes which have the weight larger than a
given threshold.
ice
Sorry that I do not understand what you are saying here.
Do you mean you want to find a minimum set of length-limited paths
that cover a set of specified (hot) edges?
I will refine the model according to the description of ice. Correct
me if I am wrong.
We will have weight information on edges and nodes, instead of a set
of paths as I described earlier. So we will be given a directed graph
G, a constant l, and a constant k. The edges of G are partitioned into
two subsets -- important edges and unimportant edges. We want to find
a set S of at most k directed paths where each of them is at most of
length l, and the paths in S cover all important edges. We will call
this problem edge-covering-path-set problem. Similarly we can define
node-covering-path-set problem, where we partition nodes in G into
important and unimportant ones, and define the problem similarly.
It is easy to see that node-covering-path-set is NPC, since we simply
make every node important, set k to 1 and l to n - 1, where n is the
number of nodes in G, then we have a Hamiltonian path problem. I are
wondering if edge-covering-path-set is also NPC.
These two problems on tree should be easy as long as the all edges are
directed away from the root. However, this is only theoretically
interesting since in practice the graph will not be a tree.
The edge-covering-path-set problem seems to be NPC as well. Here is
the proof. We will reduce from Hamiltonian path again. Given a
directed graph H as an instance of Hamiltonian path problem, we
construct G as follows. Every node in H node splits into two nodes
called head and tail. We then add an directed edge (called internal
edge) from head to tail. Now if there is an edge from v to w in H, we
put a directed edge from the tail of v to the head of w. Now we make
all internal edges "important", l to be 2n - 1, where n is the number
of nodes in H, and k = 1. It should be easy to see that a hot path in
G will be a Hamiltonian path in H, and vice versa.
We have three problems now -- information given as paths, on the node,
or on the edges. All of them are NPC, but all of them are sovlable on
trees, with edgings going away from the root. The path version can be
solved by a dynamic programming, and the node and edge version can be
solved by greedy methods.
maybe we can add possibility into this problem.
2010年6月25日 星期五
2010年6月18日 星期五
2010年6月3日 星期四
build hadoop
2010/06/03 build hadoop, problem:
ivy-download:
/home/tk/Software/hadoop/hadoop-svn-0.20.2/build.xml:1630: java.net.SocketException: Network is unreachable
mentioned in this post that :
Setting bindv6only to 0 in /etc/sysctl.d/bindv6only.conf on my debian squeeze installation seems to have fixed the problem. Sorry for littering the list.
This actually is a special issue in Debina squeeze/sid version as explained in another post.
So what to do is:
1. Setting bindv6only to 0 in /etc/sysctl.d/bindv6only.conf
2. restart procps: sudo invoke-rc.d procps restart
now we have problem for java version
ivy-download:
/home/tk/Software/hadoop/hadoop-svn-0.20.2/build.xml:1630: java.net.SocketException: Network is unreachable
mentioned in this post that :
Setting bindv6only to 0 in /etc/sysctl.d/bindv6only.conf on my debian squeeze installation seems to have fixed the problem. Sorry for littering the list.
This actually is a special issue in Debina squeeze/sid version as explained in another post.
So what to do is:
1. Setting bindv6only to 0 in /etc/sysctl.d/bindv6only.conf
2. restart procps: sudo invoke-rc.d procps restart
now we have problem for java version
2010年6月1日 星期二
2010年5月25日 星期二
what's the major challenge of retargetable binary translator
I think the binary translator currently does not has enough tool to run optimization
what's the benefit to convert binary code to high level intermediate code?
now we have LLVM tool to manipulate binary at runtime.
to map guest register to host register at runtime.
gogogogogogogogogogogogogogogogogo
I think the binary translator currently does not has enough tool to run optimization
what's the benefit to convert binary code to high level intermediate code?
now we have LLVM tool to manipulate binary at runtime.
to map guest register to host register at runtime.
gogogogogogogogogogogogogogogogogo
2010年5月3日 星期一
2010年4月24日 星期六
2010年4月12日 星期一
qemu 在 user mode 下是用 static code gen buffer,size為32mb
in ./exec.c line 408 it mentioned:
Currently it is not recommended to allocate big chunks of data in
user mode. It will change when a dedicated libc will be used
but don't know why. Why not use mmap as system mode use?
先弄一個 exit basic block,然後再將 pass 給去掉。
現在要focus 在 indirect branch 上,要知道的是第一個參數的 address 是什麼?
FP->getPassName()
lib/CodeGen/StackProtector.cpp
DefaultJITMemoryManager
lib/CodeGen/LLVMTargetMachine.cpp:
LLVMTargetMachine::addPassesToEmitMachineCode
LLVMTargetMachine::addCommonCodeGenPasses
=============================
Prevent spill register code gen.
enable to emit epilog for branch and indirect brance
=============================
錯在這:0x40095872: mov (%edx),%eax
%edx 為 init_stack+c 的位置,其中的值
在 loader_exec 中就放了!
在 create_elf_tables() 之後!
在 loader_build_argptr() 之後!
在 put_user_ual(stringp, envp) 之後!
in ./exec.c line 408 it mentioned:
Currently it is not recommended to allocate big chunks of data in
user mode. It will change when a dedicated libc will be used
but don't know why. Why not use mmap as system mode use?
Segment selector Format:
15 ...... 3 | 2 1 | 0 |
---|---|---|
index | TI | RPL |
- TI Table indicator:
- 0 means selector indexes into GDT
1 means selector indexes into LDT - RPL Privelege level. Linux uses only two privelege levels.
- 0 means kernel
3 means user
Examples:
- Kernel code segment
- TI=0, index=1, RPL=0, therefore selector = 0x08 (GDT[1])
- User data segment
- TI=1, index=2, RPL=3, therefore selector = 0x17 (LDT[2])
先弄一個 exit basic block,然後再將 pass 給去掉。
現在要focus 在 indirect branch 上,要知道的是第一個參數的 address 是什麼?
FP->getPassName()
lib/CodeGen/StackProtector.cpp
DefaultJITMemoryManager
lib/CodeGen/LLVMTargetMachine.cpp:
LLVMTargetMachine::addPassesToEmitMachineCode
LLVMTargetMachine::addCommonCodeGenPasses
=============================
Prevent spill register code gen.
enable to emit epilog for branch and indirect brance
=============================
錯在這:0x40095872: mov (%edx),%eax
%edx 為 init_stack+c 的位置,其中的值
在 loader_exec 中就放了!
在 create_elf_tables() 之後!
在 loader_build_argptr() 之後!
在 put_user_ual(stringp, envp) 之後!
2010年4月7日 星期三
2010年4月6日 星期二
what are the meanings of cf,of and af flags in eflag?
CF: This flag indicates an overflow condition for unsigned-integer arithmetic.
Set if an arithmetic operation generates a carry or a borrow out of the most-significant bit of the result; cleared otherwise.
OF: This flag indicates an overflow condition for signed-integer (two’s complement) arithmetic.
Set if the integer result is too large a positive number or too small a negative number (excludingthe sign-bit) to fit in the destination operand; cleared otherwise.
AF: This flag is used in binary-coded decimal (BCD) arithmetic.
Set if an arithmetic operation generates a carry
or a borrow out of bit 3 of the result; cleared otherwise.
CF: This flag indicates an overflow condition for unsigned-integer arithmetic.
Set if an arithmetic operation generates a carry or a borrow out of the most-significant bit of the result; cleared otherwise.
OF: This flag indicates an overflow condition for signed-integer (two’s complement) arithmetic.
Set if the integer result is too large a positive number or too small a negative number (excludingthe sign-bit) to fit in the destination operand; cleared otherwise.
AF: This flag is used in binary-coded decimal (BCD) arithmetic.
Set if an arithmetic operation generates a carry
or a borrow out of bit 3 of the result; cleared otherwise.
2010年4月1日 星期四
2010年3月26日 星期五
I summarize our discussion as follows.
As stated in the CACM article, the single master may pose some problems when tremendous amount of data are considered.
For example, the master is unable to store all the metadata in the memory.
In addition, the master would become the bandwidth bottleneck because it has to a large amount of connections.
Google now also wants to deal with this issue, and they propose to use multiple masters.
In their proposal, a file will be related to one of the masters in an static/unchangeable way.
Besides, it seems that they do not consider the load balance problem of masters in their design.
So, the following is our design, in which the load balance and bandwidth problems are simultaneously solved.
Assume that we have k masters, each of which takes care of certain chunkservers.
Each master stores k counting bloom filters, among which i-th bloom filter represents the association between the i-th master and the file-chunk mapping it is in charge of.
Note that each master broadcasts its association to the other masters so that each master can renew the counting bloom filters it stores.
When a client wants to access a file, what it should do is to randomly pick a master and then to query the corresponding k counting bloom filters.
If not hit happens in all the filters, it means that the file does not exist.
Otherwise, the client turns to ask the master that is in charge of the file the client would like to access.
Of course, there could be cases that more than one masters are responsible for the file the client would like to access according to the response of the filters.
This is due to the nature of bloom filter, and can be easily mitigated and handled.
Increasing the filter size can mitigate such rate of false hit.
Simply asking those two masters to make sure again can be helpful in handling this ambiguity.
As a whole, this design can solve the memory problem because k masters share the workload.
This design can solve the bandwidth problem because of the randomness in the choice made by the client.
For example, the master is unable to store all the metadata in the memory.
In addition, the master would become the bandwidth bottleneck because it has to a large amount of connections.
Google now also wants to deal with this issue, and they propose to use multiple masters.
In their proposal, a file will be related to one of the masters in an static/unchangeable way.
Besides, it seems that they do not consider the load balance problem of masters in their design.
So, the following is our design, in which the load balance and bandwidth problems are simultaneously solved.
Assume that we have k masters, each of which takes care of certain chunkservers.
Each master stores k counting bloom filters, among which i-th bloom filter represents the association between the i-th master and the file-chunk mapping it is in charge of.
Note that each master broadcasts its association to the other masters so that each master can renew the counting bloom filters it stores.
When a client wants to access a file, what it should do is to randomly pick a master and then to query the corresponding k counting bloom filters.
If not hit happens in all the filters, it means that the file does not exist.
Otherwise, the client turns to ask the master that is in charge of the file the client would like to access.
Of course, there could be cases that more than one masters are responsible for the file the client would like to access according to the response of the filters.
This is due to the nature of bloom filter, and can be easily mitigated and handled.
Increasing the filter size can mitigate such rate of false hit.
Simply asking those two masters to make sure again can be helpful in handling this ambiguity.
As a whole, this design can solve the memory problem because k masters share the workload.
This design can solve the bandwidth problem because of the randomness in the choice made by the client.
I think we can raise the following questions:
1. Why the engineers in Google insist that, in the single master case, all the metadata should be stored in the memory in the master?
A: First, storing some of metadata in the disk may slow down the overall access speed. Second, doing so still cannot solve the bandwidth problem.
2. The CACM ariticle briefs Google's plan on the multiple master setting in GFS. Do you have any idea about their design? Besides, Do you have any idea how to coordinate multiple masters?
A: (We can decribe our above idea. I draw a conceptional picture. Perhaps we can attach this picture to the question-raising mail if you would like to.)
The following are some questions I personally want to ask:
1. Memory and bandwidth problems can be simultaneously solved in the multiple master setting. However, the multiple master setting has been described in the CACM article, though, very briefly. So, I think what we do is to detail how masters work by using our idea?
2. Although namespace file partition the metadata in a static way, we may still directly modify the namespace file so that after a specific time, the metadata that belongs to master 2 before the specific time belongs master 3. So, maybe it is also dynamic?
Q1:
In the CACM article, they mention a "static namespace partition file" approach to enable multiple masters over a pool of chunkservers.
The description, however, is very rough and we are very interested in it.
So, we would like to describe that idea more clearly and identify its pro and con in the class.
Answer to Q1:
First, a static namespace partition file is just a mapping table of directory <-> master ID.
Clients will read this table to find the "right" master.
Master will write to this table for a directory creation request.
The corresponding master adds an entry to this table when a directory creation request arrives.
And we think the client chooses a master randomly to server this request.
The content of this table must be consistent to all clients and masters.
Therefore, we think the reasonable approach is only masters have this table.
And the scenario for a read request should be as follows.
1. Whenever a client send a read request, it chooses randomly a master and sends the request.
2. The master looks up the table for the corresponding master and redirect that request to it.
3. The corresponding master serves the read request and send the result back to client.
Therefore, each master has a "DISPATCH" functionality, i.e., to dispatch request to the right master according to the mapping table.
Things needed to be pointed out:
1. Although they call it namespace partition "file", we don't think this information is stored in file(in the disk).
It should resides in the memories of masters in order to keep faster response. After all, all requests must consult this table.
2. The granularity of partition is "directory". There are pro and con here.
pro 1: Large granularity results in less synchronization. Synchronization overhead can be small.
pro 2: This granularity simplifies the modification of the design of master.
con 1: Large granularity may result in load imbalance both in terms of memory usage and served requests.
Once a directory is created, all metadata of files and chunks in that directory will store in one master.
And all requests for all metadata of files and chunks in that directory will direct to one master.
3. Since they allow multiple masters over a pool of chunkservers. Then each chunkserver must maintain another table to map chunk ID to master ID.
So that the chunkserver knows which chunk should report to which master according to that table.
Concluding remarks:
The "static namespace partition file" approach for multiple masters is just a workaround approach for GFS.
The two pros makes multiple masters design to be done quickly and release the burden of single master design.
However, the con described above may be a problem.
So, what about more fine granularity: "FILE".
We will address this problem in our project proposal.
Q2:
Why the namespace partition file is "STATIC" rather than "DYNAMIC"?
Again, we would like to describe what we have discussed in the class.
Answer to Q2:
First, we give our definition about STATIC and DYNAMIC:
"STATIC" means once the entry (directory to master ID) is added into the file, this mapping won't change in the future.
On the other hand, "DYNAMIC" means a directory, say "/A", can be mapped to master 1 at a time, and then mapped to master 2.
The reason why not allow dynamic:
If we allowed the mapping to be changed, we must support metadata migration, which means to move metadata from one master to another master.
Since the granularity of partition is directory, one can imagine that there will be LOTS of metadata need to migrate when the mapping is changed.
So, the overhead of migration should be large since all requests to that directory must be suspend until the migration is done, and the mapping information of chunkservers should be change accordingly.
we believe the reason why they don't allow dynamic is that they want to simplify design since this is just a workaround approach.
2010年3月25日 星期四
Master Problem in GFS:
1' memory usage problem
2' bandwidth bottleneck
3' single failure problem
suppose we use two Metadata Server(MS)
we would like to have:
1' load balance: the metadata is evenly distributed among MSes
approach:
use dispatcher: each query request is sent to dispatcher who will decide which MS
is going to serve this request.
Then we have single dispatcher problems.
So our approach is to embed dispatch function into MS.
A user wants to create a file, then it sends creation request to arbitrary MS.
The MS first decide which MS is going to serve this request
with one mysterious algorithm. It decides MS1 should serve this request.
So it redirect the creation request to MS1.
As long as we can distribute creation request evenly, we can end balance
1' memory usage problem
2' bandwidth bottleneck
3' single failure problem
suppose we use two Metadata Server(MS)
we would like to have:
1' load balance: the metadata is evenly distributed among MSes
approach:
use dispatcher: each query request is sent to dispatcher who will decide which MS
is going to serve this request.
Then we have single dispatcher problems.
So our approach is to embed dispatch function into MS.
A user wants to create a file, then it sends creation request to arbitrary MS.
The MS first decide which MS is going to serve this request
with one mysterious algorithm. It decides MS1 should serve this request.
So it redirect the creation request to MS1.
As long as we can distribute creation request evenly, we can end balance
2010年3月24日 星期三
2010年3月22日 星期一
我還需要麼,
我需要的是:
1、一個 test program。
2、列出它用到的 opcode 及 operands 的值
3、qemu 停止translate 的 opcode 以及 它的處理方式。
4、目前code gen的 overhead是否高?
5、如何確定translation function寫對?
exec.c:910
tb_link_phy_page??????
今天的進度:
1、要完成 test case 的 qemu result
2、要檢驗目前完成的 qemu
call 的
正確性!
要如何確定正確性?
我本來是想要用 qemu 的 output 來當做正確性,
就是將 translate 過後的 inst 來比較
不過這方法只要會碰到 memory 就一定會 segmentation fault,
應該沒關係吧?
另外是有關 log file 的問題,exec.c: logfilename, 及 main.c 中有 DEFAULT_LOG_FILE
需要可以讓我指定檔名。
log file 何時開啟的?
加了 logfile 的 option,可是一定要在 -d 的後面,一定!
qemu stop conditions 有 58 個 instructions:
0059 addl $1000, %eax
0065 addl $1, %eax
0066 addl %gs:0, %edx
0272 call *%gs:16
0273 call *%eax
0277 call -100149
0467 cmpsb
0768 imull -4740(%ebp)
0809 int $128
0912 jae 1
0913 jae -135
0914 ja 1
0915 ja -1163
0916 jbe 100
0917 jbe -1281
0918 jb 10
0919 jb -156
0920 jcxz 44
0921 je 1
0922 je -10119
0923 jge 107
0924 jge -1081
0925 jg 100
0926 jg -1047
0927 jle 10
0928 jle -1177
0929 jl 10
0930 jl -1285
0931 jmpl *100(%ebx)
0932 jmpl *%eax
0936 jmp 10
0937 jmp -1007
0938 jne 10
0939 jne -1012
0943 jnp -213
0944 jns 10
0945 jns -1546
0948 jp 13
0949 jp -196
0950 js 10
0951 js -1081
0987 leal (%eax,%eax), %ecx
1274 movl %eax, %gs:(%edx)
1280 movl %gs:(%eax), %eax
1281 movl %eax, %ebx
1371 movsb
1372 movsl
1424 movzwl (%eax), %eax
1469 fmuls (%eax,%ebx)
1982 ret
1983 ret $12
2059 sarl %eax
2158 shll $4, -116(%ebp)
2197 shrl $11, %edx
2264 stosb
2265 stosl
2302 subl (%eax), %ecx
2303 subl %eax, %ebx
condition code is a challenge
我需要的是:
1、一個 test program。
2、列出它用到的 opcode 及 operands 的值
3、qemu 停止translate 的 opcode 以及 它的處理方式。
4、目前code gen的 overhead是否高?
5、如何確定translation function寫對?
exec.c:910
tb_link_phy_page??????
今天的進度:
1、要完成 test case 的 qemu result
2、要檢驗目前完成的 qemu
call 的
正確性!
要如何確定正確性?
我本來是想要用 qemu 的 output 來當做正確性,
就是將 translate 過後的 inst 來比較
不過這方法只要會碰到 memory 就一定會 segmentation fault,
應該沒關係吧?
另外是有關 log file 的問題,exec.c: logfilename, 及 main.c 中有 DEFAULT_LOG_FILE
需要可以讓我指定檔名。
log file 何時開啟的?
加了 logfile 的 option,可是一定要在 -d 的後面,一定!
qemu stop conditions 有 58 個 instructions:
0059 addl $1000, %eax
0065 addl $1, %eax
0066 addl %gs:0, %edx
0272 call *%gs:16
0273 call *%eax
0277 call -100149
0467 cmpsb
0768 imull -4740(%ebp)
0809 int $128
0912 jae 1
0913 jae -135
0914 ja 1
0915 ja -1163
0916 jbe 100
0917 jbe -1281
0918 jb 10
0919 jb -156
0920 jcxz 44
0921 je 1
0922 je -10119
0923 jge 107
0924 jge -1081
0925 jg 100
0926 jg -1047
0927 jle 10
0928 jle -1177
0929 jl 10
0930 jl -1285
0931 jmpl *100(%ebx)
0932 jmpl *%eax
0936 jmp 10
0937 jmp -1007
0938 jne 10
0939 jne -1012
0943 jnp -213
0944 jns 10
0945 jns -1546
0948 jp 13
0949 jp -196
0950 js 10
0951 js -1081
0987 leal (%eax,%eax), %ecx
1274 movl %eax, %gs:(%edx)
1280 movl %gs:(%eax), %eax
1281 movl %eax, %ebx
1371 movsb
1372 movsl
1424 movzwl (%eax), %eax
1469 fmuls (%eax,%ebx)
1982 ret
1983 ret $12
2059 sarl %eax
2158 shll $4, -116(%ebp)
2197 shrl $11, %edx
2264 stosb
2265 stosl
2302 subl (%eax), %ecx
2303 subl %eax, %ebx
condition code is a challenge
2010年3月15日 星期一
我現在遇到的問題在於 debug,
我需要知道我是那一步錯了。
將 Makefile.target中的
#LDFLAGS+=-Wl,-shared
改成
LDFLAGS+=-Wl,-T,$(SRC_PATH)/$(ARCH).ld
就可以避免產生 PIE 的檔案。這樣就可以用 gdb 了。
現在遇到的另一個問題是:
要跳回qemu時的位置錯了!
就是我從qemu拿到的是 0x62933f88
可是經由 llvm 產生 call instruction 時的位置卻是0xdb24137。
而且這個值是會變的!
something wrong with "inttoptr" instruction
這是怎麼回事?
0x764, 1
28(Reg)
0x501, 2
28(Reg) 45(Reg)
0x4ff, 2
38(Reg) 5(Imm)
0x4ff, 2
27(Reg) 1(Imm)
0x501, 2
29(Reg) 38(Reg)
0x329, 1
128(Imm)
====================
現在好玩了,我發現 int 的處理是去 call helper_raise_interrupt
而非跳回去。
所以,只要 call 它就好了嗎?
call 了,可是跳不過去…
Question : how to link
可以了,哦耶!
要記得的教訓是:
1、要注意qemu對結束條件的處理,
像 int 0x80,qemu的處理法是呼叫 helper_raise_interrupt( 80, 02 )。
這要用 gdb 看才知道。
2、generated code的處理。LLVM的 CodeEmitter 將code產生在它自己管理的地方,
這部分要再處理一下。如,能否指定 memory 的位置,
目前已經可以產生 switch statement 以及 llvm translation functions header
我還需要麼,
我需要的是:
1、一個 test program。
2、列出它用到的 opcode 及 operands 的值
3、qemu 停止translate 的 opcode 以及 它的處理方式。
4、目前code gen的 overhead是否高?
5、如何確定translation function寫對?
exec.c:910
tb_link_phy_page??????
我需要知道我是那一步錯了。
將 Makefile.target中的
#LDFLAGS+=-Wl,-shared
改成
LDFLAGS+=-Wl,-T,$(SRC_PATH)/$(ARCH).ld
就可以避免產生 PIE 的檔案。這樣就可以用 gdb 了。
現在遇到的另一個問題是:
要跳回qemu時的位置錯了!
就是我從qemu拿到的是 0x62933f88
可是經由 llvm 產生 call instruction 時的位置卻是0xdb24137。
而且這個值是會變的!
something wrong with "inttoptr" instruction
這是怎麼回事?
0x764, 1
28(Reg)
0x501, 2
28(Reg) 45(Reg)
0x4ff, 2
38(Reg) 5(Imm)
0x4ff, 2
27(Reg) 1(Imm)
0x501, 2
29(Reg) 38(Reg)
0x329, 1
128(Imm)
====================
現在好玩了,我發現 int 的處理是去 call helper_raise_interrupt
而非跳回去。
所以,只要 call 它就好了嗎?
call 了,可是跳不過去…
Question : how to link
可以了,哦耶!
要記得的教訓是:
1、要注意qemu對結束條件的處理,
像 int 0x80,qemu的處理法是呼叫 helper_raise_interrupt( 80, 02 )。
這要用 gdb 看才知道。
2、generated code的處理。LLVM的 CodeEmitter 將code產生在它自己管理的地方,
這部分要再處理一下。如,能否指定 memory 的位置,
目前已經可以產生 switch statement 以及 llvm translation functions header
我還需要麼,
我需要的是:
1、一個 test program。
2、列出它用到的 opcode 及 operands 的值
3、qemu 停止translate 的 opcode 以及 它的處理方式。
4、目前code gen的 overhead是否高?
5、如何確定translation function寫對?
exec.c:910
tb_link_phy_page??????
2010年3月8日 星期一
3/8-3/12
=======3/8========
push %ebp
0x764, 1
28(Reg)
mov %esp,%ebp
0x501, 2
28(Reg) 45(Reg)
sub $0x8,%esp
0x8fd, 3
45(Reg) 45(Reg) 8(Imm)
movl $0x5,-0x4(%ebp)
0x4f9, 6
28(Reg) 1(Imm) 0(Reg) -4(Imm) 0(Reg) 5(Imm)
mov -0x4(%ebp),%eax
0x500, 6
27(Reg) 28(Reg) 1(Imm) 0(Reg) -4(Imm) 0(Reg)
mov %eax,(%esp)
0x4fa, 6
45(Reg) 1(Imm) 0(Reg) 0(Imm) 0(Reg) 27(Reg)
call 8048080
0x115, 1
-56(Imm)
===========================notes==========
1' the generated function contains a push in the first place, remove it!!!
2'
=======================================
use size_t to represent memory address size type
32bit VS 64bit, this is really a problem:w
:wG
現在的問題是如何解讀它 opcode?
第二個問題是為什麼compile出來的 qemu-i386是個 PIE?為什麼 gdb 會覺得它是?
我現在想幹嘛?
????????
instruction ID 到底是什麼?
=================
push %ebp
0x764, 1
28(Reg)
mov %esp,%ebp
0x501, 2
28(Reg) 45(Reg)
sub $0x8,%esp
0x8fd, 3
45(Reg) 45(Reg) 8(Imm)
movl $0x5,-0x4(%ebp)
0x4f9, 6
28(Reg) 1(Imm) 0(Reg) -4(Imm) 0(Reg) 5(Imm)
mov -0x4(%ebp),%eax
0x500, 6
27(Reg) 28(Reg) 1(Imm) 0(Reg) -4(Imm) 0(Reg)
mov %eax,(%esp)
0x4fa, 6
45(Reg) 1(Imm) 0(Reg) 0(Imm) 0(Reg) 27(Reg)
call 8048080
0x115, 1
-56(Imm)
===========================notes==========
1' the generated function contains a push in the first place, remove it!!!
2'
=======================================
use size_t to represent memory address size type
32bit VS 64bit, this is really a problem:w
:wG
2010年3月5日 星期五
2010年3月3日 星期三
Mongrel Cluster Setting Steps
http://azureusonrails.rubyforge.org/wiki/wiki.pl?Install/Mongrel_Cluster_With_Apache_2.2
2010年3月1日 星期一
3/1 - 3/5
過後的第二個星期,加油!喝!
translator 寫在一起算了!
加油,你可以的
另外,qemu 在 interrupt 之後就沒有再 disas code,why?
因為 syscall 的為 TARGET_NR_exit,所以就直接 _exit 了,
後面的 code 就不用 return 了。
首先:需要找出到底有那些 op 是我的 target?
先決定那些頁面要印,
我需要 instruction format 的 頁面,還有
REX prefixes are instruction-prefix bytes used in 64-bit mode. They do the following:
• Specify GPRs and SSE registers.
• Specify 64-bit operand size.
• Specify extended control registers.
要處理的 instructions 有:
83 c4 08 add $0x8,%esp
83 is op code, for ADD
c4 is ModR/M, 11 000 100 (2+3+3)
MOD: 11
Reg: 000
R/M:100
0000 1000
83 ec 08 sub $0x8,%esp
e8 c8 ff ff ff call 8048080
cd 80 int $0x80
5d pop %ebp
55 push %ebp
c3 ret
89 04 24 mov %eax,(%esp)
89 45 f8 mov %eax,-0x8(%ebp)
89 45 fc mov %eax,-0x4(%ebp)
89 cb mov %ecx,%ebx
89 e5 mov %esp,%ebp
8b 45 fc mov -0x4(%ebp),%eax
8b 4d fc mov -0x4(%ebp),%ecx
b8 01 00 00 00 mov $0x1,%eax
c7 45 fc 05 00 00 00 movl $0x5,-0x4(%ebp)
============3/2===============
每一個都要寫一個 disassembler
為什麼會這麼焦慮呢?
冷靜下來,好好的想,念「南無阿彌佗佛」,南無阿彌佗佛 南無阿彌佗佛 南無阿彌佗佛,肚子好餓,translator 寫在一起算了!
重寫 cpu_gen_code!!
tb->tb_jmp_offset : 用來跳到 code cache 的其他地方,
./tcg/i386/tcg-target.c:873
tb->tb_next_offset : 用來跳到 code cache 的其他地方,
./tcg/i386/tcg-target.c:881:
s->tb_next_offset[args[0]] = s->code_ptr - s->code_buf;
這兩個跟我們應該沒關係,我們不會用到。
幹幹幹幹幹幹幹幹幹幹幹幹幹幹幹幹幹幹幹幹幹幹幹幹幹幹幹
剛剛看了 tb_next_offset 的用途,並非沒用,
uint16_t tb_next_offset[2]; /* offset of original jump target */
uint16_t tb_jmp_offset[4]; /* offset of jump instruction */
unsigned long tb_next[2]; /* address of jump generated code */
tb_jmp_offset 在 ./exec-all.h:233 會用到,
offset = tb->tb_jmp_offset[n],
這是在 tb_set_jmp_target 中,
tb_set_jmp_target(TranslationBlock *tb, int n, unsigned long addr)
先取出 tb 中 第 n 個 tb_jmp_offset,在該處放置於 addr 相對的位置:
jmp_addr = tb->tc_ptr + offset;
*(uint32_t*)jmp_addr = add - ( T + 4);
而 tb_set_jmp_target是在 tb_add_jmp中呼叫的的,
tb_add_jump(TranslationBlock *tb, int n, TranslationBlock *tb_next)
它先 check tb->jmp_next[n]不為0,若是,則呼叫 tb_set_jmp_target
然後將 tb->jmp_next[n] = tb_next->jmp_first;
然後將 tb_next->jmp_first = (TranslationBlock *)((long)(tb) | (n));
上面有個我不知道的假設,它似乎拿最後面兩個 bit 來用,
可是它怎麼知道 TranslationBlock *tb 指到的位置都沒有用到最後面兩個 bit,
那你要去看那個 pointer 是怎麼來的,tb_alloc,
沒什麼特別,是不是所有的 pointer都是最後兩個bit沒用?
是的,是的,pointer 的最後兩個 bit 都是沒用的,因為
一個 pointer 是4個byte(在 32 位元的機器上)。
在 cpu_exec 中會呼叫 tb_add_jump,
tb_add_jump((TranslationBlock *)(next_tb & ~3), next_tb & 3, tb);
上面function的意思是指:
若 next_tb->jmp_next[n]不為0,n is "next_tb & 3"
則將 tb->tc_ptr 跟 (next_tb->tc_ptr + next_tb->tb_jmp_offset[n])的距離,
放在 next_tb->tc_ptr + next_tb->tb_jmp_offset[n] 所指的位置上。
真的會用到嗎?
會!做實驗了。
從source code document中提到關於 jmp_next:
/* list of TBs jumping to this one. This is a circular list using
the two least significant bits of the pointers to tell what is
the next pointer: 0 = jmp_next[0], 1 = jmp_next[1], 2 =
jmp_first */
what does epilogue do?
restore stack and pop out saved register
回傳值是藏那去?
算了,先不要回傳值好了,
from wikipedia:
http://en.wikipedia.org/wiki/X86_calling_conventions
the returned value is stored in EAX register
來研究一下等會要幹嘛好了,
一、綀
=============3/3=================
一、prologue 跟 epilogue 都可用,
但要將 tb_ret_addr 要能 pass 過 translator。
另一方面,要想,有沒可能自己寫 prologue 跟 epilogue?
是要寫在 translation function 中嗎?不行吧,這樣你怎麼做 linking?
還要跳過 prologue 跟 epilogue。
test program 中有兩個 basic blocks:
55 push %ebp
89 e5 mov %esp,%ebp
83 ec 08 sub $0x8,%esp
c7 45 fc 05 00 00 00 movl $0x5,-0x4(%ebp)
8b 45 fc mov -0x4(%ebp),%eax
89 04 24 mov %eax,(%esp)
e8 c8 ff ff ff call 8048080
55 push %ebp
89 e5 mov %esp,%ebp
83 ec 08 sub $0x8,%esp
8b 45 08 mov 0x8(%ebp),%eax
89 45 fc mov %eax,-0x4(%ebp)
8b 4d fc mov -0x4(%ebp),%ecx
b8 01 00 00 00 mov $0x1,%eax
89 cb mov %ecx,%ebx
cd 80 int $0x80
要處理的 instructions 有
83 c4 08 add $0x8,%esp
83 is op code
08 is immediate value 0x8
c4 is ModR/M, 11+000+100 (2+3+3)
MOD: 11
Reg: 000
R/M:100
83 ec 08 sub $0x8,%esp
e8 c8 ff ff ff call 8048080
cd 80 int $0x80
5d pop %ebp
55 push %ebp
c3 ret
89 04 24 mov %eax,(%esp)
89 45 f8 mov %eax,-0x8(%ebp)
89 45 fc mov %eax,-0x4(%ebp)
89 cb mov %ecx,%ebx
89 e5 mov %esp,%ebp
8b 45 fc mov -0x4(%ebp),%eax
8b 4d fc mov -0x4(%ebp),%ecx
b8 01 00 00 00 mov $0x1,%eax
c7 45 fc 05 00 00 00 movl $0x5,-0x4(%ebp)
in translate-all.c, line 88
CPUX86State 很大,有 28536 byte,一個 page 才 4k=4096 byte,它就佔了7個 page!
==================================
先將 disassembler 弄好。
2010年2月22日 星期一
2/22~2/26
吃飯吧
接下來就要開始寫程式了
要如何才能專心在寫程式上呢?
你都要31歲了,難道不能自制一點嗎?
你不要用責備的方式,這樣是沒有用的,
如果罵一罵有話我現在搞不好已經畢業了呢!
不罵行嗎?講十天也沒用,就你不要看書了,還是硬要看完,
是怎樣?!那幹嘛專程來中研院看書,在家裡看不就好了,
what does "extern" mean in C++?
the following URL provides a good explanation:
and, do I need to declare a extern "C" header file?
in llvm_translation.h, we should have a declaration as follows:
extern "C"{
typedef unsigned long (*PFunc_t)();
PFunc_t llvm_gen_code(CPUX86State *CPU1, const uint16_t *opc_buf, const uint32_t *opparam_buf);
}
now, goto qemu makefile
幹!專心啦
現在 make 是一個大問題
其實也還好,只要將 header include 進去,再 compile,再 link 就可以了,
llvm_gen_code要在那裡?
應該要放在 gen_code_slow 那邊吧:
========2/23============
工作工作,人生的意義就在工作裡
加一個target,llvm_translator 的 target,真想吃點東西,
Makefile.target,加在裡面:
$@: target name,
$<: dependent name
@CMD: execute CMD
寫個信給老師,說我禮拜三不會去,
幹!怎麼不吃大便,整天一直講話,
在 configure 中加入 llvm translator 的 config
可以在 configure 的時候決定是否要用 llvm_translator,
--enable-llvm-trans
要不要加在 config-host.mak。不用。
config-host.mak, config-host.h
只要加在 config.mak及config.h中即可。
config.mak, config.h
焦慮中
去洗臉吧
在LLVM中,JIT所產生的 code 是放在那?heap中嗎?還是mmap中呢?
有可能是在 heap 中,那會有什麼問題嗎?執行上的問題吧,
可是它可以執行啊;嗯,對,有可能是需要在 mmap 中,
ExecutionEngine::create 有另一個
create (
ModuleProvider *MP,
std::string *Err,
JITMemoryManager *JMM,
CodeGenOpt::Level OptLevel=CodeGenOpt::Default,
bool GVsWithCode=true)
其中 JITMemoryManager 就是控制 Memory 的 class,
但問題是,如何將把它跟 QEMU 的memory control弄在一起,
所以現在的問題是 qemu 是如何來控制 memory 的?
我猜是…,還真不知道,
我只知道滿了就清空這個,
乾脆將tb->tc_ptr = llvm_gen_code 算了,
這樣可能會死!死了再說!!!!!!!!
先能動!
走走走走走
gen_opc_buf: uint16_t buf[OPC_BUF_SIZE]
gen_opc_ptr: uint16_t *, move inside the buffer
又來了,幹,幹嘛一直看圖片啦,是在幹什麼!
下班下班,工作的意義在於下班!
===========2/24======================
今天要報告什麼呢?
上上禮拜五請假,禮拜四meeting,
報告我做的事好了:
1、將 整合進 qemu 的 Makefile 中,
2、指出 LLVM 是用 JITMemoryManager 來管理 translated code cache
但 qemu 是自己管理 translated code cache,在
qemu 的 code cache 管理很簡單,只是一直放 code,如果 buffer 滿了,就清空。
cpu_gen_code 需要return gen_code_size,
因為它需要 advance code_gen_ptr;code_gen_ptr指到目前可以放code的位置。
regs_to_env is empty in i386
translated code alignment problem in QEMU,
QEMU forces all generated code to be 16-bit aligned!!!!
IR Cache - 這會存在 Module 中,
Basic Block Cache,這會存在 JITMemoryManager 中,
所以如果以後要做 Code Cache Memory 的研究,應該 extend 這
=========2/25=============
今天要做什麼呢?
可以做 translate.c 裡的 disas_insn_llvm 這個 fuction
也可以寫一個 translation function,
可是你還沒訴 Makefile 可以找到 llvm-translator.cpp 這個檔,do it!
結果只是 rules 寫錯,幹!
加入 llvm 的檔案,compile 需要,
cscope -R -b
整理一下,
今天已成功將 llvm translator integrate 到 qemu 的 make system
遇到的困難有,c 與 c++ 檔案的 link 的問題,
主要是我想在 llvm-translation.h 中將 llvm_gen_code_wrap 這個程式用 static 的
方式用在 translate-all.c 中,
不過好像不應該寫在 header file 中,直接寫在 translate-all.c 中。
but,anyway,想講的是,如果在 extern "C" 中要用到 static,就直接寫在裡面就好了,
如:
extern "C" {
static void* f (void){ /* write your code here */}
}
不要這樣:
extern "C" {
static void* f (void);
}
extern "C" static inline void* f(void)
{
/*write yout code here*/
}
上面的方式會有 linkage error 的錯誤訊息。
另外,今天也將 llvm 加入 svn 系統中,這樣就將 llvm 的版本固定下來了。
在 xeon 上跟 haru 上都可以打 make 編譯了。
做別的事轉換一下心情吧。
看paper。
2010年2月10日 星期三
2010年2月8日 星期一
2010 2.8 note
一、我是否要用 llvm project 來建我的東西?
嗯,我覺得不一定要用它的東西。可以考慮,不過不是現在要,先知道有這個東西。
LLVM實在是一個很好的project,很難相信他們可以寫這麼多的document。
docs/ProgrammersManual.html
包含很多使用 llvm 寫程式的實用知識,
今天要將 llvm-translation.h 編好,能不能用倒是其次,要能 compile 倒是真的。
去大便吧!
compile llvm ... done, installed in /home/tk/local/llvm-2.6
然後compile llvm-gcc ... done, installed in /home/tk/local/llvm-gcc-2.6
去洗個臉,然後找啟彰去走走。
去洗個臉。
照理說,dyngen 應該是要用程式寫出來的,
合理的想法是,先寫 llvm-database,然後再parse database裡的東西寫出。
不過這我目前沒法做到,
老師也沒預期我會這樣做,所以還是先用手寫吧。
2010年1月26日 星期二
廢
即然不想做事,就來寫點東西好了,主要是今天看到的:
1、Big-Endian V.S. Little-Endian
ON HOLY WARS AND A PLEA FOR PEACE by Danny Cohen
Danny Cohen 是 ADOBE 的兩位創始者之一,PostScript是他們發明的。
這篇是寫電腦科學中有名的問題,此君以幽默的語氣完整的介紹了此問題,
十分有趣,值得一讀。
2、
2010年1月20日 星期三
TODO:
1' find out <- this is llvm translation database
2' find out the translator engine <- llvm-op.h is the engine
3' add translator engine into llvm_gen_code
4' in the slide, you should mention the architecture of the framework, and
indicate which is done.
5' show the target instructions and say that you are going to build the translation function in the next week
1' find out <- this is llvm translation database
2' find out the translator engine <- llvm-op.h is the engine
3' add translator engine into llvm_gen_code
4' in the slide, you should mention the architecture of the framework, and
indicate which is done.
5' show the target instructions and say that you are going to build the translation function in the next week
2010年1月18日 星期一
QEMU programming notes
profiler => can we ignore it?
inhibit irq => true if hardware interrupts must be disabled for the next instruction
TF cpu flag => 8th bit of flags in x86, trap flag, system flag
jmp_opt in DisasContext => indicate whether to do block linking
what is GRP1? in
what is modrm?
Q: what are my target instructions?
test program is in ~/research/qemu/src/branches/llvm-qemu/tests/
add-i386.c
tmp.c
stay in disas_insn() in target-i386/translate.c
line 4122
I need a C programming language book
1' create a new fucking useless function!
mov
0x89 source is register, destination could be register
or memory location related to a register
訂閱:
文章 (Atom)