I summarize our discussion as follows.
As stated in the CACM article, the single master may pose some problems when tremendous amount of data are considered.
For example, the master is unable to store all the metadata in the memory.
In addition, the master would become the bandwidth bottleneck because it has to a large amount of connections.
Google now also wants to deal with this issue, and they propose to use multiple masters.
In their proposal, a file will be related to one of the masters in an static/unchangeable way.
Besides, it seems that they do not consider the load balance problem of masters in their design.
So, the following is our design, in which the load balance and bandwidth problems are simultaneously solved.
Assume that we have k masters, each of which takes care of certain chunkservers.
Each master stores k counting bloom filters, among which i-th bloom filter represents the association between the i-th master and the file-chunk mapping it is in charge of.
Note that each master broadcasts its association to the other masters so that each master can renew the counting bloom filters it stores.
When a client wants to access a file, what it should do is to randomly pick a master and then to query the corresponding k counting bloom filters.
If not hit happens in all the filters, it means that the file does not exist.
Otherwise, the client turns to ask the master that is in charge of the file the client would like to access.
Of course, there could be cases that more than one masters are responsible for the file the client would like to access according to the response of the filters.
This is due to the nature of bloom filter, and can be easily mitigated and handled.
Increasing the filter size can mitigate such rate of false hit.
Simply asking those two masters to make sure again can be helpful in handling this ambiguity.
As a whole, this design can solve the memory problem because k masters share the workload.
This design can solve the bandwidth problem because of the randomness in the choice made by the client.
For example, the master is unable to store all the metadata in the memory.
In addition, the master would become the bandwidth bottleneck because it has to a large amount of connections.
Google now also wants to deal with this issue, and they propose to use multiple masters.
In their proposal, a file will be related to one of the masters in an static/unchangeable way.
Besides, it seems that they do not consider the load balance problem of masters in their design.
So, the following is our design, in which the load balance and bandwidth problems are simultaneously solved.
Assume that we have k masters, each of which takes care of certain chunkservers.
Each master stores k counting bloom filters, among which i-th bloom filter represents the association between the i-th master and the file-chunk mapping it is in charge of.
Note that each master broadcasts its association to the other masters so that each master can renew the counting bloom filters it stores.
When a client wants to access a file, what it should do is to randomly pick a master and then to query the corresponding k counting bloom filters.
If not hit happens in all the filters, it means that the file does not exist.
Otherwise, the client turns to ask the master that is in charge of the file the client would like to access.
Of course, there could be cases that more than one masters are responsible for the file the client would like to access according to the response of the filters.
This is due to the nature of bloom filter, and can be easily mitigated and handled.
Increasing the filter size can mitigate such rate of false hit.
Simply asking those two masters to make sure again can be helpful in handling this ambiguity.
As a whole, this design can solve the memory problem because k masters share the workload.
This design can solve the bandwidth problem because of the randomness in the choice made by the client.
I think we can raise the following questions:
1. Why the engineers in Google insist that, in the single master case, all the metadata should be stored in the memory in the master?
A: First, storing some of metadata in the disk may slow down the overall access speed. Second, doing so still cannot solve the bandwidth problem.
2. The CACM ariticle briefs Google's plan on the multiple master setting in GFS. Do you have any idea about their design? Besides, Do you have any idea how to coordinate multiple masters?
A: (We can decribe our above idea. I draw a conceptional picture. Perhaps we can attach this picture to the question-raising mail if you would like to.)
The following are some questions I personally want to ask:
1. Memory and bandwidth problems can be simultaneously solved in the multiple master setting. However, the multiple master setting has been described in the CACM article, though, very briefly. So, I think what we do is to detail how masters work by using our idea?
2. Although namespace file partition the metadata in a static way, we may still directly modify the namespace file so that after a specific time, the metadata that belongs to master 2 before the specific time belongs master 3. So, maybe it is also dynamic?
Q1:
In the CACM article, they mention a "static namespace partition file" approach to enable multiple masters over a pool of chunkservers.
The description, however, is very rough and we are very interested in it.
So, we would like to describe that idea more clearly and identify its pro and con in the class.
Answer to Q1:
First, a static namespace partition file is just a mapping table of directory <-> master ID.
Clients will read this table to find the "right" master.
Master will write to this table for a directory creation request.
The corresponding master adds an entry to this table when a directory creation request arrives.
And we think the client chooses a master randomly to server this request.
The content of this table must be consistent to all clients and masters.
Therefore, we think the reasonable approach is only masters have this table.
And the scenario for a read request should be as follows.
1. Whenever a client send a read request, it chooses randomly a master and sends the request.
2. The master looks up the table for the corresponding master and redirect that request to it.
3. The corresponding master serves the read request and send the result back to client.
Therefore, each master has a "DISPATCH" functionality, i.e., to dispatch request to the right master according to the mapping table.
Things needed to be pointed out:
1. Although they call it namespace partition "file", we don't think this information is stored in file(in the disk).
It should resides in the memories of masters in order to keep faster response. After all, all requests must consult this table.
2. The granularity of partition is "directory". There are pro and con here.
pro 1: Large granularity results in less synchronization. Synchronization overhead can be small.
pro 2: This granularity simplifies the modification of the design of master.
con 1: Large granularity may result in load imbalance both in terms of memory usage and served requests.
Once a directory is created, all metadata of files and chunks in that directory will store in one master.
And all requests for all metadata of files and chunks in that directory will direct to one master.
3. Since they allow multiple masters over a pool of chunkservers. Then each chunkserver must maintain another table to map chunk ID to master ID.
So that the chunkserver knows which chunk should report to which master according to that table.
Concluding remarks:
The "static namespace partition file" approach for multiple masters is just a workaround approach for GFS.
The two pros makes multiple masters design to be done quickly and release the burden of single master design.
However, the con described above may be a problem.
So, what about more fine granularity: "FILE".
We will address this problem in our project proposal.
Q2:
Why the namespace partition file is "STATIC" rather than "DYNAMIC"?
Again, we would like to describe what we have discussed in the class.
Answer to Q2:
First, we give our definition about STATIC and DYNAMIC:
"STATIC" means once the entry (directory to master ID) is added into the file, this mapping won't change in the future.
On the other hand, "DYNAMIC" means a directory, say "/A", can be mapped to master 1 at a time, and then mapped to master 2.
The reason why not allow dynamic:
If we allowed the mapping to be changed, we must support metadata migration, which means to move metadata from one master to another master.
Since the granularity of partition is directory, one can imagine that there will be LOTS of metadata need to migrate when the mapping is changed.
So, the overhead of migration should be large since all requests to that directory must be suspend until the migration is done, and the mapping information of chunkservers should be change accordingly.
we believe the reason why they don't allow dynamic is that they want to simplify design since this is just a workaround approach.