Wednesday 8 March 2017

Nebula: Distributed Edge Cloud for Data Intensive Computing

Objective:
Nebula is a dispersed cloud infrastructure that uses voluntary edge resources for both computation and storage. It is designed with following goals:
1) Support for distributed data-intensive computing.
2) Location-aware resource management.
3) Sandboxed execution environment.
4) Fault tolerance.

Framework Architecture:
The main components of Nebula is shown in the above figure.
Data Nodes: they are volunteer servers which offer storage service to the system.
DataStore Master: it maintains the storage system metadata and makes data placement decisions.
Nebula Monitor: it monitors performance of volunteer nodes and network characteristics.
Nebula Central: it is a front-end for Nebula eco-system. It allows volunteers to join the system, application writers to inject applications into the system.
ComputerPool Master: it is used to coordinate the task executions. This is to say, it will choose best compute nodes to run a specific tasks.
Compute Nodes: it provides computation resources. Compute node will first download required data from data nodes and run tasks on the data. 

The Most Important Contribution of Nebula:
In the beginning of the paper, the authors list 4 goals. But I think the most interesting one is the location-aware resource management. With the help monitor service, Nebula can store data in a geo-distributed manner. Also, Nebula can process data in a geo-distributed manner. 

Strengths:
1) Location-aware resource management helps Nebula to achieve efficiency in geo-distributed computing framework.
2) DataStore master and ComputePool master  help Nebula to achieve fault tolerance. DataStore master will maintain enough replicates for one file in the system in case of file lost. ComputePool master will restart failed tasks if the master ensures the task has failed.
3) Compared to the previous volunteer computing models(central source central intermediate data and central source distributed intermediate data), Nebula makes a huge enhancement.

Weaknesses:
1) are there many volunteers in the real world to offer their own resources?
2) The most important part for Nebula is how it manage the geo-distributed resources. This paper doesn't give us a quantified metrics or method on how to store the data and disperse the computing tasks.
3) Nowadays, computing-intensive frameworks are much more common and these kinds of frameworks do not need expensive hardware. Besides, these frameworks can also do data-intensive work. Do we really need to design a new architecture?

Discussion & possible future direction:
1) Nebula is not very suitable to be a computing framework, but it can be good geo-distributed storage framework which can also offer fair computing ability by itself.
2) If you are researchers, how would you develop Nebula in the future?

1 comment:

  1. nice job. Point #1 is worth discussion -- I suppose it depends upon what type of computation an edge node could or should perform?

    ReplyDelete

Note: only a member of this blog may post a comment.