This paper frames the task of computation offloading as an optimization problem which can be solved using parametric analysis.
Computation Offloading is useful only when it leads to an overall improvement in performance and energy saving. Therefore, when making a decision about whether or not a particular computation should be offloaded there exists a tradeoff between the communication cost and the computation cost. The premise of this paper is based on the fact that for most applications these costs depend on the input parameters of the specific application instance and hence an optimal program partitioning cannot be completely determined a priori. The paper associates with the program a cost formula which uses the runtime values of parameters to efficiently determine an optimal partitioning.
The solution presented by this paper is as follows -
- Program Partitioning -
- The program is partitioned into schedulable tasks.
- A Task Control Flow Graph (TCFG) is used to represent the program. Each node is a task and the directed edges denote the associated data transfer between these tasks.
- ‘Data validity States’ are maintained on each host to avoid redundant data transfers.
- ‘Memory Abstraction’ is used to resolve the issue of unknown data dependencies.
- Cost Analysis -
- Computation, Communication, Scheduling and Data Registration (Validity) costs are expressed as functions of program input parameter values.
- These are then used to derive a cost formula which must be optimised.
- This optimisation problem is modelled as a ‘min-cut network flow’ problem
- A parametric partitioning algorithm is used to solve this problem and the result is an optimal program partitioning, one for each possible range of input parameters.
- At runtime, the program’s tasks schedule themselves on either host based on the current parameters and the corresponding partitioning solution from 2.d.
STRENGTHS OF THE PAPER
- The Program Partitioning Model could be an important take away from this paper. Most computation offloading schemes require restructuring of the program and the details discussed in section 2 may easily be applied to other offloading techniques. It clearly outlines the task granularity and breakpoints and the TCFG provides a good visualisation.
- Most of the program partitioning and cost analysis work is done beforehand which means that this does not add latency unlike frameworks that make offloading decisions completely at runtime. At the same time these decisions are not agnostic to the program execution state. At runtime, the program refers to a predetermined chart to find its optimal partitioning which means that better decisions are made than a purely static system.
- The authors provide a lot of details of the experimental setup which makes their results easier to understand and accept. The experiments are run with 4 different applications which differ in their complexity, the number of input parameters as well as the results of the parametric analysis.
- The Results of this approach as discussed in this paper are impressive. There is a 37% performance improvement with offloading when compared to running the whole application locally.
- The writing style of the paper makes understanding very involved math and a lot of details relatively easier due to the abundance of examples, edge cases and explanations. A few instances of this are -
- The authors state that different input parameters can require different optimal partitioning and they put this point across well with the help of an exact example of an audio file encoder in section 1.1
- The authors provide reasoning for their decisions where possible such as they explain why the granularity of a task must be smaller than a function in section 2.
- Edge Cases are handled where possible like section 2.2 provides the ‘Data Validity States’ to ensure communication costs are as close to real as possible.
WEAKNESSES OF THE PAPER
- Program Partitioning -
- The program partitioning discussed in this paper analyses the code line by line to find task headers and branches as discussed in section 2.1. This means that considerable programer effort is required to restructure the program to break it into tasks. An even deeper understanding may be required to construct the correct TCFG.
- A point that is discussed in [2] states that such methods can only be applied to specific kinds of applications (multimedia). I believe this is true because it may be impossible to model very complex applications as a simple graph.
- Cost Analysis
- Extensive Experiments are needed to determine the values of the constants in the Cost Formula. These experiments must be re-run each time the application logic is changed and for every new application.
- For the more complex applications, user annotations (section 3.4) may be required to express the costs as functions of the input parameters.
- The results of parametric analysis are different partitioning for different ranges of input parameters. The range of an input parameter can be very large and more details are necessary to understand how these results are stored/evaluated/searched at runtime.
- Experiments -
- Experiments are only conducted with multimedia applications. As discussed in [1], experiments with simpler applications such as text editors could also have been included.
- The experiments do not provide comparisons with other similar frameworks and any exact numbers on energy consumption.
- Scheduling -
- The paper talks about scheduling only very briefly in section 2.
- Only one host is active at any point but this unnecessarily adds delay to the application execution. While the overall runtime reduces due to the offloading, such scheduling is not fully exploiting the benefit of a distributed architecture.
- There is a lot of additional bookkeeping required in terms of the Data Validity States, the Abstract Memory Locations, the Mapping Tables etc which need to be maintained at both the client as well as the server.
- This paper does not address the question of the ‘How’ to offload and hence has no discussion on fault tolerance, security or details of the message passing architecture.
- Weaknesses discussed in the paper itself (section 5.2 & 5.3)
- The Degeneracy problem which exists due to the nature of the linear systems itself
- The Path sensitivity problem which says that the order in which tasks are executed is not considered while making offloading decision.
DISCUSSION POINTS
- For the more mathematically minded -
The paper frames the task of finding an optimal partitioning into an optimization problem and then goes on to model it as a ‘min-cut network flow’ problem. The paper does not explain why this approach is selected. So why this approach? Could another technique be used just as well?
- Is the 37% improvement in performance really worth all this extra effort? Do other techniques provide comparable improvements with lesser effort?
- How does this approach compare to MAUI, CloneCloud and COMET in terms of -
- The complexity of applications targeted
- Being fully automatic
- The offloading decision (dynamic or static)
More Related to this Paper and Computation Offloading -
This is an amazing blog entry! You raised many great points for discussion.
ReplyDelete