Wednesday 25 January 2017

COMET: Code Offload by Migrating Execution Transparently

Overview

COMET is a runtime system that allows unmodified mutli-threaded applications to utilize multiple machines. It is publicly available here.


The design goals are: correctness in multi-threaded programs, speed-up of computation, no manual effort, fault tolerance, and generalize with existing applications.

COMET uses Distributed Shared Memory (DSM) to access and change memory between systems (as opposed to Remote Procedure Calls) thus allowing for multi-threading support and thread migration. It's the first to apply DSM to offloading.

The system operates on a VM-Synchronization primitive using a push-pull protocol. Deltas among updates of objects is tracked through a "tracked set table" that denotes dirty fields.
1. The pusher and puller enter an executable exchange protocol to exchange binaries.
2. The pusher sends over information about each thread. All local threads are temporarily suspended.
3. The pusher sends over an update of the shared heap.
4. The puller buffers the rest of the synchronization operation, then temporarily suspending its local threads (just like step 3 on the pusher's end). It merges in update to the heap first. Then it pulls in updates to the stack.


Advantages

  • Significant speed-up in computation while still correctly managing multi-threaded programs.
  • Even with some amount of synchronization between threads, COMET still operates well. 
    • The paper demonstrates this through an application that favors a queue of integers. There was an impressive speedup of 202x of Wi-Fi (equating to 44 minutes locally compared to 13 seconds over Wi-Fi).
  • As a side-effect of improving the speed of computation, COMET also improves energy-efficiency by offloading computationally intensive portions of code.
  • Failure recovery is almost cost-free. Clients can just resume computations on server failure.
    • However, the client should never enter a non-recoverable state upon server failure. Thus wait for all changes to be pulled/buffered before committing any changes.
  • Smart adaptability to network conditions in terms of latency and bandwidth of the connection.

Shortcomings

  • The authors concede that there is no built-in security mechanisms that protect data privacy or mitigate and tampering with the computation results. 
    • Thus, the client must trust the server. However the converse does not apply because the server has no private data or dependency on the accuracy of results.
    • This may limit usage for enterprises -- ECOS (Gember et al.) attempts to address this problem of ensuring data privacy.
  •  COMET may decide to send over data that is not needed for computation thus wasting bandwidth.
  • The scheduling algorithm tasked with moving threads between endpoints (in order to maximize throughput) is somewhat naive. Essentially, a thread is migrated when its time has exceeded some (configurable) parameter t
    • However, it should be noted that scheduling isn't a main component of the paper.

Discussion Points

  • The authors state that at the time of writing (2012) they had not found many mobile applications that rely on heavy computation. As a result, the practicality of COMET can be put to question -- however, the authors quickly follow-up with the statement that COMET can now allow for these types of applications to exist and is further generalized to existing applications that require no offloading logic.
    • In the last 4 years, have mobile applications become any more computationally intensive?
  • COMET is limited to Android systems for its reliance on the Java Memory Model and how synchronicity is conducted. An interesting experiment would researching an implementation for iOS devices.
    • The authors do state that the design of COMET is general enough that it can be applied to other environments such as Microsoft's Common Language Runtime.
  • How can the scheduler be improved? In other words, how can we increase throughput via scheduling thread migrations?

2 comments:

  1. Excellent job. You raised a number of really good discussion points.

    ReplyDelete
  2. Additional comments: porting to iOS may not be an interesting research topic (though a good engineering task). I would also emphasize the value of building this on a DSM system. In terms of improving scheduling -- what results or assumptions in the paper motivate you to ask this as one could always ask of nay paper 'can you do better'?

    ReplyDelete

Note: only a member of this blog may post a comment.