Tuesday 31 January 2017

ThinkAir: Dynamic resource allocation and parallel execution in cloud for mobile code offloading

Summary:
    ThinkAir is a framework that allows complex mobile applications to be partially run at cloud, thereby removing the restrictions from the handhelds of limited processing and storage capabilities, battery life etc. ThinkAir attains this with remote execution of code (method-level computation offloading) using multiple virtual machines (VM) at the cloud. The main focuses of the paper are to address 'scalability' issues and to demonstrate parallel execution of offloaded tasks by exploiting multiple VMs available at cloud. ThinkAir claims superiority over related solutions like MAUI and CloneCloud with its on-demand and scalable infrastructure.

  ToolChain: (i). ThinkAir Library (API) (ii). Compiler (iii). VM manager and, (iv). Parallel processing module at cloud.

  Flow:
    Offline :-
      Programmer annotates (@Remote) at the code on methods that the system could consider as candidates for remote execution.
      The ThinkAir compiler translates the annotated code and generates remoteable method wrappers and utility functions.
    Online :-
                                     Mobile                                                                     Cloud                                    -------------------------------------------------------------------------------------------------------------------------
   Execution Controller at the mobile decides during run time
   if the executing method could be transferred to cloud or
   proceed with local execution.
      - utilizes various profilers (Software, Hardware and Network)
         to determine if the execution switch is needed.
      - transferred if the input exceeds a Boundary Input Value (BIV)


                                                                                    A light-weight application at a surrogate
                                                                                    server called Client Handler receives
                                                                                    and executes offloaded code and data.
                                                                                       - it instantiates one or more VMs and
                                                                                         delegates the task to it.
                                                                                       - secondary VMs are in 'paused' state
                                                                                         and could be resumed for task
                                                                                         processing
                                                                                       - parallelization by direct splitting of
                                                                                         sub-tasks to different VMs.
 ------------------------------------------------------------------------------------------------------------------------

Objectives and solutions:
1. Dynamic adaptation:
            what?
                adapt quickly as conditions change to achieve high performance and ensure correctness.
            how?
                - if connectivity is lost with server, the framework falls back to local execution.
                - exceptions thrown at application server are caught and re-thrown at client side.
                - OutofMemory situations at App.server are handled by cloning a powerful VM to own the task.
2. Ease of use:
            what?
                less learning curve owing to a simple interface to developers.
            how?
                - no modifying cost involved other than tag/annotate the methods at the code.
                - semi-automatic offloading of code based on real-time environmental factors assessed by profilers.
 3. Performance improvement:
            what?
                improve computational performance and power efficiency of mobile devices by leveraging mobile clouds.
            how?
                - cloud-augmented execution of offloaded tasks exploiting smartphone virtualization techniques (Android x86 + Virtual Box)
                - recursive and data-intensive algorithms are split into multiple tasks and are distributed over multiple VMs to achieve an efficient parallel execution model.
 4. Dynamic scaling:
            what?
                dynamically scale the computational power at the application server to optimize the performance.
            how?
                - on-demand resource allocation: client can request for extra computational power.
                - exploits parallelism by dynamically creating, resuming and destroying VMs.

Limitations:

        1. High VM instantiation time (32s is a lot!!); even though it is addressed at the paper by retaining the VMs at a 'paused' state and thereby reduces the need to create a VM frequently, it has been noted that even for seven simultaneous requests, the resume time takes over 7 seconds. This adds a considerable overhead to processing the tasks.
        2. ThinkAir assumes a trustworthy cloud server exec environment: there is hope that whenever data is offloaded to the cloud, the code and state of the data are not maliciously modified or stolen! No authentication mechanism involved; the app server will process requests to process any submissions from any client (it has been proposed to address this security concern in a future work).
        3. It is assumed that the smartphone and smartphone clone at the cloud will be pre-synchronized. And, it is not discussed anywhere in the paper.
 

Scope for discussions:

        1. Did they say multi-users? It is understandable from the paper that the ThinkAir system doesn't consider the tasks based on the users but based on different processes (or applications). If more number of users are connected to the server, it limits the scalability scope - as each VM reserve considerable amount of memory (the paused VMs also hold the memory, only CPU cycles will not be consumed!). This posts a resource constraint and casts doubts on the commercial deployment of the system.
        2. It was not discussed in detail how the profiling of the VMs will be performed at the server side of the framework. If more applications are processed by the server at the same time, some applications will be delayed to get VM for processing and the profiler data will also include the wait-time period which will negatively influence the future decisions by the Execution Controller.
        3. More discussion related to network constraints are to be present. It is assumed that the Round Trip Time between the clients and server will be negligible; High bandwidth is assumed.
        4. ThinkAir uses a sub-optimal data transfer approach. Not all of data instance objects will be required by the cloud for processing; and between successive execution calls from the same client and application, the state changes could be limited. This idea could be utilized to cache the data at server side and work with incremental updates from the client on successive processing calls.
        5. Will the users/clients be remembered between tasks? If there's a client queue overflow scenario at the app server, how will it be addressed? Will there be any priority based scheduling?
        6. BIV is estimated at client side. So, it is understandable that BIV is learnt by each instance of the process (application). This requires each client to start from the scratch for BIV learning, thus submitting all batches it sees as annotated as 'remote'.
        7. Possible use-cases for deploying ThinkAir:
                i.   High-end processing: image processing applications like Face recognition systems
                ii.  Games: n-queens, sudoku solver etc.
                iii. Social media processing: video decoding, image file conversions etc.
                iv.  other data intensive applications?
        8. How they compare with other related works, MAUI, CloneCloud, COMET etc.?
 

Some other interesting observations at the paper:
 
       1. cascading profilers have issues with working distributed kernel?
        2. energy usage pattern for 3g vs wifi
        3. networking cannot be neglected in a distributed solution (from the performance discussion between 3g, wifi and wifi-local)
        4. communications may impact dramatically on performance
        5. carefully select proper technologies

1 comment:

  1. Ajay, this is an excellent blog post. Most of the points you made was relevant and worth discussion!

    ReplyDelete

Note: only a member of this blog may post a comment.