SUMMARY
The paper discusses the issue of partitioning virtual machines in localized cloudlets to be used for offloading computations. Mobile devices must offload complex computations to more powerful remote resources which may be very heterogeneous in resources and architecture. The use of virtual machines (VMs) can be used to abstract these complexities away, making remote application code more modular. An issue introduced by doing this is that these VMs must be generated via images which can take a considerable amount of time to dynamically bootstrap. Additionally, the remoteness of the cloud can reduce the advantage of offloading computation, leading to the use of cloudlets. The authors implement a methodology for allocating VM partitions more quickly that can be used across both clouds and cloudlets.
METHODOLOGY
The basic methodology is that a cloudlet stores a set of common base-VMs which can then be overlaid with application code held by a mobile device that needs to offload computation.
The steps for the cloudlet’s execution are as follows:
- Cloudlet downloads base VM images from a cloud node.
- Mobile device contacts the cloudlet with the required ‘VM overlay’.
- The cloudlet ‘overlays’ the required changes on the base VM.
- Cloudlet activates the VM and notifies the mobile device.
- Mobile device performs its computations.
- Mobile device signals a disconnection from the cloudlet.
- Cloudlet sends back a 'residual' which contains any changes to the overlay.
- Cloudlet deletes the VM.
The authors make many optimizations to the basic VM system.
- One such optimization is 'deduplication', which eliminates redundant data between the memory and disk copies in the overlay.
- Another optimization involves alleviating some of the loss of granularity created by the VMs that makes minimizing the disk image size difficult. This is due to the fact that the host OS is unable to ‘see’ inside the guest OS’s to determine what files have been deleted in the guest’s filesystem. Two approaches are proposed, one in which the guest OS is able to issue directives to the underlying HDD/SSD to trim sectors, marking them as unused, and another in which the guest file system is ‘crawled’ by the host OS to determine which blocks inside the virtual disk are being used. The authors mention that both are used in their implementation for validation purposes.
- A third optimization involves trimming extra main store memory by exposing the page table to the VMM and allowing for unused pages to be purged from the overlay. This requires a specific module be added to the guest OS at runtime.
- They also parallelize or ‘pipeline’ various steps in the VM synthesis process by segmenting the overlay transfer and decompressing and applying each segment of the overlay independently.
- They also experiment with starting the overlaid VM early, assuming that the entire overlay is unnecessary in most instances.
GOOD ASPECTS
- The paper addresses a relevant and interesting problem, that of improving the performance of an architecture that is likely of great importance to future mobile and real-time applications.
- The paper shows the effects of many optimizations on the performance of an offloading system. Many of these are generalizable to other such systems, making them especially of interest.
- The authors evaluate each optimization for a diverse set of programs, comparing each to the baseline implementation and showing a clear improvement.
- The evaluation shows an order of magnitude difference between the baseline synthesis and the final, more optimized version. Each individual optimization shows a clear difference as well.
POOR ASPECTS
- A mobile device, which has memory constraints, is expected to store the overlays for at least one VM in order to facilitate the offloading.
- The energy required to send potentially hundreds of megabytes for the overlay for each application could also be prohibitive on a mobile device.
- The main memory optimization performed requires a Linux-specific module.
- The performance improvement, while significant, still leaves the total time to start the remote process at almost 10 seconds for some applications. Given that network uncertainties may degrade this further, some real-time applications may not be suitable to this system.
- The paper does not address the issue of a mobile device leaving before a computation completes. Should it continue in case the device re-enters or terminate?
- It also does not address the issue of what happens if the required VM image is not present.
- The paper is somewhat confusing at times, for example the use of ‘base’ and ‘launch’ VM leads to some confusing sentences. More distinct terms may have been better.
DISCUSSION QUESTIONS
- Do you think energy and memory constrained mobile devices are ideal for storing potentially hundreds of MBs of VM images, as well as paying the cost of transferring these images and receiving residuals?
- Do you think the VM should be deleted immediately upon disconnect or stored for other devices (or the same one) in the area to potentially reuse?
- What do you think should happen if the mobile device disconnects due to range or other network issues? Should the computation finish in case of a reconnect?
- Some of the optimizations listed in this paper are not new to the field of CS (for example, the early start, pipelining, and some of the deduplication efforts). How valuable do you feel it is to the quality of the work to see these ideas used in this setting?
Zach: these are very insightful and deep observations!
ReplyDeleteMore detailed:
ReplyDeletePoor 3,5,6 are true but probably 'nits' in that they are not unique to cloudlets (one can't solve all problems).
Not clear what you are getting at with discussion Q #4. Applying known techniques and showing their effectiveness in a new setting is valid research (in my opinion, at least).
It may have helped if I mentioned that the module mentioned in Poor 3 is not only Linux-specific but is developed by them and is not part of the standard Linux kernel.
DeletePoor 5, when combined with discussion question 2 deals with the issue that the framework seems to aggressively discard data that may be useful later on.
My primary concern with Poor 6 is that the authors make the assumption that the number of images needed will be small. They don't mention what happens if an image changes (due to updates in the OS of that image). Are they assuming the applications are stored centrally and could be updated to work on the new OS images like the App and Play store for mobile? Also, how would images on the local nodes be updated to avoid compatibility issues due to such updates?
In DQ #4 I was not questioning the value of reusing old techniques. I was commenting on the fact that the advantages of using these optimizations are already well known. For example, the idea that pre-starting an application will reduce the execution time is obvious. Focusing on already explored aspects of these optimizations makes it more of an engineering exercise rather than a scientific one. It was not an attack on reusing traditional approaches but rather a qualitative critique concerning how much new knowledge was really being contributed.