CSCi 8980 Edge Cloud Research: DeepCham: Collaborative Edge-Mediated Adaptive Deep Learning for Mobile Object Recognition

Background:

Deep learning algorithms are used a lot in computer vision related tasks i.e. in speech recognition, object recognition etc. These algorithms are compute intensive, so they are executed in cloud where sufficient resources can be provided. But with the recent developments in compute power of mobiles, these algorithms can be used in applications to provide the users with better functionalities at the same time minimize the bandwidth and latency. These models are developed with huge training data and there is no one model that can be used for all the instances.

The challenges with these algorithms are that they are domain specific and we cannot pre-train models a model. DeepCham is a framework that aims to solve object recognition problem by using edge computing. It uses a combination of a generic deep model and domain specific shallow model built on the training instances of the domain. DeepCham creates domain specific adaptation training instances using in-situ photos from participating mobile devices and enables deep model adaptation. The challenges with such a framework are that mobile photos are not labelled, every image contain multiple objects, privacy of the photographs etc.

Design:

DeepCham architecture consists of three components:

1. Domain aware selection Pipeline: This finds the images in the users mobile phones based on the domain.

2. Training instance generation pipeline: Every image may have multiple objects. This component helps in extracting objects that are essential using a deep generic model.
3. Adaption training pipeline: With all the training instances generated and labelled from the workers develop a shallow model.

The stepwise execution is as follows:

1. The initiator (anyone who starts an adaption task) selects the domain it wanted to train and sends request to the master specifying the characteristics of the domain i.e. location, date, time etc.
2. If there exists a model for the specified characteristics with the master, the initiator is given an option of whether to use the existing or create a new model. Else it will start creating new model.
3. The master then recruits the workers that and starts executing the tasks as collaborative group.
4. The first step for every worker is to select photos that matches requirements. In a single mobile this can be done by using system API to get the details of images. The user has upper limit of how many images he wants to process. Based on this a clustering algorithm is run to cluster the subset of images into the upper limit of the users and select one image per cluster to maximize the information obtained.
The challenge arises when multiple workers are involved to avoid duplicate objects. This is solved by a combination of two synchronous methods – one-after-another, all-together.
5. If an image has multiple objects, the first task is to crop them as training images such that every image contains single object to be labelled. This is obtained by Enhanced Bounding Boxes Strategy (EBBS) that operates on the output of Edge boxes algorithm.
6. Each user for every image labels the object using auto complete suggestion by Google etc.
7. Once the labelling is done, the images are sent to master where a shallow model is developed and sent to the initiator.

Object recognition:

From the generic deep model a domain constrained deep model is generated which contains only the object classes included in the target visual domain. Now we have both the domain constrained deep model and shallow model. A fused probability is calculated using a linear interpolation, this is a technique call late fusion and final class is determines.

Strengths:
1. Every stage of the pipeline is explained in detail and the design choices are supported by evidence from the experiments.
2. This paper uses late fusion methods which avoid the creation of deep model for every target domain which is an impossible task.
3. The initial processing of the images takes place at the mobile level and this saves the bandwidth for the communication.
4. The paper also takes into account the energy issues of the mobile devices while deciding on the number of images to process.
5. The edge server that is located at one-hop is used which reduces communication and network latency is limited.

Weakness:
1. The deployment of “one-hop” edge servers effects the scalability of the solution.
2. The paper does not take into account the dynamic nature of the mobile devices? What happens when a mobile device is disconnected from the master?
3. While using the synchronization algorithm, how is the privacy issue taken care when the images features are transferred between mobile phones in one after another scenario.
4. The paper doesn’t mention any incentives for the users to join the network and participate in the process.
5. The pipeline consists of some manual work from the user to label the photos and no experiment regarding user experience is done.
6. Not much information is given on how the master is recruiting the users? Is any broadcast message sent to all the mobile devices within that radius?
7. The privacy aspects are not much discussed in the paper.
8. Human involvement in selecting the parameters for the model require the user to have sufficient knowledge about the framework.
9. An assumption that the paper is making is that the users provide correct labels for all the training instances.

Discussion:
1. What all applications can be developed with this kind of architecture as base framework?
2. Can social networking sites where photos are tagged, be used to get the training instances from remote servers based on the location and time?
3. How scalable is the solution in terms of user participation/ the deployment of servers?
4. Can this framework be extended for other kind of domains such as speech recognition etc.?

CSCi 8980 Edge Cloud Research

Monday 10 April 2017

DeepCham: Collaborative Edge-Mediated Adaptive Deep Learning for Mobile Object Recognition

1 comment: