Tuesday 11 April 2017

DeepCham

The authors propose a new deep-learning image recognition model for efficiently recognizing objects in an environment via mobile devices. The goal is to improve over previous work, particularly when it comes to non-stationary objects.

The primary goals for their system are:

  1. To require minimal human effort
  2. Respect privacy
  3. Efficiently utilize resources
  4. Be done in real-time
They mention their contributions include the development of mobile-based framework as well as a series of artifacts for creating fast yet flexible image recognition services using this framework, as well as an evaluation of said framework.

The authors mention that the use of the edge is necessary due to the constrained nature of mobile devices (limited bandwidth, energy) as well as the issue of security/privacy. This, along with the minimal annotation typical of mobile photos, as well as the redundancy issue of users in the same area taking the same photo from different angles, are the challenges they face.

Strengths
  1. The authors evaluated each step of their process very effectively, showing the improvement that each method, as well as the system as a whole are improvements over previous work.
  2. The authors clearly take into account the difficulties in the mobile environment, taking necessary steps to avoid costly communications (one hop policy).
  3. Some of their algorithms are well-described, both in words and graphically. This makes understanding the work easier for someone not overly familiar with typical image-recognition methods.

Weaknesses
  1. The master-worker model they utilize seems like it would be quite brittle. There is no specified behavior in the event of a master failure. They also don't specify if the master is static or dynamically chosen.
  2. They don't address the issue of discovery. How do masters and workers find each other?
  3. The authors mention that there is an issue that multiple workers may give different labels to the same object due to having taken the images at different angles, and yet they don't seem to indicate how this gets resolved.
  4. They mention that the master node receives a request from an initial worker and then decides whether to utilize an existing training method or a new one, but no formality is given to what this method comprises.
  5. The authors mention that there is the possibility that users may be asked to start taking photos of an object to help identify it. Not only is this somewhat intrusive and without incentive, one could argue that simply asking the user 'hey, what is this thing?' might be more efficient, since the goal is object recognition. The authors do discuss this in a latter section, but the argument they use that this is somehow more intrusive than asking the user to take photos is weak.
  6. The authors don't seem to take into account the possibility of remote regions when no server is available for the master. The necessary characteristics of a master server are also not stated.
  7. The authors' goal of being minimally intrusive seems to be ignored by the insistence of asking for photos/labeling.
  8. Privacy is still an issue, as the master server does not seem to be authenticated.
Discussion Questions
  1. Do you agree with the authors that asking a user for photos of an object is better than asking them to simply label a photo?
  2. What mechanisms would you suggest for increasing privacy? Should a user be able to specify whether a photo can be used by this framework when they take it?
  3. The authors state that there's an issue of malicious users providing bad labels, what other ways do you think such users could mess with this system? How would you avoid/compensate for these.
  4. What applications can you think of that specifically require photos from local users, gathered in real-time, as opposed to simply mining the cloud for previous photos?

1 comment:

Note: only a member of this blog may post a comment.