Thursday 23 March 2017

QuiltView: a Crowd-Sourced Video Response System

QuiltView: a Crowd-Sourced Video Response System


Gist of the paper:

QuiltView leverages the functionality of Google glass micro-interactions to provide a first person view point video to users enabling them to send out real time queries to other subscribed users who reply with video snippets. Furthermore, using this approach, the paper goes on to consider creating a real-time social network capitalizing on the rich meaningfulness of videos Vs texts. A few more use cases have been mentioned in the paper such as finding a missing child, traffic emergency, free food finder etc-
They also cache the results, use geolocation and query similarity detection in order to optimize the number of queries the users receive. 

Architecture: 
A Global catalogue of users & their preferences, queries and responses with Youtube URLs of the videos uploaded by the responding users is maintained(Both the uploading and viewing of videos are done using standard YouTube mechanisms that are wrapped inside QuiltView query and response software).
 The QuiltView catalog is used as a result cache that short-circuits query processing.
Queries are posed using a web interface to Google Maps as shown below:




Implementation Details:
  1. Glass Client: Implemented with the Glass Development Kit (GDK) and Google Cloud Messaging as the service to talk to the Glass devices from the central cloud.
  2. Quit View Service: QuiltView is implemented as a web-based service in a single virtual machine at Amazon EC2 East. Standard load balancing and scaling mechanisms for web-based services can be used in the future to cope with increased load. They assume one device belongs to only one user.
  3. Query Workflow:
    1) User zooms into the area he wants the response from, a location share link is created on this map
    2) He then types the text of his query, uploads any associated image (such as a picture of a missing child or pet), and adds details such as reward ordered and desired timeliness of responses.
    3) Then the query optimizer works in order to ascertain if there are any cached results already pertaining to any similar query.
  4. Query Similarity detection: They use  Gensim, open source framework for unsupervised semantic modeling and Latent Dirichlet Allocation model for topical inference. They give three scenarios for when the similarity detection is correct, when it gives a false positive and when it gives a false negative. While the false negative is just as they mention an ‘opportunity lost’ but false positives could cost heavy.
  5. Synthetic Users: Due to the absence of enough Glass devices, most of the glass devices have been simulated. These do not send out queries but are responders.

Key Assumptions:
  1. User Distraction is the major cost(Higher than Network bandwidth, storage, CPU and Power Utilization). [Thus we will not discuss the cost of uploading videos to youtube furthermore]
  2. Raw Data Vs Refined Data : They take note that transmitting raw data such as a video delivers more than delivering the refined data.
  3. Requests for videos are relatively rare (a few an hour, perhaps, for a typical user).

From the point of Distributed Systems:


Security aspects
  1. Glass clients use SSL to communicate with this service. 
  2. They assume one device belongs to only one user, and establish a decentralized authentication via based on the BrowserID protocol, allows a user to verify his/her identity via a participating email provider's OpenID or OAuth gateway. No new password creation is involved. While that might be a part of lessening user interaction involved, but it does not speak for strong privacy features.
Fault Tolerance: Not much has been mentioned about this in the paper, and this might as well be a negative. The presence of Django could also be a point of contention. 
It does not follow OOP features and a minor update to a dependency could cause breakage of code. In short, Django is highly break-prone.

Power Utilization: The paper does not speak about it but does give it lower priority than User Distraction. They are only implementing Polling from the cloud by the participating devices, which would be a downfall in this case.

Load Balancing : Since User distraction is their costliest metric, load balancing also has been implied upon this. The user can specify their preferences of not receiving queries in certain time/geographic locations and the number of queries in a certain time period that they wish to not exceed. Moreover, there is result caching with query similarity detection so multiple common videos are not uploaded and some cached results could be directly given to the user.

CPU Utilization: The paper, again does not speak about this directly.
Network Latency : The paper does not talk about this either
Processing Latency of ML : The paper does not talk about this either


Disadvantages:
Absence of concrete metrics to judge the working system. Also many issues have not been addresses, some of which are discussed below.
1) “Glass users may be spread over a large area such as a city or a county” - This means they are only considering geographical distances of a city/country, which considering there are such few glass devices with people anyway, might be too less to serve any actual use cases. 

2) For the Missing Child Use Case: “Many video responses are immediately received by the police, and they are soon able to apprehend the suspect and rescue the child”
In many use cases, one query is calling for multiple responses, the user has no way of regulating the number of responses they get. Many of these may not be relevant at all to the user.

3) QuiltView is implemented as a web-based service in a single virtual machine at Amazon EC2 East. Thus scaling has not been tested at this front.
If it were to scale, we take note that they poll from the central cloud and device mobility is also there, this would lead to additional considerations of which server a device should poll from and this would pile on more compute-related exhaustion to the device.

4) User needs to have an idea of the geographic location from where he wants the response. 

5) No word is mentioned about Network Latency, CPU Utilization, Power concerns(apart from moving out of Polling in future).

6) The use of Django(makes the features easy breakable)

7) False Positive Alarms: In case the user was provided with cached results, he might have to wait to reject some results, which takes on user distraction and makes this costly, no word has been mentioned as to how much more costly this operation becomes.

8) Speed of ML processing has not been mentioned. There might be added processing latency due to this.(As text corpus they have 9GB of English Wikipedia)

9) Does not mention how they choose the subset of potential responders: In fact, in the Load Balancing paragraph, they mention that after a small subset of users have been obtained based on their preferences, “QuiltView randomly chooses the desired number of users and delivers the query to them.” However in the beginning of the paper it is mentioned that 

10) The synthetic users do not pose any queries and do not move.


Advantages :
1) Leveraging the Richness of videos
2) A video-led social networking concept seems to hold potential.

Some of the future work that the paper acknowledges :
1) Dynamic adaptation of estimates based on actual experience is possible and more sophisticated user selection mechanisms (such as those based on user reputation)
can be envisioned for the future.

2) Scaling the number of central servers.

3) GCM-based notifications instead of polling from the centra server.

4) Introducing mobility in the synthetic users.


Suggestion Points from Reviewer :
  1. Can we have a tag based approach to finding similarities?
  2. Should the user be able to choose how many responses they want, else this would lead to network congestion?
  3. In case of uploading video as part of queries, there must be some constraints applied there as well, but even if videos are part of queries, not many responders might follow due to the excessive user distraction and bandwidth upload involved. Other's thoughts on this are welcome.
------------  Edit 1  ---------------
Suggestion/Discussion Points contd.

1. Privacy: Privacy is a concern because videos are taken and transmitted publicly. This could facilitate criminal activities against people included in the videos, their day-to-day activities may be tracked. This does not just breach individual privacy but also privacy laws of a region/state.

2. Local/Edge Processing:
1) One way in which this may be leveraged is peer-peer cache metadata checking. Thus, communication with the central cloud is not always needed.  
2) Furthermore, the ML processing might also be computes on the edge by the Glass devices. We could also have a set of volunteer nodes ready to do the ML Processing. However, seeing that we are feeding a text of 9GB as base, this doesn't seem to be an edge-compliant task.

3. Reliability: Reliability of videos received by the user is questionable due to a few issues:
1) They might be getting stale cached data, they would not know if they have to reject the video in this case.
2) There is still no way of ascertaining the reliability of the videos given

Timestamp: Thu, 1:19 pm

3 comments:

  1. Seems like the discussion points center on minor configuration issues. What about more significant issues -- local processing using the edge, privacy/security, reliability (are results real), etc.

    ReplyDelete
  2. Hi Professor Weissman,
    I have provided an edit to the blog post in the last part with more thoughts upon those issues.

    "1. Privacy: Privacy is a concern because videos are taken and transmitted publicly. This could facilitate criminal activities against people included in the videos, their day-to-day activities may be tracked. This does not just breach individual privacy but also privacy laws of a region/state.

    2. Local/Edge Processing:
    1) One way in which this may be leveraged is peer-peer cache metadata checking. Thus, communication with the central cloud is not always needed.
    2) Furthermore, the ML processing might also be computes on the edge by the Glass devices. We could also have a set of volunteer nodes ready to do the ML Processing. However, seeing that we are feeding a text of 9GB as base, this doesn't seem to be an edge-compliant task.

    3. Reliability: Reliability of videos received by the user is questionable due to a few issues:
    1) They might be getting stale cached data, they would not know if they have to reject the video in this case.
    2) There is still no way of ascertaining the reliability of the videos given."

    ReplyDelete
  3. Also, taking a leaf from the previous papers, there could be two way privacy:
    1) Anonymity of the users
    2) Third party control over the video contents, thus whatever part breaches public or individual privacy may be omitted.

    ReplyDelete

Note: only a member of this blog may post a comment.