Wednesday, 12 April 2017

Auditeur: A Mobile-Cloud Service Platform for Acoustic Event Detection on Smartphones




Many number of apps are being developed on the acoustic event detection i.e. on basis of speech, music, heartbeat etc. It is inconvenient for the app developers to write efficient acoustic processing algorithm separately. Auditeur is a platform where developers can register for acoustic events and whenever an event occurs the app would be notified.
Auditeur provides API to enable the registration of the applications for which events could be manmade sounds, music, vehicle sounds etc. It generates a context aware and energy aware classifier for event detection and notifies the application when the event occurs.

Collection of training data:
Training data that are used in the classification problem are collected in form of soundlet which is 3- 30s audio clip. The audio clip is attached with the contextual information which combines phone generated context about audio clip and also the user tags. There are two types of tags: content which describes what sound is and container that describes the background e.g. office. An example of phone context information could be location of the phone, body position w.r.t. phone etc.

Storage of the training data:
The collection of data is logically divided into public and private spaces. Soundlet in public space can be shared between developers but private space is for each developer. To prevent malicious tags in the public user domain, sanity check are performed and there are fixed set of tags that are predefined.

Training model:
In cloud, the feature vector is generated from the audio clip which is 121 dimension vector. The content tags are used in two different ways, look for tags that describes a class of soundlet and within tags which describe sounds in the universe. The request for training a model contains the sound app is looking for, other unwanted sounds that can occur in the environment, contextual information and energy constraints. These parameters can be controlled by the developer of the app.

The process of Auditeur is as follows:
1. Auditeur provides and API that can used to record, add tags and upload soundlet to cloud.
      2. After sound is captured, tagging is done by the user / phone generated.
      3. Upload the audio clip and tags to the cloud.
      4. Auditeur generated a model taking into account the energy constraints of the mobile device and transfers the plan to the mobile phones in XML format in terms of components to attach for a particular model.
      5. Periodically if there are changes in the mobile phone resources such as battery etc. a model is regenerated by reducing the number of features used.
      6. Sound engine service present inside phone detects the events by running the model and notifies the model.




Strengths
      1. The paper is well described and thoroughly evaluated for all the design choices made.
      2. The framework is built to make it easy for the application developers.
      3. The framework presented in the paper is a combination of both in-phone and cloud but once a model is obtained there is no latency in communicating with cloud for further steps.
      4. It is a single framework where all speech recognition applications can subscribe for any event detection. If every application has their own event detection then energy consumption in that case is more.
           5. The context information is also taken into consideration while training a model which gives improves accuracy of model.
      6. The pipeline developed is adaptable i.e. it can be changed dynamically based on the energy constraints. 
      7. An experiment is conducted for the user experience both for developers and users and feedback given is incorporated into the system.
      8. While doing classification, the pipeline contains both frame level classification and also window level classification.

     Weakness 
      1. The User while labelling the context and container should label them properly especially for videos in the private space.
      2. The paper mentions that the typical users for Auditeur are the developers. For the private space videos, they are stored for every user separately. What if there is a particular sound that is required by multiple developers? Will Auditeur provide any API to actually share the private space between developers?
     3. When uploading an audio clip to the cloud it is uploading the whole audio clip instead of just the feature vector. The feature vector would be small in size compared to the whole audio clip which would save energy and bandwidth of the phone.
     4. Sanity check is performed for public space sound clips using outlier detection method. But this method doesn’t perform well when there is huge increase in the training data present.

    Discussion 
    1. Can edge devices such as Wi-Fi routers etc. can be used to process the sounds in the public domain? What could be advantages and challenges for such an approach?
    2. Can this framework be used for smart home applications?
    3. Since the processing is divided into stages, can we improve performance by offloading some of the computations like window level classification which takes up almost 90% of the time to a nearby edge device like desktop when at home?
    4. How feasible is it to extend the framework to adapt to different languages?



1 comment:

  1. D1, D3 makes sense. Any thoughts on how it could be applied to Smart Homes?

    Good job overall.

    ReplyDelete

Note: only a member of this blog may post a comment.