CSCi 8980 Edge Cloud Research: February 2017

Thursday 23 February 2017

Toward Global Data Infrastructure

Summary:

The paper details a new IoT Edge infrastructure prototype that focuses on developing a data-centric design matching IoT requirements. The authors discussed in detail how IoT space differs fundamentally from web services, and why conventional cloud-centric approaches are not enough for IoT applications.

Their developed work, GDP (Global Data Plane), is focused on handling transport, replication, preservation and integrity of the data streams from IoT clients. It gives a new layer of data-centric abstraction for IoT applications. The core concept of the design is a 'single-writer append-only log based' infrastructure; It also speaks about using a Common Access API and a flat address name-space for accessing the logs at the GDP.

System Design:

The GDP acts above the network level, and offers Common Access APIs (CAPPIs) to applications rather than raw packet routing. The key mechanism for data storage and communication in GDP is the secure single-writer log, which gives a narrow waist model for the infrastructure.

An authenticated data structure, Log, is the time-series append only data unit associated with each sensor. These logs can be migrated to any locality; support simultaneous readers and replication. Clients - sensors that generate data, actuators that act on the data, gateway devices (smart phones to Rasp-Pi), connect to the GDP and have read/write access on these logs. A flat 256-bit identifier called GDP-name is used to address the logs and the clients.

GDP-routers (SDN based Click modular router implementations) provide location-independent routing over an overlay network - employs DHT + selective routing. Control plane services enforce policies required by the GDP. eg., A Control plane replication service could ensure durability of the logs for the GDP.

Advantages:
1. The GDP design gives more functions to the sensors and actuators, which can support historical data query with log-based data structure.
2. Subscription based logging allows realtime data update from the sensors.

3. Access control is easily implemented at the log-level and thus avoids the dependency on
vendor specific authentication mechanisms.
4. The paper also claims that GDP ensures security best practices with reduction in attack surface.
5. Cleaner design with separation of policy control (locality and replication decisions etc.)
from the application.
6. The solution provides support for heterogeneous hardware infrastructure
7. Single-writer read-only design gives a fault-tolerance model with simple concurrency issues.

Disadvantages:
1. GDB is still a PoC; Not a bulletproof design. It has been acknowledged at the paper that
GDP has not withstood wide-scale deployment testing.
2. The burden of encryption is left to the applications and clients (not all clients are crypto-
graphic friendly. It may only be possible with smart-phones and Rasp-Pis with reasonable computational power.)
3. Simple key-management procedures. The keys are also backed by logs and transfer of keys to
the remote-entity is assumed to be secure over a presecured 'tamper-proof' channel.
4. The metadata adds more networking load at the packets thereby making the data transfer heavier.
5. Suggested overlay networks can severly affect round-trip latencies and cause serious performance
penalty. (Locality-aware distributing is suggested as an option; but still the latency will be an issue)
6. There is no clear separation of services between GDP and Control plane.

Room for Discussion:
1. GDP is still in idea stage, and need more implementation specifics.
2. Policy-driven storage is a possibility with GDP. What do you think are the options? How would such a system could be implmented for IoT with GDP?
3. How do you see the security aspects of the GDP sound? ACL at log level will not be enough? For multiple applications working on the same GDP, could we go with more secured infrastructure like SGX or container isolation?
4. What is the opinion on having 256-bit address space - is it more or less for an IoT environment? (Considering the namespace is inclusive for the historic log data)
5. Not many applications individually treat the sensor data from the devices directly. How do GDP support aggregation and analytics?

IoT Privacy

The paper discusses the inherent privacy issues involved in the IoT. Specifically, they discuss the issue that arises when a plethora of sensors meant to create the ability to optimize and automate our lives also grants the opportunity for nefarious actions such as unwarranted surveillance.

The authors argue for the existence of a new system, the privacy aspects of which are each briefly described in the second section, as a solution to this issue. Chiefly, they argue that users should be given absolute control over their personal data.

The architecture relies on the existence of cloudlets, which exist near the sensors and which act as 'privacy mediators' to anonymize, denature, and otherwise obscure sensitive data so as to adhere to privacy policies specified by users and applications. These cloudlets will be dynamically deployed in the immediate area of a user, altering data collection methods to ensure that users privacy to their customized specifications.

This is more of a 'vision' paper and is therefore sparse on evaluation.

A link to the original presentation is given here: https://www.youtube.com/watch?v=v2Oobl01XsQ

Pros

The authors give many real-world motivating examples that demonstrate the clear advantages of such a privacy-based system.
The authors demonstrate the viability of even a laptop as a local cloudlet, increasing the viability of their design.
They detail very well the privacy solutions proposed by others in their related works section.

Cons

It's a vision paper, so it's somewhat sparse on implementation specifics and almost completely devoid of evaluation, although the authors do mention that such an implementation exists.
One could argue that this design is perhaps overly optimistic in its user-centered design. Private industry would certainly favor something less controllable by the user. Some sort of negotiation between user expectations and private industry needs would likely create some new form of End User License Agreement (EULA), which users would likely have to opt into.
The authors describe the need for the system to be simple and user friendly. They then go on to list a plethora of scenarios in which users are expected to think in terms of things like video frames from cameras and other granularities the likes of which a typical user has likely never encountered.
They don't mention how a user 'instantiates' a new mediator when they enter a new environment, or where the code for these new instantiations comes from.

Discussion Questions

Imagine a company wishes to outfit all of its office spaces with sensory devices. What rights would users be expected to give up simply upon entering such a building? Would one need to sign a EULA to enter the building? How would this change if the building was a public/government building?
How would you propose to solve the issue of dynamic deployment that is left open by the authors? This is mentioned above in the con section.
What mechanisms or abstractions can you think of that would make it easy for a user to specify a level of privacy without getting obsessed with the technical minutia of data collection?

Tuesday 21 February 2017

Smart LaBLEs: Proximity, Autoconfiguration, and a Constant Supply of Gatorade

The paper aims to improve both user experience as well as business practices by using IoT enabled devices in retail space where the movement and reshelving of products can result in major reconfiguration of product signage and labels. The paper presents a detailed study of BLE channel characteristics, followed by the design and implementation of Smart LaBLEs which acts as a decentralized IoT hubs.

Key Points

- The paper mentions that it not necessary to determine the physical coordinates of each object in the environment; instead it is only necessary to order the objects in terms of nearness to the scanning device.

- Analysis regarding BLE channel covered in paper that define the design space of proximity- based systems:

1. The distance between a scanning device and the nearest BLE tag must be less than twice the distance between the above tag with the adjacent ones.

2. Long-term averages of RSSI values do not produce accurate estimates of the nearest tag due to channel fluctuations.

3. Instantaneous RSSI values collected from tags with a small window (< 1s) are sufficient to pick the nearest tag assuming the antenna of all tags are oriented correctly. Performing averages over short windows can smooth out fluctuations due to tag orientation.

4. Same type of tags with new batteries can produce different signal strengths. Tag orientation also affects the nearness ordering.

5. Change in transmit power does not change the results.

- Use of IoT hubs to adopt the decentralized architecture and potentially allowing bandwidth and energy conservation in IoT systems.

- For the Smart LaBLE system, it is assumed that the relevant product information

could be transmitted in the initial advertising message sent in passive scanning mode.

- For passive scanning mode of communication, the smallest and the largest packet takes 80 us and 328 us transmit times respectively.

- Each Smart LaBLE is attached to a central computer for data collection to generate the results presented in this section/shelf. MAC address in the tag helps identify the particular item of product types.

Strengths

- Verification with most popularly used topologies: circular and linear. Other variations: distance between the tags and scanning device, advertising period and the transmit power of each tags.

- Evaluation of the Smart LaBLEs shows a false detection rate of approximately 1%

- Conservation of energy and bandwidth due to reduction in the frequency of advertising messages sent by tags.

- Automatic configuration of associated displays by Smart LaBLEs thereby removing the hassle of manual updates to the signage.

- System is able to maintain the info about the number of products left in the shelf of the tagged items.

Weakness and Discussions

- Time to hear advertising messages from all products is over 3s when using 12 products. Will it work in real life retail environment where the number of products will be a considerably large?

- Dynamic transmit power control can be leveraged to improve the efficiency of the system.

- In case of advertising message loss, dynamically changing the advertising time period might improve the system more.

- There is no mention of tracking of products in case of failure of BLE tags.

Monday 20 February 2017

FocusStack: Orchestrating Edge Clouds Using Location-Based Focus of Attention

Previously, we have seen several edge-cloud or edge computing architectures. Some architectures (outsourcing) are focusing on how to offload computation-intensive tasks to edge devices. For example, MAUI and CloudClone are two examples. While, some other architectures (IoT) are focusing on offering a platform which can interconnect different devices. Although these good designs and implementations can solve many problems, they do not pay attention to two properties – the physical position of edge devices and the mobility of edge devices. They all assume the edge devices are fixed in one known position.

The biggest contribution of this paper is that it defines a scope for edge devices. So, this architecture gives us a flexibility to design and implement many new applications based on physical position of those devices. The biggest distinction between FocusStack and some other edge-cloud architectures is that FocusStack includes a very important component which can offer the physical position of the edge device. So, this paper uses a great of space to say how to find the physical position of an edge device. In the following, I will explain the several important components in FocusStack.

FocusStack Architecture:

In the above figure, the green rectangles represent location-based situational awareness subsystem. This subsystem maintains a mapping table of devices and their physical position (We can imagine it as routing table used in TCP/IP layer). The edge devices (cars and drones) use GPS signal to find its own position and update the mapping table in Geocast Georoute through GClib. SAMonitor is a proxy for application to talk to LSA subsystem.

The yellow rectangles represent OpenStack extension for this architecture. This subsystem is used to interconnect users and the chosen edge devices. For example, we can call FocusStack API to launch an application on edge device’s container. Conductor is a component which implements many protocols to help a person to choose one or several edge devices in a scope. OpenStack nova is used to manage containers.

AT&T Labs Geocast System – share device’s own physical position

Figure 2 AT&T Labs Geocast System

This figure shows the details in LSA subsystem. There are two protocols we can use to define the scope for edge devices (In this paper, scope is defined as a circle on the earth surface, described as a center and radius). Firstly, edge devices and the VM can create an ad hoc network. Physical positions of edge devices are shared within this ad hoc network. Secondly, we can use UDP/IP network to send messages to devices far away from us. Because the positions of devices are maintained in GRDB, we can fetch the position of any device in GRDB. And also, it is easy for us to find if an edge device is in our desired scope.

GCLib Framework

Figure 3. GCLib framework software architecture

The GCHub component is used to send and receive geographic addressing messages to and from ALGS GA (AT&T Labs Geocast System Geographic addressing) network. The Pub component implements a publish/subscribe system for data blocks. “It provides the plumbing for data to flow among components. Each component registers interest in data block tags and receive a copy when a component publishes a block update with a matching tag.” Responder is used to answer query from users. Of course, it only answers what it can answer. Components are many docker containers.

Strengths:

1. FocusStack architecture gives us a way to implement location-based application.

2. Because we only a subset of all edge devices, the workload for control plane is minimized.

3. The system is modularized and each subsystem can be replaced by alternatives easily.

4. It also maintains features which are implemented in traditional edge-cloud system, such as workload offloading.

5. The response for one query are sent to nodes around the node which just sends the query. This may lower the traffic load because many nearby nodes may be interested in same response.

Weakness & discussion:

1. In this system, there are there parties – edge device provider, application developer and edge device owner. If we want to deploy an application on other people device, it may incur security problem. So is there feasible way to fix this?

2. Because application users and device owner can both control devices, we need a method to define the who control the device at a time.

3. This paper only focus on one area. If we want to monitor 10 areas and there are 100 edge devices, how should we allocate these devices optimally?

4. This system seems need lots of energy because edge devices use GPS to locate itself. But do you think an ad hoc network is a good alternative in all situations?

FocusStack: Orchestrating Edge Clouds Using Location-Based Focus of Attention

Summary

This paper asks the question - can existing cloud orchestration tools be used to manage a distributed collection of edge devices? The paper begins by discussing why the existing Cloud Management tools are not sufficient for IoT → the presence of a very large number of edge devices, the involvement of the owners of these devices in their management and the fact that these devices may be moving (cars, drones etc).

FocusStack

Goal → Add a layer on top of OpenStack to dynamically select a suitable subset of the available edge devices for a given application based on location such that these fewer devices can be better managed.
The paper introduces the concept of ‘Location Based Situational Awareness’ - a particular edge device is in the ‘focus of attention’ only if it satisfies certain requirements such as physical location, capabilities (example the sensors it has), its health and user authorization.

It is useful for a class of applications that need data from a known geographic area.
If a device does not satisfy these requirements, it is not managed. This reduces the load on the controller (here OpenStack).
This also saves edge resources and makes the system more scalable.
Most importantly - It helps when devices could be mobile since the tracking is based on location (and not IP addresses etc)

Design and Architecture

FocusStack contains two main components LSA and OSE which work as follows -

An Application server uses the FocusStack API to initiate a location-based application. It specifies the geographic coordinates of the area of interest.
Location-Based Situational Awareness (LSA - implemented using ALGS) takes as input these coordinates and locates edge devices within this area. It coordinates all further communication to and from these edge devices using the ‘FCOP’, a scalable distributed algorithm. The results of LSA are passed on to OSE.
OpenStack Extension (OSE) is the chosen Management framework and is responsible for deploying and managing applications onto the edge devices.
The Edge devices themselves need several components in this architecture which are described in a lot of detail in the paper. The applications are deployed as docker containers on an edge device and each of these has access to all OpenStack Services.

Strengths

The paper gives us abundant use-cases, making it easy to follow and interesting.
Using OpenStack, a popular cloud management tool guarantees widespread use due to the already accepted and established security, deployment and scheduling techniques. Additionally, the whole suite of OpenStack services is available to each docker application improving their compute capabilities and creating a hybrid cloud.
OpenStack cannot directly manage edge devices and LSA elegantly solves this problem by reducing the number of compute nodes dynamically.
LSA is a layer on top of unmodified Openstack and this should mean that LSA can be used in integration with other management tools (maybe even those which are not open source?)
FocusStack looks at a new type of edge devices - cars, drones etc utilizing the data they collect while addressing the fact that these edge devices move and hence require different tracking and communication -

As the paper mentions, this problem is solved by the use of Geo-Addressing (GA) which is effective even in the absence of IP addressing.
The design of FocusStack is also modified to address this issue in other ways - for example, responses are sent back to a circular area around the querier and not a single point (or IP) in case the querier moved.

For a developer looking to create a location based application, he can focus on developing the logic and does not need to worry about locating the target edge devices.
A good design decision is to separate the response UI from the actual edge device. For example using the tablet to display DashCam results which allows using the app even when you are not near your car.

Weaknesses

App Type - FocusStack targets a small class of location-based applications and can not be applied to the whole range of edge devices and applications.
Network and Communication-

Doesn’t communication between multiple devices in whole areas lead to network clogs despite the optimisations discussed? FocusStack heavily relies on devices communicating with the central controller.
Moreover, moving edge devices would depend on cellular LTE connections which is unreliable and expensive specially for apps like live video streaming.
The DashCam architecture design makes a point about about first uploading video feed to the app server and from there to the requesting device. This seems redundant in case of a single requesting device - maybe this decision should also be dynamic based on the available network, number of requests for the same data etc (provided security can be maintained)

Security and Privacy -

The paper talks about security twice - it requires authenticated devices (no details) and secondly if a device is detected to be unsafe or rooted, the connection is killed. In my opinion security needs to be discussed in more detail because a single malicious edge device may be able to hack its way into the system easily.
This leads to the privacy issue which is already a concern today. I believe the applications discussed in this paper pose some threat here if not monitored. Example - live video feeds being shared with unknown users could be misused.

These are related to the concerns mentioned in section 7 of the paper.

Latency - Latency could be a concern for some apps like the DashCam being used to monitor queues and no numbers are provided to demonstrate how much delay exists in real time streaming. These delays become important because edge devices are being detected, selected and application dockers are being deployed over Cellular LTE.

Discussion Points

The paper addresses a very specific set of applications - those which are targeted to a particular geographical area. These apps would definitely be useful to businesses to study customer and other patterns but do you think there is a valid use-case to you as an individual (like the DashCam App ) ?
I believe that since LSA is a layer on top of OpenStack, it can be used with some other cloud management frameworks too. Any ideas?
LSA builds a layer to cut down the number of compute nodes (edge devices) based on the target location and device capabilities. What other criteria can be used to select a suitable subset of devices for other apps? Is filtering devices only based on capability and/or randomness enough for data collection?
The authors claim that FocusStack is closest to paradrop - How do they compare in terms of goals, target edge devices, security etc?

Thursday 16 February 2017

Optimizing Elastic IoT Application Deployments

This paper is quite perfect.
It has a layered, modular approach to building it's framework for the dynamic generation of optimized deployment topologies for IoT cloud based on a declarative, constraint-based model.

Motivations & Strengths:
1. It suites the pay-as-you-go pricing model perfectly.
2. The cost benefit tradeoff using edge infrastructure adds on to huge savings for organizations.
3. For their design choices, modularity. The whole design is multi-layered so much so that pieces of it like Diane, TU and DU, Leonore can be utilized by organizations separately.
4. The paper is extremely detailed with specifics about implementation, this makes it easy to modify and customize as per our requirements.
5. Flexible Approach. The paper makes way for a dynamic edge-environment, resource pooling, pool requests etc-
6. Optimizations on top: Like a final icing on cake, there are run-time adaptive optimizations [Elastic Application Deployment] and TU DU Optimization Units which utilise run time information to take decisions based on multiple metrics.
7. Other Strong optimizations to user api, server extensions with white box and black box modes.
8. It handles fault tolerance well since it follows an "N+1" approach for assigning Leonores.
9. There are no security concerns which I could identify in this paper either. All information to be passed needs secure channels which is left upon the organization to ensure.
10. The even have an Optimization registry.
11. Keeping Options in optimizations.

Discussion:
1) How can we make anything better than this? Is there an improvement possible for this paper?
2) Is it worth the effort from an organization's point of view? How much extra cost is this infrastructure going to cost to an organization?

Weaknesses:
1. Despite of the modular approach, the modules seem to be tightly coupled together. They give the notion of modularity, which would be complex in practice.
2. IoT Application Execution - They only talk about the Execution time and Bandwidth consumption. However, a cost function might have been in order since the motivation is cost.
3. Edge devices often have power constraints, this paper does not take them into consideration - or does not speak about it.