HIP 35: Safe RF Meta-Data Side Channel

Author(s): @wavesoft, @dmamalis
Start Date: 2021-07-28
Category: Technical
Original HIP PR: helium#249
Tracking Issue: helium#250
Status: In Discussion

Safe extraction of RF Meta-Data

Summary

Large-scale IoT network operators are usually collecting analytics information from their gateways in order to check the status of their network and diagnose faults the moment they happen.

This is achieved by collecting RF meta-data from every packet received or transmitted and feeding it into an analysis stack for further processing.

Since such task typically involves tapping into the packet stream, a Helium hotspot owoner might be reluctant to allow a third-party tool to tamper with the stream, for security considerations.

Therefore, we are proposing a mechanism that would enable the extraction of packet meta-data without disclosing critical information that could be used for malicious purposes.

Motivation

Helium wants to deliver a fair, robust and secure solution to all of it's stakeholders. In order to achieve this, it requires full control over the components involved in order to minimize the chances of someone abusing the network.

This means that on an official helium gateway it's normally impossible to collect RF meta-data since it requires tampering with the most critical component: the packet forwarder. Therefore, we are proposing this HIP as the means for enabling RF meta-data collection without having to tamper with the internal components.

Stakeholders

Professional network operators with big Helium network deployments
Hotspot owners that want to diagnose reception issues
Local helium communities that want to improve their coverage and overall network quality
Hotspot manufacturers that want to provide diagnostic tools to their customers

Detailed Explanation

What are RF Meta-Data

Every time a packet is received, the packet forwarder includes useful information regarding it's quality. You can read the complete list of fields returned for every packet sent and received on Semtech's packet_forwarder website but a short summary is the following:

Every time a packet is received (or scheduled for transmission) the following information are transferred:

RF Quality (RSSI/SNR)
RF Channel and Frequency
RF Modulation and Encoding information
GPS Timestamp (When the gateway is equipped with a GPS receiver)
The packet data payload

In addition, the gateway is periodically pushing summary statistics that includes:

Number of packets sent/received
Number of packets rejected transmission or received with bad CRC
Percentage of upstream datagrams that were acknowledged

All this information are obviously needed by the LoRaWAN core to function correctly, but they can also be proven helpful when you are trying to diagnose an issue in your network. For example: if you measure the average RSSI of the packets received over time and you see a degradation, then you might be having an issue with our antenna.

Payload Considerations

As you can already see, most of the information above are just meta-data and announcing them to a third-party component will have negligible security side-effects. However, the contents of the data payload might contain sensitive information that must not be shared.

For example, consider a case where data holds a PoC payload: if this message gets shared with a third-party, it could be maliciously used to simulate more witnesses than in reality.

Therefore, we should consider the data payload Unsafe and replace it with another representation that is safe, but still holds the valuable meta-data information needed. For example:

Payload length in bytes
Payload checksum (eg. ADLER32)
The LoRaWAN MAC header bytes

And the justification is the following:

The Payload Length is enough for identifying cases where wrong spreading factors are used.
The Payload Checksum is used to de-duplciate the same packet when received by multiple gateways in a short period of time. Note that it does not need to be cryptographically secure (eg. SHA sums) and simpler checksums, with smaller impact on the processing time could be used. The ADLER32 is suggested as a good trade-off between speed, memory usage and randomness of the result.
The LoRaWAN MAC header holds useful information to diagnose LoRaWAN issues and must be included intact into the meta-data. This header is found in the first 8 bytes of the payload and it does not hold any application-level information (eg. the contents of the PoC message).

Analytics Side-Channel Proposal

We are proposing the introduction of an Analytics Side-Channel that can be used by the stakeholders to consume analytics of the incoming messages in a secure and reliable manner.

 +------------------+  Semtech UDP  +----------------+
 | Packet Forwarder | ------------> | Hotspot Client | ----> Helium Network
 +------------------+               +----------------+
                                            | Analytics Side-Channel
                                            v
                                   . . . . . . . . . . .
                                   . Analytics Client  .
                                   . . . . . . . . . . .

The proposed solution should:

Induce minimum overhead to the client
Be easy to integrate into the client codebase
Allow existing solutions to be easily adapted to the new interface

Considering that:

The Analytics Side-Channel should be implemented using a connection-less protocol such as UDP, since we don't care about reliability and back-pressure from the consumer must not affect the producer.
The amount of processing power must be reduced to the minimum, therefore an ADLER32 checksum is recommended for the payload instead of the computational-intensive cryptographic hashes.
The serialization overhead of the message should be kept to minimum, in which case Google Protobuf can be used. However, since interoperatiblity with existing solutions might be a concern, JSON is a valid trade-off.
Since this is an opt-in feature, it should be enabled via an external flag (eg. using an environment variable).
To further reduce the processing demand, the analytics data are NOT processed inside the hotspot client. Instead they are relayed to a third-party Analytics Client. This could be a simple proxy, or a more elaborate statistics aggregator. The implementation details are not important as part of this HIP.
Since the analytics meta-data are stripped-off of any risky information, an analytics datagram could be sent outside of the gateway even without encryption. This allows us to consider that the Analytics Client is either a local OR a remote process.

Side-Channel Protocol Description

The analytics side-channel emits 3 different kinds of messages. Each message is sent as a UDP datagram, encoded with the aggreed serialization format (JSON or Protobuf), and always have exactly one receipient.

The fields present in these messages are very similar to the fields defined in the Semtech UDP Forwarder PROTOCOL.TXT, but they are adapted for faster consumption and for the security concerns explained above.

Note that the compact naming of the fields can be used when encoding the analytics data with JSON, in order to keep the overall message size to minimum, and therefore fit in a single datagram.

1. Uplink Messages

An uplink message is sent every time a packet is received from the packet forwarder. It contains the following fields:

#	Compact Name	Verbose Name	Type	Description
1	`tmms`	`timeGps`	`int64`	The UNIX timestamp (in milliseconds) when the message arrived in the concentrator.
2	`gpsu`	`timeGpsUs`	`uint16`	The microseconds fraction of the unix timestamp above as a number between 0 - 999.
3	`tmst`	`timeFinished`	`int64`	The UNIX timestamp (in milliseconds) of the local system when the message was received.
4	`freq`	`frequency`	`float`	RX central frequency in MHz (Hz precision).
5	`chan`	`ifChannel`	`uint8`	Concentrator "IF" channel used for RX.
6	`rfch`	`rfChain`	`uint8`	Concentrator "RF chain" used for RX.
7	`stat`	`crcStatus`	`enum`	CRC status: "OK", "Fail" or "NoCRC".
8	`modu`	`modulation`	`enum`	Modulation identifier ("LORA" or "FSK").
9	`datr`	`fskDataRate`	`uint32`	FSK datarate in bits per second. Used only when modulation is "FSK".
10	`drls`	`loraSf`	`enum`	Spreading factor component of LoRa DataRate (eg. "SF12"). Used only when modulation is "LORA".
11	`drlb`	`loraBandwidth`	`enum`	Bandwidth component of LoRa DataRate (eg. "BW500"). Used only when modulation is "LORA".
12	`codr`	`loraCodingRate`	`enum`	LoRa ECC coding rate identifier.
13	`rssi`	`rssi`	`float`	RSSI in dBm.
14	`rssi`	`rssi`	`float`	Lora SNR ratio in dB (signed float, 0.1 dB precision).
15	`size`	`size`	`int8`	RF packet payload size in bytes.
16	`data`	`data`	`int8`	The 8 first bytes of the data payload (Holding the LoRaWAN MAC header).
17	`csum`	`dataChecksum`	`uint32`	The ADLER32 checksum of the entire RF packet payload.

2. Downlink Messages

An downlink message is sent every time the system has just pushed a downlink message to the packet forwarder. It contains the following fields:

#	Compact Name	Verbose Name	Type	Description
1	`tmms`	`timeGps`	`int64`	The UNIX timestamp (in milliseconds) when the message should be sent (when set to '0' means "immediately").
2	`tmst`	`timeWall`	`int64`	The UNIX timestamp (in milliseconds) of the local system when the message should be sent (when set to '0' means "immediately").
3	`freq`	`frequency`	`float`	Tx central frequency in MHz (Hz precision).
4	`rfch`	`rfChain`	`uint8`	Concentrator "RF chain" used for TX.
5	`powe`	`txPower`	`float`	TX output power in dBm.
6	`ncrc`	`noCRC`	`bool`	If true, disable the CRC of the physical layer.
7	`modu`	`modulation`	`enum`	Modulation identifier ("LORA" or "FSK").
8	`datr`	`fskDataRate`	`uint32`	FSK datarate in bits per second. Used only when modulation is "FSK".
9	`fdev`	`fskFreqDev`	`uint16`	FSK frequency deviation in Hz.
10	`drls`	`loraSf`	`enum`	Spreading factor component of LoRa DataRate (eg. "SF12"). Used only when modulation is "LORA".
11	`drlb`	`loraBandwidth`	`enum`	Bandwidth component of LoRa DataRate (eg. "BW500"). Used only when modulation is "LORA".
12	`codr`	`loraCodingRate`	`enum`	LoRa ECC coding rate identifier.
13	`ipol`	`inversePolarity`	`bool`	Lora modulation polarization inversion.
14	`prea`	`preamble`	`number`	RF preamble size.
15	`size`	`size`	`int8`	RF packet payload size in bytes.
16	`data`	`data`	`int8`	The 8 first bytes of the data payload (Holding the LoRaWAN MAC header).
17	`csum`	`dataChecksum`	`uint32`	The ADLER32 checksum of the entire RF packet payload.

3. Statistics Messages

A statistics message is sent every time the respective stat message is received from the packet forwarder. This message is blindly forwarded without further processing and it has the following fields.

(Note that the statistic counters reset to zero every time a stat message is sent)

#	Compact Name	Verbose Name	Type	Description
1	`addr`	`hotspotAddress`	`byte[]`	The address of the hotspot.
2	`time`	`timeWall`	`int64`	The UNIX timestamp (in milliseconds) of the local system.
3	`lati`	`gpsLatitude`	`float`	GPS latitude of the gateway in degree (float, N is +).
4	`long`	`gpsLongitude`	`float`	GPS latitude of the gateway in degree (float, E is +)
5	`alti`	`gpsAltitude`	`float`	GPS altitude of the gateway in meters.
6	`rxnb`	`packetsRx`	`uint32`	Number of radio packets received.
7	`rxok`	`packetsRxOk`	`uint32`	Number of radio packets received with a valid PHY CRC.
8	`rxfw`	`packetsRxFw`	`uint32`	Number of radio packets forwarded.
9	`ackr`	`ackRatio`	`float`	Percentage of upstream datagrams that were acknowledged.
10	`dwnb`	`packetsTxReq`	`float`	Number of downlink datagrams received.
11	`txnb`	`packetsTx`	`float`	Number of packets emitted.

Use Case Examples

1. Example 1

Alice has a DIY hostspot built using a packet forwarder and a light client. She is having reception issues with her devices and she wants to debug.

She has built her own log processing stack that runs on the cloud and she wants to feed the packet analytics down to it.

She then adjust the environment variables for gateway-rs and sets HELIUM_ANALYTICS_CLIENT="my.cloud.service:12345.
Once the client restarts, she starts seeing data.

2. Example 2

Bob, a professional LoRaWAN network operator, has deployed 1,000 Helium hotspots in an area and he wants to make sure his services are reliable. He is using production miners bought from one of the official suppliers.

He is already using a centralized analytics aggregation system on his deployment that already consumes data in Semtech UDP format.

We are assuming that once this HIP has landed, the gateway manufacturer will enable a new option on their UI called Helium Analyltics Client.

Bob simply goes to the UI configures the helium analytics client to point to the cloud infrastructure that he is already using.
He will already receive 80% of the interesting data from day 0 and he will only have to do minor adjustments to the protocol in order to accommodate the new fields.

Drawbacks

We are kind of duplicating the stream of incoming data, but at the same time we cannot really forward them without processing, because we are risking exposing critical information.

Rationale and Alternatives

The most obvious and straightforward way of solving this issue is by introducing a man-in-the-middle UDP forwarder between the packet forwarder and the Hotspot Client (Light or Full) as seen below:

 +------------------+  Semtech UDP   +-----------------+  Semtech UDP   +----------------+
 | Packet Forwarder | -------------> | Analytics Proxy | -------------> | Hotspot Client | --> Helium Network
 +------------------+                +-----------------+                +----------------+
                                              |
                                              v
                                     RF Meta-Data Stream

This solution is trivial to integrate and requires no further modification to the Helium core components. However it requires that the Analytics Proxy is a trusted component and it does not disclose sensitive information to third parties.

Unresolved Questions

We need to decide weather we go with JSON (and therefore creating a backwards-compatible interface, similar to the semtech UDP packet itself), or we go with Protobuf, and therefore breaking any existing solution.
Some discussion might be needed to further clean-up the fields in the analytics protocol. More specifically, if there is any smart alternative to encode the different fields for LORA or FSK encoding.

8. Deployment Impact

We are not expecting any considerable impact on the deployment once this solution is applied. Both the processing power and the overall size footprint should be left relataively intact.

Plus, this is an opt-in feature so it wan't affect the user experience by default.

Success Metrics

Any stakeholder reporting a successful usage of this system to diagnose a problem they are having.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

0035-safe-rf-metadata-side-channel.md

0035-safe-rf-metadata-side-channel.md

HIP 35: Safe RF Meta-Data Side Channel

Safe extraction of RF Meta-Data

Summary

Motivation

Stakeholders

Detailed Explanation

What are RF Meta-Data

Payload Considerations

Analytics Side-Channel Proposal

Side-Channel Protocol Description

1. Uplink Messages

2. Downlink Messages

3. Statistics Messages

Use Case Examples

1. Example 1

2. Example 2

Drawbacks

Rationale and Alternatives

Unresolved Questions

8. Deployment Impact

Success Metrics

Files

0035-safe-rf-metadata-side-channel.md

Latest commit

History

0035-safe-rf-metadata-side-channel.md

File metadata and controls

HIP 35: Safe RF Meta-Data Side Channel

Safe extraction of RF Meta-Data

Summary

Motivation

Stakeholders

Detailed Explanation

What are RF Meta-Data

Payload Considerations

Analytics Side-Channel Proposal

Side-Channel Protocol Description

1. Uplink Messages

2. Downlink Messages

3. Statistics Messages

Use Case Examples

1. Example 1

2. Example 2

Drawbacks

Rationale and Alternatives

Unresolved Questions

8. Deployment Impact

Success Metrics