The vision of the Internet of Things typically involves billions of digital devices, from smartphones to sensors in homes, cars and machines of all kinds, communicating with each other to automate tasks and make life better. 

However automation requires decision-making and decision-making requires accurate data. “Big Data” may help make more informed decisions but the “garbage-in-garbage-out” axiom still applies. If you don’t have reliable data (or worse if the data is being deliberately manipulated, either directly by human operators or indirectly by malware) then decisions may not only be wrong but harmful to human life.

In a previous blog post (It’s the liability, stupid) we highlighted why having an independent audit trail for machines is a good thing. In this post we will explain the how.

Machine Logging 

All machines keep logs of their activity. The background process on a machine that manages the logging activity is called a syslog daemon. The de facto standard  for machines today is rsyslog (“reliable syslog” or more entertainingly “Rainer’s syslog”).

When Logging Meets Correlation
Rainer Gerhards (author rsyslog), and Risto Vaarandi, (author Simple Event Correlator) 

A syslog file can be thought of as an append-only database, with typically one line per event:

Events can range from access events (user logged on), transaction events (party A sent X USD to party B), sensor events (air pollution is X at time T) and actuation events (send signal to open front door). All of this data is subject to potential manipulation.

Understanding KSI is very simple as it uses only hash functions, hash trees and hash chains.

A hash tree is a binary tree of hash values. Two input hash values are concatenated and a hash function applied to the result generating a third hash value. This process is repeated until there is a single ‘root’ hash value. Each input (or leaf) to the tree has an associated hash chain consisting of the hash values and concatenation order (left or right) needed to regenerate the root. In the example above, if the owner of x3 has {x4, x12, x58} then the root hash value can be re-created and thus proven that x3 participated in the original computation that led to it.

In KSI the hash tree is built on a global scale with parts of the tree operating in different countries. A new tree (with new leaves) is built every second and each leaf is returned the hash chain to allow it to recreate the public hash value. If a leaf node can recreate the root then the time, integrity and authenticity of the original data can be proven:

KSI was originally designed to sign files not syslog entries (and signing each entry would be prohibitively expensive in both time and storage) so over the last six months we have been working with Rainer Gerhards on integrating KSI into the rsyslog daemon:

One challenge is that as syslog entries may not contain much information they may be subject to a brute force attack guessing the contents. In the above diagram we show how each syslog entry (rec) is hashed and a random blinding mask (IV) is used to increase entropy. With appropriate data structures one random value can be used to produce blinding masks that are provably as secure as independently generated masks. We can then generate a hash tree within the syslog daemon and only need to send a single hash value from the daemon to the next layer of the infrastructure per unit time interval.

With this implementation the following can be achieved

  • The integrity of the syslog file as a whole can be proven (i.e. that no entries have been deleted and that the entries are in the correct order)
  • The time, integrity and authenticity of each individual entry can be proven. It is possible to generate a compact proof per event keeping all other events in confidence.

As each machine sends only one hash value per unit time interval to the next layer of the infrastructure it becomes easy to scale to billions of machines, providing independent mathematical proof for every event on every machine. The key message: Independent means that the proof does not rely on the operators of the machines or any human being. All you need is the data, the hash-chain and the root hash-value available in a publicly available ledger known as the "blockchain"

Unfortunately KSI can’t quite meet Oscar Wilde’s request that machines be kept in slavery. It can however ensure that they (and their human operators) don’t lie about their data.