Before we get further into details, some context

Today, it is hard to know with any certainty the true state of mission- or business-critical infrastructure - we simply lack any kind of reliable situational awareness. If you need a proof point, think about zero day malware, and that the ‘dwell time’ for breaches, the median time between infiltration and detection currently stands at over 200 days (see the M-Trends 2015 report [8]). In some cases breaches are going undetected for years. Given that the real damage is often done very shortly after a breach, malicious actors seemingly have plenty of time to steal secrets and change anything they like; traditional security controls are simply not working. 

If we think about risk, risk is very largely predicated on our knowledge of system state. During a risk assessment we often make assumptions about system state that subsequently provide to be ungrounded, or at best optimistic. Part of this dilemma is that the practice of assessing state (in terms of network assets for example) is itself flawed; there is certainly no accepted way to unambiguously prove asset state (to an insurer for example), and risk scores are ultimately based on best practices and the quality of the data put into the risk model: and this is key, we are talking about a model, whilst the risk scores may appear very precise they are not a reflection of the certainty of asset state. 

One of the challenges in networking is that it is complex, and (aside from the luxury of a greenfield site) administrators and security personnel often have to manage a large array of legacy equipment, APIs, log formats, policy deployments and monitoring tools. Given the scale of modern networks and the heterogeneous nature of infrastructure, it’s not surprising that a holistic security policy and visibility of its efficacy are hard to achieve - at least with the current state of the art. And whilst emerging technologies, such as SDN, offer the promise of unified security and management policies, today we must manage infrastructure appliances using the various interfaces and APIs provided, and that typically means protocols such as SSH, SSL, HTTP/S, CLI, XML Web Services, SNMP, Netconf, ’expect’ scripts, and secure REST APIs - if we are lucky.  

In cybersecurity, simplicity is generally a good thing. Simplicity leads to transparency and consistency. By contrast, complexity often leads to opaque configuration state, inconsistency, hidden security vulnerabilities, and hidden risks. Mission- or business- critical infrastructure is often inherently complex, and increasingly dynamic, given the trends in virtualization. With that said, once an appliance is configured (either virtual or in hardware) much of that state is immutable, or at least should remain so for long periods once in production. We don’t want core routers, load balancers, and switches reconfigured without change control. We don’t want software installed, firmware changed, interfaces reconfigured, peer routes changed - unless we specifically authorize it. And if and when this does happen, we want to know straight away, not in a year’s time.  

And why do we want to know as soon as the breach happens? Simply because, infrastructure devices are often deployed in-line, with complete visibility on critical message flows, and crypto material, so the consequences of that data being intercepted, redirected, changed or blocked can be very significant. Even a careless upgrade or configuration change can result in performance variation, which could be hugely damaging to a business. Imagine a FIX application based on low latency market trading data, sensitive medical records, confidential intellectual property, or even core Internet peering protocols. By ‘owning’ an infrastructure device, the attacker gains a privileged position, and may be able to access sensitive data flows or crypto materials, or even perform additional attacks against the rest of the infrastructure."[1]. It’s clear there is strong motivation here for an organization or nation state to ‘own’ core infrastructure devices, although in practice it may not be that easy. 

There are several consequences of critical assets being compromised; with the potential to have a serious long term affect on organizations sitting behind the device - including loss of reputation through customer data leakage, loss of competitive advantage through intellectual property leakage, through to state secrets. Compromise may also be used to eavesdrop, steal secrets, cripple systems and services, or even mount distributed denial of service attacks on other systems. 

Now clearly there are ways to monitor systems for state changes, after all, we have seen decades of progress in network management tools and logging. The problem is there is no consistent method today, no one set of interfaces, no standard set of protocols, no single logging system, and no agreed way to scale such services. Also consider that hackers like to cover their tracks, and in this case it appears there were several steps used to evade detection. Given the significant dwell time between breach and detection [8], it is highly likely that this type of compromise will be well covered up before the incident team even gets to look at the problem. This lack of situational awareness means that we have no real hope at present of assessing risk over time with any degree of confidence.  

Network administration is a messy repetitive business. We could and should be using HTTPS, SSL and SSH (underpinned by PKI for example) to ensure access is secured. But, we hate changing passwords, certificates get revoked, operators get fatigue, complacent, and we forget to refresh certificates. Even if we are diligent, there is no guarantee that the PKI certificate we are pinning all our trust on is legitimate – and whilst that risk is low, do we really have the appetite to accept that uncertainty with critical infrastructure? 

So it’s not surprising with recent press reports of attacks on Cisco routing appliances being compromised by unauthorized firmware changes (see Reuter’s reports on ‘SYNful Knock’ compromise [1] [4]) and alarmingly that these compromises took well over a year to discover. According to the reports this compromise affected multiple organizations and government agencies: by September 15th Mandiant [2] had found fourteen instances of Cisco router breaches across four countries (India, Mexico, Philippines and Ukraine). By September 21st Network World reported over 200 routers had been affected in 31 countries [5], with models 1841, 2811 and 3825 affected.  

It is important to note that this attack vector is atypical in that it is a not a specific product vulnerability [9], and is probably best described as a systemic process failure. As such any network appliance from any network manufacturer is potentially at risk. The attack started using valid credentials to implant a modified IOS image, and then masking some of the changes; though how the hackers gained access to these credentials remains unclear. Cisco recently published a Snort rule to detect the threat [10], together with best practice on hardening the appliance and additional instrumentation [9]. With that said - there is no single remedy; these ‘mitigating’ actions require human intervention, diligently mapping changes across some extremely large networks and potentially many appliance types.  

Until recently routers were generally deemed resistant to takeover - although vulnerable to DDoS attacks via incoming traffic). The vulnerability reported appears not to be related to any specific security issue in the appliances themselves; here it seems security credentials were stolen, and then used by malicious actors to access the devices and change firmware images. In practice, credentials may be acquired through a number of methods, through social engineering, bribery, or simply carelessness and sloppy procedures. It’s not uncommon for example for SSH credentials to remain static for very long periods (for administration convenience); it’s quite conceivable therefore for a contract employee to reuse credentials long after leaving a company, and might be used for a number of critical systems.  

Clearly this is very bad news, and many network managers out there will no doubt be reviewing their processes. But the key point here is that detection time went well beyond the breach event, enabling hackers free reign for months without any monitoring, and with ample time to cover their tracks. Here we have a relatively new attack vector, one where detection has so far eluded monitoring systems, where the consequences are potentially alarming, and where detection still lacks a coherent solution. 

So how can Guardtime help?

Guardtime’s KSI Blockchain operates as a complementary tool to existing orchestration tools. With KSI you can instrument the network consistently, at scale, to sign any digital asset, and store the ‘memory’ of the asset state into an immutable industrial strength, blockchain - together with the time of the event, and the signing entity. KSI is specifically designed to scale for large infrastructure, capable of ingesting millions of signatures concurrently, moving requests up through a hierarchy of aggregation nodes, resulting in very little traffic or latency. 

KSI enables us to sign infrastructure assets that are ‘less ephemeral’, such as firmware revisions, OS revisions, configuration file states, interface states, static peer routes and so on. We can use the local site policy to decide when and what we want to sign, and how often we want to verify those signatures – all using quantum immune digital signatures. This means is that: 

  1. KSI can be used to make assets immutable. By signing and then frequently verifying the state of an asset, at scale, we can establish whether the network is in an expected state, or potentially compromised. 
  2. KSI can establish a complete software delivery chain, from component source (e.g. an App Store) through to final deployment, with verification possible after every stage as different trust domains are crossed. 
  3. KSI can be used in concert with orchestration tools to automate recovery mechanisms, so for example if a configuration file has been changed (either maliciously or accidentally), a KSI ‘verification failure’ event could be used to trigger orchestration tooling to reinstate the production image. 
  4. KSI can establish a complete chain of custody. By signing logs, and log entries, we can establish if something changed and when. By referring back to the blockchain post incident forensic teams have irrefutable evidence of attempted compromise, by time. Through a number of anti-tamper techniques as well as the publication of widely witnessed evidence the KSI blockchain cannot be alterable after the event.

In summary, by instrumenting your network with Guardtime KSI you can ensure that any change in asset state is alerted quickly back to operations staff. Even with large and complex heterogeneous systems such as network infrastructure, and even with difficult attack vectors involving systemic process failures, you can have situational awareness today, at scale, with consistency across all network assets. 

References

[1] “Cisco router break-ins bypass cyber defenses”. Reuters report. Sep 16th 2015. See: http://mobile.reuters.com/article/idUSKCN0RF0N420150916)
[2] ‘The New Route to Persistence: Compromised Routers In The Wild’. FireEye blog. Sep 15, 2015. See https://www.fireeye.com/blog/executive-perspective/2015/09/the_new_route_toper.html
[3] http://www.cisco.com/web/about/security/intelligence/integrity-assurance.html
[4] “SYNful Knock - A Cisco router implant - Part I”. FireEye blog. Sep 2015. See: https://www.fireeye.com/blog/threat-research/2015/09/synful_knock_-_acis.html
[5] “Malware implants on Cisco routers revealed to be more widespread”. Network World. Sep 2015. See: http://www.networkworld.com/article/2985036/network-security/malware-implants-on-cisco-routers-revealed-to-be-more-widespread.html
[6] “Attackers can take over Cisco routers; other routers at risk, too”. Network World. Sep 2015. See: http://www.networkworld.com/article/2984124/security/attackers-can-take-over-cisco-routers-other-routers-at-risk-too.html
[7] “SYNful Knock router exploit isn’t going away soon”. Network World. Sep 2015. See:
http://www.networkworld.com/article/2984327/security/synful-knock-router-exploit-isn-t-going-away-soon.html
[8] ‘M-Trends 2015: A View from the Front Lines’. Mandiant. See: https://www2.fireeye.com/rs/fireye/images/rpt-m-trends-2015.pdf
[9] ‘SYNful Knock: Detecting and Mitigating Cisco IOS Software Attacks’. Cisco blog. Sep 2015. See: https://blogs.cisco.com/security/synful-knock
[10] Cisco Talos Rules 2015-09-15. Snort rule GID 1, SID 36054 to mitigate SYNful Knock. See. https://snort.org/advisories/talos-rules-2015-09-15