Blockchain Security for Big Data

Big Data can mean different things to different organizations, however a few things are certain: the volume, velocity and variety of data are increasing at an exponential rate, and organizations must have a Big Data strategy to make better business decisions.

With the myriad of data sources, including structured, semi-structured and unstructured, and with rapidly evolving ETL pipelines, it's now critical that organizations understand where data is originating, how it's moving through their systems, who has access, and how it's being modified.

For distributed organizations data is often moving between on-premise solutions and hosted cloud infrastructure, usually with complicated steps to strip out confidential or sensitive information such as PII (Personally Identifiable Information). Given the lack of veracity and provenance in Big Data, the vulnerabilities that have been proven to exist in large scale networks including cloud, being able to prove the critical aspects of your most important data in a defensible way has become paramount.

Big Data Concerns

  1. Lack of Inherent Asset-level Data Integrity
  2. Data Tampering via Trusted Insiders
  3. Data Storage Fails or Corruption
  4. Malicious Outsiders
  5. Misuse and Data Loss of PII
  6. Lawsuits and Lawful Access
  7. Regulatory Compliance (SOX, HIPAA, FISMA, NERC, and others)
Raw data is being used to meet Regulatory Compliance guidelines, and distilled insights are being used to make critical business decisions, yet most Big Data platforms are in their early maturity and offer inadequate security measures for the data they operate on. Countless laws, regulations, policies, technology failures, threats and vulnerabilities impact the information security strategy for Big Data environments, and the lack of a Defensible Data capability means that organizations have limited ability to trust that the right data is going in and the right insights are coming out.

While organizations cannot afford to ignore Big Data capabilities there must be appropriate security measures put in place. Thus, as part of an overarching Information Security program, Industrial-scale Blockchain plays a significant part in providing assurance of integrity and authenticity of data, while guaranteeing non-repudiation.

Introducing Industrial Scale Blockchain

Blockchain technology can dramatically improve Big Data security. Never before has there been an immutable ledger than can provide 100% accountability for digital assets, proving critical aspects such as time, integrity, creation entity, and even location in a widely witnessed and confidential manor.

Guardtime’s Industrial-scale Blockchain is built on 7 key principles:
  1. Transparency & Visibility - No one should be able to cover their tracks
  2. Accountability - Every action should be attributable to it’s owner
  3. Privacy - Security should be afforded without giving up confidential information
  4. Scalability - Must be able to scale to trillions of digital assets)
  5. Portability - Security must move with the data, wherever the data goes
  6. Permanence - Security must not be ephemeral – it must exist as long as the data exists, and ideally longer
  7. Open - It must not rely on traditional closed trust anchors
It is upon these seven key principles that Guardtime created KSI, an Industrial-scale Blockchain that forms the technology base for our Defensible Data solution for Big Data.

For more information on KSI see our KSI Technology page.

Defensible Data via Blockchain

Guardtime takes the seven key principles of Industrial-Scale Blockchain and applies it to Big Data ecosystems in a scalable and continuous fashion to enable the following critical capabilities:

Data Governance and Regulatory Compliance

Regulations regarding archiving and the creation of data bunkers typically inhibit the adoption of Big Data Lakes for all types of data as they do not meet industry regulations and corporate requirements for retention. Typically in order to be compliant an enterprise will need to extract the data and move to long-term archiving storage hardware. This is both grossly inefficient and expensive. Guardtime offers a way to ensure regulatory compliance can be met, and that data is universally governable within Data Lakes.

Legal Hold and Long Term Archive

Big Data systems present a new mechanism for the storage of data for legal hold or lawful access. Due to the regulatory requirements in place, organizations are hesitant to leverage Big Data platforms, but they need not be. By leveraging Guardtime organizations can create portable compact proofs that leverages flexible metadata to ensure the relationship between data and proofs are traceable, meets regulations, and are tamper free. This allows expert witnesses to stand on a stronger body of proof, authenticity, reliability and credibility of evidence.


With the different structures of input data, the various transformations it undergoes, and the numerous analytic systems that data is stored in, it is critical that a defensible Chain-of-Custody is created so that questions around origin, order of events, users, or transformations can be immediately answered in an immutable fashion. Guardtime’s KSI can be applied to digital assets in such a way that the assembled set of signatures can form a chain of custody for use in disputes or litigation.

Forensic Readiness and eDiscovery

Traditional forensic analysis and eDiscovery becomes increasingly complicated by Big Data. The simple fact is you can’t image an entire Data Lake. By leveraging Guardtime the data which resides in the Data Lake can be tagged with immutable signatures that provide cryptographic data fingerprints and can be subsequently assembled for forensic purposes with ease.

Big Data with Blockchain

Integrating Industrial-scale Blockchain into the design of Big Data platforms enables enhanced continuity of operations, enables chain of custody for critical digital assets, and begins to introduce a forensic capability into Big Data that can enable eDiscovery. With embedded forensic capabilities the time-to-recovery in the event of a failure or breach can be dramatically reduced, ensuring operations can resume in a timely fashion.

The correctness of the data, and the integrity of the process from collection, ingest, fusion, correlation/analysis and delivery is critical to making correct decisions and preventing breaches of data integrity or authenticity. In today’s hostile environments, there are many threats and accidental or malicious incidents causing manipulation of data or processes that could lead to costly incorrect conclusions.

As the complex systems that constitute Big Data execute hand-offs from system to system, the data may move from sensor to structured data in a traditional data warehouse, be bulk imported and processed in a Hadoop cluster then back to structured data in order for a BI analyst to interpret and share the results. All these hand-offs or moves across junctions are prime areas to ensure integrity and authenticity is maintained, allowing a defensible lineage to be assembled.

Guardtime’s Defensible Data platform allows for integration into Big Data flows and pipelines for signing and verification of data across digital assets. Resultant signatures and verifications can be integrated into Security Operations Centers for appropriate incident response, and subsequently audited and investigated for potentially malicious actions that impact the business.

By leveraging Blockchain technology, a new era of data provenance and veracity can be introduced, helping to fulfill critical aspects of an information security strategy including integrity of data, authenticity of data, and non-repudiation of data, in a lightweight and scalable fashion that can be integrated in nearly any Big Data solution or customized workflow.

How to Get KSI for Hadoop Big Data Lakes

Guardtime's products and solutions can be purchased for your environ-ment following our Design, Build, Operate, and Transfer (DBOT) model.

We're always happy to discuss your concrete requirements, please register your interest.