The birth of real-time data analysis through time-series databases

The world has more data than it knows what to do with. The statistics are incredible: IBM says the world creates two and a half quintillion bytes or 2,500,000,000,000,000,000 bytes of data EVERY DAY.

Plus, “about 90 per cent of all the data in the world has been generated in the past two years”.

For manufacturing, it’s a similar story. IIoT sensor technologies are collecting and sending unparalleled amounts of data. But the challenge is turning this data into action. To address it we need to be looking at what the ‘Googles’ of the internet have done with their data.

In particular, how have they revolutionised databases to better manage and analyse data?

The answer: real-time data analysis

While process historians were doing their thing in industrial plants and shop floors, time-series databases were being used by the financial sector for stock volatility or monitoring a securities price over time.

Then Amazon and Google created NoSQL databases to address the sheer scale of internet users and pieces of data that required processing. These databases were designed to cope with millions upon millions of unstructured data points and connect with other modern web-based applications.

Then, instead of closing the technology off, they made their IP publically available – allowing the open-source community to develop much of the NoSQL products around today.

About 10 years ago Yahoo implemented the open-source Apache Hadoop NoSQL database to improve their search indexing. Other internet companies followed suit – Facebook, Twitter and e-bay rolled it out in 2009.

The advancements made by tech giants over the past 10 years have revolutionised database computing, and it hasn't stopped moving. NoSQL databases have kept improving to meet demands for scaled computing. For example, Amazon added DynamoDB and Azureto to its database service portfolio, while Microsoft added DocumentDB to its suite. These have bolstered database performance, allowing customers instant scalability with minimal hassle. Another industry player is Google’s Cloud Bigtable, targeted at IoT vendors as a time-series database that performs data analysis and anomaly detection among other functions.

In short, these databases can scale easily, integrate with multiple data sources and software, and process more data, quicker than ever. And, thanks to many being open source - they’re also cost-effective and easy to use. A few open-source options featuring underlying Google technology include Influx, Grefana, and Elasticsearch.

We'll explore the capabilities of these databases in a future blog.

“Data is the new oil”– Clive Humby

For Google, Amazon and major social networking sites, the ability to store, process and control their data analysis is central to business success – or failure. The same can be said of heavy industries and manufacturing.

By harnessing the technologies that the leading data companies of the world use, manufacturers and industrial operators will ensure they manage their data competitively well into the future. Time-series is one place to start. In case you missed it, we explored why process historians and time-series are essential for data analysis in a previous blog.

Clive Humby was the data scientist who, along with his wife Edwina Dunn changed the way retailers gather and use consumer data. They created the world's first supermarket loyalty card for Tesco. Despite saying it in 2006, Humby's revelation: "data is the new oil" is more potent than ever.

Michael Palmer, of the Association of National Advertisers, expands on Humby's quote:

"Data is just like crude. It's valuable, but if unrefined it cannot really be used. It has to be changed into gas, plastic, chemicals, etc. to create a valuable entity that drives profitable activity; so must data be broken down, analysed for it to have value."

Catch up on database technologies for industrial in our other blog posts and case studies:

1. Data analysis: why process historians and time-series are essential

2. Data historians vs time-series: which is better for data analysis?

3. How TasWater reduced data costs with Nukon's time-series configuration sync tool

A consolidated view of operations is key to harnessing and 'refining data'. Our free downloadable cheat sheet on the opportunities and risks of integrating IT and OT manufacturing systems explores how to go about integrating disparate data systems.

Headline image by Edho Pratama on Unsplash.