“Drinking from the firehose” usually describes a hopeless situation where you’re inundated with more of something than you can handle. And in the world of big data, IoT, and real-time analytics, it is often true – processing and streaming big-data is a LOT like drinking from a firehose.

Many technologies endeavor to make bigger and faster databases, enhance network throughput and try to optimize input buffers – all with the goal of getting more out of the “data” firehose. In addition, our increasing dependence on data to make business decisions (which is a good thing) means that we need that firehose to get bigger too.

So, ingesting ridiculous amounts of data at cloud scale is inevitable, but how can you manage it? I will try to suggest ways to approach it.

Here are three ways to “drink” from the data “firehose.”

1. Be choosy about what goes into the firehose

Design your data ingestion by limiting what you “put-in-the-pipe.” Use filtering, data shard-ing or segment your source data into meaningful incremental data sets.  This requires a “mindful” data engineering approach to make your data streams work. You will find that later on – even when you have the resources, time, and energy to turn on the full blast – this is still good advice.

2. Put a spigot on it to manage what comes out

This is also called the “ramp meters” approach. Ramp meters are those traffic stop lights that you see when you get on a busy freeway, limiting entry to one car at a time.  In a nutshell, the idea is that if you slow down the incoming, the outgoing might catch up, hence mitigating a jam. A “jam” in point-to-point integrations really translates to HTTP 5XX response codes, which means lost transactions.  This is really bad, especially when it happens without you knowing about it. So, put your data spigots or ramp meters on, and slow down the traffic – because your data highway can only take so much.

3. Install a reservoir to buffer the ingress

This is an interesting idea.  Given an infinite amount of buffer between two points, you could send an infinite amount of data through it, and with an infinite amount of time available to receive it, it will arrive eventually.  Practically speaking, though we can’t have anything at infinite scale today, we can get to ridiculous numbers with cloud elasticity.  So, if you have an idea of how much data you need to stream in a certain amount of time, you could possibly predetermine the size of the reservoir/buffer. And if it increases, increase the buffer in real time as well. You may be able to get your ridiculous amount of data after all.

Taking in massive volumes of data is essential – and this need is going to grow exponentially. Fortunately, a regulated method of data ingestion makes it possible. With the Perspectium Integration Mesh, we let you drink from the data firehose by installing a “managed elastic data reservoir with a spigot on it”. That setup lets you focus on data engineering and architecture to fulfill your data warehousing needs. You can find out more about this solution here – and whether it might work for you. You can also download eBooks on this topic … and they’re all free!

Re-published from LinkedIn article Drinking from the Data Firehose, November 21, 2019

Related Posts