Drinking from the Data Firehose
โDrinking from the firehoseโ usually describes a hopeless situation where youโre inundated with more of something than you can handle. And in the world of big data, IoT, and real-time analytics, it is often true โ processing and streaming big-data is a LOT like drinking from a firehose.
Many technologies endeavor to make bigger and faster databases, enhance network throughput and try to optimize input buffers โ all with the goal of getting more out of the “data” firehose. In addition, our increasing dependence on data to make business decisions (which is a good thing) means that we need that firehose to get bigger too.
So, ingesting ridiculous amounts of data at cloud scale is inevitable, but how can you manage it? I will try to suggest ways to approach it.
Here are three ways to โdrinkโ from the data โfirehose.โ
1. Be choosy about what goes into the firehose
Design your data ingestion by limiting what you โput-in-the-pipe.โ Use filtering, data shard-ing or segment your source data into meaningful incremental data sets. This requires a โmindfulโ data engineering approach to make your data streams work. You will find that later on โ even when you have the resources, time, and energy to turn on the full blast โ this is still good advice.
2. Put a spigot on it to manage what comes out
This is also called the โramp metersโ approach. Ramp meters are those traffic stop lights that you see when you get on a busy freeway, limiting entry to one car at a time. In a nutshell, the idea is that if you slow down the incoming, the outgoing might catch up, hence mitigating a jam. A โjamโ in point-to-point integrations really translates to HTTP 5XX response codes, which means lost transactions. This is really bad, especially when it happens without you knowing about it. So, put your data spigots or ramp meters on, and slow down the traffic – because your data highway can only take so much.
3. Install a reservoir to buffer the ingress
This is an interesting idea. Given an infinite amount of buffer between two points, you could send an infinite amount of data through it, and with an infinite amount of time available to receive it, it will arrive eventually. Practically speaking, though we canโt have anything at infinite scale today, we can get to ridiculous numbers with cloud elasticity. So, if you have an idea of how much data you need to stream in a certain amount of time, you could possibly predetermine the size of the reservoir/buffer. And if it increases, increase the buffer in real time as well. You may be able to get your ridiculous amount of data after all.
Taking in massive volumes of data is essential – and this need is going to grow exponentially. Fortunately, a regulated method of data ingestion makes it possible. With the Perspectium Integration Mesh, we let you drink from the data firehose by installing a โmanaged elastic data reservoir with a spigot on itโ. That setup lets you focus on data engineering and architecture to fulfill your data warehousing needs. You can find out more about this solution here – and whether it might work for you. You can also download eBooks on this topic โฆ and theyโre all free!
Re-published from LinkedIn article Drinking from the Data Firehose, November 21, 2019
