Big Data as we know it has been around for a little over a decade, and its definition and applications have undergone major changes during that time. However, as technology continues to evolve at a rapid pace, it’s worth asking the question: is the era of Big Data coming to an end?
First of all, what is Big Data? In its simplest definition, Big Data refers to the massive amounts of information generated by various sources (e.g. social media, IoT devices, etc.) that can be analysed to gain insights and make better informed decisions. The primary challenge of Big Data has always been how to process and analyse such a large volume and variety of information in a timely and cost-effective manner.
Now that we’ve defined Big Data, our question is: could the end of the Big Data frenzy be in sight, at least in the Data Science and Machine Learning (ML) business practice?
The majority of data scientists have come to the conclusion that it’s not about data quantity but data quality: Having a few tens of thousands of samples of good quality data is more valuable for most (if not all) ML algorithms than having millions or billions of records containing duplicate samples, incorrect information, imbalanced targets and missing values.
The big data concepts might still be valuable in areas of BI, data analytics, insights or data quality assessments. However, for pure ML development, it could be seen as a burden in today’s landscape with higher training costs, unworkable AutoML pipelines or in-memory processes, typically larger models to store, and bigger datasets to be maintained and to perform EDA over.
In Data Science, ML and MLOps, by default, investments should be much more of a solid data engineering process to get a concise, sub-sampled dataset with high-quality examples that represent the problem in hand, rather than working at scale with all of the “information” simply transformed or extracted out of the raw data.
While the end of Big Data as we know it is not certain, several factors could significantly change how we process and analyse data in the near future. For example, making the right decisions to balance data quality over quantity and exploring which scenarios of having a larger data volume is valuable for the specific ML tasks. Regardless, it’s important that you leverage the power of data to drive your organisation forward.
Discover the benefits of a Hyper-connected business with infinite possibilities. Bringing the latest news and tips to your inbox from our expert team.
Privacy Overview
This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.
Strictly Necessary Cookies
Strictly Necessary Cookie should be enabled at all times so that we can save your preferences for cookie settings.
If you disable this cookie, we will not be able to save your preferences. This means that every time you visit this website you will need to enable or disable cookies again.
Tracking cookies
This website uses the following additional cookies:
Google Analyitcs
Hubspot
Leed Feeder
Crazy Egg
Please enable Strictly Necessary Cookies first so that we can save your preferences!