The word “Big Data” had rapidly transitioned from buzz word to reality, not only the big giants, like Facebook or Yahoo, even small companies have started adopting this technology and trying to predict the future of their business, demands and needs.
With Big Data Coming to Reality – Now What?
Decision making was a “rear view mirror” activity viz. Business Intelligence, looking at the past events that had already occurred and responding accordingly. But with increasing demand and the ability to analyze vast amounts of Big Data in real-time, decision making has now become a forward-looking event with the help of data scientists. Business executives can now see what is going on with the inventory, sales orders and information from sensors in real-time. Systems and Operations personnel can use big data analytics to infer terabytes of log files and other machine data looking for the root cause of a given problem.
How to Build a Big Data Environment?
An infrastructure that is linearly scalable and yet easy to administer is pivotal for a Big Data platform.
The primary challenge on building a big data environment would be, “Where?” Most of the organizations are chalking the pros and cons between the choices, On-Premise vs. Cloud Service. One of the understandable dilemmas for the organizations is the data leaving the premise, if the choice were to be Cloud.
This is one of most sought option for various organizations, mainly considering the sensitivity of the data leaving the premise. Some of the challenges faced with this choice are:
- Initial capital investment to setup the infrastructure without fully knowing the scale
- Integrating the Big Data Infrastructure with the existing backend infrastructure
- Getting skilled big data resources to setup the infrastructure from scratch
- Cost incurred with administering the infrastructure and availability
#2. Cloud Service:
With the uncertainty around scale and value, Cloud service has been a wise choice for many organizations. Amazon’s Elastic Map Reduce (EMR) and Microsoft’s Azure HDInsight have pioneered in hosting big data infrastructure on the cloud. However, cloud service comes with a trade-off of having the data leave the premise. Many organizations are sensitive about having the customer data leave the premise due to repeated cyber-attacks and privacy protection. However, the journey towards big data is often involved with prototypes and proof of concepts. Cloud solution comes really handy in such a case to be elastic.
Apart from the “where” part of hosting big data, the “what” part of the infrastructure is equally critical; Is it just storage? The organizations moving towards big data are often confronted with high velocities of data and varieties of data – structured and unstructured data and massive volume of data. Some of the infrastructure challenges include:
- Big data shifts the plateau by raising the storage cost from 60 to 80% every year. Given this rapid growth, choice of the storage hardware becomes extremely important. For instance, Solid-state disks (SSDs) are far superior than disk at high velocity data ingestion.
- Network isolation for all big data needs with higher bandwidth. For instance, Map Reduce operation involves large amounts of data being processed and transferred amongst nodes. Network bandwidth must be out of the constraints in a Big Data environment for real-time processing.
- Response Times
- Response Time could completely vary based on use-cases, as it can range between blink of an eye to even a few minutes. Apache Spark performs 100 times faster than traditional Map Reduce jobs as it processes the data in-memory. On the flip side, one must plan for sufficient RAM on the worker nodes to meet the quality of service.
There are various infrastructure management tools that are in place to cleanse, integrate and manage Big Data infrastructures effectively. With these innovations in place, it is now time for Enterprises, whether they are large or small, to realize that embracing Big Data and adapting is inevitable!