What is Big Data? Why do we need Hadoop or Spark?
Over the last few years we have seen a huge hype around Big Data technologies like Hadoop or Spark. These distributed technologies are the children of the digitisation of everything. At the turn of the century only 25% of our data was in digital form. These days, 99% of our data is digital. New trends such as the Internet of Things is fuelling the flames of exponential data growth. Traditional technologies were not able to cope with these huge data volumes as they generally did not scale beyond a single server. If they did as is the case with data warehouse appliances and MPPs they came with a hefty price tag?
"The digitisation of all business processes has resulted in a tsunami of data. We need affordable and scalable multi-server technologies such as Hadoop or Spark to tame it.
Another area that traditional data warehouses and ETL tools don’t handle well is unstructured (text, images, audio) or semi-structured data (XML, Excel). This is unfortunate as about 90% of enterprise data comes in these formats.
"The traditional data warehouse is not a good fit for processing unstructured data in documents, video, or audio. "
At Sonra we have successfully implemented distributed Big Data technologies such as Hadoop or Spark in many enterprises. We are a highly dedicated team and extremely passionate about data. Flexter, our very own enterprise data liberation tool is built on top of Apache Spark. Combined with our decades of experience building data warehouses we have the solution to any data analytics question.
Our very own CEO, Uli Bethke, is the founder and chair of the Hadoop User Group Ireland. The HUG as it is known by its members is a forum for data analytics experts to network and keep up to date on important developments.
Need help? CONTACT US!
Important Big Data Considerations
Below are a few items you have to take into account when implementing a big data initiative
- How and where can Big Data technologies save our organisation money?
- It takes forever to get data into the data warehouse. Surely, there must be a quicker way.
- Do I really need Hadoop/Spark? Why? Can my data warehouse not do the job?
- What is the business problem? Do we have a business case?
- What is the ROI of Big Data technologies?
- I heard that some of the Big Data tools are immature. Is Hadoop and the Big Data stack enterprise ready?
- What about high availability and disaster recovery?
- What kind of skills do I need to run a Big Data project? Is this not just the same as data warehousing?
- What is the Total Cost of Ownership?
- What roles do I need to hire for?
- What are the license and support costs?
- What are those esoteric sounding technologies such as Pig, Flume, or Oozie?
- There are hundreds of Big Data tools. New add-ons emerge on a daily basis. Which of these should I pick? Do they work well together?
- What about security? How can we implement authentication, authorisation, and encryption?
- How does Hadoop integrate with my data warehouse?
- How do I prevent my data lake turning into a data swamp?
- How many nodes do I need to size my cluster for my expected workload?
- What about NoSQL? What is it and how is it related to Hadoop?
- How do I get started? What type of pilot or PoC should I run?
- What kind of data governance should I implement?