- Project plans
- Project activities
- Legislation and standards
- Industry context
Last edited 31 Oct 2017
Top big data tools used to store and analyze data
Big data is a phrase used for a collection of data sets so big and complex that it is difficult to process using traditional applications/tools. Due to the variety of information that it encompasses, big data consistently brings several challenges relating to its volume and complexity.
A recent survey claims that 80% of the data generated in the world are unstructured. One question is how these unstructured information can be structured, before we try to understand and capture the most important data. Another challenge is how we could store it. Listed below are the top tools utilised to store and analyse big data.
 Apache Hadoop
Apache Hadoop is a java-based free software framework that can effectively store great deal of information in a cluster. This frame runs in parallel on a cluster and has an ability to enable us to process data across all nodes. Hadoop Distributed File System (HDFS) is the storage system of Hadoop which splits big information and distribute across several nodes in a cluster. This also replicates data in a bunch thus providing high availability.
 Microsoft HDInsight
While the traditional SQL can be effectively utilised to handle large quantity of structured data, we want NoSQL (Not Just SQL) to deal with unstructured data. NoSQL databases store unstructured information with no particular schema.
This works on top of SQL Server 2012 Parallel Data Warehouse (PDW) and is used to get data stored in PDW. PDW is a data-warehousing appliance built for processing any quantity of relational data and provides an integration with Hadoop allowing the additional provision of non-relational information.
Lots of people are comfortable doing data analytics, therefore, the users may even connect data stored in Hadoop using Excel 2013. The Power View feature of Excel 2013 can be used to easily summarise the information. Similarly, Microsoft's HDInsight enables us to connect to big data stored in Azure Cloud using a power query option.
Facebook has developed and recently open-sourced its Query engine (SQL-on-Hadoop) called Presto which is built to manage petabytes of information. Unlike Hive, Presto doesn't depend on MapReduce technique and can quickly retrieve information.
 Find out more
 Related articles on Designing Buildings Wiki
Featured articles and news
The new NEC4 contract creates a true procurement alliance arrangement for all stakeholders.
Andrew Strauss talks about performance and team building at the 2018 BSRIA Briefing.
Applications have to be in by the end of the week.
Reflections on the 5th Annual Global Congress of Knowledge Economy, held in Qingdao, China.
An artist finds ruined and decaying buildings a source of inspiration for his work. Book review.
When is there a right to light, and what happens if it is obstructed?
What would the nationalisation of economic infrastructure mean for GB?
A new guide to improving value by reducing design error.
We've reached 80,000 page views a day and 10,000 registered users. Why not join them?
A masterplan is a framework within which a location is encouraged to develop or change. Read our introductory article.
New consultation announced on a specialist Housing Court to settle landlord-tenant disputes.