Elena at 13:40, 23 October 2017

2017-10-23T13:40:58Z

← Older revision		Revision as of 13:40, 23 October 2017
Line 1:		Line 1:
−	~~Top big data tools used to store and analyze data~~

−	BIG DATA is a phrase used for a collection of data sets so big and complex that it is difficult to process using traditional applications/tools. Due to the variety of information that it encompasses, [https://en.wikipedia.org/wiki/Big_data big data] consistently brings several challenges relating to its volume and complexity. A recent survey claims that 80 percent of the data generated in the world are unstructured. One question is how these unstructured information can be structured, before we try to understand and capture the most important data. Another challenge is how we could store it. Listed below are the top tools utilized to store and analyse Big Data.
−
−	~~1. Apache Hadoop~~
−
−	Apache Hadoop is a java based free software framework that can effectively store great deal of information in a cluster. This frame runs in parallel on a cluster and has an ability to enable us to process data across all nodes. Hadoop Distributed File System (HDFS) is the storage system of Hadoop which splits big information and distribute across several nodes in a cluster. This also replicates data in a bunch thus providing high availability.
−
−	~~2. Microsoft HDInsight~~
−
−	~~HDInsight utilizes Windows Azure Blob storage as the default file system. This also provides high availability with reduced price.~~
−
−	~~3. NoSQL~~
−
−	While the traditional [https://en.wikipedia.org/wiki/SQL SQL] can be effectively utilised to handle large quantity of structured data, we want NoSQL (Not Just SQL) to deal with unstructured data. NoSQL databases store unstructured information with no particular schema. NoSQL gives better performance in storing massive number of data. There are lots of open-source NoSQL DBs available to analyse big Data.
−
−	~~4. Hive~~
−
−	~~This supports SQL-like query option HiveSQL (HSQL) to get big data. This may be primarily used for Data mining function.~~
−
−	~~5. Sqoop~~
−
−	~~This is a tool which connects Hadoop with various relational databases to transfer information. This can be effectively utilised to transport structured data to Hadoop or Hive.~~
−
−
−
−	~~6. PolyBase~~
−
−	This works on top of SQL Server 2012 Parallel Data Warehouse (PDW) and is used to get data stored in PDW. PDW is a datawarhousing appliance built for processing any quantity of relational data and provides an integration with Hadoop allowing us to get non-relational information also.
−
−	~~7. Big Data in Excel~~
−
−	As lots of men and women are comfortable in doing [http://stealthtechnovations.com/we-make-your-world-digital-all-you-need-is-a-click/ Data Analytics- best digital analysis,] Therefore, the users may even connect data stored in Hadoop using EXCEL 2013. You can use Power View feature of EXCEL 2013 to easily summarise the information. Similarly, Microsoft's HDInsight enables us to connect to Big data stored in Azure Cloud using a power query option.
−
−	~~8. Presto~~
−
−	Facebook has developed and recently open-sourced its Query engine (SQL-on-Hadoop) called Presto which is built to manage petabytes of information. Unlike Hive, Presto doesn't depend on MapReduce technique and can quickly retrieve information.

103.70.200.77 at 13:37, 23 October 2017

2017-10-23T13:37:46Z

New page

Top big data tools used to store and analyze data

BIG DATA is a phrase used for a collection of data sets so big and complex that it is difficult to process using traditional applications/tools. Due to the variety of information that it encompasses, [https://en.wikipedia.org/wiki/Big_data big data] consistently brings several challenges relating to its volume and complexity. A recent survey claims that 80 percent of the data generated in the world are unstructured. One question is how these unstructured information can be structured, before we try to understand and capture the most important data. Another challenge is how we could store it. Listed below are the top tools utilized to store and analyse Big Data.

1. Apache Hadoop

Apache Hadoop is a java based free software framework that can effectively store great deal of information in a cluster. This frame runs in parallel on a cluster and has an ability to enable us to process data across all nodes. Hadoop Distributed File System (HDFS) is the storage system of Hadoop which splits big information and distribute across several nodes in a cluster. This also replicates data in a bunch thus providing high availability.

2. Microsoft HDInsight

HDInsight utilizes Windows Azure Blob storage as the default file system. This also provides high availability with reduced price.

3. NoSQL

While the traditional [https://en.wikipedia.org/wiki/SQL SQL] can be effectively utilised to handle large quantity of structured data, we want NoSQL (Not Just SQL) to deal with unstructured data. NoSQL databases store unstructured information with no particular schema. NoSQL gives better performance in storing massive number of data. There are lots of open-source NoSQL DBs available to analyse big Data.

4. Hive

This supports SQL-like query option HiveSQL (HSQL) to get big data. This may be primarily used for Data mining function.

5. Sqoop

This is a tool which connects Hadoop with various relational databases to transfer information. This can be effectively utilised to transport structured data to Hadoop or Hive.

6. PolyBase

This works on top of SQL Server 2012 Parallel Data Warehouse (PDW) and is used to get data stored in PDW. PDW is a datawarhousing appliance built for processing any quantity of relational data and provides an integration with Hadoop allowing us to get non-relational information also.

7. Big Data in Excel

As lots of men and women are comfortable in doing [http://stealthtechnovations.com/we-make-your-world-digital-all-you-need-is-a-click/ Data Analytics- best digital analysis,] Therefore, the users may even connect data stored in Hadoop using EXCEL 2013. You can use Power View feature of EXCEL 2013 to easily summarise the information. Similarly, Microsoft's HDInsight enables us to connect to Big data stored in Azure Cloud using a power query option.

8. Presto

Facebook has developed and recently open-sourced its Query engine (SQL-on-Hadoop) called Presto which is built to manage petabytes of information. Unlike Hive, Presto doesn't depend on MapReduce technique and can quickly retrieve information.

User:Elena - Revision history

Elena at 13:40, 23 October 2017

103.70.200.77 at 13:37, 23 October 2017