How do you tune a Hive query?

Table of Contents

1 How do you tune a Hive query?
2 What are Hive queries?
3 How does Hive handle subquery?
4 What file formats can you use in Hadoop?
5 What is hive query optimization and why is it important?
6 How to improve hive query performance with skewed keys?

How do you tune a Hive query?

Below are the list of practices that we can follow to optimize Hive Queries.

Enable Compression in Hive.
Optimize Joins.
Avoid Global Sorting in Hive.
Enable Tez Execution Engine.
Optimize LIMIT operator.
Enable Parallel Execution.
Enable Mapreduce Strict Mode.
Single Reduce for Multi Group BY.

What are Hive queries?

The Hive Query Language (HiveQL) is a query language for Hive to process and analyze structured data in a Metastore. This chapter explains how to use the SELECT statement with WHERE clause. SELECT statement is used to retrieve the data from a table.

How does Hive process a query?

Interface of the Hive such as Command Line or Web user interface delivers query to the driver to execute. In this, UI calls the execute interface to the driver such as ODBC or JDBC. Driver designs a session handle for the query and transfer the query to the compiler to make execution plan.

How can optimize SQL query?

It’s vital you optimize your queries for minimum impact on database performance.

Define business requirements first.
SELECT fields instead of using SELECT *
Avoid SELECT DISTINCT.
Create joins with INNER JOIN (not WHERE)
Use WHERE instead of HAVING to define filters.
Use wildcards at the end of a phrase only.

How does Hive handle subquery?

Hive supports subqueries only in the FROM clause (through Hive 0.12). The subquery has to be given a name because every table in a FROM clause must have a name. Columns in the subquery select list must have unique names.

What file formats can you use in Hadoop?

Below are some of the most common formats of the Hadoop ecosystem:

Text/CSV. A plain text file or CSV is the most common format both outside and within the Hadoop ecosystem.
SequenceFile. The SequenceFile format stores the data in binary format.
Avro.
Parquet.
RCFile (Record Columnar File)
ORC (Optimized Row Columnar)

What are the components of a Hive query processor?

Following are the components of a Hive Query Processor:

Parse and Semantic Analysis (ql/parse)
Metadata Layer (ql/metadata)
Type Interfaces (ql/typeinfo)
Sessions (ql/session)
Map/Reduce Execution Engine (ql/exec)
Plan Components (ql/plan)
Hive Function Framework (ql/udf)
Tools (ql/tools)

What is hive performance tuning and why is it important?

And so hive performance tuning is very important. When you do Hive query optimization, it helps the query to execute at least by 50\%. If your query is not optimized, a simple select statement can take very long to execute.

What is hive query optimization and why is it important?

When you do Hive query optimization, it helps the query to execute at least by 50\%. If your query is not optimized, a simple select statement can take very long to execute.

How to improve hive query performance with skewed keys?

In a follow-up map-reduce job, process those skewed keys. The same key need not be skewed for all the tables, and so, the follow-up map-reduce job (for the skewed keys) would be much faster, since it would be a map-join. If tables are bucketed by a particular column, you can use bucketed map join to improve the hive query performance.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.