Introduction

Hadoop is an open-source framework that was designed to facilitate the storing and processing of large volumes of data and Hive is a data warehouse tool that works in the Hadoop ecosystem to process and summarise the data, making it easier to a and use. Now that you know what is Hive in the Hadoop ecosystem, read on to find out the most common Hive interview questions.

Hive Interview Questions

The following are some of the most commonly asked Hive interview questions. Interview questions on Hive may be direct or application-based.

  • What applications are supported by Hive?

Hive supports client applications based on Java, PHP, Python, C, and Ruby coding languages.

  • What are the different tables available in Hive?

There are two types of tables available; managed and external. 

  • What is the difference between external and managed tables?

While external tables give data control to Hive but not control of a schema, managed tables give both schema and data control.

  • Can Hive be used in OLTP systems?

Since Hive does not support row-level data insertion, it is not suitable for use in OLTP systems.

  • Can a table name be changed in Hive?

Rename a table name by using: Alter Table table_name RENAME TO new_name.

  • Where is Hive table data stored?

Hive table data is stored in an HDFS directory by default – user/hive/warehouse. This can be altered.

  • Can the default location of a managed table be changed?

Yes, the default managed table location can be changed by using the LOCATION ‘<hdfs_path>’ clause.

  • What is a Hive Metastore?

A metastore is a relational database that stores the metadata of Hive partitions, tables, databases, and so on.

  • What are the types of meta stores?

Local and Remote meta stores are the two types of Hive meta stores.

  • What is the difference between Local and Remote metastores?

Local meta stores run on the same Java Virtual Machine (JVM) as the Hive service whereas remote metastores run on a separate, distinct JVM. 

  • What is the default Apache Hive metastore database?

The default database for metastore is the embedded Derby database provided by Hive which is backed by the local disk. 

  • Can multiple users use one metastore?

No, metastore sharing is not supported by Hive.

  • What are the three different modes in which Hive can be operated?

The three modes are Local mode, distributed mode, and pseudo-distributed mode.

  • Is there a data type in Hive to store date information?

The TIMESTAMP data type stores all data information the java.sql.timestamp format.

  • Why is partitioning used in Hive?

Partitioning allows for the reduction of query latency as instead of scanning entire tables, only relevant partitions and corresponding datasets are scanned.

  • What are the Hive collection data types?

ARRAY, MAP, AND STRUCT are the three Hive collection data types.

  • Is it possible to run UNIX shell commands in Hive?

Yes, one can run shell commands in Hive by adding a ‘!’ before the command.

  • Is it possible to execute Hive queries from a script file?

Yes, one can do so with the help of a source command. For example – Hive> source /path/queryfile.hql  

  • What is a .hiverc file?

It is a file that consists of a list of commands that need to be run when the Command Line Input is initiated.

  • How can you check if a specific partition exists?

Use the following command: SHOW PARTITIONS table_name PARTITION (partitioned_column=’partition_value’)

  • If you had to list all databases that began with the letter ‘c’, how would you do it?

By using the following command: SHOW DATABASES LIKE ‘c.*’

  • Is it possible to delete DBPROPERTY in Hive?

No, there is no way to delete the DBPROPERTY.

  • Which Java class handles the input record encoding into files that store Hive tables?

The ‘org.apache.hadoop.mapred.TextInputFormat’ class.

  • Which Java class handles output record encoding into Hive query files?

The ‘org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat’ class.

  • When a Hive table partition is pointed to a new directory, what happens to the data?

The data remains in the old directory and needs to be transferred manually. 

  • Do you save space in the HDFS by archiving Hive tables?

No, archiving only helps reduce the number of files that makes for easier management of data.

  • How can you stop a partition from being accessed in a query?

Use the ENABLE OFFLINE clause along with the ALTER TABLE command.

  • What is a table generating function on Hive?

MapReduce is a programming framework that allows Hive to divide large datasets into smaller units and process them parallelly. 

  • Can you avoid MapReduce on Hive?

You can make Hive avoid MapReduce to return query results by setting the hive.exec.mode.local.auto property to ‘true’.

  • Can a Cartesian join be created between two Hive tables?

This is not possible as it cannot be implemented in MapReduce programming.

  • What is a view in Hive?

A view is a logical construct that allows search queries to be treated as tables.

  • Can the name of a view be the same as a Hive table name?

No, the name of the view must always be unique in the database.

  • Can we use the LOAD or INSERT command to view?

No, these commands cannot be used with respect to a view in Hive.

  • What is indexing in Hive?

Hive indexing is a query optimization technique to reduce the time needed to access a column or a set of columns within a Hive database.

  • Are multi-line comments supported by Hive?

No.

  • How can you view the indexes of a Hive table?

By using the following command: SHOW INDEX ON table_name

  • What is the Hive ObjectInspector function?

It helps to analyze the structure of individual columns and rows and provides access to the complex objects that are stored within the database.

  • What is bucketing?

Bucketing is the process of hashing the values in a column into several user-defined buckets which helps avoid over-partitioning.

  • How is bucketing helpful?

Bucketing helps optimize the sampling process and shortens the query response time.

  • Can you specify the name of the table creator in Hive?

Yes, by using the TBLPROPERTIES clause. For example – TBLPROPERTIES (‘creator’= ‘john’)

  • What is Hcatalog?

Hcatalog is a tool that helps to share data structures with other external systems on the Hadoop ecosystem.

  • What is UDF in Hive?

UDF is a user-designed function created with a Java program to address a specific function that is not part of the existing Hive functions.

  • What does /*streamtable(table_name)*/ do? 

It is a query hint that allows for a table to be streamed into memory before a query is executed.

  • What are the limitations of Hive?

Hive has several limitations: 

  • Real-time queries cannot be executed and it has no row-level support.
  • Hive cannot be used for online transaction processing.

Conclusion

The above Hive scenario-based questions and hive interview questions and answers cover most of the important topics under Hive but this is in no way an exhaustive list.

If you are interested in making it big in the world of data and evolve as a Future Leader, you may consider our Integrated Program in Business Analytics, a 10-month online program, in collaboration with IIM Indore!

Also, Read

Top 50 Excel Interview Questions

SHARE
share

Are you ready to build your own career?