org.apache.hadoop.hive.ql.io
Class HiveFileFormatUtils

java.lang.Object
  extended by org.apache.hadoop.hive.ql.io.HiveFileFormatUtils

public class HiveFileFormatUtils
extends Object

An util class for various Hive file format tasks. #registerOutputFormatSubstitute(Class, Class) and {@link #getOutputFormatSubstitute(Class)} are added for backward compatibility. They return the newly added HiveOutputFormat for the older ones.


Field Summary
static String READ_COLUMN_IDS_CONF_STR
           
 
Constructor Summary
HiveFileFormatUtils()
           
 
Method Summary
static boolean checkInputFormat(org.apache.hadoop.fs.FileSystem fs, HiveConf conf, Class<? extends org.apache.hadoop.mapred.InputFormat> inputFormatCls, ArrayList<org.apache.hadoop.fs.FileStatus> files)
          checks if files are in same format as the given input format
static Class<? extends InputFormatChecker> getInputFormatChecker(Class<?> inputFormat)
          get an InputFormatChecker for a file format.
static org.apache.hadoop.fs.Path getOutputFormatFinalPath(org.apache.hadoop.fs.Path parent, org.apache.hadoop.mapred.JobConf jc, HiveOutputFormat<?,?> hiveOutputFormat, boolean isCompressed, org.apache.hadoop.fs.Path defaultFinalPath)
          get the final output path of a given FileOutputFormat.
static Class<? extends HiveOutputFormat> getOutputFormatSubstitute(Class<?> origin)
          get a OutputFormat's substitute HiveOutputFormat
static ArrayList<Integer> getReadColumnIDs(org.apache.hadoop.conf.Configuration conf)
          Returns an array of column ids(start from zero) which is set in the given parameter conf.
static void registerInputFormatChecker(Class<? extends org.apache.hadoop.mapred.InputFormat> format, Class<? extends InputFormatChecker> checker)
          register an InputFormatChecker for a given InputFormat
static void registerOutputFormatSubstitute(Class<? extends org.apache.hadoop.mapred.OutputFormat> origin, Class<? extends HiveOutputFormat> substitute)
          register a substitute
static void setFullyReadColumns(org.apache.hadoop.conf.Configuration conf)
          Clears the read column ids set in the conf, and will read all columns.
static void setReadColumnIDs(org.apache.hadoop.conf.Configuration conf, ArrayList<Integer> ids)
          Sets read columns' ids(start from zero) for RCFile's Reader.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

READ_COLUMN_IDS_CONF_STR

public static String READ_COLUMN_IDS_CONF_STR
Constructor Detail

HiveFileFormatUtils

public HiveFileFormatUtils()
Method Detail

registerOutputFormatSubstitute

public static void registerOutputFormatSubstitute(Class<? extends org.apache.hadoop.mapred.OutputFormat> origin,
                                                  Class<? extends HiveOutputFormat> substitute)
register a substitute

Parameters:
origin - the class that need to be substituted
substitute -

getOutputFormatSubstitute

public static Class<? extends HiveOutputFormat> getOutputFormatSubstitute(Class<?> origin)
get a OutputFormat's substitute HiveOutputFormat


getOutputFormatFinalPath

public static org.apache.hadoop.fs.Path getOutputFormatFinalPath(org.apache.hadoop.fs.Path parent,
                                                                 org.apache.hadoop.mapred.JobConf jc,
                                                                 HiveOutputFormat<?,?> hiveOutputFormat,
                                                                 boolean isCompressed,
                                                                 org.apache.hadoop.fs.Path defaultFinalPath)
                                                          throws IOException
get the final output path of a given FileOutputFormat.

Parameters:
parent - parent dir of the expected final output path
jc - job configuration
Throws:
IOException

registerInputFormatChecker

public static void registerInputFormatChecker(Class<? extends org.apache.hadoop.mapred.InputFormat> format,
                                              Class<? extends InputFormatChecker> checker)
register an InputFormatChecker for a given InputFormat

Parameters:
format - the class that need to be substituted
checker -

getInputFormatChecker

public static Class<? extends InputFormatChecker> getInputFormatChecker(Class<?> inputFormat)
get an InputFormatChecker for a file format.


checkInputFormat

public static boolean checkInputFormat(org.apache.hadoop.fs.FileSystem fs,
                                       HiveConf conf,
                                       Class<? extends org.apache.hadoop.mapred.InputFormat> inputFormatCls,
                                       ArrayList<org.apache.hadoop.fs.FileStatus> files)
                                throws HiveException
checks if files are in same format as the given input format

Throws:
HiveException

setReadColumnIDs

public static void setReadColumnIDs(org.apache.hadoop.conf.Configuration conf,
                                    ArrayList<Integer> ids)
Sets read columns' ids(start from zero) for RCFile's Reader. Once a column is included in the list, RCFile's reader will not skip its value.


getReadColumnIDs

public static ArrayList<Integer> getReadColumnIDs(org.apache.hadoop.conf.Configuration conf)
Returns an array of column ids(start from zero) which is set in the given parameter conf.


setFullyReadColumns

public static void setFullyReadColumns(org.apache.hadoop.conf.Configuration conf)
Clears the read column ids set in the conf, and will read all columns.



Copyright © 2009 The Apache Software Foundation