How to use
PigStorage
in
org.apache.pig.builtin

Best Java code snippets using org.apache.pig.builtin.PigStorage (Showing top 9 results out of 315)

@Override
public boolean equals(Object obj) {
  if (obj instanceof PigStorage)
    return equals((PigStorage)obj);
  else
    return false;
}

  @Override
  public Tuple getNext() throws IOException {
    Tuple myTuple = super.getNext();
    if (myTuple != null) {
      myTuple.append(path.toString());
    }
    return myTuple;
  }
}

private void addTupleValue(ArrayList<Object> tuple, byte[] buf, int start, int end) {
  tuple.add(readField(buf, start, end));
}

    if (buf[i] == fieldDel) {
      if (mRequiredColumns==null || (mRequiredColumns.length>fieldID && mRequiredColumns[fieldID]))
        addTupleValue(mProtoTuple, buf, start, i);
      start = i + 1;
      fieldID++;
    addTupleValue(mProtoTuple, buf, start, len);
  return dontLoadSchema ? t : applySchema(t);
} catch (InterruptedException e) {
  int errCode = 6018;

@Override
public void setStoreLocation(String location, Job job) throws IOException {
  job.getConfiguration().set(MRConfiguration.TEXTOUTPUTFORMAT_SEPARATOR, "");
  FileOutputFormat.setOutputPath(job, new Path(location));
  if( "true".equals( job.getConfiguration().get( "output.compression.enabled" ) ) ) {
    FileOutputFormat.setCompressOutput( job, true );
    String codec = job.getConfiguration().get( "output.compression.codec" );
    try {
      FileOutputFormat.setOutputCompressorClass( job,  (Class<? extends CompressionCodec>) Class.forName( codec ) );
    } catch (ClassNotFoundException e) {
      throw new RuntimeException("Class not found: " + codec );
    }
  } else {
    // This makes it so that storing to a directory ending with ".gz" or ".bz2" works.
    setCompression(new Path(location), job);
  }
}

@Override
public void setStoreLocation(String location, Job job) throws IOException {
 HadoopCompat.getConfiguration(job).set("output.compression.enabled", "true");
 HadoopCompat.getConfiguration(job).set("output.compression.codec", BZip2Codec.class.getName());
 super.setStoreLocation(location, job);
}

@Override
public void prepareToRead(@SuppressWarnings("rawtypes") RecordReader reader, PigSplit split) {
  super.prepareToRead(reader, split);
  path = ((FileSplit) split.getWrappedSplit()).getPath();
}

Options validOptions = populateValidOptions();
String[] optsArr = options.split(" ");
try {

private Tuple applySchema(Tuple tup) throws IOException {
  if ( caster == null) {
    caster = getLoadCaster();

Javadoc

A load function that parses a line of input into fields using a character delimiter. The default delimiter is a tab. You can specify any character as a literal ("a"), a known escape character ("\\t"), or a dec or hex value ("\\u001", "\\x0A").

An optional second constructor argument is provided that allows one to customize advanced behaviors. A list of available options is below:

-schema Reads/Stores the schema of the relation using a hidden JSON file.
-noschema Ignores a stored schema during loading.
-tagFile Appends input source file name to beginning of each tuple.
-tagPath Appends input source file path to beginning of each tuple.

Schemas

If -schema is specified, a hidden ".pig_schema" file is created in the output directory when storing data. It is used by PigStorage (with or without -schema) during loading to determine the field names and types of the data without the need for a user to explicitly provide the schema in an as clause, unless -noschema is specified. No attempt to merge conflicting schemas is made during loading. The first schema encountered during a file system scan is used. If the schema file is not present while '-schema' option is used during loading, it results in an error.

In addition, using -schema drops a ".pig_headers" file in the output directory. This file simply lists the delimited aliases. This is intended to make export to tools that can read files with header lines easier (just cat the header to your data).

Source tagging

If-tagFile is specified, PigStorage will prepend input split name to each Tuple/row. Usage: A = LOAD 'input' using PigStorage(',','-tagFile'); B = foreach A generate $0; The first field (0th index) in each Tuple will contain input file name. If-tagPath is specified, PigStorage will prepend input split path to each Tuple/row. Usage: A = LOAD 'input' using PigStorage(',','-tagPath'); B = foreach A generate $0; The first field (0th index) in each Tuple will contain input file path

Note that regardless of whether or not you store the schema, you always need to specify the correct delimiter to read your data. If you store reading delimiter "#" and then load using the default delimiter, your data will not be parsed correctly.

Compression

Storing to a directory whose name ends in ".bz2" or ".gz" or ".lzo" (if you have installed support for LZO compression in Hadoop) will automatically use the corresponding compression codec.
output.compression.enabled and output.compression.codec job properties also work.

Loading from directories ending in .bz2 or .bz works automatically; other compression formats are not auto-detected on loading.

Most used methods

addTupleValue
applySchema
equals
getLoadCaster
getNext
populateValidOptions
prepareToRead
readField
Read the bytes between start and end into a DataByteArray for inclusion in the return tuple.
setCompression
setStoreLocation

Popular in Java

Parsing JSON documents to java classes using gson
onRequestPermissionsResult (Fragment)
getSystemService (Context)
findViewById (Activity)
EOFException (java.io)
Thrown when a program encounters the end of a file or stream during an input operation.
Charset (java.nio.charset)
A charset is a named mapping between Unicode characters and byte sequences. Every Charset can decode
Calendar (java.util)
Calendar is an abstract base class for converting between a Date object and a set of integer fields
JFrame (javax.swing)
JList (javax.swing)
BasicDataSource (org.apache.commons.dbcp)
Basic implementation of javax.sql.DataSource that is configured via JavaBeans properties. This is no
Top Vim plugins

How to usePigStorage in org.apache.pig.builtin

Best Java code snippets using org.apache.pig.builtin.PigStorage (Showing top 9 results out of 315)

How to use
PigStorage
in
org.apache.pig.builtin