How to use
setInputSequenceFile
method
in
org.apache.avro.mapred.AvroJob

Best Java code snippets using org.apache.avro.mapred.AvroJob.setInputSequenceFile (Showing top 2 results out of 315)

@Test
public void testSequenceFileInputFormat() throws Exception {
 JobConf job = new JobConf();
 Path outputPath = new Path(OUTPUT_DIR.getRoot().getPath());
 outputPath.getFileSystem(job).delete(outputPath);
 // configure input for Avro from sequence file
 AvroJob.setInputSequenceFile(job);
 FileInputFormat.setInputPaths(job, file().toURI().toString());
 AvroJob.setInputSchema(job, SCHEMA);
 // mapper is default, identity
 // reducer is default, identity
 // configure output for avro
 AvroJob.setOutputSchema(job, SCHEMA);
 FileOutputFormat.setOutputPath(job, outputPath);
 JobClient.runJob(job);
 checkFile(new DataFileReader<>
      (new File(outputPath.toString() + "/part-00000.avro"),
       new SpecificDatumReader<>()));
}

@Test
public void testNonAvroReducer() throws Exception {
 JobConf job = new JobConf();
 Path outputPath = new Path(OUTPUT_DIR.getRoot().getPath());
 outputPath.getFileSystem(job).delete(outputPath);
 // configure input for Avro from sequence file
 AvroJob.setInputSequenceFile(job);
 AvroJob.setInputSchema(job, SCHEMA);
 FileInputFormat.setInputPaths(job, file().toURI().toString());
 // mapper is default, identity
 // use a hadoop reducer that consumes Avro input
 AvroJob.setMapOutputSchema(job, SCHEMA);
 job.setReducerClass(NonAvroReducer.class);
 // configure outputPath for non-Avro SequenceFile
 job.setOutputFormat(SequenceFileOutputFormat.class);
 FileOutputFormat.setOutputPath(job, outputPath);
 // output key/value classes are default, LongWritable/Text
 JobClient.runJob(job);
 checkFile(new SequenceFileReader<>
      (new File(outputPath.toString() + "/part-00000")));
}

Javadoc

Indicate that a job's input files are in SequenceFile format.

Popular methods of AvroJob

getOutputSchema
Return a job's output key schema.
getMapOutputSchema
Return a job's map output key schema.
setInputSchema
Configure a job's map input schema.
setOutputSchema
Configure a job's output schema. Unless this is a map-only job, this must be a Pair schema.
setMapperClass
Configure a job's mapper implementation.
setReducerClass
Configure a job's reducer implementation.
getInputSchema
Return a job's map input schema.
setCombinerClass
Configure a job's combiner implementation.
setMapOutputSchema
Configure a job's map output schema. The map output schema defaults to the output schema and need on
setOutputCodec
Configure a job's output compression codec.
configureAvroInput
configureAvroJob

Popular in Java

Creating JSON documents from java classes using gson
getSystemService (Context)
setScale (BigDecimal)
onCreateOptionsMenu (Activity)
SocketTimeoutException (java.net)
This exception is thrown when a timeout expired on a socket read or accept operation.
URLEncoder (java.net)
This class is used to encode a string using the format required by application/x-www-form-urlencoded
Arrays (java.util)
This class contains various methods for manipulating arrays (such as sorting and searching). This cl
Set (java.util)
A Set is a data structure which does not allow duplicate elements.
Logger (org.slf4j)
The org.slf4j.Logger interface is the main user entry point of SLF4J API. It is expected that loggin
BufferedImage (java.awt.image)
The BufferedImage subclass describes an java.awt.Image with an accessible buffer of image data. All
Best IntelliJ plugins

How to use setInputSequenceFilemethodin org.apache.avro.mapred.AvroJob

Best Java code snippets using org.apache.avro.mapred.AvroJob.setInputSequenceFile (Showing top 2 results out of 315)

How to use
setInputSequenceFile
method
in
org.apache.avro.mapred.AvroJob