How to use
eu.stratosphere.api.java.record.io.TextInputFormat
constructor

Best Java code snippets using eu.stratosphere.api.java.record.io.TextInputFormat.<init> (Showing top 5 results out of 315)

@Override
public Plan getPlan(String... args) {
  // parse job parameters
  int numSubTasks   = (args.length > 0 ? Integer.parseInt(args[0]) : 1);
  String dataInput = (args.length > 1 ? args[1] : "");
  String output    = (args.length > 2 ? args[2] : "");
  FileDataSource source = new FileDataSource(new TextInputFormat(), dataInput, "Input Lines");
  MapOperator mapper = MapOperator.builder(new TokenizeLine())
    .input(source)
    .name("Tokenize Lines")
    .build();
  ReduceOperator reducer = ReduceOperator.builder(CountWords.class, StringValue.class, 0)
    .input(mapper)
    .name("Count Words")
    .build();
  
  @SuppressWarnings("unchecked")
  FileDataSink out = new FileDataSink(new CsvOutputFormat("\n", " ", StringValue.class, IntValue.class), output, reducer, "Word Counts");
  
  Plan plan = new Plan(out, "WordCount Example");
  plan.setDefaultParallelism(numSubTasks);
  return plan;
}

static Plan getTestPlanPlan(int numSubTasks, String input, String output) {
  
  FileDataSource source = new FileDataSource(new TextInputFormat(), input, "Input Lines");
  source.setParameter(TextInputFormat.CHARSET_NAME, "ASCII");
  MapOperator mapper = MapOperator.builder(new TokenizeLine())
    .input(source)
    .name("Tokenize Lines")
    .build();
  ReduceOperator reducer = ReduceOperator.builder(CountWords.class, StringValue.class, 0)
    .input(mapper)
    .name("Count Words")
    .build();
  @SuppressWarnings("unchecked")
  FileDataSink out = new FileDataSink(new CsvOutputFormat("\n"," ", StringValue.class, IntValue.class), output, reducer, "Word Counts");
  Plan plan = new Plan(out, "WordCount Example");
  plan.setDefaultParallelism(numSubTasks);
  
  return plan;
}

public Plan getPlan(int numSubTasks, String dataInput, String output) {
  // input is {word, count} pair
  FileDataSource source = new FileDataSource(new TextInputFormat(), dataInput, "Input Lines");
  //do a selection using cached file
  MapOperator mapper = MapOperator.builder(new TokenizeLine())
    .input(source)
    .name("Tokenize Lines")
    .build();
  FileDataSink out = new FileDataSink(new CsvOutputFormat(), output, mapper, "Selection");
  CsvOutputFormat.configureRecordFormat(out)
    .recordDelimiter('\n')
    .fieldDelimiter(' ')
    .field(StringValue.class, 0)
    .field(IntValue.class, 1);
  Plan plan = new Plan(out, "Distributed Cache");
  plan.setDefaultParallelism(numSubTasks);
  return plan;
}

@Override
public Plan getPlan(String... args) {
  int numSubTasks = (args.length > 0 ? Integer.parseInt(args[0]) : 1);
  String dataInput = (args.length > 1 ? args[1] : "");
  String output = (args.length > 2 ? args[2] : "");
  FileDataSource source = new FileDataSource(new TextInputFormat(), dataInput, "Input Lines");
  MapOperator mapper = MapOperator.builder(new TokenizeLine()).input(source).name("Tokenize Lines").build();
  
  ReduceOperator reducer = ReduceOperator.builder(CountWords.class, StringValue.class, 0).input(mapper)
      .name("Count Words").build();
  
  FileDataSink out = new FileDataSink(new CsvOutputFormat(), output, reducer, "Word Counts");
  
  CsvOutputFormat.configureRecordFormat(out).recordDelimiter('\n')
      .fieldDelimiter(' ').field(StringValue.class, 0)
      .field(IntValue.class, 1);
  Plan plan = new Plan(out, "WordCount Example");
  plan.setDefaultParallelism(numSubTasks);
  return plan;
}

private void checkWordCountWithSortedSink(boolean estimates) {
  try {
    FileDataSource sourceNode = new FileDataSource(new TextInputFormat(), IN_FILE, "Input Lines");
    MapOperator mapNode = MapOperator.builder(new TokenizeLine())
      .input(sourceNode)

Popular methods of TextInputFormat

Popular in Java

Start an intent from android
notifyDataSetChanged (ArrayAdapter)
getOriginalFilename (MultipartFile)
Return the original filename in the client's filesystem.This may contain path information depending
getContentResolver (Context)
InputStreamReader (java.io)
A class for turning a byte stream into a character stream. Data read from the source input stream is
OutputStream (java.io)
A writable sink for bytes.Most clients will use output streams that write data to the file system (
XPath (javax.xml.xpath)
XPath provides access to the XPath evaluation environment and expressions. Evaluation of XPath Expr
IsNull (org.hamcrest.core)
Is the value null?
DateTimeFormat (org.joda.time.format)
Factory that creates instances of DateTimeFormatter from patterns and styles. Datetime formatting i
Join (org.hibernate.mapping)
CodeWhisperer alternatives

How to use eu.stratosphere.api.java.record.io.TextInputFormatconstructor

Best Java code snippets using eu.stratosphere.api.java.record.io.TextInputFormat.<init> (Showing top 5 results out of 315)

How to use
eu.stratosphere.api.java.record.io.TextInputFormat
constructor