org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.readFile java code examples

/**
 * Reads the contents of the user-specified {@code filePath} based on the given {@link FileInputFormat}.
 *
 * <p>Since all data streams need specific information about their types, this method needs to determine the
 * type of the data produced by the input format. It will attempt to determine the data type by reflection,
 * unless the input format implements the {@link org.apache.flink.api.java.typeutils.ResultTypeQueryable} interface.
 * In the latter case, this method will invoke the
 * {@link org.apache.flink.api.java.typeutils.ResultTypeQueryable#getProducedType()} method to determine data
 * type produced by the input format.
 *
 * <p><b>NOTES ON CHECKPOINTING: </b> The source monitors the path, creates the
 * {@link org.apache.flink.core.fs.FileInputSplit FileInputSplits} to be processed,
 * forwards them to the downstream {@link ContinuousFileReaderOperator readers} to read the actual data,
 * and exits, without waiting for the readers to finish reading. This implies that no more checkpoint
 * barriers are going to be forwarded after the source exits, thus having no checkpoints after that point.
 *
 * @param filePath
 *         The path of the file, as a URI (e.g., "file:///some/local/file" or "hdfs://host:port/file/path")
 * @param inputFormat
 *         The input format used to create the data stream
 * @param <OUT>
 *         The type of the returned data stream
 * @return The data stream that represents the data read from the given file
 */
public <OUT> DataStreamSource<OUT> readFile(FileInputFormat<OUT> inputFormat,
                      String filePath) {
  return readFile(inputFormat, filePath, FileProcessingMode.PROCESS_ONCE, -1);
}

      "explicitly by using the 'createInput(InputFormat, TypeInformation)' method instead.");
return readFile(inputFormat, filePath, watchType, interval, typeInformation);

      "explicitly by using the 'createInput(InputFormat, TypeInformation)' method instead.");
return readFile(inputFormat, filePath, watchType, interval, typeInformation);

/**
 * Reads the given file line-by-line and creates a data stream that contains a string with the
 * contents of each such line. The {@link java.nio.charset.Charset} with the given name will be
 * used to read the files.
 *
 * <p><b>NOTES ON CHECKPOINTING: </b> The source monitors the path, creates the
 * {@link org.apache.flink.core.fs.FileInputSplit FileInputSplits} to be processed,
 * forwards them to the downstream {@link ContinuousFileReaderOperator readers} to read the actual data,
 * and exits, without waiting for the readers to finish reading. This implies that no more checkpoint
 * barriers are going to be forwarded after the source exits, thus having no checkpoints after that point.
 *
 * @param filePath
 *         The path of the file, as a URI (e.g., "file:///some/local/file" or "hdfs://host:port/file/path")
 * @param charsetName
 *         The name of the character set used to read the file
 * @return The data stream that represents the data read from the given file as text lines
 */
public DataStreamSource<String> readTextFile(String filePath, String charsetName) {
  Preconditions.checkNotNull(filePath, "The file path must not be null.");
  Preconditions.checkNotNull(filePath.isEmpty(), "The file path must not be empty.");
  TextInputFormat format = new TextInputFormat(new Path(filePath));
  format.setFilesFilter(FilePathFilter.createDefaultFilter());
  TypeInformation<String> typeInfo = BasicTypeInfo.STRING_TYPE_INFO;
  format.setCharsetName(charsetName);
  return readFile(format, filePath, FileProcessingMode.PROCESS_ONCE, -1, typeInfo);
}

@Override
public void testProgram(StreamExecutionEnvironment env) {
  // set the restart strategy.
  env.getConfig().setRestartStrategy(RestartStrategies.fixedDelayRestart(NO_OF_RETRIES, 0));
  env.enableCheckpointing(10);
  // create and start the file creating thread.
  fc = new FileCreator();
  fc.start();
  // create the monitoring source along with the necessary readers.
  TextInputFormat format = new TextInputFormat(new org.apache.flink.core.fs.Path(localFsURI));
  format.setFilesFilter(FilePathFilter.createDefaultFilter());
  DataStream<String> inputStream = env.readFile(format, localFsURI,
    FileProcessingMode.PROCESS_CONTINUOUSLY, INTERVAL);
  TestingSinkFunction sink = new TestingSinkFunction();
  inputStream.flatMap(new FlatMapFunction<String, String>() {
    @Override
    public void flatMap(String value, Collector<String> out) throws Exception {
      out.collect(value);
    }
  }).addSink(sink).setParallelism(1);
}

/**
 * Reads the contents of the user-specified {@code filePath} based on the given {@link FileInputFormat}.
 *
 * <p>Since all data streams need specific information about their types, this method needs to determine the
 * type of the data produced by the input format. It will attempt to determine the data type by reflection,
 * unless the input format implements the {@link org.apache.flink.api.java.typeutils.ResultTypeQueryable} interface.
 * In the latter case, this method will invoke the
 * {@link org.apache.flink.api.java.typeutils.ResultTypeQueryable#getProducedType()} method to determine data
 * type produced by the input format.
 *
 * <p><b>NOTES ON CHECKPOINTING: </b> The source monitors the path, creates the
 * {@link org.apache.flink.core.fs.FileInputSplit FileInputSplits} to be processed,
 * forwards them to the downstream {@link ContinuousFileReaderOperator readers} to read the actual data,
 * and exits, without waiting for the readers to finish reading. This implies that no more checkpoint
 * barriers are going to be forwarded after the source exits, thus having no checkpoints after that point.
 *
 * @param filePath
 *         The path of the file, as a URI (e.g., "file:///some/local/file" or "hdfs://host:port/file/path")
 * @param inputFormat
 *         The input format used to create the data stream
 * @param <OUT>
 *         The type of the returned data stream
 * @return The data stream that represents the data read from the given file
 */
public <OUT> DataStreamSource<OUT> readFile(FileInputFormat<OUT> inputFormat,
                      String filePath) {
  return readFile(inputFormat, filePath, FileProcessingMode.PROCESS_ONCE, -1);
}

/**
 * Reads the contents of the user-specified {@code filePath} based on the given {@link FileInputFormat}.
 *
 * <p>Since all data streams need specific information about their types, this method needs to determine the
 * type of the data produced by the input format. It will attempt to determine the data type by reflection,
 * unless the input format implements the {@link org.apache.flink.api.java.typeutils.ResultTypeQueryable} interface.
 * In the latter case, this method will invoke the
 * {@link org.apache.flink.api.java.typeutils.ResultTypeQueryable#getProducedType()} method to determine data
 * type produced by the input format.
 *
 * <p><b>NOTES ON CHECKPOINTING: </b> The source monitors the path, creates the
 * {@link org.apache.flink.core.fs.FileInputSplit FileInputSplits} to be processed,
 * forwards them to the downstream {@link ContinuousFileReaderOperator readers} to read the actual data,
 * and exits, without waiting for the readers to finish reading. This implies that no more checkpoint
 * barriers are going to be forwarded after the source exits, thus having no checkpoints after that point.
 *
 * @param filePath
 *         The path of the file, as a URI (e.g., "file:///some/local/file" or "hdfs://host:port/file/path")
 * @param inputFormat
 *         The input format used to create the data stream
 * @param <OUT>
 *         The type of the returned data stream
 * @return The data stream that represents the data read from the given file
 */
public <OUT> DataStreamSource<OUT> readFile(FileInputFormat<OUT> inputFormat,
                      String filePath) {
  return readFile(inputFormat, filePath, FileProcessingMode.PROCESS_ONCE, -1);
}

/**
 * Reads the contents of the user-specified {@code filePath} based on the given {@link FileInputFormat}.
 *
 * <p>Since all data streams need specific information about their types, this method needs to determine the
 * type of the data produced by the input format. It will attempt to determine the data type by reflection,
 * unless the input format implements the {@link org.apache.flink.api.java.typeutils.ResultTypeQueryable} interface.
 * In the latter case, this method will invoke the
 * {@link org.apache.flink.api.java.typeutils.ResultTypeQueryable#getProducedType()} method to determine data
 * type produced by the input format.
 *
 * <p><b>NOTES ON CHECKPOINTING: </b> The source monitors the path, creates the
 * {@link org.apache.flink.core.fs.FileInputSplit FileInputSplits} to be processed,
 * forwards them to the downstream {@link ContinuousFileReaderOperator readers} to read the actual data,
 * and exits, without waiting for the readers to finish reading. This implies that no more checkpoint
 * barriers are going to be forwarded after the source exits, thus having no checkpoints after that point.
 *
 * @param filePath
 *         The path of the file, as a URI (e.g., "file:///some/local/file" or "hdfs://host:port/file/path")
 * @param inputFormat
 *         The input format used to create the data stream
 * @param <OUT>
 *         The type of the returned data stream
 * @return The data stream that represents the data read from the given file
 */
public <OUT> DataStreamSource<OUT> readFile(FileInputFormat<OUT> inputFormat,
                      String filePath) {
  return readFile(inputFormat, filePath, FileProcessingMode.PROCESS_ONCE, -1);
}

/**
 * Reads the contents of the user-specified {@code filePath} based on the given {@link FileInputFormat}.
 *
 * <p>Since all data streams need specific information about their types, this method needs to determine the
 * type of the data produced by the input format. It will attempt to determine the data type by reflection,
 * unless the input format implements the {@link org.apache.flink.api.java.typeutils.ResultTypeQueryable} interface.
 * In the latter case, this method will invoke the
 * {@link org.apache.flink.api.java.typeutils.ResultTypeQueryable#getProducedType()} method to determine data
 * type produced by the input format.
 *
 * <p><b>NOTES ON CHECKPOINTING: </b> The source monitors the path, creates the
 * {@link org.apache.flink.core.fs.FileInputSplit FileInputSplits} to be processed,
 * forwards them to the downstream {@link ContinuousFileReaderOperator readers} to read the actual data,
 * and exits, without waiting for the readers to finish reading. This implies that no more checkpoint
 * barriers are going to be forwarded after the source exits, thus having no checkpoints after that point.
 *
 * @param filePath
 *         The path of the file, as a URI (e.g., "file:///some/local/file" or "hdfs://host:port/file/path")
 * @param inputFormat
 *         The input format used to create the data stream
 * @param <OUT>
 *         The type of the returned data stream
 * @return The data stream that represents the data read from the given file
 */
public <OUT> DataStreamSource<OUT> readFile(FileInputFormat<OUT> inputFormat,
                      String filePath) {
  return readFile(inputFormat, filePath, FileProcessingMode.PROCESS_ONCE, -1);
}

      "explicitly by using the 'createInput(InputFormat, TypeInformation)' method instead.");
return readFile(inputFormat, filePath, watchType, interval, typeInformation);

      "explicitly by using the 'createInput(InputFormat, TypeInformation)' method instead.");
return readFile(inputFormat, filePath, watchType, interval, typeInformation);

      "explicitly by using the 'createInput(InputFormat, TypeInformation)' method instead.");
return readFile(inputFormat, filePath, watchType, interval, typeInformation);

      "explicitly by using the 'createInput(InputFormat, TypeInformation)' method instead.");
return readFile(inputFormat, filePath, watchType, interval, typeInformation);

      "explicitly by using the 'createInput(InputFormat, TypeInformation)' method instead.");
return readFile(inputFormat, filePath, watchType, interval, typeInformation);

      "explicitly by using the 'createInput(InputFormat, TypeInformation)' method instead.");
return readFile(inputFormat, filePath, watchType, interval, typeInformation);

      "explicitly by using the 'createInput(InputFormat, TypeInformation)' method instead.");
return readFile(inputFormat, filePath, watchType, interval, typeInformation);

/**
 * Reads the given file line-by-line and creates a data stream that contains a string with the
 * contents of each such line. The {@link java.nio.charset.Charset} with the given name will be
 * used to read the files.
 *
 * <p><b>NOTES ON CHECKPOINTING: </b> The source monitors the path, creates the
 * {@link org.apache.flink.core.fs.FileInputSplit FileInputSplits} to be processed,
 * forwards them to the downstream {@link ContinuousFileReaderOperator readers} to read the actual data,
 * and exits, without waiting for the readers to finish reading. This implies that no more checkpoint
 * barriers are going to be forwarded after the source exits, thus having no checkpoints after that point.
 *
 * @param filePath
 *         The path of the file, as a URI (e.g., "file:///some/local/file" or "hdfs://host:port/file/path")
 * @param charsetName
 *         The name of the character set used to read the file
 * @return The data stream that represents the data read from the given file as text lines
 */
public DataStreamSource<String> readTextFile(String filePath, String charsetName) {
  Preconditions.checkNotNull(filePath, "The file path must not be null.");
  Preconditions.checkNotNull(filePath.isEmpty(), "The file path must not be empty.");
  TextInputFormat format = new TextInputFormat(new Path(filePath));
  format.setFilesFilter(FilePathFilter.createDefaultFilter());
  TypeInformation<String> typeInfo = BasicTypeInfo.STRING_TYPE_INFO;
  format.setCharsetName(charsetName);
  return readFile(format, filePath, FileProcessingMode.PROCESS_ONCE, -1, typeInfo);
}

/**
 * Reads the given file line-by-line and creates a data stream that contains a string with the
 * contents of each such line. The {@link java.nio.charset.Charset} with the given name will be
 * used to read the files.
 *
 * <p><b>NOTES ON CHECKPOINTING: </b> The source monitors the path, creates the
 * {@link org.apache.flink.core.fs.FileInputSplit FileInputSplits} to be processed,
 * forwards them to the downstream {@link ContinuousFileReaderOperator readers} to read the actual data,
 * and exits, without waiting for the readers to finish reading. This implies that no more checkpoint
 * barriers are going to be forwarded after the source exits, thus having no checkpoints after that point.
 *
 * @param filePath
 *         The path of the file, as a URI (e.g., "file:///some/local/file" or "hdfs://host:port/file/path")
 * @param charsetName
 *         The name of the character set used to read the file
 * @return The data stream that represents the data read from the given file as text lines
 */
public DataStreamSource<String> readTextFile(String filePath, String charsetName) {
  Preconditions.checkNotNull(filePath, "The file path must not be null.");
  Preconditions.checkNotNull(filePath.isEmpty(), "The file path must not be empty.");
  TextInputFormat format = new TextInputFormat(new Path(filePath));
  format.setFilesFilter(FilePathFilter.createDefaultFilter());
  TypeInformation<String> typeInfo = BasicTypeInfo.STRING_TYPE_INFO;
  format.setCharsetName(charsetName);
  return readFile(format, filePath, FileProcessingMode.PROCESS_ONCE, -1, typeInfo);
}

/**
 * Reads the given file line-by-line and creates a data stream that contains a string with the
 * contents of each such line. The {@link java.nio.charset.Charset} with the given name will be
 * used to read the files.
 *
 * <p><b>NOTES ON CHECKPOINTING: </b> The source monitors the path, creates the
 * {@link org.apache.flink.core.fs.FileInputSplit FileInputSplits} to be processed,
 * forwards them to the downstream {@link ContinuousFileReaderOperator readers} to read the actual data,
 * and exits, without waiting for the readers to finish reading. This implies that no more checkpoint
 * barriers are going to be forwarded after the source exits, thus having no checkpoints after that point.
 *
 * @param filePath
 *         The path of the file, as a URI (e.g., "file:///some/local/file" or "hdfs://host:port/file/path")
 * @param charsetName
 *         The name of the character set used to read the file
 * @return The data stream that represents the data read from the given file as text lines
 */
public DataStreamSource<String> readTextFile(String filePath, String charsetName) {
  Preconditions.checkNotNull(filePath, "The file path must not be null.");
  Preconditions.checkNotNull(filePath.isEmpty(), "The file path must not be empty.");
  TextInputFormat format = new TextInputFormat(new Path(filePath));
  format.setFilesFilter(FilePathFilter.createDefaultFilter());
  TypeInformation<String> typeInfo = BasicTypeInfo.STRING_TYPE_INFO;
  format.setCharsetName(charsetName);
  return readFile(format, filePath, FileProcessingMode.PROCESS_ONCE, -1, typeInfo);
}

/**
 * Reads the given file line-by-line and creates a data stream that contains a string with the
 * contents of each such line. The {@link java.nio.charset.Charset} with the given name will be
 * used to read the files.
 *
 * <p><b>NOTES ON CHECKPOINTING: </b> The source monitors the path, creates the
 * {@link org.apache.flink.core.fs.FileInputSplit FileInputSplits} to be processed,
 * forwards them to the downstream {@link ContinuousFileReaderOperator readers} to read the actual data,
 * and exits, without waiting for the readers to finish reading. This implies that no more checkpoint
 * barriers are going to be forwarded after the source exits, thus having no checkpoints after that point.
 *
 * @param filePath
 *         The path of the file, as a URI (e.g., "file:///some/local/file" or "hdfs://host:port/file/path")
 * @param charsetName
 *         The name of the character set used to read the file
 * @return The data stream that represents the data read from the given file as text lines
 */
public DataStreamSource<String> readTextFile(String filePath, String charsetName) {
  Preconditions.checkNotNull(filePath, "The file path must not be null.");
  Preconditions.checkNotNull(filePath.isEmpty(), "The file path must not be empty.");
  TextInputFormat format = new TextInputFormat(new Path(filePath));
  format.setFilesFilter(FilePathFilter.createDefaultFilter());
  TypeInformation<String> typeInfo = BasicTypeInfo.STRING_TYPE_INFO;
  format.setCharsetName(charsetName);
  return readFile(format, filePath, FileProcessingMode.PROCESS_ONCE, -1, typeInfo);
}

Javadoc

Reads the contents of the user-specified filePath based on the given FileInputFormat.

Since all data streams need specific information about their types, this method needs to determine the type of the data produced by the input format. It will attempt to determine the data type by reflection, unless the input format implements the org.apache.flink.api.java.typeutils.ResultTypeQueryable interface. In the latter case, this method will invoke the org.apache.flink.api.java.typeutils.ResultTypeQueryable#getProducedType() method to determine data type produced by the input format.

NOTES ON CHECKPOINTING: The source monitors the path, creates the org.apache.flink.core.fs.FileInputSplit to be processed, forwards them to the downstream ContinuousFileReaderOperator to read the actual data, and exits, without waiting for the readers to finish reading. This implies that no more checkpoint barriers are going to be forwarded after the source exits, thus having no checkpoints after that point.

Popular methods of StreamExecutionEnvironment

execute
getExecutionEnvironment
Creates an execution environment that represents the context in which the program is currently execu
addSource
Ads a data source with a custom type information thus opening a DataStream. Only in very special cas
getConfig
Gets the config object.
enableCheckpointing
Enables checkpointing for the streaming job. The distributed state of the streaming dataflow will be
setStreamTimeCharacteristic
Sets the time characteristic for all streams create from this environment, e.g., processing time, ev
setParallelism
Sets the parallelism for operations executed through this environment. Setting a parallelism of x he
fromElements
Creates a new data stream that contains the given elements. The elements must all be of the same typ
setStateBackend
Sets the state backend that describes how to store and checkpoint operator state. It defines both wh
createLocalEnvironment
Creates a LocalStreamEnvironment. The local execution environment will run the program in a multi-th
fromCollection
Creates a data stream from the given iterator.Because the iterator will remain unmodified until the
getCheckpointConfig
Gets the checkpoint config, which defines values like checkpoint interval, delay between checkpoints

Popular in Java

Making http requests using okhttp
addToBackStack (FragmentTransaction)
setScale (BigDecimal)
compareTo (BigDecimal)
File (java.io)
An "abstract" representation of a file system entity identified by a pathname. The pathname may be a
IOException (java.io)
Signals a general, I/O-related error. Error details may be specified when calling the constructor, a
Timestamp (java.sql)
A Java representation of the SQL TIMESTAMP type. It provides the capability of representing the SQL
JarFile (java.util.jar)
JarFile is used to read jar entries and their associated data from jar files.
BoxLayout (javax.swing)
Response (javax.ws.rs.core)
Defines the contract between a returned instance and the runtime when an application needs to provid
Top 12 Jupyter Notebook extensions

How to use readFilemethodin org.apache.flink.streaming.api.environment.StreamExecutionEnvironment

Best Java code snippets using org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.readFile (Showing top 20 results out of 315)

How to use
readFile
method
in
org.apache.flink.streaming.api.environment.StreamExecutionEnvironment