org.apache.mahout.clustering.topdown.postprocessor.ClusterOutputPostProcessorDriver java code examples

/**
 * Post processes the output of clustering algorithms and groups them into respective clusters. Each
 * cluster's vectors are written into a directory named after its clusterId.
 *
 * @param input         The output path provided to the clustering algorithm, whose would be post processed. Hint: The
 *                      path of the directory containing clusters-*-final and clusteredPoints.
 * @param output        The post processed data would be stored at this path.
 * @param runSequential If set to true, post processes it sequentially, else, uses. MapReduce. Hint: If the clustering
 *                      was done sequentially, make it sequential, else vice versa.
 */
public static void run(Path input, Path output, boolean runSequential) throws IOException,
    InterruptedException,
    ClassNotFoundException {
 if (runSequential) {
  postProcessSeq(input, output);
 } else {
  Configuration conf = new Configuration();
  postProcessMR(conf, input, output);
  movePartFilesToRespectiveDirectories(conf, output);
 }
}

/**
 * CLI to run clustering post processor. The input to post processor is the ouput path specified to the
 * clustering.
 */
@Override
public int run(String[] args) throws Exception {
 addInputOption();
 addOutputOption();
 addOption(DefaultOptionCreator.methodOption().create());
 addOption(DefaultOptionCreator.overwriteOption().create());
 if (parseArguments(args) == null) {
  return -1;
 }
 Path input = getInputPath();
 Path output = getOutputPath();
 if (hasOption(DefaultOptionCreator.OVERWRITE_OPTION)) {
  HadoopUtil.delete(getConf(), output);
 }
 boolean runSequential = getOption(DefaultOptionCreator.METHOD_OPTION).equalsIgnoreCase(
     DefaultOptionCreator.SEQUENTIAL_METHOD);
 run(input, output, runSequential);
 return 0;
}

public static void main(String[] args) throws Exception {
 ToolRunner.run(new Configuration(), new ClusterOutputPostProcessorDriver(), args);
}

/**
 * CLI to run clustering post processor. The input to post processor is the ouput path specified to the
 * clustering.
 */
@Override
public int run(String[] args) throws Exception {
 addInputOption();
 addOutputOption();
 addOption(DefaultOptionCreator.methodOption().create());
 addOption(DefaultOptionCreator.overwriteOption().create());
 if (parseArguments(args) == null) {
  return -1;
 }
 Path input = getInputPath();
 Path output = getOutputPath();
 if (hasOption(DefaultOptionCreator.OVERWRITE_OPTION)) {
  HadoopUtil.delete(getConf(), output);
 }
 boolean runSequential = getOption(DefaultOptionCreator.METHOD_OPTION).equalsIgnoreCase(
     DefaultOptionCreator.SEQUENTIAL_METHOD);
 run(input, output, runSequential);
 return 0;
}

public static void main(String[] args) throws Exception {
 ToolRunner.run(new Configuration(), new ClusterOutputPostProcessorDriver(), args);
}

/**
 * CLI to run clustering post processor. The input to post processor is the ouput path specified to the
 * clustering.
 */
@Override
public int run(String[] args) throws Exception {
 addInputOption();
 addOutputOption();
 addOption(DefaultOptionCreator.methodOption().create());
 addOption(DefaultOptionCreator.overwriteOption().create());
 if (parseArguments(args) == null) {
  return -1;
 }
 Path input = getInputPath();
 Path output = getOutputPath();
 if (hasOption(DefaultOptionCreator.OVERWRITE_OPTION)) {
  HadoopUtil.delete(getConf(), output);
 }
 boolean runSequential = getOption(DefaultOptionCreator.METHOD_OPTION).equalsIgnoreCase(
     DefaultOptionCreator.SEQUENTIAL_METHOD);
 run(input, output, runSequential);
 return 0;
}

/**
 * Post processes the output of clustering algorithms and groups them into respective clusters. Each
 * cluster's vectors are written into a directory named after its clusterId.
 *
 * @param input         The output path provided to the clustering algorithm, whose would be post processed. Hint: The
 *                      path of the directory containing clusters-*-final and clusteredPoints.
 * @param output        The post processed data would be stored at this path.
 * @param runSequential If set to true, post processes it sequentially, else, uses. MapReduce. Hint: If the clustering
 *                      was done sequentially, make it sequential, else vice versa.
 */
public static void run(Path input, Path output, boolean runSequential) throws IOException,
    InterruptedException,
    ClassNotFoundException {
 if (runSequential) {
  postProcessSeq(input, output);
 } else {
  Configuration conf = new Configuration();
  postProcessMR(conf, input, output);
  movePartFilesToRespectiveDirectories(conf, output);
 }
}

public static void main(String[] args) throws Exception {
 ToolRunner.run(new Configuration(), new ClusterOutputPostProcessorDriver(), args);
}

/**
 * Post processes the output of clustering algorithms and groups them into respective clusters. Each
 * cluster's vectors are written into a directory named after its clusterId.
 *
 * @param input         The output path provided to the clustering algorithm, whose would be post processed. Hint: The
 *                      path of the directory containing clusters-*-final and clusteredPoints.
 * @param output        The post processed data would be stored at this path.
 * @param runSequential If set to true, post processes it sequentially, else, uses. MapReduce. Hint: If the clustering
 *                      was done sequentially, make it sequential, else vice versa.
 */
public static void run(Path input, Path output, boolean runSequential) throws IOException,
    InterruptedException,
    ClassNotFoundException {
 if (runSequential) {
  postProcessSeq(input, output);
 } else {
  Configuration conf = new Configuration();
  postProcessMR(conf, input, output);
  movePartFilesToRespectiveDirectories(conf, output);
 }
}

Javadoc

Post processes the output of clustering algorithms and groups them into respective clusters. Ideal to be used for top down clustering. It can also be used if the clustering output needs to be grouped into their respective clusters.

Most used methods

<init>
Constructor to be used by the ToolRunner.
addInputOption
addOption
addOutputOption
getConf
getInputPath
getOption
getOutputPath
hasOption
movePartFilesToRespectiveDirectories
The mapreduce version of the post processor writes different clusters into different part files. Thi
parseArguments
postProcessMR
Process as a map reduce job. The numberOfReduceTasks is set to the number of clusters present in the

Popular in Java

Reactive rest calls using spring rest template
startActivity (Activity)
scheduleAtFixedRate (Timer)
runOnUiThread (Activity)
BufferedReader (java.io)
Wraps an existing Reader and buffers the input. Expensive interaction with the underlying reader is
Runnable (java.lang)
Represents a command that can be executed. Often used to run code in a different Thread.
SecureRandom (java.security)
This class generates cryptographically secure pseudo-random numbers. It is best to invoke SecureRand
Vector (java.util)
Vector is an implementation of List, backed by an array and synchronized. All optional operations in
ExecutorService (java.util.concurrent)
An Executor that provides methods to manage termination and methods that can produce a Future for tr
Pattern (java.util.regex)
Patterns are compiled regular expressions. In many cases, convenience methods such as String#matches
Top Sublime Text plugins

How to useClusterOutputPostProcessorDriver in org.apache.mahout.clustering.topdown.postprocessor

Best Java code snippets using org.apache.mahout.clustering.topdown.postprocessor.ClusterOutputPostProcessorDriver (Showing top 9 results out of 315)

How to use
ClusterOutputPostProcessorDriver
in
org.apache.mahout.clustering.topdown.postprocessor