org.nd4j.linalg.dataset.DataSet.splitTestAndTrain java code examples

/**
 * Splits a dataset in to test and train randomly.
 * This will modify the dataset in place to shuffle it before splitting into test/train!
 *
 * @param numHoldout the number to hold out for training
 * @param  rng Random Number Generator to use to shuffle the dataset
 * @return the pair of datasets for the train test split
 */
@Override
public SplitTestAndTrain splitTestAndTrain(int numHoldout, Random rng) {
  long seed = rng.nextLong();
  this.shuffle(seed);
  return splitTestAndTrain(numHoldout);
}

@Override
public SplitTestAndTrain splitTestAndTrain(double fractionTrain) {
  Preconditions.checkArgument(fractionTrain > 0.0 && fractionTrain < 1.0,
      "Train fraction must be > 0.0 and < 1.0 - got %s", fractionTrain);
  int numTrain = (int) (fractionTrain * numExamples());
  if (numTrain <= 0)
    numTrain = 1;
  return splitTestAndTrain(numTrain);
}

@Override
public SplitTestAndTrain splitTestAndTrain(double percentTrain) {
  int numPercent = (int) (percentTrain * numExamples());
  if (numPercent <= 0)
    numPercent = 1;
  return splitTestAndTrain(numPercent);
}

/**
 * Splits a dataset in to test and train randomly.
 * This will modify the dataset in place to shuffle it before splitting into test/train!
 *
 * @param numHoldout the number to hold out for training
 * @param  rng Random Number Generator to use to shuffle the dataset
 * @return the pair of datasets for the train test split
 */
@Override
public SplitTestAndTrain splitTestAndTrain(int numHoldout, Random rng) {
  long seed = rng.nextLong();
  this.shuffle(seed);
  return splitTestAndTrain(numHoldout);
}

DataSet allData = iterator.next();
allData.shuffle(seed);
SplitTestAndTrain testAndTrain = allData.splitTestAndTrain(trainPercent);  //Use 65% of data for training

private void createDataSource() throws IOException, InterruptedException {
  //First: get the dataset using the record reader. CSVRecordReader handles loading/parsing
  int numLinesToSkip = 0;
  String delimiter = ",";
  RecordReader recordReader = new CSVRecordReader(numLinesToSkip, delimiter);
  recordReader.initialize(new InputStreamInputSplit(dataFile));
  //Second: the RecordReaderDataSetIterator handles conversion to DataSet objects, ready for use in neural network
  int labelIndex = 11;
  DataSetIterator iterator = new RecordReaderDataSetIterator(recordReader, batchSize, labelIndex, labelIndex, true);
  DataSet allData = iterator.next();
  SplitTestAndTrain testAndTrain = allData.splitTestAndTrain(0.80);  //Use 80% of data for training
  trainingData = testAndTrain.getTrain();
  testData = testAndTrain.getTest();
  //We need to normalize our data. We'll use NormalizeStandardize (which gives us mean 0, unit variance):
  DataNormalization normalizer = new NormalizerStandardize();
  normalizer.fit(trainingData);           //Collect the statistics (mean/stdev) from the training data. This does not modify the input data
  normalizer.transform(trainingData);     //Apply normalization to the training data
  normalizer.transform(testData);         //Apply normalization to the test data. This is using statistics calculated from the *training* set
}

private void createDataSource() throws IOException, InterruptedException {
  //First: get the dataset using the record reader. CSVRecordReader handles loading/parsing
  int numLinesToSkip = 0;
  String delimiter = ",";
  RecordReader recordReader = new CSVRecordReader(numLinesToSkip, delimiter);
  recordReader.initialize(new InputStreamInputSplit(dataFile));
  //Second: the RecordReaderDataSetIterator handles conversion to DataSet objects, ready for use in neural network
  int labelIndex = 4;     //5 values in each row of the iris.txt CSV: 4 input features followed by an integer label (class) index. Labels are the 5th value (index 4) in each row
  int numClasses = 3;     //3 classes (types of iris flowers) in the iris data set. Classes have integer values 0, 1 or 2
  DataSetIterator iterator = new RecordReaderDataSetIterator(recordReader, batchSize, labelIndex, numClasses);
  DataSet allData = iterator.next();
  allData.shuffle();
  SplitTestAndTrain testAndTrain = allData.splitTestAndTrain(0.80);  //Use 80% of data for training
  trainingData = testAndTrain.getTrain();
  testData = testAndTrain.getTest();
  //We need to normalize our data. We'll use NormalizeStandardize (which gives us mean 0, unit variance):
  DataNormalization normalizer = new NormalizerStandardize();
  normalizer.fit(trainingData);           //Collect the statistics (mean/stdev) from the training data. This does not modify the input data
  normalizer.transform(trainingData);     //Apply normalization to the training data
  normalizer.transform(testData);         //Apply normalization to the test data. This is using statistics calculated from the *training* set
}

SplitTestAndTrain testAndTrain = allData.splitTestAndTrain(0.80);

Javadoc

Splits a dataset in to test and train

Popular methods of DataSet

<init>
Create a dataset with the specified input INDArray and labels (output) INDArray, plus (optionally) m
getLabels
Returns the labels for the dataset
getFeatures
getFeatureMatrix
Get the feature matrix (inputs for the data)
numExamples
get
Gets a copy of example i
getLabelsMaskArray
merge
numOutcomes
load
numInputs
The number of inputs in the feature matrix
asList

Popular in Java

Start an intent from android
orElseThrow (Optional)
Return the contained value, if present, otherwise throw an exception to be created by the provided s
getExternalFilesDir (Context)
startActivity (Activity)
PrintWriter (java.io)
Wraps either an existing OutputStream or an existing Writerand provides convenience methods for prin
Path (java.nio.file)
List (java.util)
An ordered collection (also known as a sequence). The user of this interface has precise control ove
Handler (java.util.logging)
A Handler object accepts a logging request and exports the desired messages to a target, for example
IOUtils (org.apache.commons.io)
General IO stream manipulation utilities. This class provides static utility methods for input/outpu
Component (java.awt)
A component is an object having a graphical representation that can be displayed on the screen and t
From CI to AI: The AI layer in your organization

How to use splitTestAndTrainmethodin org.nd4j.linalg.dataset.DataSet

Best Java code snippets using org.nd4j.linalg.dataset.DataSet.splitTestAndTrain (Showing top 8 results out of 315)

How to use
splitTestAndTrain
method
in
org.nd4j.linalg.dataset.DataSet