org.apache.hadoop.util.bloom.DynamicBloomFilter java code examples

private synchronized void initBloomFilter(Configuration conf) {
 numKeys = conf.getInt(
   IO_MAPFILE_BLOOM_SIZE_KEY, IO_MAPFILE_BLOOM_SIZE_DEFAULT);
 // vector size should be <code>-kn / (ln(1 - c^(1/k)))</code> bits for
 // single key, where <code> is the number of hash functions,
 // <code>n</code> is the number of keys and <code>c</code> is the desired
 // max. error rate.
 // Our desired error rate is by default 0.005, i.e. 0.5%
 float errorRate = conf.getFloat(
   IO_MAPFILE_BLOOM_ERROR_RATE_KEY, IO_MAPFILE_BLOOM_ERROR_RATE_DEFAULT);
 vectorSize = (int)Math.ceil((double)(-HASH_COUNT * numKeys) /
   Math.log(1.0 - Math.pow(errorRate, 1.0/HASH_COUNT)));
 bloomFilter = new DynamicBloomFilter(vectorSize, HASH_COUNT,
   Hash.getHashType(conf), numKeys);
}

@Override
public synchronized void append(WritableComparable key, Writable val)
  throws IOException {
 super.append(key, val);
 buf.reset();
 key.write(buf);
 bloomKey.set(byteArrayForBloomKey(buf), 1.0);
 bloomFilter.add(bloomKey);
}

@Override
public void add(Key key) {
 if (key == null) {
  throw new NullPointerException("Key can not be null");
 }
 BloomFilter bf = getActiveStandardBF();
 if (bf == null) {
  addRow();
  bf = matrix[matrix.length - 1];
  currentNbRecord = 0;
 }
 bf.add(key);
 currentNbRecord++;
}

private void initBloomFilter(Path dirName, 
               Configuration conf) {
 
 DataInputStream in = null;
 try {
  FileSystem fs = dirName.getFileSystem(conf);
  in = fs.open(new Path(dirName, BLOOM_FILE_NAME));
  bloomFilter = new DynamicBloomFilter();
  bloomFilter.readFields(in);
  in.close();
  in = null;
 } catch (IOException ioe) {
  LOG.warn("Can't open BloomFilter: " + ioe + " - fallback to MapFile.");
  bloomFilter = null;
 } finally {
  IOUtils.closeStream(in);
 }
}

@Override
public synchronized void close() throws IOException {
 super.close();
 DataOutputStream out = fs.create(new Path(dir, BLOOM_FILE_NAME), true);
 try {
  bloomFilter.write(out);
  out.flush();
  out.close();
  out = null;
 } finally {
  IOUtils.closeStream(out);
 }
}

/**
 * Checks if this MapFile has the indicated key. The membership test is
 * performed using a Bloom filter, so the result has always non-zero
 * probability of false positives.
 * @param key key to check
 * @return  false iff key doesn't exist, true if key probably exists.
 * @throws IOException
 */
public boolean probablyHasKey(WritableComparable key) throws IOException {
 if (bloomFilter == null) {
  return true;
 }
 buf.reset();
 key.write(buf);
 bloomKey.set(byteArrayForBloomKey(buf), 1.0);
 return bloomFilter.membershipTest(bloomKey);
}

private void initBloomFilter(FileSystem fs, String dirName,
  Configuration conf) {
 try {
  DataInputStream in = fs.open(new Path(dirName, BLOOM_FILE_NAME));
  bloomFilter = new DynamicBloomFilter();
  bloomFilter.readFields(in);
  in.close();
 } catch (IOException ioe) {
  LOG.warn("Can't open BloomFilter: " + ioe + " - fallback to MapFile.");
  bloomFilter = null;
 }
}

@Override
public synchronized void close() throws IOException {
 super.close();
 DataOutputStream out = fs.create(new Path(dir, BLOOM_FILE_NAME), true);
 bloomFilter.write(out);
 out.flush();
 out.close();
}

/**
 * Checks if this MapFile has the indicated key. The membership test is
 * performed using a Bloom filter, so the result has always non-zero
 * probability of false positives.
 * @param key key to check
 * @return  false iff key doesn't exist, true if key probably exists.
 * @throws IOException
 */
public boolean probablyHasKey(WritableComparable key) throws IOException {
 if (bloomFilter == null) {
  return true;
 }
 buf.reset();
 key.write(buf);
 bloomKey.set(byteArrayForBloomKey(buf), 1.0);
 return bloomFilter.membershipTest(bloomKey);
}

@Nonnull
public static DynamicBloomFilter newDynamicBloomFilter(
    @Nonnegative final int expectedNumberOfElements, @Nonnegative final float errorRate,
    @Nonnegative final int nbHash) {
  int vectorSize = (int) Math.ceil((-nbHash * expectedNumberOfElements)
      / Math.log(1.d - Math.pow(errorRate, 1.d / nbHash)));
  return new DynamicBloomFilter(vectorSize, nbHash, Hash.MURMUR_HASH,
    expectedNumberOfElements);
}

private void initBloomFilter(Path dirName, 
               Configuration conf) {
 
 DataInputStream in = null;
 try {
  FileSystem fs = dirName.getFileSystem(conf);
  in = fs.open(new Path(dirName, BLOOM_FILE_NAME));
  bloomFilter = new DynamicBloomFilter();
  bloomFilter.readFields(in);
  in.close();
  in = null;
 } catch (IOException ioe) {
  LOG.warn("Can't open BloomFilter: " + ioe + " - fallback to MapFile.");
  bloomFilter = null;
 } finally {
  IOUtils.closeStream(in);
 }
}

@Override
public void add(Key key) {
 if (key == null) {
  throw new NullPointerException("Key can not be null");
 }
 BloomFilter bf = getActiveStandardBF();
 if (bf == null) {
  addRow();
  bf = matrix[matrix.length - 1];
  currentNbRecord = 0;
 }
 bf.add(key);
 currentNbRecord++;
}

@Override
public synchronized void append(WritableComparable key, Writable val)
  throws IOException {
 super.append(key, val);
 buf.reset();
 key.write(buf);
 bloomKey.set(byteArrayForBloomKey(buf), 1.0);
 bloomFilter.add(bloomKey);
}

@Override
public synchronized void close() throws IOException {
 super.close();
 DataOutputStream out = fs.create(new Path(dir, BLOOM_FILE_NAME), true);
 try {
  bloomFilter.write(out);
  out.flush();
  out.close();
  out = null;
 } finally {
  IOUtils.closeStream(out);
 }
}

/**
 * Checks if this MapFile has the indicated key. The membership test is
 * performed using a Bloom filter, so the result has always non-zero
 * probability of false positives.
 * @param key key to check
 * @return  false iff key doesn't exist, true if key probably exists.
 * @throws IOException
 */
public boolean probablyHasKey(WritableComparable key) throws IOException {
 if (bloomFilter == null) {
  return true;
 }
 buf.reset();
 key.write(buf);
 bloomKey.set(byteArrayForBloomKey(buf), 1.0);
 return bloomFilter.membershipTest(bloomKey);
}

private synchronized void initBloomFilter(Configuration conf) {
 numKeys = conf.getInt("io.mapfile.bloom.size", 1024 * 1024);
 // vector size should be <code>-kn / (ln(1 - c^(1/k)))</code> bits for
 // single key, where <code> is the number of hash functions,
 // <code>n</code> is the number of keys and <code>c</code> is the desired
 // max. error rate.
 // Our desired error rate is by default 0.005, i.e. 0.5%
 float errorRate = conf.getFloat("io.mapfile.bloom.error.rate", 0.005f);
 vectorSize = (int)Math.ceil((double)(-HASH_COUNT * numKeys) /
   Math.log(1.0 - Math.pow(errorRate, 1.0/HASH_COUNT)));
 bloomFilter = new DynamicBloomFilter(vectorSize, HASH_COUNT,
   Hash.getHashType(conf), numKeys);
}

private void initBloomFilter(Path dirName, 
               Configuration conf) {
 
 DataInputStream in = null;
 try {
  FileSystem fs = dirName.getFileSystem(conf);
  in = fs.open(new Path(dirName, BLOOM_FILE_NAME));
  bloomFilter = new DynamicBloomFilter();
  bloomFilter.readFields(in);
  in.close();
  in = null;
 } catch (IOException ioe) {
  LOG.warn("Can't open BloomFilter: " + ioe + " - fallback to MapFile.");
  bloomFilter = null;
 } finally {
  IOUtils.closeStream(in);
 }
}

@Override
public void add(Key key) {
 if (key == null) {
  throw new NullPointerException("Key can not be null");
 }
 BloomFilter bf = getActiveStandardBF();
 if (bf == null) {
  addRow();
  bf = matrix[matrix.length - 1];
  currentNbRecord = 0;
 }
 bf.add(key);
 currentNbRecord++;
}

@Override
public synchronized void append(WritableComparable key, Writable val)
  throws IOException {
 super.append(key, val);
 buf.reset();
 key.write(buf);
 bloomKey.set(byteArrayForBloomKey(buf), 1.0);
 bloomFilter.add(bloomKey);
}

@Override
public synchronized void close() throws IOException {
 super.close();
 DataOutputStream out = fs.create(new Path(dir, BLOOM_FILE_NAME), true);
 try {
  bloomFilter.write(out);
  out.flush();
  out.close();
  out = null;
 } finally {
  IOUtils.closeStream(out);
 }
}

Javadoc

Implements a dynamic Bloom filter, as defined in the INFOCOM 2006 paper.

A dynamic Bloom filter (DBF) makes use of a s * m bit matrix but each of the s rows is a standard Bloom filter. The creation process of a DBF is iterative. At the start, the DBF is a 1 * m bit matrix, i.e., it is composed of a single standard Bloom filter. It assumes that nr elements are recorded in the initial bit vector, where nr (n is the cardinality of the set A to record in the filter).

As the size of A grows during the execution of the application, several keys must be inserted in the DBF. When inserting a key into the DBF, one must first get an active Bloom filter in the matrix. A Bloom filter is active when the number of recorded keys, nr, is strictly less than the current cardinality of A, n. If an active Bloom filter is found, the key is inserted and nr is incremented by one. On the other hand, if there is no active Bloom filter, a new one is created (i.e., a new row is added to the matrix) according to the current size of A and the element is added in this new Bloom filter and the nr value of this new Bloom filter is set to one. A given key is said to belong to the DBF if the k positions are set to one in one of the matrix rows.

Originally created by European Commission One-Lab Project 034819.

Most used methods

<init>
Constructor. Builds an empty Dynamic Bloom filter.
add
addRow
Adds a new row to this dynamic Bloom filter.
getActiveStandardBF
Returns the active standard Bloom filter in this dynamic Bloom filter.
membershipTest
readFields
write

Popular in Java

Updating database using SQL prepared statement
getExternalFilesDir (Context)
getContentResolver (Context)
notifyDataSetChanged (ArrayAdapter)
FileInputStream (java.io)
An input stream that reads bytes from a file. File file = ...finally if (in != null) in.clos
Path (java.nio.file)
AtomicInteger (java.util.concurrent.atomic)
An int value that may be updated atomically. See the java.util.concurrent.atomic package specificati
Cipher (javax.crypto)
This class provides access to implementations of cryptographic ciphers for encryption and decryption
JCheckBox (javax.swing)
Reflections (org.reflections)
Reflections one-stop-shop objectReflections scans your classpath, indexes the metadata, allows you t
Top PhpStorm plugins

How to useDynamicBloomFilter in org.apache.hadoop.util.bloom

Best Java code snippets using org.apache.hadoop.util.bloom.DynamicBloomFilter (Showing top 20 results out of 315)

How to use
DynamicBloomFilter
in
org.apache.hadoop.util.bloom