How to use
MemoryPostings
in
org.terrier.structures.indexing.singlepass

Best Java code snippets using org.terrier.structures.indexing.singlepass.MemoryPostings (Showing top 8 results out of 315)

/**
 * Hook method that creates the right type of MemoryPostings class.
 */
protected void createMemoryPostings(){
  if (useFieldInformation)
    mp = new FieldsMemoryPostings();
  else
    mp = new MemoryPostings();
}

/**
 * Add the terms in a DocumentPostingList to the postings in memory.
 * @param docPostings DocumentPostingList containing the term information for the denoted document.
 * @param docid Current document Identifier. 
 * @throws IOException if an I/O error occurs.
 */
public void addTerms(DocumentPostingList docPostings, int docid) throws IOException {
  for (String term : docPostings.termSet())
    add(term, docid, docPostings.getFrequency(term));
}

/**
 * Triggers the writing of the postings in memory to disk. 
 * Uses the default RunWriter, writing to the specified files.
 * @param file name of the file to write the postings.
 * @throws IOException if an I/O error occurs.
 */
public void finish(String[] file) throws IOException{	
  finish(new RunWriter(file[0], file[1]));
}

/**
 * {@inheritDoc}.
 * This implementation only places content in the runs in memory, which will eventually be flushed to disk.
 */
@Override
protected void indexDocument(Map<String,String> docProperties, DocumentPostingList termsInDocument) throws Exception
{
  if (seenDocnos.contains(docProperties.get("docno"))) return;
  else seenDocnos.add(docProperties.get("docno"));
  
  if (termsInDocument.getDocumentLength() > 0) {
    numberOfDocsSinceCheck++;
    numberOfDocsSinceFlush++;
    
    checkFlush();
    mp.addTerms(termsInDocument, currentId);
    DocumentIndexEntry die = termsInDocument.getDocumentStatistics();
    docIndexBuilder.addEntryToBuffer((FieldScore.FIELDS_COUNT > 0) ? die : new SimpleDocumentIndexEntry(die));
    metaBuilder.writeDocumentEntry(docProperties);
    currentId++;
    numberOfDocuments++;
  }
}

/** Triggers the writing of the postings in memory to the specified 
 * RunWriter. If the RunWriter requires that terms are written in order,
 * then this will happen.
 * @param runWriter
 * @throws IOException
 */
public void finish(RunWriter runWriter) throws IOException {
  logger.debug("Writing run "+runWriter.toString());
  //only sort the postings if required by the RunWriter
  writeToWriter(runWriter, runWriter.writeSorted() 
      ? new TreeMap<String, Posting>(postings)
      : postings);
  logger.debug(" done");
}

  return;
numberOfDocsSinceCheck = 0;
final long consumed = mp.getMemoryConsumption();
boolean doFlush = false;
final boolean memCheck = memoryCheck.checkMemory();

/**
 * {@inheritDoc}.
 * This implementation only places content in the runs in memory, which will eventually be flushed to disk.
 */
@Override
protected void indexDocument(Map<String,String> docProperties, DocumentPostingList termsInDocument) throws Exception
{
  if (termsInDocument.getDocumentLength() > 0) {
    numberOfDocsSinceCheck++;
    numberOfDocsSinceFlush++;
    
    checkFlush();
    mp.addTerms(termsInDocument, currentId);
    DocumentIndexEntry die = termsInDocument.getDocumentStatistics();
    docIndexBuilder.addEntryToBuffer((FieldScore.FIELDS_COUNT > 0) ? die : new SimpleDocumentIndexEntry(die));
    metaBuilder.writeDocumentEntry(docProperties);
    currentId++;
    numberOfDocuments++;
  }
}

@edu.umd.cs.findbugs.annotations.SuppressWarnings(
    value="DM_GC",
    justification="Forcing GC is an essential part of releasing" +
        "memory for further indexing")
/** causes the posting lists built up in memory to be flushed out */
protected void forceFlush() throws IOException
{    
  mp.finish(finishMemoryPosting());
  System.gc();
  createMemoryPostings();
  memoryCheck.reset();
  numberOfDocsSinceFlush = 0;	
}

Javadoc

Class for handling Simple posting lists in memory while indexing.

Most used methods

<init>
add
Adds an occurrence of a term in a document to the posting in memory.
addTerms
Add the terms in a DocumentPostingList to the postings in memory.
finish
Triggers the writing of the postings in memory to disk. Uses the default RunWriter, writing to the s
getMemoryConsumption
Returns the number of bytes consumed by this set of postings
writeToWriter
Writes the contents of the postings in memory to disk.

Popular in Java

Running tasks concurrently on multiple threads
getSupportFragmentManager (FragmentActivity)
setContentView (Activity)
putExtra (Intent)
URL (java.net)
A Uniform Resource Locator that identifies the location of an Internet resource as specified by RFC
Charset (java.nio.charset)
A charset is a named mapping between Unicode characters and byte sequences. Every Charset can decode
Arrays (java.util)
This class contains various methods for manipulating arrays (such as sorting and searching). This cl
LinkedHashMap (java.util)
LinkedHashMap is an implementation of Map that guarantees iteration order. All optional operations a
Pattern (java.util.regex)
Patterns are compiled regular expressions. In many cases, convenience methods such as String#matches
FlowLayout (java.awt)
A flow layout arranges components in a left-to-right flow, much like lines of text in a paragraph. F
Github Copilot alternatives

How to useMemoryPostings in org.terrier.structures.indexing.singlepass

Best Java code snippets using org.terrier.structures.indexing.singlepass.MemoryPostings (Showing top 8 results out of 315)

How to use
MemoryPostings
in
org.terrier.structures.indexing.singlepass