Transforms for creating
PCollectionView from
PCollection (to read them as side inputs).
While a
PCollection has many values of type
ElemT per
window, a
PCollectionView has a single value of type
ViewT for each window. It can be thought of as a mapping from windows to values of type
ViewT. The transforms here represent ways of converting the
ElemT values in a window
into a
ViewT for that window.
When a
ParDo tranform is processing a main input element in a window
w and a
PCollectionView is read via
DoFn.ProcessContext#sideInput, the value of the view
for
w is returned.
The SDK supports viewing a
PCollection, per window, as a single value, a
List,
an
Iterable, a
Map, or a multimap (iterable-valued
Map).
For a
PCollection that contains a single value of type
T per window, such as
the output of
Combine#globally, use
View#asSingleton() to prepare it for use as a
side input:
PCollectionView output = someOtherPCollection
For a small
PCollection with windows that can fit entirely in memory, use
View#asList() to prepare it for use as a
List. When read as a side input, the entire
list for a window will be cached in memory.
PCollectionView> output =
If a
PCollection of
KV is known to have a single value per window for
each key, then use
View#asMap() to view it as a
Map:
PCollectionView output =
Otherwise, to access a
PCollection of
KV as a
Map
PCollectionView> output =
To iterate over an entire window of a
PCollection via side input, use
View#asIterable():
PCollectionView> output =
Both
View#asMultimap() and
View#asMap() are useful for implementing lookup
based "joins" with the main input, when the side input is small enough to fit into memory.
For example, if you represent a page on a website via some
Page object and have some
type
UrlVisits logging that a URL was visited, you could convert these to more fully
structured
PageVisit objects using a side input, something like the following:
PCollection pages = ... // pages fit into memory{@literal @}Override
void processElement(ProcessContext context)
UrlVisit urlVisit = context.element();
Map urlToPage = context.sideInput(urlToPageView);
Page page = urlToPage.get(urlVisit.getUrl());
c.output(new PageVisit(page, urlVisit.getVisitData()));
}
}));
}
See
ParDo.SingleOutput#withSideInputs for details on how to access this variable
inside a
ParDo over another
PCollection.