RuleBasedCollator is a concrete subclass of Collator. It allows customization of the Collator via user-specified rule
sets. RuleBasedCollator is designed to be fully compliant to the Unicode Collation Algorithm (UCA) and conforms to ISO 14651.
A Collator is thread-safe only when frozen. See {
#isFrozen() and
com.ibm.icu.util.Freezable.
Users are strongly encouraged to read the User
Guide for more information about the collation service before using this class.
Create a RuleBasedCollator from a locale by calling the getInstance(Locale) factory method in the base class
Collator. Collator.getInstance(Locale) creates a RuleBasedCollator object based on the collation rules defined by the
argument locale. If a customized collation ordering or attributes is required, use the RuleBasedCollator(String)
constructor with the appropriate rules. The customized RuleBasedCollator will base its ordering on the CLDR root collation, while
re-adjusting the attributes and orders of the characters in the specified rule accordingly.
RuleBasedCollator provides correct collation orders for most locales supported in ICU. If specific data for a locale
is not available, the orders eventually falls back to the
CLDR root sort order.
For information about the collation rule syntax and details about customization, please refer to the Collation customization section of the
User Guide.
Note that there are some differences between the Collation rule syntax used in Java and ICU4J:
- According to the JDK documentation:
Modifier '!' : Turns on Thai/Lao vowel-consonant swapping. If this rule is in force when a Thai vowel of the range
\U0E40-\U0E44 precedes a Thai consonant of the range \U0E01-\U0E2E OR a Lao vowel of the range
\U0EC0-\U0EC4 precedes a Lao consonant of the range \U0E81-\U0EAE then the vowel is placed after the
consonant for collation purposes.
If a rule is without the modifier '!', the Thai/Lao vowel-consonant swapping is not turned on.
ICU4J's RuleBasedCollator does not support turning off the Thai/Lao vowel-consonant swapping, since the UCA clearly
states that it has to be supported to ensure a correct sorting order. If a '!' is encountered, it is ignored.
- As mentioned in the documentation of the base class Collator, compatibility decomposition mode is not supported.
Examples
Creating Customized RuleBasedCollators:
String simple = "& a < b < c < d";
RuleBasedCollator simpleCollator = new RuleBasedCollator(simple);
String norwegian = "& a , A < b , B < c , C < d , D < e , E "
+ "< f , F < g , G < h , H < i , I < j , "
+ "J < k , K < l , L < m , M < n , N < "
+ "o , O < p , P < q , Q <r , R <s , S < "
+ "t , T < u , U < v , V < w , W < x , X "
+ "< y , Y < z , Z < \u00E5 = a\u030A "
+ ", \u00C5 = A\u030A ; aa , AA < \u00E6 "
+ ", \u00C6 < \u00F8 , \u00D8";
RuleBasedCollator norwegianCollator = new RuleBasedCollator(norwegian);
Concatenating rules to combine
Collator
s:
// Create an en_US Collator object
RuleBasedCollator en_USCollator = (RuleBasedCollator)
Collator.getInstance(new Locale("en", "US", ""));
// Create a da_DK Collator object
RuleBasedCollator da_DKCollator = (RuleBasedCollator)
Collator.getInstance(new Locale("da", "DK", ""));
// Combine the two
// First, get the collation rules from en_USCollator
String en_USRules = en_USCollator.getRules();
// Second, get the collation rules from da_DKCollator
String da_DKRules = da_DKCollator.getRules();
RuleBasedCollator newCollator =
new RuleBasedCollator(en_USRules + da_DKRules);
// newCollator has the combined rules
Making changes to an existing RuleBasedCollator to create a new
Collator
object, by appending changes to
the existing rule:
// Create a new Collator object with additional rules
String addRules = "& C < ch, cH, Ch, CH";
RuleBasedCollator myCollator =
new RuleBasedCollator(en_USCollator.getRules() + addRules);
// myCollator contains the new rules
How to change the order of non-spacing accents:
// old rule with main accents
String oldRules = "= \u0301 ; \u0300 ; \u0302 ; \u0308 "
+ "; \u0327 ; \u0303 ; \u0304 ; \u0305 "
+ "; \u0306 ; \u0307 ; \u0309 ; \u030A "
+ "; \u030B ; \u030C ; \u030D ; \u030E "
+ "; \u030F ; \u0310 ; \u0311 ; \u0312 "
+ "< a , A ; ae, AE ; \u00e6 , \u00c6 "
+ "< b , B < c, C < e, E & C < d , D";
// change the order of accent characters
String addOn = "& \u0300 ; \u0308 ; \u0302";
RuleBasedCollator myCollator = new RuleBasedCollator(oldRules + addOn);
Putting in a new primary ordering before the default setting, e.g. sort English characters before or after Japanese
characters in the Japanese
Collator
:
// get en_US Collator rules
RuleBasedCollator en_USCollator
= (RuleBasedCollator)Collator.getInstance(Locale.US);
// add a few Japanese characters to sort before English characters
// suppose the last character before the first base letter 'a' in
// the English collation rule is \u2212
String jaString = "& \u2212 <\u3041, \u3042 <\u3043, "
+ "\u3044";
RuleBasedCollator myJapaneseCollator
= new RuleBasedCollator(en_USCollator.getRules() + jaString);
This class is not subclassable