Unicode Collation Test Page

Home > Test menu

This page provides a means to convert a string of Unicode characters into a binary collation key using the Java language version ("icu4j") of the IBM International Components for Unicode (ICU) library. A collation key is the basis for sorting and comparing strings in a language-sensitive Unicode environment. A collation key is built using a "locale" (a designation for a particular laguage or a variant) and a comparison level. The levels supported here (Primary, Secondary, Tertiary, Quaternary and Identical) correspond to levels "L1" through "Ln" as described in Unicode Technical Standard #10 - Unicode Collation Algorithm. When comparing collation keys for two different strings, both keys must have been created using the same locale and comparison level in order to be meaningful. The two keys are compared from left to right, byte for byte until one of the bytes is not equal to the other. Whichever byte is numerically less than the other causes the source string for that collation key to sort before the other string.

Input a string into the "Source" field, optionally select a locale and click on the button corresponding to the comparison level you want. The result will be displayed in hex. The source string may contain numeric character entities of the form &#DECIMAL; or &#xHEX; where DECIMAL or HEX is a decimal or hexadecimal number, respectively.

Collation

Locale: 
Source string: 
Collation key: