Unicode Sorting Test Page

Home > Test menu

This page provides a means to sort a list of Unicode strings into language-sensitive order using the Java language version ("icu4j") of the IBM International Components for Unicode (ICU) library. The strings are sorted based on a "locale" (a designation for a particular laguage or a variant) and a comparison level. The levels supported here (Primary, Secondary, Tertiary, Quaternary and Identical) correspond to levels "L1" through "Ln" as described in Unicode Technical Standard #10 - Unicode Collation Algorithm.

Input a series of lines into the "Source" field, optionally select a locale and click on the button corresponding to the comparison level you want. The lines will be returned in sorted order with the numbers in the first column indicating the sequence number for each line. If two or more sequence numbers are the same, it means that those lines sorted equally. The source string may contain numeric character entities of the form &#DECIMAL; or &#xHEX; where DECIMAL or HEX is a decimal or hexadecimal number, respectively. Please note: blank input lines are ignored but any spaces on non-blank lines will affect the sorting results.

Sorting

Locale: 
Source lines: 
Sorted results: