Unicode Normalization Test Page

Unicode Normalization Test Page

Home > Test menu

This page provides a means to normalize a string of Unicode characters using the Java language version ("icu4j") of the IBM International Components for Unicode (ICU) library. The library supports the standard normalization forms described in Unicode Standard Annex #15 - Unicode Normalization Forms.

Input a string into the "Source" field and click on the button corresponding to the type of normalization you want. The result will be displayed as a string and in hex. The source string may contain numeric character entities of the form &#DECIMAL; or &#xHEX; where DECIMAL or HEX is a decimal or hexadecimal number, respectively. For example, try Á which is capital A with an acute accent.

Normalizer

Source string:

Result string:

Result in hex:

NFC - Canonical decomposition followed by canonical composition.
NFD - Canonical decomposition.
NFKC - Compatibility decomposition followed by canonical composition.
NFKD - Compatibility decomposition.