Handling Multibyte and Varying-Width Characters
sdiff treat each line of
input as a string of unibyte characters. This can mishandle multibyte
characters in some cases. For example, when asked to ignore spaces,
diff does not properly ignore a multibyte space character.
diff currently assumes that each byte is one column
wide, and this assumption is incorrect in some locales, e.g., locales
that use UTF-8 encoding. This causes problems with the
--side-by-side option of
These problems need to be fixed without unduly affecting the
performance of the utilities in unibyte environments.
The IBM GNU/Linux Technology Center Internationalization Team has
proposed some patches to support internationalized
Unfortunately, these patches are incomplete and are to an older
diff, so more work needs to be done in this area.