Follow Techotopia on Twitter

On-line Guides
All Guides
eBook Store
iOS / Android
Linux for Beginners
Office Productivity
Linux Installation
Linux Security
Linux Utilities
Linux Virtualization
Linux Kernel
System/Network Admin
Programming
Scripting Languages
Development Tools
Web Development
GUI Toolkits/Desktop
Databases
Mail Systems
openSolaris
Eclipse Documentation
Techotopia.com
Virtuatopia.com
Answertopia.com

How To Guides
Virtualization
General System Admin
Linux Security
Linux Filesystems
Web Servers
Graphics & Desktop
PC Hardware
Windows
Problem Solutions
Privacy Policy

  




 

 

6.3.1 Selecting the conversion and its properties

We already said above that the currently selected locale for the LC_CTYPE category decides about the conversion that is performed by the functions we are about to describe. Each locale uses its own character set (given as an argument to localedef) and this is the one assumed as the external multibyte encoding. The wide character character set always is UCS-4, at least on GNU systems.

A characteristic of each multibyte character set is the maximum number of bytes that can be necessary to represent one character. This information is quite important when writing code that uses the conversion functions (as shown in the examples below). The ISO C standard defines two macros that provide this information.

— Macro: int MB_LEN_MAX

MB_LEN_MAX specifies the maximum number of bytes in the multibyte sequence for a single character in any of the supported locales. It is a compile-time constant and is defined in limits.h.

— Macro: int MB_CUR_MAX

MB_CUR_MAX expands into a positive integer expression that is the maximum number of bytes in a multibyte character in the current locale. The value is never greater than MB_LEN_MAX. Unlike MB_LEN_MAX this macro need not be a compile-time constant, and in the GNU C library it is not.

MB_CUR_MAX is defined in stdlib.h.

Two different macros are necessary since strictly ISO C90 compilers do not allow variable length array definitions, but still it is desirable to avoid dynamic allocation. This incomplete piece of code shows the problem:

     {
       char buf[MB_LEN_MAX];
       ssize_t len = 0;
     
       while (! feof (fp))
         {
           fread (&buf[len], 1, MB_CUR_MAX - len, fp);
           /* ... process buf */
           len -= used;
         }
     }

The code in the inner loop is expected to have always enough bytes in the array buf to convert one multibyte character. The array buf has to be sized statically since many compilers do not allow a variable size. The fread call makes sure that MB_CUR_MAX bytes are always available in buf. Note that it isn't a problem if MB_CUR_MAX is not a compile-time constant.


 
 
  Published under the terms of the GNU General Public License Design by Interspire