summaryrefslogtreecommitdiff
path: root/manual/ctype.texi
diff options
context:
space:
mode:
authorUlrich Drepper <drepper@redhat.com>1999-01-11 20:13:43 +0000
committerUlrich Drepper <drepper@redhat.com>1999-01-11 20:13:43 +0000
commit390955cbdeb674bead490fc3f74a8a0893ea83cf (patch)
tree2900fdc697f52133f633c09edbbe712882736bf0 /manual/ctype.texi
parent68ef28edc2f1bafa417da1ac8d35a3bf2a1b565b (diff)
Update.
1999-01-11 Ulrich Drepper <drepper@cygnus.com> * ctype/Versions [GLIBC_2.0]: Export __ctype32_b. * include/wctype.h: Declare __iswctype. * stdio-common/vfscanf.c (__vfscanf): Use __iswspace instead of iswspace. * wctype/Makefile (routines): Add wcextra_l. * wctype/wcextra.c (iswblank): Implement function here and don't use __iswctype. (__iswblank_l): Move definition to... * wctype/wcextra_l.c: ...here. New file. * wctype/wcfuncs.c: Really implement functions and don't call __iswctype or __towctrans. * wctype/wctype.h: Change isw* and tow* macros. Don't call __iswctype or __towctrans. Instead optimize constant argument case. * iconv/gconv.h: Fix typos. * iconv/skeleton.c: Fix typos. Optimize init function a bit. Correctly emit escape sequence to return to initial state in conversion function. * iconvdata/iso-2022-jp.c (gconv_init): Correctly initialize max_needed_to element. * manual/mbyte.texi: Removed. This is now described in charset.texi. * manual/charset.texi: New file. * manual/Makefile (chapters): Replace mbyte by charset. * manual/ctype.texi: Document wide character functions. * manual/intro.texi: Fix reference to mbyte chapter. * manual/lang.texi: Likewise. * manual/locale.texi: Likewise. * manual/stdio.texi: Likewise. * manual/string.texi: Fix @node line for new charset chapter. * manual/libc.texinfo (UPDATED): Updated. Also update copyright years. * manual/memory.texi (savestring): Optimize code to give a good example. * manual/filesys.texi: Fix wording. Patches by Jim Meyering. * nscd/nscd_getgr_r.c: Include stdint.h to get uintptr_t definition. * nscd/nscd_getpw_r.c: Likewise. * nscd/nscd_gethst_r.c: Likewise. * stdlib/stdtold_l.c: Always include xlocale.h. 1999-01-11 Geoffrey Keating <geoffk@ozemail.com.au> * stdlib/fpioconst.h (LDBL_MAX_10_EXP_LOG): Define to be same as DBL_MAX_10_EXP_LOG if there is no long double. (_fpioconst_pow10): Always use size as LDBL_MAX_10_EXP_LOG to match printf_fp.c. 1999-01-10 Andreas Jaeger <aj@arthur.rhein-neckar.de> * timezone/Makefile ($(testdata)/GB): Changed to ... ($(testdata)/Europe/London): ... for tst-timezone test. ($(objpfx)tst-timezone.out): Change GB to Europe/London. * timezone/tst-timezone.c (main): Enable DST switching test, change GB to Europe/London. 1999-01-10 Philip Blundell <philb@gnu.org> * socket/Makefile (headers): Remove bits/sockunion.h. 1999-01-09 Philip Blundell <philb@gnu.org> * socket/sys/socket.h: Don't include <bits/sockunion.h>. * sysdeps/generic/bits/sockunion.h: Deleted. * sysdeps/unix/sysv/linux/bits/sockunion.h: Likewise. 1999-01-08 H.J. Lu <hjl@gnu.org> * io/fts.c (fts_close): Don't access memory after having it freed.
Diffstat (limited to 'manual/ctype.texi')
-rw-r--r--manual/ctype.texi521
1 files changed, 513 insertions, 8 deletions
diff --git a/manual/ctype.texi b/manual/ctype.texi
index 26e40a1c53..de90acbe67 100644
--- a/manual/ctype.texi
+++ b/manual/ctype.texi
@@ -15,11 +15,20 @@ are affected by the current locale. (More precisely, they are affected
by the locale currently selected for character classification---the
@code{LC_CTYPE} category; see @ref{Locale Categories}.)
-@menu
-* Classification of Characters:: Testing whether characters are
- letters, digits, punctuation, etc.
+The @w{ISO C} standard specifies two different sets of functions. The
+one set works on @code{char} type characters, the other one on
+@code{wchar_t} wide character (@pxref{Extended Char Intro}).
-* Case Conversion:: Case mapping, and the like.
+@menu
+* Classification of Characters:: Testing whether characters are
+ letters, digits, punctuation, etc.
+
+* Case Conversion:: Case mapping, and the like.
+* Classification of Wide Characters:: Character class determination for
+ wide characters.
+* Using Wide Char Classes:: Notes on using the wide character
+ classes.
+* Wide Character Case Conversion:: Mapping of wide characters.
@end menu
@node Classification of Characters, Case Conversion, , Character Handling
@@ -57,14 +66,16 @@ These functions are declared in the header file @file{ctype.h}.
@comment ctype.h
@comment ISO
@deftypefun int islower (int @var{c})
-Returns true if @var{c} is a lower-case letter.
+Returns true if @var{c} is a lower-case letter. The letter need not be
+from the Latin alphabet, any alphabet representable is valid.
@end deftypefun
@cindex upper-case character
@comment ctype.h
@comment ISO
@deftypefun int isupper (int @var{c})
-Returns true if @var{c} is an upper-case letter.
+Returns true if @var{c} is an upper-case letter. The letter need not be
+from the Latin alphabet, any alphabet representable is valid.
@end deftypefun
@cindex alphabetic character
@@ -188,7 +199,7 @@ into the US/UK ASCII character set. This function is a BSD extension
and is also an SVID extension.
@end deftypefun
-@node Case Conversion, , Classification of Characters, Character Handling
+@node Case Conversion, Classification of Wide Characters, Classification of Characters, Character Handling
@section Case Conversion
@cindex character case conversion
@cindex case conversion of characters
@@ -224,7 +235,7 @@ lower-case letter. If @var{c} is not an upper-case letter,
@comment ctype.h
@comment ISO
@deftypefun int toupper (int @var{c})
-If @var{c} is a lower-case letter, @code{tolower} returns the corresponding
+If @var{c} is a lower-case letter, @code{toupper} returns the corresponding
upper-case letter. Otherwise @var{c} is returned unchanged.
@end deftypefun
@@ -249,3 +260,497 @@ with the SVID. @xref{SVID}.@refill
This is identical to @code{toupper}, and is provided for compatibility
with the SVID.
@end deftypefun
+
+
+@node Classification of Wide Characters, Using Wide Char Classes, Case Conversion, Character Handling
+@section Character class determination for wide characters
+
+The second amendment to @w{ISO C89} defines functions to classify wide
+character. Although the original @w{ISO C89} standard already defined
+the type @code{wchar_t} but no functions operating on them were defined.
+
+The general design of the classification functions for wide characters
+is more general. It allows to extend the set of available
+classification beyond the set which is always available. The POSIX
+standard specifies a way how the extension can be done and this is
+already implemented in the GNU C library implementation of the
+@code{localedef} program.
+
+The character class functions are normally implemented using bitsets.
+I.e., for the character in question the appropriate bitset is read from
+a table and a test is performed whether a certain bit is set in this
+bitset. Which bit is tested for is determined by the class.
+
+For the wide character classification functions this is made visible.
+There is a type representing the classification, a function to retrieve
+this value for a specific class, and a function to test using the
+classification value whether a given character is in this class. On top
+of this the normal character classification functions as used for
+@code{char} objects can be defined.
+
+@comment wctype.h
+@comment ISO
+@deftp {Data type} wctype_t
+The @code{wctype_t} can hold a value which represents a character class.
+The ony defined way to generate such a value is by using the
+@code{wctype} function.
+
+@pindex wctype.h
+This type is defined in @file{wctype.h}.
+@end deftp
+
+@comment wctype.h
+@comment ISO
+@deftypefun wctype_t wctype (const char *@var{property})
+The @code{wctype} returns a value representing a class of wide
+characters which is identified by the string @var{property}. Beside
+some standard properties each locale can define its own ones. In case
+no property with the given name is known for the current locale for the
+@code{LC_CTYPE} category the function returns zero.
+
+@noindent
+The properties known in every locale are:
+
+@multitable @columnfractions .25 .25 .25 .25
+@item
+@code{"alnum"} @tab @code{"alpha"} @tab @code{"cntrl"} @tab @code{"digit"}
+@item
+@code{"graph"} @tab @code{"lower"} @tab @code{"print"} @tab @code{"punct"}
+@item
+@code{"space"} @tab @code{"upper"} @tab @code{"xdigit"}
+@end multitable
+
+@pindex wctype.h
+This function is declared in @file{wctype.h}.
+@end deftypefun
+
+To test the membership of a character to one of the non-standard classes
+the @w{ISO C} standard defines a completely new function.
+
+@comment wctype.h
+@comment ISO
+@deftypefun int iswctype (wint_t @var{wc}, wctype_t @var{desc})
+This function returns a nonzero value if @var{wc} is in the character
+class specified by @var{desc}. @var{desc} must previously be returned
+by a successful call to @code{wctype}.
+
+@pindex wctype.h
+This function is declared in @file{wctype.h}.
+@end deftypefun
+
+The make it easier to use the commonly used classification functions
+they are defined in the C library. There is no need to use
+@code{wctype} is the property string is one of the known character
+classes. In some situations it is desirable to construct the property
+string and then it gets important that @code{wctype} can also handle the
+standard classes.
+
+@cindex alphanumeric character
+@comment wctype.h
+@comment ISO
+@deftypefun int iswalnum (wint_t @var{wc})
+This function returns a nonzero value if @var{wc} is an alphanumeric
+character (a letter or number); in other words, if either @code{iswalpha}
+or @code{iswdigit} is true of a character, then @code{iswalnum} is also
+true.
+
+@noindent
+This function can be implemented using
+
+@smallexample
+iswctype (wc, wctype ("alnum"))
+@end smallexample
+
+@pindex wctype.h
+This function is declared in @file{wctype.h}.
+@end deftypefun
+
+@cindex alphabetic character
+@comment wctype.h
+@comment ISO
+@deftypefun int iswalpha (wint_t @var{wc})
+Returns true if @var{wc} is an alphabetic character (a letter). If
+@code{iswlower} or @code{iswupper} is true of a character, then
+@code{iswalpha} is also true.
+
+In some locales, there may be additional characters for which
+@code{iswalpha} is true---letters which are neither upper case nor lower
+case. But in the standard @code{"C"} locale, there are no such
+additional characters.
+
+@noindent
+This function can be implemented using
+
+@smallexample
+iswctype (wc, wctype ("alpha"))
+@end smallexample
+
+@pindex wctype.h
+This function is declared in @file{wctype.h}.
+@end deftypefun
+
+@cindex control character
+@comment wctype.h
+@comment ISO
+@deftypefun int iswcntrl (wint_t @var{wc})
+Returns true if @var{wc} is a control character (that is, a character that
+is not a printing character).
+
+@noindent
+This function can be implemented using
+
+@smallexample
+iswctype (wc, wctype ("cntrl"))
+@end smallexample
+
+@pindex wctype.h
+This function is declared in @file{wctype.h}.
+@end deftypefun
+
+@cindex digit character
+@comment wctype.h
+@comment ISO
+@deftypefun int iswdigit (wint_t @var{wc})
+Returns true if @var{wc} is a digit (e.g., @samp{0} through @samp{9}).
+Please note that this function does not only return a nonzero value for
+@emph{decimal} digits, but for all kinds of digits. A consequence is
+that code like the following will @strong{not} work unconditionally for
+wide characters:
+
+@smallexample
+n = 0;
+while (iswctype (*wc))
+ @{
+ n *= 10;
+ n += *wc++ - L'0';
+ @}
+@end smallexample
+
+@noindent
+This function can be implemented using
+
+@smallexample
+iswctype (wc, wctype ("digit"))
+@end smallexample
+
+@pindex wctype.h
+This function is declared in @file{wctype.h}.
+@end deftypefun
+
+@cindex graphic character
+@comment wctype.h
+@comment ISO
+@deftypefun int iswgraph (wint_t @var{wc})
+Returns true if @var{wc} is a graphic character; that is, a character
+that has a glyph associated with it. The whitespace characters are not
+considered graphic.
+
+@noindent
+This function can be implemented using
+
+@smallexample
+iswctype (wc, wctype ("graph"))
+@end smallexample
+
+@pindex wctype.h
+This function is declared in @file{wctype.h}.
+@end deftypefun
+
+@cindex lower-case character
+@comment ctype.h
+@comment ISO
+@deftypefun int iswlower (wint_t @var{wc})
+Returns true if @var{wc} is a lower-case letter. The letter need not be
+from the Latin alphabet, any alphabet representable is valid.
+
+@noindent
+This function can be implemented using
+
+@smallexample
+iswctype (wc, wctype ("lower"))
+@end smallexample
+
+@pindex wctype.h
+This function is declared in @file{wctype.h}.
+@end deftypefun
+
+@cindex printing character
+@comment wctype.h
+@comment ISO
+@deftypefun int iswprint (wint_t @var{wc})
+Returns true if @var{wc} is a printing character. Printing characters
+include all the graphic characters, plus the space (@samp{ }) character.
+
+@noindent
+This function can be implemented using
+
+@smallexample
+iswctype (wc, wctype ("print"))
+@end smallexample
+
+@pindex wctype.h
+This function is declared in @file{wctype.h}.
+@end deftypefun
+
+@cindex punctuation character
+@comment wctype.h
+@comment ISO
+@deftypefun int iswpunct (wint_t @var{wc})
+Returns true if @var{wc} is a punctuation character.
+This means any printing character that is not alphanumeric or a space
+character.
+
+@noindent
+This function can be implemented using
+
+@smallexample
+iswctype (wc, wctype ("punct"))
+@end smallexample
+
+@pindex wctype.h
+This function is declared in @file{wctype.h}.
+@end deftypefun
+
+@cindex whitespace character
+@comment wctype.h
+@comment ISO
+@deftypefun int iswspace (wint_t @var{wc})
+Returns true if @var{wc} is a @dfn{whitespace} character. In the standard
+@code{"C"} locale, @code{iswspace} returns true for only the standard
+whitespace characters:
+
+@table @code
+@item L' '
+space
+
+@item L'\f'
+formfeed
+
+@item L'\n'
+newline
+
+@item L'\r'
+carriage return
+
+@item L'\t'
+horizontal tab
+
+@item L'\v'
+vertical tab
+@end table
+
+@noindent
+This function can be implemented using
+
+@smallexample
+iswctype (wc, wctype ("space"))
+@end smallexample
+
+@pindex wctype.h
+This function is declared in @file{wctype.h}.
+@end deftypefun
+
+@cindex upper-case character
+@comment wctype.h
+@comment ISO
+@deftypefun int iswupper (wint_t @var{wc})
+Returns true if @var{wc} is an upper-case letter. The letter need not be
+from the Latin alphabet, any alphabet representable is valid.
+
+@noindent
+This function can be implemented using
+
+@smallexample
+iswctype (wc, wctype ("upper"))
+@end smallexample
+
+@pindex wctype.h
+This function is declared in @file{wctype.h}.
+@end deftypefun
+
+@cindex hexadecimal digit character
+@comment wctype.h
+@comment ISO
+@deftypefun int iswxdigit (wint_t @var{wc})
+Returns true if @var{wc} is a hexadecimal digit.
+Hexadecimal digits include the normal decimal digits @samp{0} through
+@samp{9} and the letters @samp{A} through @samp{F} and
+@samp{a} through @samp{f}.
+
+@noindent
+This function can be implemented using
+
+@smallexample
+iswctype (wc, wctype ("xdigit"))
+@end smallexample
+
+@pindex wctype.h
+This function is declared in @file{wctype.h}.
+@end deftypefun
+
+The GNu C library provides also a function which is not defined in the
+@w{ISO C} standard but which is available as a version for single byte
+characters as well.
+
+@cindex blank character
+@comment wctype.h
+@comment GNU
+@deftypefun int iswblank (wint_t @var{wc})
+Returns true if @var{wc} is a blank character; that is, a space or a tab.
+This function is a GNU extension. It is declared in @file{wchar.h}.
+@end deftypefun
+
+@node Using Wide Char Classes, Wide Character Case Conversion, Classification of Wide Characters, Character Handling
+@section Notes on using the wide character classes
+
+The first note is probably nothing astonishing but still occasionally a
+cause of problems. The @code{isw@var{XXX}} functions can be implemented
+using macros and in fact, the GNU C library does this. They are still
+available as real functions but when the @file{wctype.h} header is
+included the macros will be used. This is nothing new compared to the
+@code{char} type versions of these functions.
+
+The second notes covers something which is new. It can be best
+illustrated by a (real-world) example. The first piece of code is an
+excerpt from the original code. It is truncated a bit but the intention
+should be clear.
+
+@smallexample
+int
+is_in_class (int c, const char *class)
+@{
+ if (strcmp (class, "alnum") == 0)
+ return isalnum (c);
+ if (strcmp (class, "alpha") == 0)
+ return isalpha (c);
+ if (strcmp (class, "cntrl") == 0)
+ return iscntrl (c);
+ ...
+ return 0;
+@}
+@end smallexample
+
+Now with the @code{wctype} and @code{iswctype} one could avoid the
+@code{if} cascades. But rewriting the code as follows is wrong:
+
+@smallexample
+int
+is_in_class (int c, const char *class)
+@{
+ wctype_t desc = wctype (class);
+ return desc ? iswctype ((wint_t) c, desc) : 0;
+@}
+@end smallexample
+
+The problem is that it is not guarateed that the wide character
+representation of a single-byte character can be found using casting.
+In fact, usually this fails miserably. The correct solution for this
+problem is to write the code as follows:
+
+@smallexample
+int
+is_in_class (int c, const char *class)
+@{
+ wctype_t desc = wctype (class);
+ return desc ? iswctype (btowc (c), desc) : 0;
+@}
+@end smallexample
+
+See @xref{Converting a Character} for more information on @code{btowc}.
+Please note that this change probably does not improve the performance
+of the program a lot since the @code{wctype} function still has to make
+the string comparisons. But it gets really interesting if the
+@code{is_in_class} function would be called more than once using the
+same class name. In this case the variable @var{desc} could be computed
+once and reused for all the calls. Therefore the above form of the
+function is probably not the final one.
+
+
+@node Wide Character Case Conversion, , Using Wide Char Classes, Character Handling
+@section Mapping of wide characters.
+
+As for the classification functions the @w{ISO C} standard also
+generalizes the mapping functions. Instead of only allowing the two
+standard mappings the locale can contain others. Again, the
+@code{localedef} program already supports generating such locale data
+files.
+
+@comment wctype.h
+@comment ISO
+@deftp {Data Type} wctrans_t
+This data type is defined as a scalar type which can hold a value
+representing the locale-dependent character mapping. There is no way to
+construct such a value beside using the return value of the
+@code{wctrans} function.
+
+@pindex wctype.h
+@noindent
+This type is defined in @file{wctype.h}.
+@end deftp
+
+@comment wctype.h
+@comment ISO
+@deftypefun wctrans_t wctrans (const char *@var{property}
+The @code{wctrans} function has to be used to find out whether a named
+mapping is defined in the current locale selected for the
+@code{LC_CTYPE} category. If the returned value is non-zero it can
+afterwards be used in calls to @code{towctrans}. If the return value is
+zero no such mapping is known in the current locale.
+
+Beside locale-specific mappings there are two mappings which are
+guaranteed to be available in every locale:
+
+@multitable @columnfractions .5 .5
+@item
+@code{"tolower"} @tab @code{"toupper"}
+@end multitable
+
+@pindex wctype.h
+@noindent
+This function is declared in @file{wctype.h}.
+@end deftypefun
+
+@comment wctype.h
+@comment ISO
+@deftypefun wint_t towctrans (wint_t @var{wc}, wctrans_t @var{desc})
+The @code{towctrans} function maps the input character @var{wc}
+according to the rules of the mapping for which @var{desc} is an
+descriptor and returns the so found value. The @var{desc} value must be
+obtained by a successful call to @code{wctrans}.
+
+@pindex wctype.h
+@noindent
+This function is declared in @file{wctype.h}.
+@end deftypefun
+
+The @w{ISO C} standard also defines for the generally available mappings
+convenient shortcuts so that it is not necesary to call @code{wctrans}
+for them.
+
+@comment wctype.h
+@comment ISO
+@deftypefun wint_t towlower (wint_t @var{wc})
+If @var{wc} is an upper-case letter, @code{towlower} returns the corresponding
+lower-case letter. If @var{wc} is not an upper-case letter,
+@var{wc} is returned unchanged.
+
+@pindex wctype.h
+@noindent
+This function is declared in @file{wctype.h}.
+@end deftypefun
+
+@comment wctype.h
+@comment ISO
+@deftypefun wint_t towupper (wint_t @var{wc})
+If @var{wc} is a lower-case letter, @code{towupper} returns the corresponding
+upper-case letter. Otherwise @var{wc} is returned unchanged.
+
+@pindex wctype.h
+@noindent
+This function is declared in @file{wctype.h}.
+@end deftypefun
+
+The same warnings given in the last section for the use of the wide
+character classiffication function applies here. It is not possible to
+simply cast a @code{char} type value to a @code{wint_t} and use it as an
+argument for @code{towctrans} calls.