initial import

author: Roland McGrath <roland@gnu.org> 1995-02-18 01:27:10 +0000
committer: Roland McGrath <roland@gnu.org> 1995-02-18 01:27:10 +0000
commit: 28f540f45bbacd939bfd07f213bcad2bf730b1bf (patch)
tree: 15f07c4c43d635959c6afee96bde71fb1b3614ee /manual/=limits.texinfo
1 files changed, 593 insertions, 0 deletions
diff --git a/manual/=limits.texinfo b/manual/=limits.texinfo
new file mode 100644
index 0000000000..3e384dd6b6
--- /dev/null
+++ b/manual/=limits.texinfo
@@ -0,0 +1,593 @@
+@node Representation Limits, System Configuration Limits, System Information, Top
+@chapter Representation Limits
+
+This chapter contains information about constants and parameters that
+characterize the representation of the various integer and
+floating-point types supported by the GNU C library.
+
+@menu
+* Integer Representation Limits::       Determining maximum and minimum
+                                         representation values of
+                                         various integer subtypes.
+* Floating-Point Limits ::              Parameters which characterize
+                                         supported floating-point
+                                         representations on a particular
+                                         system. 
+@end menu
+
+@node Integer Representation Limits, Floating-Point Limits ,  , Representation Limits
+@section Integer Representation Limits
+@cindex integer representation limits
+@cindex representation limits, integer
+@cindex limits, integer representation
+
+Sometimes it is necessary for programs to know about the internal
+representation of various integer subtypes.  For example, if you want
+your program to be careful not to overflow an @code{int} counter
+variable, you need to know what the largest representable value that
+fits in an @code{int} is.  These kinds of parameters can vary from
+compiler to compiler and machine to machine.  Another typical use of
+this kind of parameter is in conditionalizing data structure definitions
+with @samp{#ifdef} to select the most appropriate integer subtype that
+can represent the required range of values.
+
+Macros representing the minimum and maximum limits of the integer types
+are defined in the header file @file{limits.h}.  The values of these
+macros are all integer constant expressions.
+@pindex limits.h
+
+@comment limits.h
+@comment ANSI
+@deftypevr Macro int CHAR_BIT
+This is the number of bits in a @code{char}, usually eight.
+@end deftypevr
+
+@comment limits.h
+@comment ANSI
+@deftypevr Macro int SCHAR_MIN
+This is the minimum value that can be represented by a @code{signed char}.
+@end deftypevr
+
+@comment limits.h
+@comment ANSI
+@deftypevr Macro int SCHAR_MAX
+This is the maximum value that can be represented by a @code{signed char}.
+@end deftypevr
+
+@comment limits.h
+@comment ANSI
+@deftypevr Macro int UCHAR_MAX
+This is the maximum value that can be represented by a @code{unsigned char}.
+(The minimum value of an @code{unsigned char} is zero.)
+@end deftypevr
+
+@comment limits.h
+@comment ANSI
+@deftypevr Macro int CHAR_MIN
+This is the minimum value that can be represented by a @code{char}.
+It's equal to @code{SCHAR_MIN} if @code{char} is signed, or zero
+otherwise.
+@end deftypevr
+
+@comment limits.h
+@comment ANSI
+@deftypevr Macro int CHAR_MAX
+This is the maximum value that can be represented by a @code{char}.
+It's equal to @code{SCHAR_MAX} if @code{char} is signed, or
+@code{UCHAR_MAX} otherwise.
+@end deftypevr
+
+@comment limits.h
+@comment ANSI
+@deftypevr Macro int SHRT_MIN
+This is the minimum value that can be represented by a @code{signed
+short int}.  On most machines that the GNU C library runs on,
+@code{short} integers are 16-bit quantities.
+@end deftypevr
+
+@comment limits.h
+@comment ANSI
+@deftypevr Macro int SHRT_MAX
+This is the maximum value that can be represented by a @code{signed
+short int}.
+@end deftypevr
+
+@comment limits.h
+@comment ANSI
+@deftypevr Macro int USHRT_MAX
+This is the maximum value that can be represented by an @code{unsigned
+short int}.  (The minimum value of an @code{unsigned short int} is zero.)
+@end deftypevr
+
+@comment limits.h
+@comment ANSI
+@deftypevr Macro int INT_MIN
+This is the minimum value that can be represented by a @code{signed
+int}.  On most machines that the GNU C system runs on, an @code{int} is
+a 32-bit quantity.
+@end deftypevr
+
+@comment limits.h
+@comment ANSI
+@deftypevr Macro int INT_MAX
+This is the maximum value that can be represented by a @code{signed
+int}.
+@end deftypevr
+
+@comment limits.h
+@comment ANSI
+@deftypevr Macro {unsigned int} UINT_MAX
+This is the maximum value that can be represented by an @code{unsigned
+int}.  (The minimum value of an @code{unsigned int} is zero.)
+@end deftypevr
+
+@comment limits.h
+@comment ANSI
+@deftypevr Macro {long int} LONG_MIN
+This is the minimum value that can be represented by a @code{signed long
+int}.  On most machines that the GNU C system runs on, @code{long}
+integers are 32-bit quantities, the same size as @code{int}.
+@end deftypevr
+
+@comment limits.h
+@comment ANSI
+@deftypevr Macro {long int} LONG_MAX
+This is the maximum value that can be represented by a @code{signed long
+int}.
+@end deftypevr
+
+@comment limits.h
+@comment ANSI
+@deftypevr Macro {unsigned long int} ULONG_MAX
+This is the maximum value that can be represented by an @code{unsigned
+long int}.  (The minimum value of an @code{unsigned long int} is zero.)
+@end deftypevr
+
+@strong{Incomplete:}  There should be corresponding limits for the GNU
+C Compiler's @code{long long} type, too.  (But they are not now present
+in the header file.)
+
+The header file @file{limits.h} also defines some additional constants
+that parameterize various operating system and file system limits.  These
+constants are described in @ref{System Parameters} and @ref{File System
+Parameters}.
+@pindex limits.h
+
+
+@node Floating-Point Limits ,  , Integer Representation Limits, Representation Limits
+@section Floating-Point Limits
+@cindex floating-point number representation
+@cindex representation, floating-point number
+@cindex limits, floating-point representation
+
+Because floating-point numbers are represented internally as approximate
+quantities, algorithms for manipulating floating-point data often need
+to be parameterized in terms of the accuracy of the representation.
+Some of the functions in the C library itself need this information; for
+example, the algorithms for printing and reading floating-point numbers
+(@pxref{I/O on Streams}) and for calculating trigonometric and
+irrational functions (@pxref{Mathematics}) use information about the
+underlying floating-point representation to avoid round-off error and
+loss of accuracy.  User programs that implement numerical analysis
+techniques also often need to be parameterized in this way in order to
+minimize or compute error bounds.
+
+The specific representation of floating-point numbers varies from
+machine to machine.  The GNU C library defines a set of parameters which
+characterize each of the supported floating-point representations on a
+particular system.
+
+@menu
+* Floating-Point Representation::       Definitions of terminology.
+* Floating-Point Parameters::           Descriptions of the library
+                                         facilities. 
+* IEEE Floating Point::                 An example of a common
+                                         representation. 
+@end menu
+
+@node Floating-Point Representation, Floating-Point Parameters,  , Floating-Point Limits
+@subsection Floating-Point Representation
+
+This section introduces the terminology used to characterize the
+representation of floating-point numbers.
+
+You are probably already familiar with most of these concepts in terms
+of scientific or exponential notation for floating-point numbers.  For
+example, the number @code{123456.0} could be expressed in exponential
+notation as @code{1.23456e+05}, a shorthand notation indicating that the
+mantissa @code{1.23456} is multiplied by the base @code{10} raised to
+power @code{5}.
+
+More formally, the internal representation of a floating-point number
+can be characterized in terms of the following parameters:
+
+@itemize @bullet
+@item
+The @dfn{sign} is either @code{-1} or @code{1}.
+@cindex sign (of floating-point number)
+
+@item
+The @dfn{base} or @dfn{radix} for exponentiation; an integer greater
+than @code{1}.  This is a constant for the particular representation.
+@cindex base (of floating-point number)
+@cindex radix (of floating-point number)
+
+@item
+The @dfn{exponent} to which the base is raised.  The upper and lower
+bounds of the exponent value are constants for the particular
+representation.
+@cindex exponent (of floating-point number)
+
+Sometimes, in the actual bits representing the floating-point number,
+the exponent is @dfn{biased} by adding a constant to it, to make it
+always be represented as an unsigned quantity.  This is only important
+if you have some reason to pick apart the bit fields making up the
+floating-point number by hand, which is something for which the GNU
+library provides no support.  So this is ignored in the discussion that
+follows.
+@cindex bias (of floating-point number exponent)
+
+@item
+The value of the @dfn{mantissa} or @dfn{significand}, which is an
+unsigned integer.
+@cindex mantissa (of floating-point number)
+@cindex significand (of floating-point number)
+
+@item 
+The @dfn{precision} of the mantissa.  If the base of the representation
+is @var{b}, then the precision is the number of base-@var{b} digits in
+the mantissa.  This is a constant for the particular representation.
+
+Many floating-point representations have an implicit @dfn{hidden bit} in
+the mantissa.  Any such hidden bits are counted in the precision.
+Again, the GNU library provides no facilities for dealing with such low-level
+aspects of the representation.
+@cindex precision (of floating-point number)
+@cindex hidden bit (of floating-point number mantissa)
+@end itemize
+
+The mantissa of a floating-point number actually represents an implicit
+fraction whose denominator is the base raised to the power of the
+precision.  Since the largest representable mantissa is one less than
+this denominator, the value of the fraction is always strictly less than
+@code{1}.  The mathematical value of a floating-point number is then the
+product of this fraction; the sign; and the base raised to the exponent.
+
+If the floating-point number is @dfn{normalized}, the mantissa is also
+greater than or equal to the base raised to the power of one less
+than the precision (unless the number represents a floating-point zero,
+in which case the mantissa is zero).  The fractional quantity is
+therefore greater than or equal to @code{1/@var{b}}, where @var{b} is
+the base.
+@cindex normalized floating-point number
+
+@node Floating-Point Parameters, IEEE Floating Point, Floating-Point Representation, Floating-Point Limits
+@subsection Floating-Point Parameters
+
+@strong{Incomplete:}  This section needs some more concrete examples
+of what these parameters mean and how to use them in a program.
+
+These macro definitions can be accessed by including the header file
+@file{float.h} in your program.
+@pindex float.h
+
+Macro names starting with @samp{FLT_} refer to the @code{float} type,
+while names beginning with @samp{DBL_} refer to the @code{double} type
+and names beginning with @samp{LDBL_} refer to the @code{long double}
+type.  (In implementations that do not support @code{long double} as
+a distinct data type, the values for those constants are the same
+as the corresponding constants for the @code{double} type.)@refill
+@cindex @code{float} representation limits
+@cindex @code{double} representation limits
+@cindex @code{long double} representation limits
+
+Of these macros, only @code{FLT_RADIX} is guaranteed to be a constant
+expression.  The other macros listed here cannot be reliably used in
+places that require constant expressions, such as @samp{#if}
+preprocessing directives or array size specifications.
+
+Although the ANSI C standard specifies minimum and maximum values for
+most of these parameters, the GNU C implementation uses whatever
+floating-point representations are supported by the underlying hardware.
+So whether GNU C actually satisfies the ANSI C requirements depends on
+what machine it is running on.
+
+@comment float.h
+@comment ANSI
+@deftypevr Macro int FLT_ROUNDS
+This value characterizes the rounding mode for floating-point addition.
+The following values indicate standard rounding modes:
+
+@table @code
+@item -1
+The mode is indeterminable.
+@item 0
+Rounding is towards zero.
+@item 1
+Rounding is to the nearest number.
+@item 2
+Rounding is towards positive infinity.
+@item 3
+Rounding is towards negative infinity.
+@end table
+
+@noindent
+Any other value represents a machine-dependent nonstandard rounding
+mode.
+@end deftypevr
+
+@comment float.h
+@comment ANSI
+@deftypevr Macro int FLT_RADIX
+This is the value of the base, or radix, of exponent representation.
+This is guaranteed to be a constant expression, unlike the other macros
+described in this section.
+@end deftypevr
+
+@comment float.h
+@comment ANSI
+@deftypevr Macro int FLT_MANT_DIG
+This is the number of base-@code{FLT_RADIX} digits in the floating-point
+mantissa for the @code{float} data type.
+@end deftypevr
+
+@comment float.h
+@comment ANSI
+@deftypevr Macro int DBL_MANT_DIG
+This is the number of base-@code{FLT_RADIX} digits in the floating-point
+mantissa for the @code{double} data type.
+@end deftypevr
+
+@comment float.h
+@comment ANSI
+@deftypevr Macro int LDBL_MANT_DIG
+This is the number of base-@code{FLT_RADIX} digits in the floating-point
+mantissa for the @code{long double} data type.
+@end deftypevr
+
+@comment float.h
+@comment ANSI
+@deftypevr Macro int FLT_DIG
+This is the number of decimal digits of precision for the @code{float}
+data type.  Technically, if @var{p} and @var{b} are the precision and
+base (respectively) for the representation, then the decimal precision
+@var{q} is the maximum number of decimal digits such that any floating
+point number with @var{q} base 10 digits can be rounded to a floating
+point number with @var{p} base @var{b} digits and back again, without
+change to the @var{q} decimal digits.
+
+The value of this macro is guaranteed to be at least @code{6}.
+@end deftypevr
+
+@comment float.h
+@comment ANSI
+@deftypevr Macro int DBL_DIG
+This is similar to @code{FLT_DIG}, but is for the @code{double} data
+type.  The value of this macro is guaranteed to be at least @code{10}.
+@end deftypevr
+
+@comment float.h
+@comment ANSI
+@deftypevr Macro int LDBL_DIG
+This is similar to @code{FLT_DIG}, but is for the @code{long double}
+data type.  The value of this macro is guaranteed to be at least
+@code{10}.
+@end deftypevr
+
+@comment float.h
+@comment ANSI
+@deftypevr Macro int FLT_MIN_EXP
+This is the minimum negative integer such that the mathematical value
+@code{FLT_RADIX} raised to this power minus 1 can be represented as a
+normalized floating-point number of type @code{float}.  In terms of the
+actual implementation, this is just the smallest value that can be
+represented in the exponent field of the number.
+@end deftypevr
+
+@comment float.h
+@comment ANSI
+@deftypevr Macro int DBL_MIN_EXP
+This is similar to @code{FLT_MIN_EXP}, but is for the @code{double} data
+type.
+@end deftypevr
+
+@comment float.h
+@comment ANSI
+@deftypevr Macro int LDBL_MIN_EXP
+This is similar to @code{FLT_MIN_EXP}, but is for the @code{long double}
+data type.
+@end deftypevr
+
+@comment float.h
+@comment ANSI
+@deftypevr Macro int FLT_MIN_10_EXP
+This is the minimum negative integer such that the mathematical value
+@code{10} raised to this power minus 1 can be represented as a
+normalized floating-point number of type @code{float}.  This is
+guaranteed to be no greater than @code{-37}.
+@end deftypevr
+
+@comment float.h
+@comment ANSI
+@deftypevr Macro int DBL_MIN_10_EXP
+This is similar to @code{FLT_MIN_10_EXP}, but is for the @code{double}
+data type.
+@end deftypevr
+
+@comment float.h
+@comment ANSI
+@deftypevr Macro int LDBL_MIN_10_EXP
+This is similar to @code{FLT_MIN_10_EXP}, but is for the @code{long
+double} data type.
+@end deftypevr
+
+
+
+@comment float.h
+@comment ANSI
+@deftypevr Macro int FLT_MAX_EXP
+This is the maximum negative integer such that the mathematical value
+@code{FLT_RADIX} raised to this power minus 1 can be represented as a
+floating-point number of type @code{float}.  In terms of the actual
+implementation, this is just the largest value that can be represented
+in the exponent field of the number.
+@end deftypevr
+
+@comment float.h
+@comment ANSI
+@deftypevr Macro int DBL_MAX_EXP
+This is similar to @code{FLT_MAX_EXP}, but is for the @code{double} data
+type.
+@end deftypevr
+
+@comment float.h
+@comment ANSI
+@deftypevr Macro int LDBL_MAX_EXP
+This is similar to @code{FLT_MAX_EXP}, but is for the @code{long double}
+data type.
+@end deftypevr
+
+@comment float.h
+@comment ANSI
+@deftypevr Macro int FLT_MAX_10_EXP
+This is the maximum negative integer such that the mathematical value
+@code{10} raised to this power minus 1 can be represented as a
+normalized floating-point number of type @code{float}.  This is
+guaranteed to be at least @code{37}.
+@end deftypevr
+
+@comment float.h
+@comment ANSI
+@deftypevr Macro int DBL_MAX_10_EXP
+This is similar to @code{FLT_MAX_10_EXP}, but is for the @code{double}
+data type.
+@end deftypevr
+
+@comment float.h
+@comment ANSI
+@deftypevr Macro int LDBL_MAX_10_EXP
+This is similar to @code{FLT_MAX_10_EXP}, but is for the @code{long
+double} data type.
+@end deftypevr
+
+
+@comment float.h
+@comment ANSI
+@deftypevr Macro double FLT_MAX
+The value of this macro is the maximum representable floating-point
+number of type @code{float}, and is guaranteed to be at least
+@code{1E+37}.
+@end deftypevr
+
+@comment float.h
+@comment ANSI
+@deftypevr Macro double DBL_MAX
+The value of this macro is the maximum representable floating-point
+number of type @code{double}, and is guaranteed to be at least
+@code{1E+37}.
+@end deftypevr
+
+@comment float.h
+@comment ANSI
+@deftypevr Macro {long double} LDBL_MAX
+The value of this macro is the maximum representable floating-point
+number of type @code{long double}, and is guaranteed to be at least
+@code{1E+37}.
+@end deftypevr
+
+
+@comment float.h
+@comment ANSI
+@deftypevr Macro double FLT_MIN
+The value of this macro is the minimum normalized positive
+floating-point number that is representable by type @code{float}, and is
+guaranteed to be no more than @code{1E-37}.
+@end deftypevr
+
+@comment float.h
+@comment ANSI
+@deftypevr Macro double DBL_MIN
+The value of this macro is the minimum normalized positive
+floating-point number that is representable by type @code{double}, and
+is guaranteed to be no more than @code{1E-37}.
+@end deftypevr
+
+@comment float.h
+@comment ANSI
+@deftypevr Macro {long double} LDBL_MIN
+The value of this macro is the minimum normalized positive
+floating-point number that is representable by type @code{long double},
+and is guaranteed to be no more than @code{1E-37}.
+@end deftypevr
+
+
+@comment float.h
+@comment ANSI
+@deftypevr Macro double FLT_EPSILON
+This is the minimum positive floating-point number of type @code{float}
+such that @code{1.0 + FLT_EPSILON != 1.0} is true.  It's guaranteed to
+be no greater than @code{1E-5}.
+@end deftypevr
+
+@comment float.h
+@comment ANSI
+@deftypevr Macro double DBL_EPSILON
+This is similar to @code{FLT_EPSILON}, but is for the @code{double}
+type.  The maximum value is @code{1E-9}.
+@end deftypevr
+
+@comment float.h
+@comment ANSI
+@deftypevr Macro {long double} LDBL_EPSILON
+This is similar to @code{FLT_EPSILON}, but is for the @code{long double}
+type.  The maximum value is @code{1E-9}.
+@end deftypevr
+
+
+@node IEEE Floating Point,  , Floating-Point Parameters, Floating-Point Limits
+@subsection IEEE Floating Point
+@cindex IEEE floating-point representation 
+@cindex floating-point, IEEE
+@cindex IEEE Std 754
+
+
+Here is an example showing how these parameters work for a common
+floating point representation, specified by the @cite{IEEE Standard for
+Binary Floating-Point Arithmetic (ANSI/IEEE Std 754-1985)}.  Nearly
+all computers today use this format.
+
+The IEEE single-precision float representation uses a base of 2.  There
+is a sign bit, a mantissa with 23 bits plus one hidden bit (so the total
+precision is 24 base-2 digits), and an 8-bit exponent that can represent
+values in the range -125 to 128, inclusive.
+
+So, for an implementation that uses this representation for the
+@code{float} data type, appropriate values for the corresponding
+parameters are:
+
+@example
+FLT_RADIX                             2
+FLT_MANT_DIG                         24
+FLT_DIG                               6
+FLT_MIN_EXP                        -125
+FLT_MIN_10_EXP                      -37
+FLT_MAX_EXP                         128
+FLT_MAX_10_EXP                      +38
+FLT_MIN                 1.17549435E-38F
+FLT_MAX                 3.40282347E+38F
+FLT_EPSILON             1.19209290E-07F
+@end example
+
+Here are the values for the @code{double} data type:
+
+@example
+DBL_MANT_DIG                         53
+DBL_DIG                              15
+DBL_MIN_EXP                       -1021
+DBL_MIN_10_EXP                     -307
+DBL_MAX_EXP                        1024
+DBL_MAX_10_EXP                      308
+DBL_MAX         1.7976931348623157E+308
+DBL_MIN         2.2250738585072014E-308
+DBL_EPSILON     2.2204460492503131E-016
+@end example
author	Roland McGrath <roland@gnu.org>	1995-02-18 01:27:10 +0000
committer	Roland McGrath <roland@gnu.org>	1995-02-18 01:27:10 +0000
commit	28f540f45bbacd939bfd07f213bcad2bf730b1bf (patch)
tree	15f07c4c43d635959c6afee96bde71fb1b3614ee /manual/=limits.texinfo