summaryrefslogtreecommitdiff
path: root/manual/=float.texinfo
diff options
context:
space:
mode:
Diffstat (limited to 'manual/=float.texinfo')
-rw-r--r--manual/=float.texinfo72
1 files changed, 35 insertions, 37 deletions
diff --git a/manual/=float.texinfo b/manual/=float.texinfo
index a8c901542e..d4e3920f8c 100644
--- a/manual/=float.texinfo
+++ b/manual/=float.texinfo
@@ -1,4 +1,4 @@
-@node Floating-Point Limits
+@node Floating-Point Limits
@chapter Floating-Point Limits
@pindex <float.h>
@cindex floating-point number representation
@@ -75,7 +75,7 @@ unsigned quantity.
@cindex mantissa (of floating-point number)
@cindex significand (of floating-point number)
-@item
+@item
The @dfn{precision} of the mantissa. If the base of the representation
is @var{b}, then the precision is the number of base-@var{b} digits in
the mantissa. This is a constant for the particular representation.
@@ -124,14 +124,14 @@ expression, so the other macros listed here cannot be reliably used in
places that require constant expressions, such as @samp{#if}
preprocessing directives and array size specifications.
-Although the ANSI C standard specifies minimum and maximum values for
+Although the @w{ISO C} standard specifies minimum and maximum values for
most of these parameters, the GNU C implementation uses whatever
floating-point representations are supported by the underlying hardware.
-So whether GNU C actually satisfies the ANSI C requirements depends on
+So whether GNU C actually satisfies the @w{ISO C} requirements depends on
what machine it is running on.
@comment float.h
-@comment ANSI
+@comment ISO
@defvr Macro FLT_ROUNDS
This value characterizes the rounding mode for floating-point addition.
The following values indicate standard rounding modes:
@@ -155,7 +155,7 @@ mode.
@end defvr
@comment float.h
-@comment ANSI
+@comment ISO
@defvr Macro FLT_RADIX
This is the value of the base, or radix, of exponent representation.
This is guaranteed to be a constant expression, unlike the other macros
@@ -163,28 +163,28 @@ described in this section.
@end defvr
@comment float.h
-@comment ANSI
+@comment ISO
@defvr Macro FLT_MANT_DIG
This is the number of base-@code{FLT_RADIX} digits in the floating-point
mantissa for the @code{float} data type.
@end defvr
@comment float.h
-@comment ANSI
+@comment ISO
@defvr Macro DBL_MANT_DIG
This is the number of base-@code{FLT_RADIX} digits in the floating-point
mantissa for the @code{double} data type.
@end defvr
@comment float.h
-@comment ANSI
+@comment ISO
@defvr Macro LDBL_MANT_DIG
This is the number of base-@code{FLT_RADIX} digits in the floating-point
mantissa for the @code{long double} data type.
@end defvr
@comment float.h
-@comment ANSI
+@comment ISO
@defvr Macro FLT_DIG
This is the number of decimal digits of precision for the @code{float}
data type. Technically, if @var{p} and @var{b} are the precision and
@@ -198,14 +198,14 @@ The value of this macro is guaranteed to be at least @code{6}.
@end defvr
@comment float.h
-@comment ANSI
+@comment ISO
@defvr Macro DBL_DIG
This is similar to @code{FLT_DIG}, but is for the @code{double} data
type. The value of this macro is guaranteed to be at least @code{10}.
@end defvr
@comment float.h
-@comment ANSI
+@comment ISO
@defvr Macro LDBL_DIG
This is similar to @code{FLT_DIG}, but is for the @code{long double}
data type. The value of this macro is guaranteed to be at least
@@ -213,7 +213,7 @@ data type. The value of this macro is guaranteed to be at least
@end defvr
@comment float.h
-@comment ANSI
+@comment ISO
@defvr Macro FLT_MIN_EXP
This is the minimum negative integer such that the mathematical value
@code{FLT_RADIX} raised to this power minus 1 can be represented as a
@@ -223,21 +223,21 @@ represented in the exponent field of the number.
@end defvr
@comment float.h
-@comment ANSI
+@comment ISO
@defvr Macro DBL_MIN_EXP
This is similar to @code{FLT_MIN_EXP}, but is for the @code{double} data
type.
@end defvr
@comment float.h
-@comment ANSI
+@comment ISO
@defvr Macro LDBL_MIN_EXP
This is similar to @code{FLT_MIN_EXP}, but is for the @code{long double}
data type.
@end defvr
@comment float.h
-@comment ANSI
+@comment ISO
@defvr Macro FLT_MIN_10_EXP
This is the minimum negative integer such that the mathematical value
@code{10} raised to this power minus 1 can be represented as a
@@ -246,14 +246,14 @@ guaranteed to be no greater than @code{-37}.
@end defvr
@comment float.h
-@comment ANSI
+@comment ISO
@defvr Macro DBL_MIN_10_EXP
This is similar to @code{FLT_MIN_10_EXP}, but is for the @code{double}
data type.
@end defvr
@comment float.h
-@comment ANSI
+@comment ISO
@defvr Macro LDBL_MIN_10_EXP
This is similar to @code{FLT_MIN_10_EXP}, but is for the @code{long
double} data type.
@@ -262,7 +262,7 @@ double} data type.
@comment float.h
-@comment ANSI
+@comment ISO
@defvr Macro FLT_MAX_EXP
This is the maximum negative integer such that the mathematical value
@code{FLT_RADIX} raised to this power minus 1 can be represented as a
@@ -272,21 +272,21 @@ in the exponent field of the number.
@end defvr
@comment float.h
-@comment ANSI
+@comment ISO
@defvr Macro DBL_MAX_EXP
This is similar to @code{FLT_MAX_EXP}, but is for the @code{double} data
type.
@end defvr
@comment float.h
-@comment ANSI
+@comment ISO
@defvr Macro LDBL_MAX_EXP
This is similar to @code{FLT_MAX_EXP}, but is for the @code{long double}
data type.
@end defvr
@comment float.h
-@comment ANSI
+@comment ISO
@defvr Macro FLT_MAX_10_EXP
This is the maximum negative integer such that the mathematical value
@code{10} raised to this power minus 1 can be represented as a
@@ -295,14 +295,14 @@ guaranteed to be at least @code{37}.
@end defvr
@comment float.h
-@comment ANSI
+@comment ISO
@defvr Macro DBL_MAX_10_EXP
This is similar to @code{FLT_MAX_10_EXP}, but is for the @code{double}
data type.
@end defvr
@comment float.h
-@comment ANSI
+@comment ISO
@defvr Macro LDBL_MAX_10_EXP
This is similar to @code{FLT_MAX_10_EXP}, but is for the @code{long
double} data type.
@@ -310,7 +310,7 @@ double} data type.
@comment float.h
-@comment ANSI
+@comment ISO
@defvr Macro FLT_MAX
The value of this macro is the maximum representable floating-point
number of type @code{float}, and is guaranteed to be at least
@@ -318,7 +318,7 @@ number of type @code{float}, and is guaranteed to be at least
@end defvr
@comment float.h
-@comment ANSI
+@comment ISO
@defvr Macro DBL_MAX
The value of this macro is the maximum representable floating-point
number of type @code{double}, and is guaranteed to be at least
@@ -326,7 +326,7 @@ number of type @code{double}, and is guaranteed to be at least
@end defvr
@comment float.h
-@comment ANSI
+@comment ISO
@defvr Macro LDBL_MAX
The value of this macro is the maximum representable floating-point
number of type @code{long double}, and is guaranteed to be at least
@@ -335,7 +335,7 @@ number of type @code{long double}, and is guaranteed to be at least
@comment float.h
-@comment ANSI
+@comment ISO
@defvr Macro FLT_MIN
The value of this macro is the minimum normalized positive
floating-point number that is representable by type @code{float}, and is
@@ -343,7 +343,7 @@ guaranteed to be no more than @code{1E-37}.
@end defvr
@comment float.h
-@comment ANSI
+@comment ISO
@defvr Macro DBL_MIN
The value of this macro is the minimum normalized positive
floating-point number that is representable by type @code{double}, and
@@ -351,7 +351,7 @@ is guaranteed to be no more than @code{1E-37}.
@end defvr
@comment float.h
-@comment ANSI
+@comment ISO
@defvr Macro LDBL_MIN
The value of this macro is the minimum normalized positive
floating-point number that is representable by type @code{long double},
@@ -360,7 +360,7 @@ and is guaranteed to be no more than @code{1E-37}.
@comment float.h
-@comment ANSI
+@comment ISO
@defvr Macro FLT_EPSILON
This is the minimum positive floating-point number of type @code{float}
such that @code{1.0 + FLT_EPSILON != 1.0} is true. It's guaranteed to
@@ -368,14 +368,14 @@ be no greater than @code{1E-5}.
@end defvr
@comment float.h
-@comment ANSI
+@comment ISO
@defvr Macro DBL_EPSILON
This is similar to @code{FLT_EPSILON}, but is for the @code{double}
type. The maximum value is @code{1E-9}.
@end defvr
@comment float.h
-@comment ANSI
+@comment ISO
@defvr Macro LDBL_EPSILON
This is similar to @code{FLT_EPSILON}, but is for the @code{long double}
type. The maximum value is @code{1E-9}.
@@ -388,7 +388,8 @@ type. The maximum value is @code{1E-9}.
Here is an example showing how these parameters work for a common
floating point representation, specified by the @cite{IEEE Standard for
-Binary Floating-Point Arithmetic (ANSI/IEEE Std 754-1985)}.
+Binary Floating-Point Arithmetic (ANSI/IEEE Std 754-1985 or ANSI/IEEE
+Std 854-1987)}.
The IEEE single-precision float representation uses a base of 2. There
is a sign bit, a mantissa with 23 bits plus one hidden bit (so the total
@@ -411,6 +412,3 @@ FLT_MIN 1.17549435E-38F
FLT_MAX 3.40282347E+38F
FLT_EPSILON 1.19209290E-07F
@end example
-
-
-