1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
|
@node Representation Limits, System Configuration Limits, System Information, Top
@chapter Representation Limits
This chapter contains information about constants and parameters that
characterize the representation of the various integer and
floating-point types supported by the GNU C library.
@menu
* Integer Representation Limits:: Determining maximum and minimum
representation values of
various integer subtypes.
* Floating-Point Limits :: Parameters which characterize
supported floating-point
representations on a particular
system.
@end menu
@node Integer Representation Limits, Floating-Point Limits , , Representation Limits
@section Integer Representation Limits
@cindex integer representation limits
@cindex representation limits, integer
@cindex limits, integer representation
Sometimes it is necessary for programs to know about the internal
representation of various integer subtypes. For example, if you want
your program to be careful not to overflow an @code{int} counter
variable, you need to know what the largest representable value that
fits in an @code{int} is. These kinds of parameters can vary from
compiler to compiler and machine to machine. Another typical use of
this kind of parameter is in conditionalizing data structure definitions
with @samp{#ifdef} to select the most appropriate integer subtype that
can represent the required range of values.
Macros representing the minimum and maximum limits of the integer types
are defined in the header file @file{limits.h}. The values of these
macros are all integer constant expressions.
@pindex limits.h
@comment limits.h
@comment ANSI
@deftypevr Macro int CHAR_BIT
This is the number of bits in a @code{char}, usually eight.
@end deftypevr
@comment limits.h
@comment ANSI
@deftypevr Macro int SCHAR_MIN
This is the minimum value that can be represented by a @code{signed char}.
@end deftypevr
@comment limits.h
@comment ANSI
@deftypevr Macro int SCHAR_MAX
This is the maximum value that can be represented by a @code{signed char}.
@end deftypevr
@comment limits.h
@comment ANSI
@deftypevr Macro int UCHAR_MAX
This is the maximum value that can be represented by a @code{unsigned char}.
(The minimum value of an @code{unsigned char} is zero.)
@end deftypevr
@comment limits.h
@comment ANSI
@deftypevr Macro int CHAR_MIN
This is the minimum value that can be represented by a @code{char}.
It's equal to @code{SCHAR_MIN} if @code{char} is signed, or zero
otherwise.
@end deftypevr
@comment limits.h
@comment ANSI
@deftypevr Macro int CHAR_MAX
This is the maximum value that can be represented by a @code{char}.
It's equal to @code{SCHAR_MAX} if @code{char} is signed, or
@code{UCHAR_MAX} otherwise.
@end deftypevr
@comment limits.h
@comment ANSI
@deftypevr Macro int SHRT_MIN
This is the minimum value that can be represented by a @code{signed
short int}. On most machines that the GNU C library runs on,
@code{short} integers are 16-bit quantities.
@end deftypevr
@comment limits.h
@comment ANSI
@deftypevr Macro int SHRT_MAX
This is the maximum value that can be represented by a @code{signed
short int}.
@end deftypevr
@comment limits.h
@comment ANSI
@deftypevr Macro int USHRT_MAX
This is the maximum value that can be represented by an @code{unsigned
short int}. (The minimum value of an @code{unsigned short int} is zero.)
@end deftypevr
@comment limits.h
@comment ANSI
@deftypevr Macro int INT_MIN
This is the minimum value that can be represented by a @code{signed
int}. On most machines that the GNU C system runs on, an @code{int} is
a 32-bit quantity.
@end deftypevr
@comment limits.h
@comment ANSI
@deftypevr Macro int INT_MAX
This is the maximum value that can be represented by a @code{signed
int}.
@end deftypevr
@comment limits.h
@comment ANSI
@deftypevr Macro {unsigned int} UINT_MAX
This is the maximum value that can be represented by an @code{unsigned
int}. (The minimum value of an @code{unsigned int} is zero.)
@end deftypevr
@comment limits.h
@comment ANSI
@deftypevr Macro {long int} LONG_MIN
This is the minimum value that can be represented by a @code{signed long
int}. On most machines that the GNU C system runs on, @code{long}
integers are 32-bit quantities, the same size as @code{int}.
@end deftypevr
@comment limits.h
@comment ANSI
@deftypevr Macro {long int} LONG_MAX
This is the maximum value that can be represented by a @code{signed long
int}.
@end deftypevr
@comment limits.h
@comment ANSI
@deftypevr Macro {unsigned long int} ULONG_MAX
This is the maximum value that can be represented by an @code{unsigned
long int}. (The minimum value of an @code{unsigned long int} is zero.)
@end deftypevr
@strong{Incomplete:} There should be corresponding limits for the GNU
C Compiler's @code{long long} type, too. (But they are not now present
in the header file.)
The header file @file{limits.h} also defines some additional constants
that parameterize various operating system and file system limits. These
constants are described in @ref{System Parameters} and @ref{File System
Parameters}.
@pindex limits.h
@node Floating-Point Limits , , Integer Representation Limits, Representation Limits
@section Floating-Point Limits
@cindex floating-point number representation
@cindex representation, floating-point number
@cindex limits, floating-point representation
Because floating-point numbers are represented internally as approximate
quantities, algorithms for manipulating floating-point data often need
to be parameterized in terms of the accuracy of the representation.
Some of the functions in the C library itself need this information; for
example, the algorithms for printing and reading floating-point numbers
(@pxref{I/O on Streams}) and for calculating trigonometric and
irrational functions (@pxref{Mathematics}) use information about the
underlying floating-point representation to avoid round-off error and
loss of accuracy. User programs that implement numerical analysis
techniques also often need to be parameterized in this way in order to
minimize or compute error bounds.
The specific representation of floating-point numbers varies from
machine to machine. The GNU C library defines a set of parameters which
characterize each of the supported floating-point representations on a
particular system.
@menu
* Floating-Point Representation:: Definitions of terminology.
* Floating-Point Parameters:: Descriptions of the library
facilities.
* IEEE Floating Point:: An example of a common
representation.
@end menu
@node Floating-Point Representation, Floating-Point Parameters, , Floating-Point Limits
@subsection Floating-Point Representation
This section introduces the terminology used to characterize the
representation of floating-point numbers.
You are probably already familiar with most of these concepts in terms
of scientific or exponential notation for floating-point numbers. For
example, the number @code{123456.0} could be expressed in exponential
notation as @code{1.23456e+05}, a shorthand notation indicating that the
mantissa @code{1.23456} is multiplied by the base @code{10} raised to
power @code{5}.
More formally, the internal representation of a floating-point number
can be characterized in terms of the following parameters:
@itemize @bullet
@item
The @dfn{sign} is either @code{-1} or @code{1}.
@cindex sign (of floating-point number)
@item
The @dfn{base} or @dfn{radix} for exponentiation; an integer greater
than @code{1}. This is a constant for the particular representation.
@cindex base (of floating-point number)
@cindex radix (of floating-point number)
@item
The @dfn{exponent} to which the base is raised. The upper and lower
bounds of the exponent value are constants for the particular
representation.
@cindex exponent (of floating-point number)
Sometimes, in the actual bits representing the floating-point number,
the exponent is @dfn{biased} by adding a constant to it, to make it
always be represented as an unsigned quantity. This is only important
if you have some reason to pick apart the bit fields making up the
floating-point number by hand, which is something for which the GNU
library provides no support. So this is ignored in the discussion that
follows.
@cindex bias (of floating-point number exponent)
@item
The value of the @dfn{mantissa} or @dfn{significand}, which is an
unsigned integer.
@cindex mantissa (of floating-point number)
@cindex significand (of floating-point number)
@item
The @dfn{precision} of the mantissa. If the base of the representation
is @var{b}, then the precision is the number of base-@var{b} digits in
the mantissa. This is a constant for the particular representation.
Many floating-point representations have an implicit @dfn{hidden bit} in
the mantissa. Any such hidden bits are counted in the precision.
Again, the GNU library provides no facilities for dealing with such low-level
aspects of the representation.
@cindex precision (of floating-point number)
@cindex hidden bit (of floating-point number mantissa)
@end itemize
The mantissa of a floating-point number actually represents an implicit
fraction whose denominator is the base raised to the power of the
precision. Since the largest representable mantissa is one less than
this denominator, the value of the fraction is always strictly less than
@code{1}. The mathematical value of a floating-point number is then the
product of this fraction; the sign; and the base raised to the exponent.
If the floating-point number is @dfn{normalized}, the mantissa is also
greater than or equal to the base raised to the power of one less
than the precision (unless the number represents a floating-point zero,
in which case the mantissa is zero). The fractional quantity is
therefore greater than or equal to @code{1/@var{b}}, where @var{b} is
the base.
@cindex normalized floating-point number
@node Floating-Point Parameters, IEEE Floating Point, Floating-Point Representation, Floating-Point Limits
@subsection Floating-Point Parameters
@strong{Incomplete:} This section needs some more concrete examples
of what these parameters mean and how to use them in a program.
These macro definitions can be accessed by including the header file
@file{float.h} in your program.
@pindex float.h
Macro names starting with @samp{FLT_} refer to the @code{float} type,
while names beginning with @samp{DBL_} refer to the @code{double} type
and names beginning with @samp{LDBL_} refer to the @code{long double}
type. (In implementations that do not support @code{long double} as
a distinct data type, the values for those constants are the same
as the corresponding constants for the @code{double} type.)@refill
@cindex @code{float} representation limits
@cindex @code{double} representation limits
@cindex @code{long double} representation limits
Of these macros, only @code{FLT_RADIX} is guaranteed to be a constant
expression. The other macros listed here cannot be reliably used in
places that require constant expressions, such as @samp{#if}
preprocessing directives or array size specifications.
Although the ANSI C standard specifies minimum and maximum values for
most of these parameters, the GNU C implementation uses whatever
floating-point representations are supported by the underlying hardware.
So whether GNU C actually satisfies the ANSI C requirements depends on
what machine it is running on.
@comment float.h
@comment ANSI
@deftypevr Macro int FLT_ROUNDS
This value characterizes the rounding mode for floating-point addition.
The following values indicate standard rounding modes:
@table @code
@item -1
The mode is indeterminable.
@item 0
Rounding is towards zero.
@item 1
Rounding is to the nearest number.
@item 2
Rounding is towards positive infinity.
@item 3
Rounding is towards negative infinity.
@end table
@noindent
Any other value represents a machine-dependent nonstandard rounding
mode.
@end deftypevr
@comment float.h
@comment ANSI
@deftypevr Macro int FLT_RADIX
This is the value of the base, or radix, of exponent representation.
This is guaranteed to be a constant expression, unlike the other macros
described in this section.
@end deftypevr
@comment float.h
@comment ANSI
@deftypevr Macro int FLT_MANT_DIG
This is the number of base-@code{FLT_RADIX} digits in the floating-point
mantissa for the @code{float} data type.
@end deftypevr
@comment float.h
@comment ANSI
@deftypevr Macro int DBL_MANT_DIG
This is the number of base-@code{FLT_RADIX} digits in the floating-point
mantissa for the @code{double} data type.
@end deftypevr
@comment float.h
@comment ANSI
@deftypevr Macro int LDBL_MANT_DIG
This is the number of base-@code{FLT_RADIX} digits in the floating-point
mantissa for the @code{long double} data type.
@end deftypevr
@comment float.h
@comment ANSI
@deftypevr Macro int FLT_DIG
This is the number of decimal digits of precision for the @code{float}
data type. Technically, if @var{p} and @var{b} are the precision and
base (respectively) for the representation, then the decimal precision
@var{q} is the maximum number of decimal digits such that any floating
point number with @var{q} base 10 digits can be rounded to a floating
point number with @var{p} base @var{b} digits and back again, without
change to the @var{q} decimal digits.
The value of this macro is guaranteed to be at least @code{6}.
@end deftypevr
@comment float.h
@comment ANSI
@deftypevr Macro int DBL_DIG
This is similar to @code{FLT_DIG}, but is for the @code{double} data
type. The value of this macro is guaranteed to be at least @code{10}.
@end deftypevr
@comment float.h
@comment ANSI
@deftypevr Macro int LDBL_DIG
This is similar to @code{FLT_DIG}, but is for the @code{long double}
data type. The value of this macro is guaranteed to be at least
@code{10}.
@end deftypevr
@comment float.h
@comment ANSI
@deftypevr Macro int FLT_MIN_EXP
This is the minimum negative integer such that the mathematical value
@code{FLT_RADIX} raised to this power minus 1 can be represented as a
normalized floating-point number of type @code{float}. In terms of the
actual implementation, this is just the smallest value that can be
represented in the exponent field of the number.
@end deftypevr
@comment float.h
@comment ANSI
@deftypevr Macro int DBL_MIN_EXP
This is similar to @code{FLT_MIN_EXP}, but is for the @code{double} data
type.
@end deftypevr
@comment float.h
@comment ANSI
@deftypevr Macro int LDBL_MIN_EXP
This is similar to @code{FLT_MIN_EXP}, but is for the @code{long double}
data type.
@end deftypevr
@comment float.h
@comment ANSI
@deftypevr Macro int FLT_MIN_10_EXP
This is the minimum negative integer such that the mathematical value
@code{10} raised to this power minus 1 can be represented as a
normalized floating-point number of type @code{float}. This is
guaranteed to be no greater than @code{-37}.
@end deftypevr
@comment float.h
@comment ANSI
@deftypevr Macro int DBL_MIN_10_EXP
This is similar to @code{FLT_MIN_10_EXP}, but is for the @code{double}
data type.
@end deftypevr
@comment float.h
@comment ANSI
@deftypevr Macro int LDBL_MIN_10_EXP
This is similar to @code{FLT_MIN_10_EXP}, but is for the @code{long
double} data type.
@end deftypevr
@comment float.h
@comment ANSI
@deftypevr Macro int FLT_MAX_EXP
This is the maximum negative integer such that the mathematical value
@code{FLT_RADIX} raised to this power minus 1 can be represented as a
floating-point number of type @code{float}. In terms of the actual
implementation, this is just the largest value that can be represented
in the exponent field of the number.
@end deftypevr
@comment float.h
@comment ANSI
@deftypevr Macro int DBL_MAX_EXP
This is similar to @code{FLT_MAX_EXP}, but is for the @code{double} data
type.
@end deftypevr
@comment float.h
@comment ANSI
@deftypevr Macro int LDBL_MAX_EXP
This is similar to @code{FLT_MAX_EXP}, but is for the @code{long double}
data type.
@end deftypevr
@comment float.h
@comment ANSI
@deftypevr Macro int FLT_MAX_10_EXP
This is the maximum negative integer such that the mathematical value
@code{10} raised to this power minus 1 can be represented as a
normalized floating-point number of type @code{float}. This is
guaranteed to be at least @code{37}.
@end deftypevr
@comment float.h
@comment ANSI
@deftypevr Macro int DBL_MAX_10_EXP
This is similar to @code{FLT_MAX_10_EXP}, but is for the @code{double}
data type.
@end deftypevr
@comment float.h
@comment ANSI
@deftypevr Macro int LDBL_MAX_10_EXP
This is similar to @code{FLT_MAX_10_EXP}, but is for the @code{long
double} data type.
@end deftypevr
@comment float.h
@comment ANSI
@deftypevr Macro double FLT_MAX
The value of this macro is the maximum representable floating-point
number of type @code{float}, and is guaranteed to be at least
@code{1E+37}.
@end deftypevr
@comment float.h
@comment ANSI
@deftypevr Macro double DBL_MAX
The value of this macro is the maximum representable floating-point
number of type @code{double}, and is guaranteed to be at least
@code{1E+37}.
@end deftypevr
@comment float.h
@comment ANSI
@deftypevr Macro {long double} LDBL_MAX
The value of this macro is the maximum representable floating-point
number of type @code{long double}, and is guaranteed to be at least
@code{1E+37}.
@end deftypevr
@comment float.h
@comment ANSI
@deftypevr Macro double FLT_MIN
The value of this macro is the minimum normalized positive
floating-point number that is representable by type @code{float}, and is
guaranteed to be no more than @code{1E-37}.
@end deftypevr
@comment float.h
@comment ANSI
@deftypevr Macro double DBL_MIN
The value of this macro is the minimum normalized positive
floating-point number that is representable by type @code{double}, and
is guaranteed to be no more than @code{1E-37}.
@end deftypevr
@comment float.h
@comment ANSI
@deftypevr Macro {long double} LDBL_MIN
The value of this macro is the minimum normalized positive
floating-point number that is representable by type @code{long double},
and is guaranteed to be no more than @code{1E-37}.
@end deftypevr
@comment float.h
@comment ANSI
@deftypevr Macro double FLT_EPSILON
This is the minimum positive floating-point number of type @code{float}
such that @code{1.0 + FLT_EPSILON != 1.0} is true. It's guaranteed to
be no greater than @code{1E-5}.
@end deftypevr
@comment float.h
@comment ANSI
@deftypevr Macro double DBL_EPSILON
This is similar to @code{FLT_EPSILON}, but is for the @code{double}
type. The maximum value is @code{1E-9}.
@end deftypevr
@comment float.h
@comment ANSI
@deftypevr Macro {long double} LDBL_EPSILON
This is similar to @code{FLT_EPSILON}, but is for the @code{long double}
type. The maximum value is @code{1E-9}.
@end deftypevr
@node IEEE Floating Point, , Floating-Point Parameters, Floating-Point Limits
@subsection IEEE Floating Point
@cindex IEEE floating-point representation
@cindex floating-point, IEEE
@cindex IEEE Std 754
Here is an example showing how these parameters work for a common
floating point representation, specified by the @cite{IEEE Standard for
Binary Floating-Point Arithmetic (ANSI/IEEE Std 754-1985)}. Nearly
all computers today use this format.
The IEEE single-precision float representation uses a base of 2. There
is a sign bit, a mantissa with 23 bits plus one hidden bit (so the total
precision is 24 base-2 digits), and an 8-bit exponent that can represent
values in the range -125 to 128, inclusive.
So, for an implementation that uses this representation for the
@code{float} data type, appropriate values for the corresponding
parameters are:
@example
FLT_RADIX 2
FLT_MANT_DIG 24
FLT_DIG 6
FLT_MIN_EXP -125
FLT_MIN_10_EXP -37
FLT_MAX_EXP 128
FLT_MAX_10_EXP +38
FLT_MIN 1.17549435E-38F
FLT_MAX 3.40282347E+38F
FLT_EPSILON 1.19209290E-07F
@end example
Here are the values for the @code{double} data type:
@example
DBL_MANT_DIG 53
DBL_DIG 15
DBL_MIN_EXP -1021
DBL_MIN_10_EXP -307
DBL_MAX_EXP 1024
DBL_MAX_10_EXP 308
DBL_MAX 1.7976931348623157E+308
DBL_MIN 2.2250738585072014E-308
DBL_EPSILON 2.2204460492503131E-016
@end example
|