Navigation

Monday 15 October 2018

ⓩ - Why do we use double instead of int in C++ for the calculation of large numbers?


An ‘int’ is (typically) a 32 bit number - if you use it as an “unsigned” number - then the largest value you can store is about 4 billion.
A ‘float’ is also 32 bit number but it uses an approximation to represent numbers. It’s basically a number between 0 and 1 (called the ‘mantissa’) - multiplied by 2 to the power some other number (called the ‘exponent’). So packed into those 32 bits are TWO numbers rather than one. Hence the accuracy (which is determined by the mantissa) isn’t as good as an integer - but the SIZE of the number (which is determined by the exponent) can be much MUCH larger.
The largest number you can store in a ‘float’ is roughly 1038 - which is more atoms than there are in every human being on Earth…a pretty decent size for most purposes. But the bigger the number gets - the less accurately it’s stored. So for precision work - we tend to use integers.
A ‘double’ is a 64 bit number - and it works the same way as a ‘float’ - but has more space for the mantissa and exponent. 52 bits of mantissa, 11 bits of exponent and a sign bit (plus/minus). The largest ‘double’ is around 10384 - which is an ungodly large number - VASTLY bigger than the number of atoms in the visible universe (about 1082).
But to be VERY clear - floats and doubles are APPROXIMATE representations. integers are EXACT.

Of course if you’re willing to consume 64 bits to store a double - you could have used a ‘long int’ - which is (probably) 64 bits - and can store whole numbers up to about 1019 with perfect precision.

Source:Why do we use double instead of int in C++ for the calculation of large numbers?

No comments:

Post a Comment