When you’re working with Python, numpy, and any sort of normalized data, it’s very common to get an array like this:
# Normalized data [1.02425171e-01, 1.46210406e-01, 0.00000000e+00, 3.90839180e-04, 5.85902134e-01]
Even though I work with numbers literally every single day (yes, even on vacation and weekends) I never got used to this, and probably never will.
Here’s a table of advantages for scientific notation:
Advantages | Argument against |
---|---|
Good for massive numbers | Most times you won’t be working with massive numbers - there are different units for that |
Good for tiny numbers | Most times you won’t be working with tiny numbers - there are different units for that |
“But I can’t say the moon is 384,400,000 meters away from the Earth!”
First of all, yes, yes you can. Second of all, you can say 384.400 kilometers.
But Km’s doesn’t work with larger numbers!
Use light years, fool.
My point is, if you’re working with very large number of very small numbers, you can likely change the units to something a bit more sensible.
A Zero Fetish
Just look at that fugly “0.00000000e+00”. Yes, that is the same as “0.0”, or better yet “0”.
0 reads “zero”, not “zero point zero zero zero zero zero zero zero zero times ten to the power of zero”.
Why the hell, would someone write “0.00000000e+00” instead of plain and simple “0”? Beats me.
Who does that?! Why would you do that? Why do you hate the world so much? Why do you want to see it burn?
Terrible comparisons
Now, take a look at those two numbers: 3.90839180e-04 and 5.85902134e-01. Without reading them again, tell me: Which one is greater, the first or the second one?
Did you answer correctly? You had to look back, didn’t you? And perhaps, if you don’t work with scientific notation daily, it took you a second or two to realize there was a -
hidden in there, right between the e
and the 0
. And yes, if you’re wondering, that zero is completely unnecessary too.
The truth is, those numbers hurt our eyes so much, our brain just wants to skim over them asap, wishing it will all be over soon.
Now look at these two numbers: 0.00039084, 0.58590213. Which one is the larger the number? Or better yet: 0.00039 and 0.58. Now that was easy, wasn’t it?
(Not) Efficient
Have a look at this number:0.00007041000000000473
(22 characters)
For numpy this is how it should be printed7.041000000000473e-05
(21 characters)
Great, right? We just saved one character.
Could we save the same number of characters if we removed the leading 0?.00007041000000000473
(21 characters)
Yes. Would that be easier to read and compare? Also yes.
Difficult to compress
A lot of us use CSV’s to transfer data from one place to the other. It’s definitely not the most efficient thing to do, but the reality is, it’s one of, if not the most, common.
CSV is easily understood and can be opened by multiple software, besides not having any kind of proprietary restrictive license.
Because CSV files are text based, most people would understand why compression is so important and typically compression algorithms work pretty well with this kind of file.
However, because of the way most compression algorithms work, having a e
and a -
mixed in there with the numbers will increase the size of your compressed file, considerably.
Suggestion #1 : Make it worse
I have a great suggestion for the person who thought this was a good idea.
Since you hate so many people, why not make it worse? Expand the numbers even further, this way we can convert something like 96,734,147 into:
9(107) + 6(106) + 7(105) + 3(104) + 4(103) + 1(102) + 4(101) + 7(100) = 96,734,147
What do you think? Great uh?
The magic solution
Memorize this or write it somewhere, it contains the code for both pandas and numpy.
# Magic to get rid of bad notation import numpy as np import pandas np.set_printoptions(suppress=True) pd.set_option('display.float_format', lambda x: '%.10f' % x)
To enable again:
# You got the cure, why do you want to get sick again?... np.set_printoptions(suppress=False) pandas.reset_option('display.float_format')
According to the documentation, this is what suppress
does:
If True, always print floating point numbers using fixed point notation, in which case numbers equal to zero in the current precision will print as zero. If False, then scientific notation is used when absolute value of the smallest number is < 1e-4 or the ratio of the maximum absolute value to the minimum is > 1e3. The default is False.
I’ve also added this to my Python checklist:
Python Checklist
Use this checklist to make your Python developments easier and to solve most common problems.
Author: Manuel Levi | Publisher: ManuelLevi.com