Why i hate numpy's scientific notation and how to fix it

This is a RANT. If you're a math nazi, you'll be offended by this. Scientific notation is **rarely** useful. If you understand anything about statistics and the english language, you understand the difference between me saying **rarely useful**, and **never useful**. Are there tons of uses for scientific notation? Sure. Do they represent the majority? Not even close. Is scientific notation overused? Definitely.

When you’re working with Python, numpy, and any sort of normalized data, it’s very common to get an array like this:

# Normalized data
[1.02425171e-01, 1.46210406e-01, 0.00000000e+00, 3.90839180e-04, 5.85902134e-01]

Even though I work with numbers literally every single day (yes, even on vacation and weekends) I never got used to this, and probably never will.

Here’s a table of advantages for scientific notation:

Advantages	Argument against
Good for massive numbers	Most times you won’t be working with massive numbers - there are different units for that
Good for tiny numbers	Most times you won’t be working with tiny numbers - there are different units for that

“But I can’t say the moon is 384,400,000 meters away from the Earth!”

First of all, yes, yes you can. Second of all, you can say 384.400 kilometers.

But Km’s doesn’t work with larger numbers!

Use light years, fool.

My point is, if you’re working with very large number of very small numbers, you can likely change the units to something a bit more sensible.

A Zero Fetish

Just look at that fugly “0.00000000e+00”. Yes, that is the same as “0.0”, or better yet “0”.

0 reads “zero”, not “zero point zero zero zero zero zero zero zero zero times ten to the power of zero”.

Why the hell, would someone write “0.00000000e+00” instead of plain and simple “0”? Beats me.

Who does that?! Why would you do that? Why do you hate the world so much? Why do you want to see it burn?

Terrible comparisons

Now, take a look at those two numbers: 3.90839180e-04 and 5.85902134e-01. Without reading them again, tell me: Which one is greater, the first or the second one?

Did you answer correctly? You had to look back, didn’t you? And perhaps, if you don’t work with scientific notation daily, it took you a second or two to realize there was a - hidden in there, right between the e and the 0. And yes, if you’re wondering, that zero is completely unnecessary too.

The truth is, those numbers hurt our eyes so much, our brain just wants to skim over them asap, wishing it will all be over soon.

Now look at these two numbers: 0.00039084, 0.58590213. Which one is the larger the number? Or better yet: 0.00039 and 0.58. Now that was easy, wasn’t it?

(Not) Efficient

Have a look at this number:
0.00007041000000000473 (22 characters)
For numpy this is how it should be printed
7.041000000000473e-05 (21 characters)
Great, right? We just saved one character.

Could we save the same number of characters if we removed the leading 0?
.00007041000000000473 (21 characters)
Yes. Would that be easier to read and compare? Also yes.

Difficult to compress

A lot of us use CSV’s to transfer data from one place to the other. It’s definitely not the most efficient thing to do, but the reality is, it’s one of, if not the most, common.

CSV is easily understood and can be opened by multiple software, besides not having any kind of proprietary restrictive license.

Because CSV files are text based, most people would understand why compression is so important and typically compression algorithms work pretty well with this kind of file.
However, because of the way most compression algorithms work, having a e and a - mixed in there with the numbers will increase the size of your compressed file, considerably.

Suggestion #1 : Make it worse

I have a great suggestion for the person who thought this was a good idea.

Since you hate so many people, why not make it worse? Expand the numbers even further, this way we can convert something like 96,734,147 into:

9(107) + 6(106) + 7(105) + 3(104) + 4(103) + 1(102) + 4(101) + 7(100) = 96,734,147

What do you think? Great uh?

The magic solution

Memorize this or write it somewhere, it contains the code for both pandas and numpy.

# Magic to get rid of bad notation
import numpy as np
import pandas
    
np.set_printoptions(suppress=True)
pd.set_option('display.float_format', lambda x: '%.10f' % x)

To enable again:

# You got the cure, why do you want to get sick again?...
np.set_printoptions(suppress=False)
pandas.reset_option('display.float_format')

According to the documentation, this is what suppress does:

If True, always print floating point numbers using fixed point notation, in which case numbers equal to zero in the current precision will print as zero. If False, then scientific notation is used when absolute value of the smallest number is < 1e-4 or the ratio of the maximum absolute value to the minimum is > 1e3. The default is False.

I’ve also added this to my Python checklist:

Python Checklist

Use this checklist to make your Python developments easier and to solve most common problems.

Author: Manuel Levi | Publisher: ManuelLevi.com

A Zero Fetish

Terrible comparisons

(Not) Efficient

Difficult to compress

Suggestion #1 : Make it worse

The magic solution

Python Checklist

Manuel Levi

Common errors with Pandas, Sklearn and Tensorflow/Keras

Flask Login WITHOUT a library - Minimal tutorial

Pyautogui: Common problems and fixes (Python RPA)

Manuel Levi

Recent posts

Recommended Tools for Data Scientists and Digital Professionals

Paying for Your Blog Is Costing You a Fortune

Menu

Why I HATE numpy's scientific notation and how to fix it

A Zero Fetish

Terrible comparisons

(Not) Efficient

Difficult to compress

Suggestion #1 : Make it worse

The magic solution

Python Checklist

Manuel Levi

You may also like...

Common errors with Pandas, Sklearn and Tensorflow/Keras

Flask Login WITHOUT a library - Minimal tutorial

Pyautogui: Common problems and fixes (Python RPA)

Recommended Tools for Data Scientists and Digital Professionals

Paying for Your Blog Is Costing You a Fortune