June 21, 2018

Numpy array append and performance issues

Issue

Assume you need to add 100 elements in the numpy array, the initial intuition is to run-down the loop construct and append the element to the array.

However, this would be a grave mistake in-terms of extension-ability of the code. Assume that the elements changes from 100 to 50000 - what would you expect? Let's see with below example


Comparing numpy append vs list append 

Check out this example (link) which run's through adding elements 
  1. directly to numpy array - using the append api provided by numpy
  2. indirectly create the numpy array - using a list (to construct the information) and converting the list to array using numpy array api.
We see that the Option(2) is much more efficient compared to Option(1). And more the number of elements to add to numpy array, more efficient is the option (2).




Result Analysis

Let's look at the sample results obtained in the run


Num of elements
 Time taken (in sec) to directly add in numpy (A)
Time taken (in sec) to add to list and then convert to numpy (B)
Approach A is slower compared to B by
10000
0.112673044205
0.00116991996765
96 times
20000
0.218866825104
0.00390100479126
56 times
30000
0.721004962921
0.00337600708008
213 times
40000
1.38639616966
0.00489687919617
283 times
50000
2.46702504158
0.00590395927429
417 times

So we see that the as the number of elements to insert increases, the factor by which the insert takes time increases drastically (as depicted by below graph)



Reason

Now, lets look at the api details of numpy append. When we look into the details - we get a small note - 

Note that append does not occur in-place: a new array is allocated and filled. If axis is None, out is a flattened array.
So, unlike the python list, the numpy array is not a linked list implementation & every time we add a element, it copies all existing contents of the array with an additional element space - before inserting. Which is a very very costly operation.

So when you need to build an numpy array, always build it first with python list and then convert it to numpy array.


No comments:

Post a Comment