Issue
Assume you need to add 100 elements in the numpy array, the initial intuition is to run-down the loop construct and append the element to the array.
However, this would be a grave mistake in-terms of extension-ability of the code. Assume that the elements changes from 100 to 50000 - what would you expect? Let's see with below example
Comparing numpy append vs list append
Check out this example (link) which run's through adding elements
- directly to numpy array - using the append api provided by numpy
- indirectly create the numpy array - using a list (to construct the information) and converting the list to array using numpy array api.
We see that the Option(2) is much more efficient compared to Option(1). And more the number of elements to add to numpy array, more efficient is the option (2).
Result Analysis
Let's look at the sample results obtained in the run
Num of elements
|
Time taken (in sec) to directly
add in numpy (A)
|
Time taken (in sec) to add to list and then convert to numpy (B)
|
Approach A is slower compared to B by
|
10000
|
0.112673044205
|
0.00116991996765
|
96 times
|
20000
|
0.218866825104
|
0.00390100479126
|
56 times
|
30000
|
0.721004962921
|
0.00337600708008
|
213 times
|
40000
|
1.38639616966
|
0.00489687919617
|
283 times
|
50000
|
2.46702504158
|
0.00590395927429
|
417 times
|
So we see that the as the number of elements to insert increases, the factor by which the insert takes time increases drastically (as depicted by below graph)
Reason
Now, lets look at the api details of numpy append. When we look into the details - we get a small note -
Note that append
does not occur in-place: a new array is allocated and filled. If axis is None, out is a flattened array.
So, unlike the python list, the numpy array is not a linked list implementation & every time we add a element, it copies all existing contents of the array with an additional element space - before inserting. Which is a very very costly operation.
So when you need to build an numpy array, always build it first with python list and then convert it to numpy array.