May 7, 2018

Use of __repr__ in python


Problem Statement


How many times you are in the process of debugging/ printing/logging your python objects for a critical issue analysis, and you find output which is very obscure??? Quiet Often?


<__main__.Point instance at 0x7f23e62995f0>

Actual Reason

Consider a 'Point' class which takes in two co-ordinates. Now, when you print the object details, python interpreter understands it as a command to provide the instance details of the object along with its associated class.


class Point:
    def __init__(self, x_cord, y_cord):
        self.x = x_cord
        self.y = y_cord



if __name__ == "__main__":
    d = Point(3,4)
    print d

So, when you run a simple code provided above, you will see that the output is very complicate details of the class object - which provides very little information.


<__main__.Point instance at 0x7f23e62995f0>

For some, this might be sufficient. However, for many of us, we need to understand more about the object than just the plain instance details associated with the class.

Workaround

Python has provided a beautiful mechanism to have a workaround with this. Here we use a built-in class member called __repr__.

Now, consider the below modified version of the same class definition.


class Point:
    def __init__(self, x_cord, y_cord):
        self.x = x_cord
        self.y = y_cord

    def __repr__(self):
        return 'Point in co-ordindate form where x = %s, y = %s' % (self.x,self.y)



if __name__ == "__main__":
    d = Point(3,4)
    print d

when you run the above program, the out changes to the text description as shown below



lvtb9:~/workspace/blog_examples/repr_usage$ python class_repr_usage.py
Point in co-ordindate form where x = 3, y = 4

Wonderful!!!

Conclusion

We might think that these minor tweaks carry little value in overall day-day product development/maintainance.

However, think again - let's say you need to log the object details when an exception is thrown for a code which is legacy and difficult to change. Now just by extending the details of the '__repr__' in your legacy code, you will be easily able to extend the logging details and equip yourself for effective debugging.



April 27, 2018

Python name guard and its importance

Why do we need "__main__" guard in python code?

We all have seen code where we have protected the python code with "__main__" guard. Why do we need this? take a look at below explanation

File - grepinfiles.py

import sys

def grep(ptrn,txtfl):
    with open(txtfl) as f:
        for line in f:
            if ptrn in line:
                yield line.rstrip('\n')

ptrn,txtfl = sys.argv[1],sys.argv[2]
for matchline in grep(ptrn,txtfl):
    print(matchline)


For a sample input file

>>cat /tmp/1.txt
This is a sample code for grep
we do no have any example for egrep


We get the below output

python grepinfiles.py egrep /tmp/1.txt
we do no have any example for egrep


Now, lets use this module as package in another module.

File: finderror.py


import sys
from grepinfiles import findpattern

txtfl = sys.argv[1]
for line in findpattern('ERROR',txtfl):
    print(line)

when you run this function, we get the below output.


>> python finderror.py /tmp/1.txt
Traceback (most recent call last):
  File "finderror.py", line 2, in <module>
    from grepinfiles import findpattern
  File "/home/user/workspace/blog_examples/python_name_gaurd/grepinfiles.py", line 9, in <module>
    ptrn,txtfl = sys.argv[1],sys.argv[2]
IndexError: list index out of range

Why is this error???

Magic variable called "__name__"

Now, lets modify the code slightly and run the same command.

File: grepinfiles.py


import sys

def findpattern(ptrn,txtfl):
    print("Inside the module",__name__)
    with open(txtfl) as f:
        for line in f:
            if ptrn in line:
                yield line.rstrip('\n')

if __name__ == "__main__":
    ptrn,txtfl = sys.argv[1],sys.argv[2]
    for matchline in findpattern(ptrn,txtfl):
        print(matchline)

and

File: finderror.py


import sys
from grepinfiles import findpattern


if __name__ == "__main__":
    txtfl = sys.argv[1]
    for line in findpattern('ERROR',txtfl):
        print(line)


Now, check the output for the modified source code


>> python grepinfiles.py egrep /tmp/1.txt
('Inside the module', '__main__')
we do no have any example for egrep
>> python finderror.py /tmp/1.txt
('Inside the module', 'grepinfiles')

Explanation

The main package which is invoked by the python interpreter will have __name__ variable set to __main__
Any other module/package which is invoked by main package/module will have __name__ as the module name itself.

So, when finderror.py was invoked,
  • finderror.py module will have __name__ set to __main__
  • grepinfiles.py module will have __name__ set to 'grepinfiles'
However, when only grepinfiles.py was invoked,
  • grepinfiles.py module will have __name__ set to '__main__'

Conclusion

Name guard is a mechanism to customize your python module/package to run any specific code for the module when invoked independently. Also, it is a mechanism to safeguard the code base which are not to be executed when invoked from other up-stream modules/functions.


March 31, 2018

How to use linux perf tools and save dollars - Part 2

Flame Graph - Introduction

In our previous discussion (I strongly recommend to go through the linked blog), we have seen how in real world, linux perf tool would be helpful in writing a very well optimized code. We have seen an example of how we could save loads of CPU cycles in the example linked. 

Now we will see how to visualize the output which would help us to find the bottle-neck super fast!!!

What is flame graph?

Flame graph is a tool which converts the binary data file from perf tool into a "*.svg" a more comfortable visualized format. In fact, to put it more generically flame graph is a tool which helps to represent different profile data - like perf tool, Dtrace etc in a visual format.


Ok... Tell me how to use it

Step - 1

Install the flame and generate perf data dump. For this, we are using the example in previous discussion - and generate the perf data dump. Check the below GIF which provides the steps executed - 

Install Flame Graph and using perf generate the perf dump

Step - 2

Follow the steps shown in the below GIF to convert the dump into a "*.svg" file which can be opened in any browser and analyzed.

Convert perf data to flame graph

Step - 3

Analysis of the graph

  • In the GIF, you can see that the graph generated most of the stack (represents CPU cycles) is related to usleep system call.
  • Also, the usleep is triggered by consumer call-back function of the consumer thread.
  • Compare the total area of the consumer and producer we would be clearly identifying the total CPU consumed by consumer is far more than producer.
This immediately helps us concluded that - consumer slow is excessively slow and is due to the function usleep in the consumer call back function.

Óyela.... we will get all required answer!!!!

Conclusion

  • Use the flame graph tool very effectively, (even in multi-threaded applications also) to visualize your application bottle-neck and fix them.
  • Both the long term costs saved, and time to debug will be sufficiently reduced with this great tool.
Keeping looking into this space, you would get much more information on this tool.
Would be happy to hear you out and help you... Please leave your opinion in the comment section.

And a closing thought...



How to find linux kernel version in your distros

How do you find the linux kernel version you are running?

Run the below command to find out the Linux kernel version.

uname -r

In my distros, I get the below response

~/ $ uname -r
3.13.0-142-generic
~/ $



Here is a short clip on how to run and get the details.
Added bonus, I have given more details on other options for the same command.

How to find linux kernel version


March 30, 2018

SBT run/test for every edit automatically

What is it about?

Imagine you have to do huge number of smaller edits to your source code in scala and verify the output. Most of the time, in other languages especially, you end up doing lots of manual work that consumes lots of repeated boring time.

So, we end up taking some time to automate them - change -> build -> compile -> run. Which is smartly handled in scala.

Ok, tell me about the savior

~ (Tilde) is our savior. 

Run the program with prefix of ~ to ensure it continuously polls for the change and executes it when detected one.

Like :


sbt> ~run


Ok, show me a small clip with example

Here you go

GIF on how to use tilde and run on-the-go


Similarly, you can do the same for testOnly command to build your test cases.

Here you go for the test suite example

GIF on how to use tilde and run test cases on the go


Wonderful, ain't it?
Let me know your thoughts in the comment section.

March 29, 2018

How to find which flavor and which version of JDK do I have on my Ubuntu

Command to find the file containing the jdk information

Command:

file `which java javac`

Output:


~/ $ file `which java javac`
/usr/bin/java:  symbolic link to `/etc/alternatives/java'
/usr/bin/javac: symbolic link to `/etc/alternatives/javac'


Command to find the jdk version information

Command and Output


~/ $ file /etc/alternatives/java /etc/alternatives/javac
/etc/alternatives/java:  symbolic link to `/usr/lib/jvm/java-8-oracle/jre/bin/java'
/etc/alternatives/javac: symbolic link to `/usr/lib/jvm/java-8-oracle/bin/javac'
~/ $

Now, you are having Java/JDK version 8 of oracle.

How to use linux perf tool and save dollars

When does this tool help me?

Ask yourself if you have any of the following problem
  1. Does you application built on linux run slow? And want to find out the reason?
  2. Do you want to be cost-effective and don't want to buy any high licensed profiling product?
  3. Do you want a reliable and accurate profiling tool in the linux distros?
If you have a "yes" for most of the above questions, then this post should help you provide a gateway into the "easy world of profilers".

What does this tool do?

  • It is a tool available in linux distros (Linux 2.6+) (How to find linux kernel version?) which abstracts CPU hardware differences and provides a simple way to instrument/profile your code. 
          In plain terms, it is a tool which is helpful in finding out any possible bottle-necks in your applications.
  • Unlike tools such as purecoverage/purify/quanitfy series, perf does not need your application to be recompile.
          In plain terms, this tool does not add any additional code (called instrumenting) overhead and can run on precompiled binary. It derives all required metrics from the linux low level system parameters thus saving your day.

         Another major worry for any instrument code is that, during the process of instrumenting code, it slows down your application thus sometimes takes the real essence of the tool. perf stands out as one of the big guy, who does not add much overhead to your application during profiling.

OK, Show me some examples (I mean real example)

Consider the below consumer producer problem - where
  • Producer puts data to mutex protected buffer every 1 second.
  • Consumer consumes the data only when it reaches 10000 messages (It does not want to process every message, it would want to take data in batches.
  • And we have 100 consumer - producer pair running in our application


  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
/ for sleep
#include <unistd.h>

// streams
#include <iostream>
#include <sstream>

// pthread
#include <pthread.h>

// memcpy
#include <string.h>

// strings
#include <string>
#include <vector>

using std::cout;
using std::endl;
using std::string;
using std::vector;
using std::stringstream;

#define INIT_COUNT_BUF_SIZE 4096
#define ITER 10

class Mutex {
public:
    Mutex() {
        pthread_mutex_init(&m, NULL);
    }
    ~Mutex() {
        pthread_mutex_destroy(&m);
    }

    void Lock() {
        pthread_mutex_lock(&m);
    }

    void Unlock() {
        pthread_mutex_unlock(&m);
    }

    bool Trylock() {
        return (pthread_mutex_trylock(&m) == 0);
    }
private:
    pthread_mutex_t m;
};

class SyncedBuffer {
public:
    SyncedBuffer()  {
        mBuffer.reserve(INIT_COUNT_BUF_SIZE);
    }

    virtual ~SyncedBuffer() {
    }

    size_t Size() {
        Lock();
        size_t ret = mBuffer.size();
        Unlock();
        return ret;
    }

    void Write(const char* src, size_t n) {
        Lock();

        mBuffer.push_back(string((char*)src, n));

        Unlock();
    }

    size_t Read(char* dst, size_t n) {
        Lock();

        memcpy((void*)dst, mBuffer.back().c_str(), mBuffer.back().size());
        size_t copy_size = mBuffer.back().size();
        mBuffer.pop_back();

        Unlock();

        return copy_size;
    }

    void Reset() {
        Lock();

        mBuffer.clear();

        Unlock();
    }

protected:
    void Lock() {
        m.Lock();
    }

    void Unlock() {
        m.Unlock();
    }


    vector<string> mBuffer;
    Mutex m;

};

void* producer(void *ptr) {
    string prefix("");

    SyncedBuffer *b = (SyncedBuffer*)ptr;

    cout << prefix << "producer started" << endl;

    string data = "simple data string #";

    do{
        static int i = 1;
        stringstream ss;
        ss << data << i++;
        string s = ss.str();
        b->Write(s.c_str(), s.size());
        cout<<" P "<<b->Size()<<endl;
        usleep(1000000);                              //Producer pumps data every 3 seconds.
        //if(i % 100 == 0) cout<<"Producer count"<<i<<endl;
    } while(true);

    cout << prefix << "producer finished" << endl;

    return (void*)0;
}

void* consumer(void *ptr) {
    string prefix("\t\t\t\t\t\t");

    SyncedBuffer *b = (SyncedBuffer*)ptr;

    char data[INIT_COUNT_BUF_SIZE];

    cout << prefix << "consumer started" << endl;

    size_t read = 0;

    do{
        static int i = 1;
        if(b->Size() <= 10000 )
        {
            usleep(1);                              //Bad code in consumer to sleep very less time
        }
        else
        {
            read = b->Read(data, INIT_COUNT_BUF_SIZE);
            string s;
            s.append(data, read);
            i++;
        }
    } while(true);

    cout << prefix << "consumer finished" << endl;

    return (void*)0;
}

int main() {
    const int COUNT = 100;
    SyncedBuffer b[COUNT];

    pthread_t prod_thread[COUNT];
    pthread_t cons_thread[COUNT];

    for (int i = 0; i < COUNT; i++)
        pthread_create(&prod_thread[i], NULL, producer, (void*)&(b[i]));

    sleep(2); // seconds

    for (int i = 0; i < COUNT; i++)
        pthread_create(&cons_thread[i], NULL, consumer, (void*)&(b[i]));

    for (int i = 0; i < COUNT; i++)
    {
        pthread_join(prod_thread[i], NULL);
        pthread_join(cons_thread[i], NULL);
    }

}


Now you can compile the above snippet in with below command line option


1
g++ -o a.out buffer.cpp -Wall -l pthread

Strangely, when you run the top report for the above program, it would look like below


1
2
3
4
5
6
7
8
top - 13:30:31 up 164 days,  3:59,  2 users,  load average: 0.83, 0.75, 0.39
Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.9 us,  5.7 sy,  0.0 ni, 93.4 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem:  32767840 total, 28525000 used,  4242840 free,   519348 buffers
KiB Swap: 32772092 total,   187724 used, 32584368 free 25386684 cached Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
23453 root      20   0 5786032   4420   2236 S 237.8  0.0   0:44.46 a.out

A strange thing you would identify is that the CPU taken by processor is 237.8 %.... Howdy!!!
Now let's see if our perf tool can find out the culprit....

Running the tool

Command to run (I will explain more about the parameters in later blog) to record the profiler information


1
perf record --call-graph dwarf -F 99 ./a.out  -g sleep 120

Command to peep-into the inner happenings in the tool


1
perf report -n -i perf.data

The output produced is.... - there goes my first clue!!!!


1
2
3
4
5
6
7
8
9
                                                 ┌─Warning:───────────────────────────────┐
                                                 │Processed 7149 events and lost 6 chunks!│
                                                 │                                        │
                                                 │Check IO/CPU overload!                  │
                                                 │                                        │
                                                 │                                        │
                                                 │                                        │
                                                 │Press any key...                        │
                                                 └────────────────────────────────────────┘

It clearly tell us that there is a incorrect code-base which is causing a huge CPU overload (bingo it matches our top report).

Now, if you dis-assemble the report, you will get the below details


Samples: 6K of event 'cycles:pp', Event count (approx.): 76746912353054071
  Children      Self       Samples  Command  Shared Object        Symbol                                                                    
+   96.09%     0.09%             6  a.out    a.out                [.] consumer                                                              
+   92.53%     3.55%            30  a.out    libc-2.19.so         [.] __GI___libc_nanosleep                                                 
-   92.37%     0.00%             1  a.out    libc-2.19.so         [.] usleep                                                                
   - usleep                                                                                                                                 
      - 92.37% consumer                                                                                                                     
         - start_thread                                                                                                                     
           __clone                                                                                                                          
        0.00% producer

hmmm... so consumer has a usleep which consumes almost all CPU.

Conclusion

  • The micro second sleep in consumer (callback function) is very ineffective.
  • Most of the CPU utilization is gone in sleep, wake-up from the kernel and again sleeping. This un-necessary ineffective code is causing us precious CPU cycles being wasted which otherwise could have been used for other purpose.
Extend this line of thinking a bit-further, assume you have such in-effective application running on AWS and you paying precious $$$ for the bad code. A HUGE loss of revenue which could be caught just by running the same application with this tool and save your life....

Home-work : Run the same application, by changing the usleep(1) in consumer - Line 150 to usleep(1000000). You would see the change.

hmmm, It looks interesting. What more can I do with it?

A lot more... wait for my next blog
In the meanwhile, you can comment on any such similar free tools which can potentially catch such problem. :)

Further Reading...

How to use flame-graph?

And a closing thought....