April 27, 2018

Python name guard and its importance

Why do we need "__main__" guard in python code?

We all have seen code where we have protected the python code with "__main__" guard. Why do we need this? take a look at below explanation

File - grepinfiles.py

import sys

def grep(ptrn,txtfl):
    with open(txtfl) as f:
        for line in f:
            if ptrn in line:
                yield line.rstrip('\n')

ptrn,txtfl = sys.argv[1],sys.argv[2]
for matchline in grep(ptrn,txtfl):
    print(matchline)


For a sample input file

>>cat /tmp/1.txt
This is a sample code for grep
we do no have any example for egrep


We get the below output

python grepinfiles.py egrep /tmp/1.txt
we do no have any example for egrep


Now, lets use this module as package in another module.

File: finderror.py


import sys
from grepinfiles import findpattern

txtfl = sys.argv[1]
for line in findpattern('ERROR',txtfl):
    print(line)

when you run this function, we get the below output.


>> python finderror.py /tmp/1.txt
Traceback (most recent call last):
  File "finderror.py", line 2, in <module>
    from grepinfiles import findpattern
  File "/home/user/workspace/blog_examples/python_name_gaurd/grepinfiles.py", line 9, in <module>
    ptrn,txtfl = sys.argv[1],sys.argv[2]
IndexError: list index out of range

Why is this error???

Magic variable called "__name__"

Now, lets modify the code slightly and run the same command.

File: grepinfiles.py


import sys

def findpattern(ptrn,txtfl):
    print("Inside the module",__name__)
    with open(txtfl) as f:
        for line in f:
            if ptrn in line:
                yield line.rstrip('\n')

if __name__ == "__main__":
    ptrn,txtfl = sys.argv[1],sys.argv[2]
    for matchline in findpattern(ptrn,txtfl):
        print(matchline)

and

File: finderror.py


import sys
from grepinfiles import findpattern


if __name__ == "__main__":
    txtfl = sys.argv[1]
    for line in findpattern('ERROR',txtfl):
        print(line)


Now, check the output for the modified source code


>> python grepinfiles.py egrep /tmp/1.txt
('Inside the module', '__main__')
we do no have any example for egrep
>> python finderror.py /tmp/1.txt
('Inside the module', 'grepinfiles')

Explanation

The main package which is invoked by the python interpreter will have __name__ variable set to __main__
Any other module/package which is invoked by main package/module will have __name__ as the module name itself.

So, when finderror.py was invoked,
  • finderror.py module will have __name__ set to __main__
  • grepinfiles.py module will have __name__ set to 'grepinfiles'
However, when only grepinfiles.py was invoked,
  • grepinfiles.py module will have __name__ set to '__main__'

Conclusion

Name guard is a mechanism to customize your python module/package to run any specific code for the module when invoked independently. Also, it is a mechanism to safeguard the code base which are not to be executed when invoked from other up-stream modules/functions.


1 comment: