Chapter 1 - Essential Scripts

Section 1.1 - Argparse Example

Example for setting up arguments for your command line utility.

Example Usage:

$ python argparse.py

References:

Argparse configuration

This function shows an example of creating an argparse instance with required and optional parameters. Further, it demonstrates how to set default values and boolean arguments. the argparse module has many more features documented at https://docs.python.org/3/library/argparse.html

def setup_argparse():
    # Setup a parser instance with common fields including a
    # description and epilog. The `formatter_class` instructs
    # argparse to show default values set for parameters.
    parser = argparse.ArgumentParser(
        description="Sample Argparse",
        formatter_class=argparse.ArgumentDefaultsHelpFormatter,
        epilog=f"Built by {__author__}, v.{__date__}",
    )

    # The simplest form of adding an argument, the name of the
    # parameter and a description of its form.
    parser.add_argument("INPUT_FILE", help="Input file to parse")
    parser.add_argument("OUTPUT_FOLDER", help="Folder to store output")

    # An optional argument with multiple methods of specifying
    # the parameter. Includes a default value
    parser.add_argument(
        "-l",
        "--log",
        help="Path to log file",
        default=os.path.abspath(
            os.path.join(
                PurePath(__file__).parent,
                PurePath(__file__).name.rsplit(".", 1)[0] + ".log",
            )
        ),
    )

    # An optional argument which does not accept a value, instead
    # just modifies functionality.
    parser.add_argument(
        "-v", "--verbose", action="store_true", help="Include debug log messages"
    )

    # Once we've specified our arguments we can parse them for
    # reference
    args = parser.parse_args()

    # Returning our parsed arguments for further use.
    return args

Section 1.2 - Logging Example

Example for writing logging information to the console and a log file.

Example Usage:

$ python logging_example.py

References:

Logging configuration

This function shows an example of creating a logging instance that writes messages to both STDERR and a file, allowing your script to write content to STDOUT uninterrupted. Additionally, you can set different logging levels for the two handlers - generally you keep debugging information in the log file while writing more critical messages to the console in STDERR.

def setup_logging(logging_obj, log_file, verbose=False):
    """Function to setup logging configuration and test it.

    Args:
        logging_obj: A logging instance, returned from logging.getLogger().
        log_file: File path to write log messages to.
        verbose: Whether or not to enable the debug level in STDERR output.

    Examples:
        >>> sample_logger = logging.getLogger(name=__name__)
        >>> log_path = "sample.log"
        >>> setup_logging(sample_logger, log_path, verbose=True)
        >>> sample_logger.debug("This is a debug message")
        >>> sample_logger.info("This is an info message")
        >>> sample_logger.warning("This is a warning message")
        >>> sample_logger.error("This is a error message")
        >>> sample_logger.critical("This is a critical message")
    """
    logging_obj.setLevel(logging.DEBUG)

    # Logging formatter. Best to keep consistent for most use cases
    log_format = logging.Formatter(
        "%(asctime)s %(filename)s %(levelname)s %(module)s "
        "%(funcName)s %(lineno)d %(message)s"
    )

    # Setup STDERR logging, allowing you uninterrupted
    # STDOUT redirection
    stderr_handle = logging.StreamHandler(stream=sys.stderr)
    if verbose:
        stderr_handle.setLevel(logging.DEBUG)
    else:
        stderr_handle.setLevel(logging.INFO)
    stderr_handle.setFormatter(log_format)

    # Setup file logging
    file_handle = logging.FileHandler(log_file, "a")
    file_handle.setLevel(logging.DEBUG)
    file_handle.setFormatter(log_format)

    # Add handles
    logging_obj.addHandler(stderr_handle)
    logging_obj.addHandler(file_handle)

Docstring References

setup_logging(logging_obj, log_file, verbose=False)

Function to setup logging configuration and test it.

Parameters
  • logging_obj – A logging instance, returned from logging.getLogger().

  • log_file – File path to write log messages to.

  • verbose – Whether or not to enable the debug level in STDERR output.

Examples

>>> sample_logger = logging.getLogger(name=__name__)
>>> log_path = "sample.log"
>>> setup_logging(sample_logger, log_path, verbose=True)
>>> sample_logger.debug("This is a debug message")
>>> sample_logger.info("This is an info message")
>>> sample_logger.warning("This is a warning message")
>>> sample_logger.error("This is a error message")
>>> sample_logger.critical("This is a critical message")

Section 1.3 - Open Files

Example for reading data from encoded text files.

Demonstrates how to handle setting the proper encoding for UTF-8, UTF-16-LE, and UTF-16-BE with the ability to easily expand to support checking other file magic values/signatures.

Example Usage:

$ python open_files.py

References:

Open files with proper encoding

This first function shows an example of opening a file after checking for a byte-order mark (BOM). While this method could be expanded to check for a file’s magic value/file signature, this low-tech method will help with parsing a collection of files that may be UTF-8, UTF-16-LE, and UTF-16-BE, three very common text file encodings. Feel free to build and share on this.

def open_file(input_file):
    """Opens an encoded text file and prints the contents

    Arguments:
        input_file (str): Path to file to open
    """

    test_encoding = open(input_file, "rb")
    bom = test_encoding.read(2)
    file_encoding = "utf-8"
    if bom == b"FEFF":
        file_encoding = "utf-16-le"
    elif bom == b"FFFE":
        file_encoding = "utf-16-be"

    with open(input_file, "r", encoding=file_encoding) as open_input_file:
        for raw_line in open_input_file:
            line = raw_line.strip()
            print(line)

Docstring References

open_file(input_file)

Opens an encoded text file and prints the contents

Parameters

input_file (str) – Path to file to open

Section 1.4 - CSV Example

Example for writing datasets into CSV files.

Demonstrates source datasets comprised of lists of dictionaries and lists of lists as separate functions. Example data is provided in line and will generate two identical CSVs as output.

Example Usage:

$ python csv_example.py

References:

List of dictionaries to CSV

Example data variable:

[
    {'name': 'apple', 'quantity': 10, 'location': 'VT'},
    {'name': 'orange', 'quantity': 5, 'location': 'FL'}
]

This first function shows an example of writing a list containing multiple dictionaries to a CSV file. You can optionally provide an ordered list of headers to filter what rows to show, or let the function use the keys of the first dictionary in the list to generate the header information. The latter option may produce a new order each iteration and is not preferred if you can determine the headers in advance.

def write_csv_dicts(outfile, data, headers=None):
    """Writes a list of dictionaries to a CSV file.

    Arguments:
        outfile (str): Path to output file
        data (list): List of dictionaries to write to file
        headers (list): Header row to use. If empty, will use the
            first dictionary in the `data` list.

    Example:
        >>> list_of_dicts = [
        >>>     {'name': 'apple', 'quantity': 10, 'location': 'VT'},
        >>>     {'name': 'orange', 'quantity': 5, 'location': 'FL'}
        >>> ]
        >>> write_csv_dicts('dict_test.csv', list_of_dicts)
    """

    if not headers:
        # Use the first line of data
        headers = [str(x) for x in data[0].keys()]

    with open(outfile, "w", newline="") as open_file:
        # Write only provided headers, ignore others
        csv_file = csv.DictWriter(open_file, headers, extrasaction="ignore")
        csv_file.writeheader()
        csv_file.writerows(data)

List of ordered lists to CSV

Example data variable:

[
    ['name', 'quantity', 'location'],
    ['apple', 10, 'VT'],
    ['orange', 5, 'FL']
]

This function shows an example of writing a list containing multiple lists to a CSV file. You can optionally provide an ordered list of headers, or let the function use the values of the first element in the list to generate the header information. Unlike the dictionary option, you cannot filter column data by adjusting the provided headers, you must write all columns to the CSV.

def write_csv_lists(outfile, data, headers=None):
    """Writes a list of lists to a CSV file.

    Arguments:
        outfile (str): Path to output file
        data (list): List of lists to write to file
        headers (list): Header row to use. If empty, will use the
            first list in the `data` list.

    Examples:
        >>> fields = ['name', 'quantity', 'location']
        >>> list_of_lists = [
        >>>     ['apple', 10, 'VT'],
        >>>     ['orange', 5, 'FL']
        >>> ]
        >>> write_csv_lists('list_test.csv', list_of_lists, headers=fields)
    """

    with open(outfile, "w", newline="") as open_file:
        # Write only provided headers, ignore others
        csv_file = csv.writer(open_file)
        for count, entry in enumerate(data):
            if count == 0 and headers:
                # If headers are defined, write them, otherwise
                # continue as they will be written anyways
                csv_file.writerow(headers)
            csv_file.writerow(entry)

Docstring References

write_csv_dicts(outfile, data, headers=None)

Writes a list of dictionaries to a CSV file.

Parameters
  • outfile (str) – Path to output file

  • data (list) – List of dictionaries to write to file

  • headers (list) – Header row to use. If empty, will use the first dictionary in the data list.

Example

>>> list_of_dicts = [
>>>     {'name': 'apple', 'quantity': 10, 'location': 'VT'},
>>>     {'name': 'orange', 'quantity': 5, 'location': 'FL'}
>>> ]
>>> write_csv_dicts('dict_test.csv', list_of_dicts)
write_csv_lists(outfile, data, headers=None)

Writes a list of lists to a CSV file.

Parameters
  • outfile (str) – Path to output file

  • data (list) – List of lists to write to file

  • headers (list) – Header row to use. If empty, will use the first list in the data list.

Examples

>>> fields = ['name', 'quantity', 'location']
>>> list_of_lists = [
>>>     ['apple', 10, 'VT'],
>>>     ['orange', 5, 'FL']
>>> ]
>>> write_csv_lists('list_test.csv', list_of_lists, headers=fields)

Section 1.5 - Directory Recursion

File recursion example.

Demonstration of iterating through a directory to interact with files.

Example Usage:

$ python recursion_example.py

References:

List a directory

This function shows an example of displaying all files and folders within a single directory. From here you can further interact with individual files and folders or iterate recursively by calling the function on identified subdirectories.

def list_directory(path):
    """List all file and folder entries in `path`.

    Args:
        path (str): A directory within a mounted file system. May be relative or
            absolute.

    Examples:
        >>> list_directory('.')

    """
    print(f"Files and folders in '{os.path.abspath(path)}':")
    # Quick and easy method for listing items within a single
    # folder.
    for entry in os.listdir(path):
        # Print all entry names
        print(f"\t{entry}")

List a directory recursively

This function shows an example of displaying all files and folders within a all directories. You don’t need to worry about additional function calls as the os.walk() method handles the recursion on subdirectories and your logic can focus on handling the processing of files. This sample shows a method of counting the number of files, subdirectories, and files ending in “.py” as an example.

def iterate_files(path):
    """Recursively iterate over a path, findings all files within the folder
    and its subdirectories.

    Args:
        path (str): A directory within a mounted file system. May be relative or
            absolute.

    Examples:
        >>> number_of_py_files = 0
        >>> for f in iterate_files('../'):
        ...     if f.endswith('.py'):
        ...         number_of_py_files += 1
        >>> print(f"\t{number_of_py_files} python files found "
        ...      f"in {os.path.abspath('../')}")
    """
    # Though `os.walk()` exposes a list of directories in the
    # current `root`, it is rarely used since we are generally
    # interested in the files found within the subdirectories.
    # For this reason, it is common to see `dirs` named `_`.
    # DO NOT NAME `dirs` as `dir` since `dir` is a reserved word!
    for root, dirs, files in os.walk(os.path.abspath(path)):
        # Both `dirs` and `files` are lists containing all entries
        # at the current `root`.
        for file_name in files:
            # To effectively reference a file, you should include
            # the below line which creates a full path reference
            # to the specific file, regardless of how nested it is
            # We can then hand `file_entry` off to other functions.
            yield os.path.join(root, file_name)
iterate_files(path)

Recursively iterate over a path, findings all files within the folder and its subdirectories.

Parameters

path (str) – A directory within a mounted file system. May be relative or absolute.

Examples

>>> number_of_py_files = 0
>>> for f in iterate_files('../'):
...     if f.endswith('.py'):
...         number_of_py_files += 1
>>> print(f"    {number_of_py_files} python files found "
...      f"in {os.path.abspath('../')}")
list_directory(path)

List all file and folder entries in path.

Parameters

path (str) – A directory within a mounted file system. May be relative or absolute.

Examples

>>> list_directory('.')

Indices and tables