Skip to content

Latest commit

 

History

History
383 lines (269 loc) · 10.7 KB

python.md

File metadata and controls

383 lines (269 loc) · 10.7 KB

Python

NOT FULLY PORTED YET.

Python is a popular and easy to use general purpose programming language that is heavily used in Data Analytics and Data Science as well as systems administration.

It's not as amazing for one-liners as Perl is though, which can boost shell scripts more easily.

Core Reading

Learning Python

DevOps Python tools

HariSekhon/DevOps-Python-tools

Readme Card

Shell scripts with Python

Shell scripts using Python and making it easier to install Python pip libraries from PyPI.

HariSekhon/DevOps-Bash-tools

Readme Card

Nagios Plugins in Python

HariSekhon/Nagios-Plugins

Readme Card

Python Library with Unit Tests

HariSekhon/pylib

VirtualEnv

Creates a virtual environment in the local given sub-directory in which to install PyPI modules to avoid clashes with system python libraries.

virtualenv "$directory_name_to_create"

I like top always the directory name venv for the virtualenv:

virtualenv venv

Then to use it before you starting pip installing:

source venv/bin/activate

This prepends to $PATH to use the bin/python and lib/python-3.12/site-packages under the local venv directory:

Now install PyPI modules as usual.

The venv/pyvenv.cfg file will contain some metadata like this:

home = /opt/homebrew/Cellar/python@3.12/3.12.3/bin
implementation = CPython
version_info = 3.12.3.final.0
virtualenv = 20.25.3
include-system-site-packages = false
base-prefix = /opt/homebrew/Cellar/python@3.12/3.12.3/Frameworks/Python.framework/Versions/3.12
base-exec-prefix = /opt/homebrew/Cellar/python@3.12/3.12.3/Frameworks/Python.framework/Versions/3.12
base-executable = /opt/homebrew/Cellar/python@3.12/3.12.3/Frameworks/Python.framework/Versions/3.12/bin/python3.12

Pipenv

https://pipenv.pypa.io/en/latest/

https://github.com/pypa/pipenv

Combines Pip and VirtualEnv into one command.

brew install pipenv

Creates a Pipfile and Pipfile.lock, plus a virtualenv in a standard location $HOME/.local/share/virtualenvs/ if not already inside one.

pipenv install

Activates the virtualenv

pipenv shell

Automatically converts a requirements.txt file into a Pipfile:

pipenv check

Dependency graph:

pipenv graph

Jupyter Notebook

(formerly called IPython Notebook)

https://ipython.org/notebook.html

Interactive web page where you can mix code blocks, rich notes and graphs on the same page, click to execute code blocks and form a page oriented workflow of results and analysis for sharing and demonstrating.

Libraries

You can see these used throughout these GitHub repos:

General

  • GitPython - Git
  • sh - execute shell commands more easily
  • jinja2 - Jinja2 templating
  • humanize - converts units to human readable
  • pyobjc-framework-Quartz - control Mac UI
  • psutil
  • PyInstaller - bundle Python code into standalone executablers (doesn't work for advanced code)
  • sasl

Web

  • requests - easy HTTP request library
  • beautifulsoup4 - HTML parsing library
  • Scrapy - web scraping
  • selenium - Selenium web testing framework

Databases

Cloud

  • boto3 - AWS
  • aws-consoler

CI/CD & Linting

  • python-jenkins - Jenkins
  • TravisPy - for Travis CI
  • pylint - Python linting CLI tool
  • grip - Grip renders local markdown using a local webserver
  • Markdown
  • MarkupSafe
  • checkov
  • semgrep - security / misconfiguration scanning
  • jsonlint
  • yamllint - CLI YAML linting tool

Unit Testing

  • unittest2
  • nose
  • Faker - generate fake but realistic data for unit testing, Python version of the original Perl library, comes with a faker command convenient for shell scripts:

Generate 10 fake addresses:

faker -r 10 address

Virtualization & Containerization

Pub/Sub

Big Data & NoSQL

Data Formats & Analysis

  • avro - Avro
  • ldif3 - LDAP LDIF format
  • jsonlint
  • Markdown
  • MarkupSafe
  • numpy - NumPy for scientific numeric processing
  • pandas - Pandas for data analysis
  • python-cson
  • pyarrow - Apache Arrow and Parquet support, but Parquet support in this is weak, prefer Parquet Tools
  • python-ldap
  • python-snappy - work with Snappy compression format, often pulled as a dependency
  • PyYAML - work with YAML files in Python
  • sciki-learn - SciKit Learn
  • toml
  • xmltodict
  • yamllint - CLI YAML linting tool
  • Faker - generate fake but realistic data for unit testing, Python version of the original Perl library, comes with a faker command convenient for shell scripts:

Generate 10 fake addresses:

faker -r 10 address

Data Visualization

  • matplotlib - General-purpose plotting, highly customizable
  • seaborn - built on matplotlib, higher level to make it easier to great aesthetic visualizations
  • plotly - Interactive graphs, dashboards, 3D plots
  • bokeh - Interactive, web-ready visualizations
  • pandas - Quick and easy plots directly from dataframes
  • networkx - Graph theory, network analysis
  • altair - Declarative statistical visualizations
  • pygal - Vector (SVG) visualizations, interactive
  • graph-tool - Scalable and efficient for large graph analysis

Jython

https://www.jython.org/

Python on the Java JVM.

The ease of Python coding with full access to Java APIs and libraries.

Useful when there aren't Python libraries available or they aren't as fully featured as the Java versions (eg. for Hadoop).

Today, I'd prefer to write in the native JVM language Groovy.

Install

From DevOps-Python-tools:

jython_install.sh

Run

Interactive REPL:

$ jython
Jython 2.7.3 (tags/v2.7.3:5f29801fe, Sep 10 2022, 18:52:49)
[OpenJDK 64-Bit Server VM (Eclipse Adoptium)] on java17.0.1
Type "help", "copyright", "credits" or "license" for more information.
>>>

Run a Jython script and add Java classpath to find any jar dependencies that the script uses:

jython -J-cp "$CLASSPATH" "file.py"

Code

Some Jython programs, such as those using Hadoop HDFS Java API can be found in the DevOps-Python-tools repo.

Troubleshooting

Python Fault Handler

Prints stack trace on crash.

Useful for debugging native-level crashes, C extensions, system calls, OS signals.

Activates handling of signals like:

Signal Description
SIGSEGV Segmentation fault
SIGFPE Floating-point exception
SIGBUS Bus error
SIGABRT Abort signal
SIGILL Illegal instruction

Normally, these signals cause Python to crash without much useful information, but with the fault handler enabled, it'll output a traceback before the crash to help debug.

Minimal performance overhead, but bigger logs, and possibly dumps sensitive info.

Enable Python Fault Handler

export PYTHONFAULTHANDLER=1

or

python -X faulthandler "file.py"

or

import faulthandler
faulthandler.enable()

Alpine ModuleNotFoundError: No module named 'pip._vendor.six.moves'

ModuleNotFoundError: No module named 'pip._vendor.six.moves'

Fix:

apk del py3-pip py-pip
apk add py3-pip

Partial port from private Knowledge Base page 2008+