Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Version 1.20 #47

Merged
merged 5 commits into from
Feb 7, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/workflows/pythonpublish.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,9 @@ jobs:
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v2
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v1
uses: actions/setup-python@v5
with:
python-version: '3.8'
- name: Install dependencies
Expand Down
10 changes: 5 additions & 5 deletions .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,21 +15,21 @@ jobs:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ["3.7", "3.8", "3.9", "3.10"]
python-version: ["3.8", "3.9", "3.10", "3.11", "3.12"]

steps:
- uses: actions/checkout@v2
- uses: actions/checkout@v4

- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v1
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}

- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r requirements-test.txt
pip install coveralls black==22.3.0 flake8 setuptools wheel twine
pip install coveralls black==24.1.1 flake8 setuptools wheel twine

- name: Verify Code with Black
run: |
Expand All @@ -47,7 +47,7 @@ jobs:
python -m pytest --cov=puremagic test/
coveralls || true

- name: Check distrubiton log description
- name: Check distribution log description
run: |
python setup.py sdist bdist_wheel
twine check dist/*
Expand Down
10 changes: 3 additions & 7 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.1.0
rev: v4.5.0
hooks:
# Identify invalid files
- id: check-ast
Expand Down Expand Up @@ -30,15 +30,11 @@ repos:
- id: end-of-file-fixer
exclude: ^test/data/.+
- repo: https://github.com/ambv/black
rev: 22.3.0
rev: 24.1.1
hooks:
- id: black
args: [--config=.black.toml]
- repo: https://gitlab.com/pycqa/flake8
rev: 3.9.2
hooks:
- id: flake8
- repo: https://github.com/pre-commit/mirrors-mypy
rev: 'v0.931'
rev: 'v1.8.0'
hooks:
- id: mypy
7 changes: 7 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,13 @@
Changelog
=========

Version 1.20
------------

- Adding support for multi-part header checks (thanks to Andy)
- Fixing matches for webp (thanks to Nicolas Wicht)
- Fixing matches for epub (thanks to Alexander Walters)

Version 1.15
------------

Expand Down
2 changes: 1 addition & 1 deletion LICENSE
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
The MIT License (MIT)

Copyright (c) 2013-2023 Chris Griffith
Copyright (c) 2013-2024 Chris Griffith

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
Expand Down
7 changes: 3 additions & 4 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -35,10 +35,9 @@ Disadvantages:
Compatibility
~~~~~~~~~~~~~

- Python 3.7+
- Pypy
- Python 3.8+

Using travis-ci to run continuous integration tests on listed platforms.
Using github ci to run continuous integration tests on listed platforms.

Install from pypy
-----------------
Expand Down Expand Up @@ -167,7 +166,7 @@ https://cgit.freedesktop.org/xdg/shared-mime-info/
License
-------

MIT Licenced, see LICENSE, Copyright (c) 2013-2023 Chris Griffith
MIT Licenced, see LICENSE, Copyright (c) 2013-2024 Chris Griffith

.. |CoverageStatus| image:: https://coveralls.io/repos/github/cdgriffith/puremagic/badge.svg?branch=develop
:target: https://coveralls.io/github/cdgriffith/puremagic?branch=develop
Expand Down
54 changes: 48 additions & 6 deletions puremagic/magic_data.json
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,6 @@
["", 0, ".gitconfig", "text/plain", "Git Ignore File"],
["", 0, ".rdp", "", "Windows Remote Desktop File"],
["", 0, ".ini", "text/plain", "INI Config file"],
["", 0, ".epub", "", "INI Config file"],
["", 0, ".key", "", "Encryption Key"],
["", 0, ".pem", "application/x-pem-file", "X.509 Certificate"],
["", 0, ".ps1", "text/plain", "Powershell Script"],
Expand Down Expand Up @@ -53,12 +52,54 @@
["", 0, ".pickle", "", "Python Pickle File"],
["", 0, ".conf", "text/plain", "Configuration File"]
],
"multi-part-headers": {
"464f524d": [
["494c424d", 8, ".iff", "image/x-ilbm", "IFF Interleaved Bitmap Image"],
["38535658", 8, ".iff", "audio/x-8svx", "IFF 8-Bit Sampled Voice"],
["4143424d", 8, ".iff", "application/x-iff", "Amiga Contiguous Bitmap"],
["414e424d", 8, ".iff", "application/x-iff", "IFF Animated Bitmap"],
["414e494d", 8, ".iff", "application/x-iff", " IFF CEL Animation"],
["46415858", 8, ".iff", "application/x-iff", "IFF Facsimile Image"],
["46545854", 8, ".iff", "application/x-iff", "IFF Formatted Text"],
["534d5553", 8, ".iff", "application/x-iff", "IFF Facsimile Image"],
["434d5553", 8, ".iff", "application/x-iff", "IFF Formatted Text"],
["5955564e", 8, ".iff", "application/x-iff", "IFF YUV Image"],
["46414e54", 8, ".iff", "application/x-iff", "Amiga Fantavision Movie"],
["41494646", 8, ".iff", "application/x-iff", "Audio Interchange File Format"]
],
"52494646": [
["57415645", 8, ".wav", "audio/wave", "Waveform Audio File Format"],
["41564920", 8, ".avi", "video/avi", "Audio Video Interleave"],
["57454250", 8, ".webp", "image/webp", "WebP graphics file format"],
["41434f4e", 8, ".ani", "", "Animated cursor"],
["43444441", 8, ".cda", "", "CD-DA stub file"],
["514c434d", 8, ".qcp", "audio/qcelp", "Qualcomm PureVoice"],
["5644524d", 8, ".vdr", "", "VirtualDub"],
["54524944", 8, ".trd", "", "TrID"],
["73687734", 8, ".shw", "", "Corel SHOW! 4.0"],
["73687735", 8, ".shw", "", "Corel SHOW! 5.0"],
["73687235", 8, ".shr", "", "Corel SHOW! 5.0 player"],
["73686235", 8, ".shb", "", " Corel SHOW! 5.0 background"],
["524d4d50", 8, ".mmm", "", "MacroMind Multimedia Movie or Microsoft Multimedia Movie"]
],
"41542654464f524d": [
["444a5655", 12, ".djvu", "image/vnd.djvu", "DjVu single page document or image"],
["444a564d", 12, ".djvu", "image/vnd.djvu+multipage", "DjVu document multi-page document"]
],
"52494658": [
["4647444d", 8, ".dcr", "", "Adobe Shockwave"],
["4d563933", 8, ".dir", "", "Macromedia Director file format"]
]
},
"footers": [
["54525545564953494f4e2d5846494c452e00", -18, ".tga", "image/tga", "Truevision Targa Graphic file"],
["000001b7", -4, ".mpeg", "video/mpeg", "MPEG video file"]
["000001b7", -4, ".mpeg", "video/mpeg", "MPEG video file"],
["3c2f7376673e", -8, ".svg", "image/svg+xml", "Scalable Vector Graphics Image"],
["3c2f7376673e", -7, ".svg", "image/svg+xml", "Scalable Vector Graphics Image"],
["3c2f7376673e", -6, ".svg", "image/svg+xml", "Scalable Vector Graphics Image"]
],
"headers": [
["3c3f786d6c2076657273696f6e3d", 0, ".xml", "application/xml", "XML Document"],
["3c3f786d6c", 0, ".xml", "application/xml", "XML Document"],
["454c46", 1, ".AppImage", "application/x-iso9660-appimage", "AppImage application bundle"],
["4341434845204d414e4946455354", 0, ".manifest", "text/cache-manifest", "Web application cache manifest"],
["425a68", 0, ".tar.bz2", "application/x-bzip2", "bzip2 compressed archive"],
Expand Down Expand Up @@ -110,7 +151,7 @@
["456c6646696c6500", 0, ".evtx", "", "Windows Vista event log"],
["23204469736b2044", 0, ".vmdk", "application/octet-stream", "VMware 4 Virtual Disk description"],
["4d444d5093a7", 0, ".hdmp", "", "Windows dump file"],
["464f524d00", 0, ".aiff", "audio/aiff", "Audio Interchange File"],
["464f524d", 0, ".aiff", "audio/aiff", "Audio Interchange File"],
["4d546864", 0, ".midi", "audio/midi", "MIDI sound file"],
["2e524d46", 0, ".rmvb", "", "RealMedia streaming media"],
["504b0304", 0, ".docx", "application/vnd.openxmlformats-officedocument.wordprocessingml.document", "MS Office Open XML Format Document"],
Expand All @@ -132,6 +173,7 @@
["ffd8ff", 0, ".jfif", "image/jpeg", "JPEG|EXIF|SPIFF images"],
["514649", 0, ".qemu", "", "Qcow Disk Image"],
["504b5c3030335c303034", 0, ".epub", "application/epub+zip", "electronic book document"],
["6d696d65747970656170706c69636174696f6e2f657075622b7a6970", 30, ".epub", "application/epub+zip", "electronic book document"],
["46726f6d20", 0, ".mbox", "application/mbox", "mailbox file"],
["232552414d4c20", 0, ".raml", "application/raml+yaml", "RAML document"],
["7a1a2010", 0, ".sisx", "x-epoc/x-sisx-app", "SISX package"],
Expand Down Expand Up @@ -252,7 +294,6 @@
["ffffffff", 0, ".sys", "text/plain", "DOS system driver"],
["3c3f786d6c2076657273696f6e3d22312e30223f3e0d0a3c4d4d435f436f6e736f6c6546696c6520436f6e736f6c6556657273696f6e3d22", 0, ".msc", "", "MMC Snap-in Control file"],
["4d6963726f736f66742057696e646f7773204d6564696120506c61796572202d2d20", 84, ".wpl", "", "Windows Media Player playlist"],
["3c3f786d6c2076657273696f6e3d22312e30223f3e", 0, ".xml", "application/xml", "XML Document"],
["4d6963726f736f66742056697375616c", 0, ".sln", "", "Visual Studio .NET file"],
["4d6963726f736f667420432f432b2b20", 0, ".pdb", "", "MS C++ debugging symbols file"],
["4d5a90000300000004000000ffff", 0, ".zap", "", "ZoneAlam data file"],
Expand Down Expand Up @@ -1169,6 +1210,8 @@
["43444441666d7420", 8, ".cda", "", "RIFF CD audio"],
["514c434d666d7420", 8, ".qcp", "audio/vnd.qcelp", "RIFF Qualcomm PureVoice"],
["57454250", 8, ".webp", "image/webp", "RIFF WebP"],
["524946462400000057454250", 0, ".webp", "image/webp", "RIFF WebP"],
["524946462400000057454250565038", 0, ".webp", "image/webp", "RIFF WebP VP8"],
["524d494464617461", 8, ".rmi", "", "RIFF Windows MIDI"],
["484541444552205245434f52442a2a2a", 0, ".xpt", "", "SAS Transport dataset"],
["232153494c4b0a", 0, ".sil", "", "Skype audio compression"],
Expand Down Expand Up @@ -1230,7 +1273,6 @@
["74657874", 73, ".odt", "application/vnd.oasis.opendocument.text", "OpenDocument Text File"],
["70726573656e746174696f6e", 73, ".odp", "application/vnd.oasis.opendocument.presentation", "OpenDocument Presentation"],
["7370726561647368656574", 73, ".ods", "application/vnd.oasis.opendocument.spreadsheet", "OpenDocument Spreadsheet"],
["786d6c", 2, ".xml", "application/xml", "XML File"],
["526172211a070100", 0, ".rar", "application/vnd.rar", "RAR Archive"],
["4d5a9000", 0, ".exe", "application/vnd.microsoft.portable-executable", "Windows Executable"],
["00010000", 0, ".ttf", "font/ttf", "TTF Font"],
Expand Down
69 changes: 56 additions & 13 deletions puremagic/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
magic numbers. It is designed to be minimalistic and inherently cross platform
compatible, with no imports when used as a module.

© 2013-2023 Chris Griffith - License: MIT (see LICENSE)
© 2013-2024 Chris Griffith - License: MIT (see LICENSE)

Acknowledgements
Gary C. Kessler
Expand All @@ -17,10 +17,10 @@
import binascii
from itertools import chain
from collections import namedtuple
from typing import Union, Tuple, List
from typing import Union, Tuple, List, Dict, Optional

__author__ = "Chris Griffith"
__version__ = "1.15"
__version__ = "1.20"
__all__ = [
"magic_file",
"magic_string",
Expand All @@ -32,6 +32,7 @@
"PureError",
"magic_footer_array",
"magic_header_array",
"multi_part_header_dict",
]

here = os.path.abspath(os.path.dirname(__file__))
Expand Down Expand Up @@ -65,23 +66,37 @@ class PureError(LookupError):

def _magic_data(
filename: Union[os.PathLike, str] = os.path.join(here, "magic_data.json"),
) -> Tuple[List[PureMagic], List[PureMagic], List[PureMagic]]:
) -> Tuple[
List[PureMagic],
List[PureMagic],
List[PureMagic],
Dict[bytes, List[PureMagic]],
]:
"""Read the magic file"""
with open(filename) as f:
data = json.load(f)
headers = sorted((_create_puremagic(x) for x in data["headers"]), key=lambda x: x.byte_match)
footers = sorted((_create_puremagic(x) for x in data["footers"]), key=lambda x: x.byte_match)
extensions = [_create_puremagic(x) for x in data["extension_only"]]
return headers, footers, extensions
multi_part_extensions = {}
for header_match, option_list in data["multi-part-headers"].items():
multi_part_extensions[binascii.unhexlify(header_match.encode("ascii"))] = [
_create_puremagic(x) for x in option_list
]
return headers, footers, extensions, multi_part_extensions


def _create_puremagic(x: List) -> PureMagic:
return PureMagic(
byte_match=binascii.unhexlify(x[0].encode("ascii")), offset=x[1], extension=x[2], mime_type=x[3], name=x[4]
byte_match=binascii.unhexlify(x[0].encode("ascii")),
offset=x[1],
extension=x[2],
mime_type=x[3],
name=x[4],
)


magic_header_array, magic_footer_array, extension_only_array = _magic_data()
magic_header_array, magic_footer_array, extension_only_array, multi_part_header_dict = _magic_data()


def _max_lengths() -> Tuple[int, int]:
Expand Down Expand Up @@ -127,9 +142,31 @@ def _identify_all(header: bytes, footer: bytes, ext=None) -> List[PureMagicWithC

for magic_row in magic_footer_array:
start = magic_row.offset
if footer[start:] == magic_row.byte_match:
end = magic_row.offset + len(magic_row.byte_match)
match_area = footer[start:end] if end != 0 else footer[start:]
if match_area == magic_row.byte_match:
matches.append(magic_row)

new_matches = set()
for matched in matches:
if matched.byte_match in multi_part_header_dict:
for magic_row in multi_part_header_dict[matched.byte_match]:
start = magic_row.offset
end = magic_row.offset + len(magic_row.byte_match)
if end > len(header):
continue
if header[start:end] == magic_row.byte_match:
new_matches.add(
PureMagic(
byte_match=header[matched.offset : end],
offset=magic_row.offset,
extension=magic_row.extension,
mime_type=magic_row.mime_type,
name=magic_row.name,
)
)

matches.extend(list(new_matches))
return _confidence(matches, ext)


Expand Down Expand Up @@ -207,7 +244,9 @@ def from_file(filename: Union[os.PathLike, str], mime: bool = False) -> str:
return _magic(head, foot, mime, ext_from_filename(filename))


def from_string(string: Union[str, bytes], mime: bool = False, filename: Union[os.PathLike, str] = None) -> str:
def from_string(
string: Union[str, bytes], mime: bool = False, filename: Optional[Union[os.PathLike, str]] = None
) -> str:
"""Reads in string, attempts to identify content based
off magic number and will return the file extension.
If mime is True it will return the mime type instead.
Expand All @@ -225,7 +264,7 @@ def from_string(string: Union[str, bytes], mime: bool = False, filename: Union[o
return _magic(head, foot, mime, ext)


def from_stream(stream, mime: bool = False, filename: Union[os.PathLike, str] = None) -> str:
def from_stream(stream, mime: bool = False, filename: Optional[Union[os.PathLike, str]] = None) -> str:
"""Reads in stream, attempts to identify content based
off magic number and will return the file extension.
If mime is True it will return the mime type instead.
Expand Down Expand Up @@ -260,7 +299,7 @@ def magic_file(filename: Union[os.PathLike, str]) -> List[PureMagicWithConfidenc
return info


def magic_string(string, filename: Union[os.PathLike, str] = None) -> List[PureMagicWithConfidence]:
def magic_string(string, filename: Optional[Union[os.PathLike, str]] = None) -> List[PureMagicWithConfidence]:
"""
Returns tuple of (num_of_matches, array_of_matches)
arranged highest confidence match first
Expand All @@ -279,7 +318,7 @@ def magic_string(string, filename: Union[os.PathLike, str] = None) -> List[PureM
return info


def magic_stream(stream, filename: Union[os.PathLike, str] = None) -> List[PureMagicWithConfidence]:
def magic_stream(stream, filename: Optional[Union[os.PathLike, str]] = None) -> List[PureMagicWithConfidence]:
"""Returns tuple of (num_of_matches, array_of_matches)
arranged highest confidence match first
If filename is provided it will be used in the computation.
Expand Down Expand Up @@ -308,7 +347,11 @@ def command_line_entry(*args):
)
)
parser.add_argument(
"-m", "--mime", action="store_true", dest="mime", help="Return the mime type instead of file type"
"-m",
"--mime",
action="store_true",
dest="mime",
help="Return the mime type instead of file type",
)
parser.add_argument("files", nargs="+")
args = parser.parse_args(args if args else sys.argv[1:])
Expand Down
1 change: 1 addition & 0 deletions test/resources/fake_file
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
D���y�ڌs"��Яaj�W��5��;b{#°J!�-�]�V�W�\����d�n���,
Loading