binaryornot

Binary file detection that actually works. 131 extensions, 55 magic-byte signatures, a trained decision tree, and zero dependencies.

Latest version: 0.6.0 registry icon
Maintenance score
42
Safety score
100
Popularity score
13
Check your open source dependency risks. Get immediate insight about security, stability and licensing risks.
Security
  Vulnerabilities
Version Suggest Low Medium High Critical
0.6.0 0 0 0 0 0
0.5.0 0 0 0 0 0
0.4.4 0 0 0 0 0
0.4.3 0 0 0 0 0
0.4.2 0 0 0 0 0
0.4.0 0 0 0 0 0
0.3.0 0 0 0 0 0
0.2.0 0 0 0 0 0
0.1.1 0 0 0 0 0
0.1.0 0 0 0 0 0

Stability
Latest release:

0.6.0 - This version may not be safe as it has not been updated for a long time. Find out if your coding project uses this component and get notified of any reported security vulnerabilities with Meterian-X Open Source Security Platform

Licensing

Maintain your licence declarations and avoid unwanted licences to protect your IP the way you intended.

MIT   -   MIT License

Not a wildcard

Not proprietary

OSI Compliant



BinaryOrNot

Python library and CLI tool to check if a file is binary or text. Zero dependencies.

from binaryornot.check import is_binary

is_binary("image.png")    # True
is_binary("README.md")    # False
is_binary("data.sqlite")  # True
is_binary("report.csv")   # False
$ binaryornot image.png
True

Install

pip install binaryornot

Why not just check for null bytes?

That's the first thing everyone tries. It works until it doesn't:

  • A UTF-16 text file is full of null bytes. Your tool thinks it's binary and corrupts it.
  • A Big5 or GB2312 text file has high-ASCII bytes everywhere. Looks binary by byte ratios alone.
  • A font file (.woff, .eot) is clearly binary but might not have null bytes in the first chunk.

BinaryOrNot reads the first 512 bytes and runs them through a trained decision tree that considers byte ratios, Shannon entropy, encoding validity, BOM detection, and more. It handles all the edge cases above correctly, with zero dependencies.

Tested against 37 text encodings and 49 binary formats, verified by parametrized tests driven from coverage CSVs.

API

One function:

from binaryornot.check import is_binary

is_binary(filename)  # returns True or False

There's also is_binary_string() if you already have bytes:

from binaryornot.helpers import is_binary_string

# Read a chunk from a file and classify it
with open("mystery_file", "rb") as f:
    chunk = f.read(512)
is_binary_string(chunk)

Full documentation covers the detection algorithm in detail.

Credits

Created by Audrey Roy Greenfeld.