From b804443cb3df2ca4b71070739aa1c8f3c86464fc Mon Sep 17 00:00:00 2001 From: Jim Mussared Date: Tue, 27 Jun 2023 02:17:41 +1000 Subject: [PATCH] docs/library/deflate: Add docs for deflate.DeflateIO. Also update zlib & gzip docs to describe the micropython-lib modules. This work was funded through GitHub Sponsors. Signed-off-by: Jim Mussared --- docs/library/deflate.rst | 177 +++++++++++++++++++++++++++++++++++++++ docs/library/gzip.rst | 106 +++++++++++++++++++++++ docs/library/index.rst | 12 +-- docs/library/zlib.rst | 90 +++++++++++++++----- 4 files changed, 357 insertions(+), 28 deletions(-) create mode 100644 docs/library/deflate.rst create mode 100644 docs/library/gzip.rst diff --git a/docs/library/deflate.rst b/docs/library/deflate.rst new file mode 100644 index 000000000..9752af592 --- /dev/null +++ b/docs/library/deflate.rst @@ -0,0 +1,177 @@ +:mod:`deflate` -- deflate compression & decompression +===================================================== + +.. module:: deflate + :synopsis: deflate compression & decompression + +This module allows compression and decompression of binary data with the +`DEFLATE algorithm `_ +(commonly used in the zlib library and gzip archiver). + +**Availability:** + +* Added in MicroPython v1.21. + +* Decompression: Enabled via the ``MICROPY_PY_DEFLATE`` build option, on by default + on ports with the "extra features" level or higher (which is most boards). + +* Compression: Enabled via the ``MICROPY_PY_DEFLATE_COMPRESS`` build option, on + by default on ports with the "full features" level or higher (generally this means + you need to build your own firmware to enable this). + +Classes +------- + +.. class:: DeflateIO(stream, format=AUTO, wbits=0, close=False, /) + + This class can be used to wrap a *stream* which is any + :term:`stream-like ` object such as a file, socket, or stream + (including :class:`io.BytesIO`). It is itself a stream and implements the + standard read/readinto/write/close methods. + + The *stream* must be a blocking stream. Non-blocking streams are currently + not supported. + + The *format* can be set to any of the constants defined below, and defaults + to ``AUTO`` which for decompressing will auto-detect gzip or zlib streams, + and for compressing it will generate a raw stream. + + The *wbits* parameter sets the base-2 logarithm of the DEFLATE dictionary + window size. So for example, setting *wbits* to ``10`` sets the window size + to 1024 bytes. Valid values are ``5`` to ``15`` inclusive (corresponding to + window sizes of 32 to 32k bytes). + + If *wbits* is set to ``0`` (the default), then a window size of 256 bytes + will be used (corresponding to *wbits* set to ``8``), except when + :ref:`decompressing a zlib stream `. + + See the :ref:`window size ` notes below for more information + about the window size, zlib, and gzip streams. + + If *close* is set to ``True`` then the underlying stream will be closed + automatically when the :class:`deflate.DeflateIO` stream is closed. This is + useful if you want to return a :class:`deflate.DeflateIO` stream that wraps + another stream and not have the caller need to know about managing the + underlying stream. + + If compression is enabled, a given :class:`deflate.DeflateIO` instance + supports both reading and writing. For example, a bidirectional stream like + a socket can be wrapped, which allows for compression/decompression in both + directions. + +Constants +--------- + +.. data:: deflate.AUTO + deflate.RAW + deflate.ZLIB + deflate.GZIP + + Supported values for the *format* parameter. + +Examples +-------- + +A typical use case for :class:`deflate.DeflateIO` is to read or write a compressed +file from storage: + +.. code:: python + + import deflate + + # Writing a zlib-compressed stream (uses the default window size of 256 bytes). + with open("data.gz", "wb") as f: + with deflate.DeflateIO(f, deflate.ZLIB) as d: + # Use d.write(...) etc + + # Reading a zlib-compressed stream (auto-detect window size). + with open("data.z", "rb") as f: + with deflate.DeflateIO(f, deflate.ZLIB) as d: + # Use d.read(), d.readinto(), etc. + +Because :class:`deflate.DeflateIO` is a stream, it can be used for example +with :meth:`json.dump` and :meth:`json.load` (and any other places streams can +be used): + +.. code:: python + + import deflate, json + + # Write a dictionary as JSON in gzip format, with a + # small (64 byte) window size. + config = { ... } + with open("config.gz", "wb") as f: + with deflate.DeflateIO(f, deflate.GZIP, 6) as f: + json.dump(config, f) + + # Read back that dictionary. + with open("config.gz", "rb") as f: + with deflate.DeflateIO(f, deflate.GZIP, 6) as f: + config = json.load(f) + +If your source data is not in a stream format, you can use :class:`io.BytesIO` +to turn it into a stream suitable for use with :class:`deflate.DeflateIO`: + +.. code:: python + + import deflate, io + + # Decompress a bytes/bytearray value. + compressed_data = get_data_z() + with deflate.DeflateIO(io.BytesIO(compressed_data), deflate.ZLIB) as d: + decompressed_data = d.read() + + # Compress a bytes/bytearray value. + uncompressed_data = get_data() + stream = io.BytesIO() + with deflate.DeflateIO(stream, deflate.ZLIB) as d: + d.write(uncompressed_data) + compressed_data = stream.getvalue() + +.. _deflate_wbits: + +Deflate window size +------------------- + +The window size limits how far back in the stream the (de)compressor can +reference. Increasing the window size will improve compression, but will +require more memory. + +However, just because a given window size is used for compression, this does not +mean that the stream will require the same size window for decompression, as +the stream may not reference data as far back as the window allows (for example, +if the length of the input is smaller than the window size). + +If the decompressor uses a smaller window size than necessary for the input data +stream, it will fail mid-way through decompression with :exc:`OSError`. + +.. _deflate_wbits_zlib: + +The zlib format includes a header which specifies the window size used to +compress the data (which due to the above, may be larger than the size required +for the decompressor). + +If this header value is lower than the specified *wbits* value, then the header +value will be used instead in order to reduce the memory allocation size. If +the *wbits* parameter is zero (the default), then the header value will only be +used if it is less than the maximum value of ``15`` (which is default value +used by most compressors [#f1]_). + +In other words, if the source zlib stream has been compressed with a custom window +size (i.e. less than ``15``), then using the default *wbits* parameter of zero +will decompress any such stream. + +The gzip file format does not include the window size in the header. +Additionally, most compressor libraries (including CPython's implementation +of :class:`gzip.GzipFile`) will default to the maximum possible window size. +This makes it difficult to decompress most gzip streams on MicroPython unless +your board has a lot of free RAM. + +If you control the source of the compressed data, then prefer to use the zlib +format, with a window size that is suitable for your target device. + +.. rubric:: Footnotes + +.. [#f1] The assumption here is that if the header value is the default used by + most compressors, then nothing is known about the likely required window + size and we should ignore it. diff --git a/docs/library/gzip.rst b/docs/library/gzip.rst new file mode 100644 index 000000000..f36f896db --- /dev/null +++ b/docs/library/gzip.rst @@ -0,0 +1,106 @@ +:mod:`gzip` -- gzip compression & decompression +=============================================== + +.. module:: gzip + :synopsis: gzip compression & decompression + +|see_cpython_module| :mod:`python:gzip`. + +This module allows compression and decompression of binary data with the +`DEFLATE algorithm `_ used by the gzip +file format. + +.. note:: Prefer to use :class:`deflate.DeflateIO` instead of the functions in this + module as it provides a streaming interface to compression and decompression + which is convenient and more memory efficient when working with reading or + writing compressed data to a file, socket, or stream. + +**Availability:** + +* This module is **not present by default** in official MicroPython firmware + releases as it duplicates functionality available in the :mod:`deflate + ` module. + +* A copy of this module can be installed (or frozen) + from :term:`micropython-lib` (`source `_). + See :ref:`packages` for more information. This documentation describes that module. + +* Compression support will only be available if compression support is enabled + in the built-in :mod:`deflate ` module. + +Functions +--------- + +.. function:: open(filename, mode, /) + + Wrapper around built-in :func:`open` returning a GzipFile instance. + +.. function:: decompress(data, /) + + Decompresses *data* into a bytes object. + +.. function:: compress(data, /) + + Compresses *data* into a bytes object. + +Classes +------- + +.. class:: GzipFile(*, fileobj, mode) + + This class can be used to wrap a *fileobj* which is any + :term:`stream-like ` object such as a file, socket, or stream + (including :class:`io.BytesIO`). It is itself a stream and implements the + standard read/readinto/write/close methods. + + When the *mode* argument is ``"rb"``, reads from the GzipFile instance will + decompress the data in the underlying stream and return decompressed data. + + If compression support is enabled then the *mode* argument can be set to + ``"wb"``, and writes to the GzipFile instance will be compressed and written + to the underlying stream. + + By default the GzipFile class will read and write data using the gzip file + format, including a header and footer with checksum and a window size of 512 + bytes. + + The **file**, **compresslevel**, and **mtime** arguments are not + supported. **fileobj** and **mode** must always be specified as keyword + arguments. + +Examples +-------- + +A typical use case for :class:`gzip.GzipFile` is to read or write a compressed +file from storage: + +.. code:: python + + import gzip + + # Reading: + with open("data.gz", "rb") as f: + with gzip.GzipFile(fileobj=f, mode="rb") as g: + # Use g.read(), g.readinto(), etc. + + # Same, but using gzip.open: + with gzip.open("data.gz", "rb") as f: + # Use f.read(), f.readinto(), etc. + + # Writing: + with open("data.gz", "wb") as f: + with gzip.GzipFile(fileobj=f, mode="wb") as g: + # Use g.write(...) etc + + # Same, but using gzip.open: + with gzip.open("data.gz", "wb") as f: + # Use f.write(...) etc + + # Write a dictionary as JSON in gzip format, with a + # small (64 byte) window size. + config = { ... } + with gzip.open("config.gz", "wb") as f: + json.dump(config, f) + +For guidance on working with gzip sources and choosing the window size see the +note at the :ref:`end of the deflate documentation `. diff --git a/docs/library/index.rst b/docs/library/index.rst index 69bc81ade..ae5d3e7d7 100644 --- a/docs/library/index.rst +++ b/docs/library/index.rst @@ -64,6 +64,7 @@ library. collections.rst errno.rst gc.rst + gzip.rst hashlib.rst heapq.rst io.rst @@ -95,6 +96,7 @@ the following libraries. bluetooth.rst btree.rst cryptolib.rst + deflate.rst framebuf.rst machine.rst micropython.rst @@ -194,11 +196,11 @@ Extending built-in libraries from Python A subset of the built-in modules are able to be extended by Python code by providing a module of the same name in the filesystem. This extensibility applies to the following Python standard library modules which are built-in to -the firmware: ``array``, ``binascii``, ``collections``, ``errno``, ``hashlib``, -``heapq``, ``io``, ``json``, ``os``, ``platform``, ``random``, ``re``, -``select``, ``socket``, ``ssl``, ``struct``, ``time`` ``zlib``, as well as the -MicroPython-specific ``machine`` module. All other built-in modules cannot be -extended from the filesystem. +the firmware: ``array``, ``binascii``, ``collections``, ``errno``, ``gzip``, +``hashlib``, ``heapq``, ``io``, ``json``, ``os``, ``platform``, ``random``, +``re``, ``select``, ``socket``, ``ssl``, ``struct``, ``time`` ``zlib``, as well +as the MicroPython-specific ``machine`` module. All other built-in modules +cannot be extended from the filesystem. This allows the user to provide an extended implementation of a built-in library (perhaps to provide additional CPython compatibility or missing functionality). diff --git a/docs/library/zlib.rst b/docs/library/zlib.rst index 96d6c2452..54310b72f 100644 --- a/docs/library/zlib.rst +++ b/docs/library/zlib.rst @@ -1,38 +1,82 @@ -:mod:`zlib` -- zlib decompression -================================= +:mod:`zlib` -- zlib compression & decompression +=============================================== .. module:: zlib - :synopsis: zlib decompression + :synopsis: zlib compression & decompression |see_cpython_module| :mod:`python:zlib`. -This module allows to decompress binary data compressed with +This module allows compression and decompression of binary data with the `DEFLATE algorithm `_ -(commonly used in zlib library and gzip archiver). Compression -is not yet implemented. +(commonly used in the zlib library and gzip archiver). + +.. note:: Prefer to use :class:`deflate.DeflateIO` instead of the functions in this + module as it provides a streaming interface to compression and decompression + which is convenient and more memory efficient when working with reading or + writing compressed data to a file, socket, or stream. + +**Availability:** + +* From MicroPython v1.21 onwards, this module may not be present by default on + all MicroPython firmware as it duplicates functionality available in + the :mod:`deflate ` module. + +* A copy of this module can be installed (or frozen) + from :term:`micropython-lib` (`source `_). + See :ref:`packages` for more information. This documentation describes that module. + +* Requires the built-in :mod:`deflate ` module (available since MicroPython v1.21) + +* Compression support will only be available if compression support is enabled + in the built-in :mod:`deflate ` module. Functions --------- -.. function:: decompress(data, wbits=0, bufsize=0, /) +.. function:: decompress(data, wbits=15, /) - Return decompressed *data* as bytes. *wbits* is DEFLATE dictionary window - size used during compression (8-15, the dictionary size is power of 2 of - that value). Additionally, if value is positive, *data* is assumed to be - zlib stream (with zlib header). Otherwise, if it's negative, it's assumed - to be raw DEFLATE stream. *bufsize* parameter is for compatibility with - CPython and is ignored. + Decompresses *data* into a bytes object. -.. class:: DecompIO(stream, wbits=0, /) + The *wbits* parameter works the same way as for :meth:`zlib.compress` + with the following additional valid values: - Create a `stream` wrapper which allows transparent decompression of - compressed data in another *stream*. This allows to process compressed - streams with data larger than available heap size. In addition to - values described in :func:`decompress`, *wbits* may take values - 24..31 (16 + 8..15), meaning that input stream has gzip header. + * ``0``: Automatically determine the window size from the zlib header + (*data* must be in zlib format). + * ``35`` to ``47``: Auto-detect either the zlib or gzip format. - .. admonition:: Difference to CPython - :class: attention + As for :meth:`zlib.compress`, see the :mod:`CPython documentation for zlib ` + for more information about the *wbits* parameter. As for :meth:`zlib.compress`, + MicroPython also supports smaller window sizes than CPython. See more + :ref:`MicroPython-specific details ` in the + :mod:`deflate ` module documentation. - This class is MicroPython extension. It's included on provisional - basis and may be changed considerably or removed in later versions. + If the data to be decompressed requires a larger window size, it will + fail during decompression. + +.. function:: compress(data, wbits=15, /) + + Compresses *data* into a bytes object. + + *wbits* allows you to configure the DEFLATE dictionary window size and the + output format. The window size allows you to trade-off memory usage for + compression level. A larger window size will allow the compressor to + reference fragments further back in the input. The output formats are "raw" + DEFLATE (no header/footer), zlib, and gzip, where the latter two + include a header and checksum. + + The low four bits of the absolute value of *wbits* set the base-2 logarithm of + the DEFLATE dictionary window size. So for example, ``wbits=10``, + ``wbits=-10``, and ``wbits=26`` all set the window size to 1024 bytes. Valid + window sizes are ``5`` to ``15`` inclusive (corresponding to 32 to 32k bytes). + + Negative values of *wbits* between ``-5`` and ``-15`` correspond to "raw" + output mode, positive values between ``5`` and ``15`` correspond to zlib + output mode, and positive values between ``21`` and ``31`` correspond to + gzip output mode. + + See the :mod:`CPython documentation for zlib ` for more + information about the *wbits* parameter. Note that MicroPython allows + for smaller window sizes, which is useful when memory is constrained while + still achieving a reasonable level of compression. It also speeds up + the compressor. See more :ref:`MicroPython-specific details ` + in the :mod:`deflate ` module documentation.