Is there a way to tell whether a chunk is a directory? #90
-
Hello 👋 I adjusted the example code from the docs to simply write a archive contents to disk: from stream_unzip import async_stream_unzip
import httpx
import asyncio
from pathlib import Path
async def zipped_chunks(client):
async with client.stream('GET', 'http://127.0.0.1:8000/example.zip') as r:
async for chunk in r.aiter_bytes(chunk_size=65536):
yield chunk
async def main():
async with httpx.AsyncClient() as client:
async for file_name, file_size, unzipped_chunks in async_stream_unzip(
zipped_chunks(client),
):
file_path = Path(file_name.decode())
file_path.parent.mkdir(parents=True, exist_ok=True)
with open(file_path, "wb") as fd:
async for chunk in unzipped_chunks:
fd.write(chunk)
asyncio.run(main()) This workes quite well with most archives I had to work with, but then I encountered one that contained empty directories (and empty files). Such an archive can be created using the following Python code: import zipfile
with zipfile.ZipFile("example.zip", "w") as archive:
archive.mkdir("directory")
archive.writestr("emptyfile", "") I'm now trying to adjust my code to handle the creation of empty directories. However, I haven't managed yet to reliably detect whether an archive member is a directory. from stream_unzip import async_stream_unzip
import httpx
import asyncio
from pathlib import Path
async def zipped_chunks(client):
async with client.stream('GET', 'http://127.0.0.1:8000/example.zip') as r:
async for chunk in r.aiter_bytes(chunk_size=65536):
yield chunk
async def main():
async with httpx.AsyncClient() as client:
async for file_name, file_size, unzipped_chunks in async_stream_unzip(
zipped_chunks(client),
):
file_path = Path(file_name.decode())
file_path.parent.mkdir(parents=True, exist_ok=True)
try:
first_chunk = await unzipped_chunks.__anext__()
except StopAsyncIteration:
file_path.mkdir(exist_ok=True)
continue
with open(file_path, "wb") as fd:
fd.write(first_chunk)
async for chunk in unzipped_chunks:
fd.write(chunk)
asyncio.run(main()) However, it turned out empty files also don't yield any chunks. Now I'm wondering whether (1) this scenario is unsupported, (2) there is a way to detect it which I don't know yet, or (3) the approach with chunks was the way to go but it doesn't properly work using the async API? Thanks in advance for any hints :) P.S.: All the code snippets above are runnable, so feel free to verify my claims. The example.zip can easily be served under the URL used in the code snippets by using Python's built-in http server ( |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 3 replies
-
So it's historically this... stream-unzip wasn't really made with this in mind: it was very much about getting at the data in files and not really caring about directories (or even any other metadata that comes in ZIP files, like mode/permissions or modification times of the member files)
But there might be a way. When adding support to stream-zip making empty directories, it did seem like empty directories are just empty members with a name that ends in a forward slash. You could investigate that? (And also check what happens with ZIP files made from both Windows and *nix-y things in case there are forward/back slash differences) |
Beta Was this translation helpful? Give feedback.
So it's historically this... stream-unzip wasn't really made with this in mind: it was very much about getting at the data in files and not really caring about directories (or even any other metadata that comes in ZIP files, like mode/permissions or modification times of the member files)
But there might be a way. When adding support to stream-zip making empty directories, it did seem like empty directories are just empty members with a name that ends in a forward slash. You could investigate that? (And also check what happens with ZIP files made from both Windows and *nix-y things in case there ar…