Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IPFSTool.download #501

Open
Karrenbelt opened this issue Dec 23, 2022 · 0 comments
Open

IPFSTool.download #501

Karrenbelt opened this issue Dec 23, 2022 · 0 comments
Labels
enhancement New feature or request

Comments

@Karrenbelt
Copy link

Karrenbelt commented Dec 23, 2022

I'm opening an issue to clarify the issues with IPFSTool.download. The method has too many responsibilities.

def download(
self,
hash_id: str,
target_dir: str,
fix_path: bool = True,
attempts: int = 5,
) -> str:
"""
Download dir by its hash.
:param hash_id: str. hash of file to download
:param target_dir: str. directory to place downloaded
:param fix_path: bool. default True. on download don't wrap result in to hash_id directory.
:param attempts: int. default 5. How often to attempt the download.
:return: downloaded path
"""
if not os.path.exists(target_dir): # pragma: nocover
os.makedirs(target_dir, exist_ok=True)
if os.path.exists(os.path.join(target_dir, hash_id)): # pragma: nocover
raise DownloadError(f"{hash_id} was already downloaded to {target_dir}")
downloaded_path = str(Path(target_dir) / hash_id)
while attempts:
attempts -= 1
try: # download to tmp_dir in case of midway download failure
with tempfile.TemporaryDirectory() as tmp_dir:
self.client.get(hash_id, tmp_dir)
download_path = Path(tmp_dir) / hash_id
shutil.copytree(download_path, downloaded_path)
break
except ipfshttpclient.exceptions.StatusError as e:
logging.error(f"error on download of {hash_id}: {e}")
time.sleep(1)
else:
raise DownloadError(f"Failed to download: {hash_id}")
package_path = None
if os.path.isdir(downloaded_path):
downloaded_files = os.listdir(downloaded_path)
if len(downloaded_files) > 0:
package_name, *_ = os.listdir(downloaded_path)
package_path = str(Path(target_dir) / package_name)
if package_path is None:
package_path = target_dir
if fix_path:
# self.client.get creates result with hash name
# and content, but we want content in the target dir
try:
for each_file in Path(downloaded_path).iterdir(): # grabs all files
shutil.move(str(each_file), target_dir)
except shutil.Error as e: # pragma: nocover
if os.path.isdir(downloaded_path):
shutil.rmtree(downloaded_path)
raise DownloadError(f"error on move files {str(e)}") from e
if os.path.isdir(downloaded_path):
shutil.rmtree(downloaded_path)
return package_path

Effectively it just wraps ipfshttpclient.Client(...).get(...).
self.client.get(hash_id, tmp_dir)

What it actually does is:

  1. make dir
  2. check downloaded path not exists else raise.
  3. download
  4. define package_path, which will be returned.
  5. optionally fix the path by moving files up one directory.
  6. remove downloaded path.

Yes, the complexity of it all is perplexing. At least for my little chimp brain 🐒

The following issues currently exist:

  1. If the IPFS hash references a file, the current implementation will fail.
  2. Otherwise, the directory is <ipfs_hash>/<dir_name>/.... That is, content of the IPFS hash is a directory, which contains a directory itself that bares a name. It is currently not clear whether or not a directory can be empty.
    1. This edge case is not covered.
      Let us assume it is not allowed to be. Thus, some number of files exists, which also bare a name.
    2. if we do fix_path, we move the files up one directory, that is <ipfs_hash>/<dir_name>/... -> /<dir_name>/.... However, this abrogates the logic above where we check whether <target_dir>/<ipfs_hash> exists to assert whether or not the downloaded files already exist.
    3. if we do not fix_path (which we presumably always do), then we delete the download, as we never moved the downloaded_path to target_dir, nullifying our efforts.
      if os.path.isdir(downloaded_path):
      shutil.rmtree(downloaded_path)
    4. if we do fix_path, and if the downloaded directory contains nested directories, these are not copied or moved along with the file content in the directory.
      for each_file in Path(downloaded_path).iterdir(): # grabs all files
      shutil.move(str(each_file), target_dir)

Proposal:

  1. In case of a single file, the filename is considered metadata and not associated with the IPFS hash or file content, in case a file the IPFS download retrieves a file we may return immediately. To my knowledge there exists no way to retrieve the original file name, though this would probably be useful. Please inform me if you know of a way to associate the original filename to the file.
    1. If the downloaded directory is empty, let us raise a violent error.
    2. The check to see if the downloaded file already exists was explained as "common practice". Let us annihilate it 🙏. The ipfshttpclient.Client(...).get(...) method has no issue overwriting exiting content either. If your local is different, your local is wrong. If we move files, we can deal with files already existing in a given location then.
    3. Download to a temporary directory and copy the (arbitrarily) nested content to the target directory.
    4. Let us remove the package_path defining logic entirely. It is used in the following location, where additional logic resides that deals with path modification 🙈. Other places where it is invoked do not use the return value

      open-aea/aea/cli/fetch.py

      Lines 147 to 155 in beadc9f

      agent_path = ipfs_tool.download(public_id.hash, str(target_dir))
      ctx.clean_paths.append(agent_path)
      if alias is not None:
      target_dir = cast(Path, target_dir) / alias
      if target_dir.exists():
      raise click.ClickException(f"{alias} already exists at {target_dir}")
      shutil.move(agent_path, target_dir)
      else:
      target_dir = Path(agent_path)
@Karrenbelt Karrenbelt added the bug Something isn't working label Dec 23, 2022
@DavidMinarsch DavidMinarsch added enhancement New feature or request and removed bug Something isn't working labels Dec 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants