`IPFSTool.download` #501

Karrenbelt · 2022-12-23T23:44:22Z

I'm opening an issue to clarify the issues with IPFSTool.download. The method has too many responsibilities.

open-aea/plugins/aea-cli-ipfs/aea_cli_ipfs/ipfs_utils.py

Lines 309 to 370 in c607317

    
               def download( 
        
                   self, 
        
                   hash_id: str, 
        
                   target_dir: str, 
        
                   fix_path: bool = True, 
        
                   attempts: int = 5, 
        
               ) -> str: 
        
                   """ 
        
                   Download dir by its hash. 
        
                   :param hash_id: str. hash of file to download 
        
                   :param target_dir: str. directory to place downloaded 
        
                   :param fix_path: bool. default True. on download don't wrap result in to hash_id directory. 
        
                   :param attempts: int. default 5. How often to attempt the download. 
        
                   :return: downloaded path 
        
                   """ 
        
                   if not os.path.exists(target_dir):  # pragma: nocover 
        
                       os.makedirs(target_dir, exist_ok=True) 
        
                   if os.path.exists(os.path.join(target_dir, hash_id)):  # pragma: nocover 
        
                       raise DownloadError(f"{hash_id} was already downloaded to {target_dir}") 
        
                   downloaded_path = str(Path(target_dir) / hash_id) 
        
                   while attempts: 
        
                       attempts -= 1 
        
                       try:  # download to tmp_dir in case of midway download failure 
        
                           with tempfile.TemporaryDirectory() as tmp_dir: 
        
                               self.client.get(hash_id, tmp_dir) 
        
                               download_path = Path(tmp_dir) / hash_id 
        
                               shutil.copytree(download_path, downloaded_path) 
        
                               break 
        
                       except ipfshttpclient.exceptions.StatusError as e: 
        
                           logging.error(f"error on download of {hash_id}: {e}") 
        
                           time.sleep(1) 
        
                   else: 
        
                       raise DownloadError(f"Failed to download: {hash_id}") 
        
                   package_path = None 
        
                   if os.path.isdir(downloaded_path): 
        
                       downloaded_files = os.listdir(downloaded_path) 
        
                       if len(downloaded_files) > 0: 
        
                           package_name, *_ = os.listdir(downloaded_path) 
        
                           package_path = str(Path(target_dir) / package_name) 
        
                   if package_path is None: 
        
                       package_path = target_dir 
        
                   if fix_path: 
        
                       # self.client.get creates result with hash name 
        
                       # and content, but we want content in the target dir 
        
                       try: 
        
                           for each_file in Path(downloaded_path).iterdir():  # grabs all files 
        
                               shutil.move(str(each_file), target_dir) 
        
                       except shutil.Error as e:  # pragma: nocover 
        
                           if os.path.isdir(downloaded_path): 
        
                               shutil.rmtree(downloaded_path) 
        
                           raise DownloadError(f"error on move files {str(e)}") from e 
        
                   if os.path.isdir(downloaded_path): 
        
                       shutil.rmtree(downloaded_path) 
        
                   return package_path

Effectively it just wraps ipfshttpclient.Client(...).get(...).

open-aea/plugins/aea-cli-ipfs/aea_cli_ipfs/ipfs_utils.py

Line 337 in c607317

self.client.get(hash_id, tmp_dir)

What it actually does is:

make dir
check downloaded path not exists else raise.
download
define package_path, which will be returned.
optionally fix the path by moving files up one directory.
remove downloaded path.

Yes, the complexity of it all is perplexing. At least for my little chimp brain 🐒

The following issues currently exist:

If the IPFS hash references a file, the current implementation will fail.

Otherwise, the directory is <ipfs_hash>/<dir_name>/.... That is, content of the IPFS hash is a directory, which contains a directory itself that bares a name. It is currently not clear whether or not a directory can be empty.

This edge case is not covered.
Let us assume it is not allowed to be. Thus, some number of files exists, which also bare a name.
if we do fix_path, we move the files up one directory, that is <ipfs_hash>/<dir_name>/... -> /<dir_name>/.... However, this abrogates the logic above where we check whether <target_dir>/<ipfs_hash> exists to assert whether or not the downloaded files already exist.
if we do not fix_path (which we presumably always do), then we delete the download, as we never moved the downloaded_path to target_dir, nullifying our efforts.

open-aea/plugins/aea-cli-ipfs/aea_cli_ipfs/ipfs_utils.py

Lines 368 to 369 in c607317

if os.path.isdir(downloaded_path):

shutil.rmtree(downloaded_path)

if we do fix_path, and if the downloaded directory contains nested directories, these are not copied or moved along with the file content in the directory.

open-aea/plugins/aea-cli-ipfs/aea_cli_ipfs/ipfs_utils.py

Lines 361 to 362 in c607317

    
           for each_file in Path(downloaded_path).iterdir():  # grabs all files 
        
               shutil.move(str(each_file), target_dir)

Proposal:

In case of a single file, the filename is considered metadata and not associated with the IPFS hash or file content, in case a file the IPFS download retrieves a file we may return immediately. To my knowledge there exists no way to retrieve the original file name, though this would probably be useful. Please inform me if you know of a way to associate the original filename to the file.

If the downloaded directory is empty, let us raise a violent error.
The check to see if the downloaded file already exists was explained as "common practice". Let us annihilate it 🙏. The ipfshttpclient.Client(...).get(...) method has no issue overwriting exiting content either. If your local is different, your local is wrong. If we move files, we can deal with files already existing in a given location then.
Download to a temporary directory and copy the (arbitrarily) nested content to the target directory.

Let us remove the package_path defining logic entirely. It is used in the following location, where additional logic resides that deals with path modification 🙈. Other places where it is invoked do not use the return value

open-aea/aea/cli/fetch.py

Lines 147 to 155 in beadc9f

    
           agent_path = ipfs_tool.download(public_id.hash, str(target_dir)) 
        
           ctx.clean_paths.append(agent_path) 
        
           if alias is not None: 
        
               target_dir = cast(Path, target_dir) / alias 
        
               if target_dir.exists(): 
        
                   raise click.ClickException(f"{alias} already exists at {target_dir}") 
        
               shutil.move(agent_path, target_dir) 
        
           else: 
        
               target_dir = Path(agent_path)

The text was updated successfully, but these errors were encountered:

Karrenbelt added the bug Something isn't working label Dec 23, 2022

This was referenced Dec 24, 2022

[v1.27.0] Tests: IPFS download #502

Merged

[v1.27.0] Refactor: IPFS download #480

Merged

DavidMinarsch added enhancement New feature or request and removed bug Something isn't working labels Dec 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`IPFSTool.download` #501

`IPFSTool.download` #501

Karrenbelt commented Dec 23, 2022 •

edited

Loading

IPFSTool.download #501

IPFSTool.download #501

Comments

Karrenbelt commented Dec 23, 2022 • edited Loading

`IPFSTool.download` #501

`IPFSTool.download` #501

Karrenbelt commented Dec 23, 2022 •

edited

Loading