Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KeyError exception in multiprocessing module #413

Closed
sschuberth opened this issue Dec 21, 2016 · 9 comments · Fixed by #414
Closed

KeyError exception in multiprocessing module #413

sschuberth opened this issue Dec 21, 2016 · 9 comments · Fixed by #414

Comments

@sschuberth
Copy link
Collaborator

sschuberth commented Dec 21, 2016

Using ScanCode 1ec43b7 on Linux with Python 2.7 I get:

Traceback (most recent call last):
  File "/var/lib/jenkins/jobs/scancode/workspace/tools/scancode/bin/scancode", line 9, in <module>
    load_entry_point('scancode-toolkit', 'console_scripts', 'scancode')()
  File "/var/lib/jenkins/jobs/scancode/workspace/tools/scancode/local/lib/python2.7/site-packages/click/core.py", line 664, in __call__
    return self.main(*args, **kwargs)
  File "/var/lib/jenkins/jobs/scancode/workspace/tools/scancode/src/scancode/utils.py", line 64, in main
    standalone_mode=standalone_mode, **extra)
  File "/var/lib/jenkins/jobs/scancode/workspace/tools/scancode/local/lib/python2.7/site-packages/click/core.py", line 644, in main
    rv = self.invoke(ctx)
  File "/var/lib/jenkins/jobs/scancode/workspace/tools/scancode/local/lib/python2.7/site-packages/click/core.py", line 837, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/var/lib/jenkins/jobs/scancode/workspace/tools/scancode/local/lib/python2.7/site-packages/click/core.py", line 464, in invoke
    return callback(*args, **kwargs)
  File "/var/lib/jenkins/jobs/scancode/workspace/tools/scancode/src/scancode/cli.py", line 312, in scancode
    diag, timeout, max_memory)
  File "/var/lib/jenkins/jobs/scancode/workspace/tools/scancode/src/scancode/cli.py", line 409, in scan
    result = scanned.next()
  File "/var/lib/jenkins/jobs/scancode/workspace/tools/scancode/local/lib/python2.7/site-packages/click/_termui_impl.py", line 238, in     
    return next(self.iter)
  File "/var/lib/jenkins/jobs/scancode/workspace/tools/scancode/src/scancode/cli.py", line 41, in wrap
    return func(self, timeout=timeout or 1e10)
  File "/usr/lib64/python2.7/multiprocessing/pool.py", line 668, in next
    raise value
KeyError: 'sha1'

The command line used was

scancode --timeout 180 -n 2 -f html . /var/lib/jenkins/jobs/scancode/workspace/results/github.com/remy/nodemon/2016-10-06_2cd85b1/nodemon-scancode-1ec43b7.html

and the project scanned was remy/nodemon@2cd85b1.

@pombredanne
Copy link
Contributor

Let me look into this.

@pombredanne
Copy link
Contributor

I can reproduce this with:
$ scancode ../nodemon/test/fixtures/some\ \\file

@pombredanne
Copy link
Contributor

The error comes from here: https://github.com/nexB/scancode-toolkit/blob/645902627cd82a9e90f7f3a82038c5b5c6f8130d/src/scancode/cache.py#L119

and this is because there is a backslash in the file name.

@pombredanne
Copy link
Contributor

FWIW, these are a bunch of test files for nodemon. Fix coming in soon.

@sschuberth
Copy link
Collaborator Author

and this is because there is a backslash in the file name.

Eek. Which means it's not even a valid file name on Windows as also reported here.

@pombredanne
Copy link
Contributor

pombredanne commented Dec 21, 2016

So the fix is going to be in multiple steps:

  1. ensure that exceptions are properly caught in threads and processes and reported as scan errors, always
  2. ensure that on POSIX, files that are valid files are scanned alright (including having tests if we are on POSIX or not and handling paths splits accordingly
  3. in extractcode and may be elsewhere either handle correctly or report as warnings paths that are eventually non-portable and overall weird (e.g with / or \ or : or other prohibited characters)

pombredanne added a commit that referenced this issue Dec 21, 2016
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
pombredanne added a commit that referenced this issue Dec 21, 2016
 * this ensure that exceptions that happening while multiprocessing and
   multithreading are not truncated and reported as errors.

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
pombredanne added a commit that referenced this issue Dec 21, 2016
 * this ensure that valid paths can still be processed including a path
   with backslah on Linux/POSIX.
 * also make sure that file info are always returned (eventually empty
   if there was an error to fetch them)

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
pombredanne added a commit that referenced this issue Dec 21, 2016
 * this ensure that a file name can still be extracted from a valid
   path even a path with backslah on Linux/POSIX.

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
pombredanne added a commit that referenced this issue Dec 21, 2016
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
@pombredanne
Copy link
Contributor

The PR #414 should fix things... but It need to review the tests on Windows and Mac first too.

@pombredanne
Copy link
Contributor

So almost there with the Mac and Windows tests. I am using archives as test files such that the repo can still be cloned everywhere ... and updating extractcode to properly handle these cases and rename files with illegal file names on any given OS.

pombredanne added a commit that referenced this issue Jan 6, 2017
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
pombredanne added a commit that referenced this issue Jan 6, 2017
 * this ensure that exceptions that happening while multiprocessing and
   multithreading are not truncated and reported as errors.

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
pombredanne added a commit that referenced this issue Jan 6, 2017
 * this ensure that valid paths can still be processed including a path
   with backslah on Linux/POSIX.
 * also make sure that file info are always returned (eventually empty
   if there was an error to fetch them)

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
pombredanne added a commit that referenced this issue Jan 6, 2017
 * this ensure that a file name can still be extracted from a valid
   path even a path with backslah on Linux/POSIX.

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
pombredanne added a commit that referenced this issue Jan 6, 2017
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
pombredanne added a commit that referenced this issue Jan 6, 2017
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
pombredanne added a commit that referenced this issue Jan 6, 2017
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
pombredanne added a commit that referenced this issue Jan 6, 2017
 * file and directory names are now transformed to POSIX portable names
 * COM, PRN and other illegal Windows names are updated to be legal
   names.
 * when scanning and not extracting, file_name is properly extracted
   by detecting possible backslash in a file name
 * as a result it is possible to process on Windows archives that
   contain illegal names. Ror instance the repo at
   https://github.com/remy/nodemon contains such files and cannot be
   cloned on Windows. Yet a tarball of this repo is extracted properly
   by extractcode and can then be scanned alright.

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
pombredanne added a commit that referenced this issue Jan 6, 2017
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
pombredanne added a commit that referenced this issue Jan 6, 2017
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
pombredanne added a commit that referenced this issue Jan 6, 2017
 * separate code from "new_name" for portable file name transforms
 * update safe_path accordingly
 * improve relative paths resolution for corner cases

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
pombredanne added a commit that referenced this issue Jan 7, 2017
 * unfortunately the behavior is not consistent on all OSes

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
pombredanne added a commit that referenced this issue Jan 7, 2017
 * ensure that scancode and extractcode tests are running verbose
   and are running first

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
pombredanne added a commit that referenced this issue Jan 7, 2017
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
pombredanne added a commit that referenced this issue Jan 7, 2017
 * Mac and Linux behavior are now the same thanks to the new path
   unicode transliteration

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
pombredanne added a commit that referenced this issue Jan 7, 2017
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
pombredanne added a commit that referenced this issue Jan 7, 2017
 * this should help review the CI failures more easily and update the
   (different) expectations for each OS.

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
pombredanne added a commit that referenced this issue Jan 9, 2017
 * this is still an issue to solve with #16 though


Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
pombredanne added a commit that referenced this issue Jan 9, 2017
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
pombredanne added a commit that referenced this issue Jan 9, 2017
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
pombredanne added a commit that referenced this issue Jan 9, 2017
 * this test cannot be made on Windows as these illegal filenames
   cannot be created there. Instead other tests of extractcode are
   testing a proper handling at extraction time of such names
   transforming illegal names in legal names. 

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
pombredanne added a commit that referenced this issue Jan 9, 2017
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
pombredanne added a commit that referenced this issue Jan 10, 2017
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
pombredanne added a commit that referenced this issue Jan 10, 2017
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
pombredanne added a commit that referenced this issue Jan 10, 2017
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
pombredanne added a commit that referenced this issue Jan 10, 2017
 * marked some tests as expected to fail for now. These are corner cases
   and seem to only pass correctly on Linux

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
pombredanne added a commit that referenced this issue Jan 10, 2017
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
pombredanne added a commit that referenced this issue Jan 10, 2017
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
pombredanne added a commit that referenced this issue Jan 10, 2017
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
pombredanne added a commit that referenced this issue Jan 10, 2017
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
pombredanne added a commit that referenced this issue Jan 10, 2017
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
pombredanne added a commit that referenced this issue Jan 10, 2017
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
pombredanne added a commit that referenced this issue Jan 11, 2017
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
pombredanne added a commit that referenced this issue Jan 11, 2017
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
pombredanne added a commit that referenced this issue Jan 11, 2017
 * the transliteration and file renaming is taking place correctly and
   no file is skipped.

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
pombredanne added a commit that referenced this issue Jan 11, 2017
 * test a posix version of the paths for portable assertions

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
pombredanne added a commit that referenced this issue Jan 11, 2017
#413 no crash on weird file names

 * This is a fairly significant change in particular on the extractcode side. 
 * We now can handle properly files that could otherwise not be processed on some OS
   such as windows because they have illegal names for that OS.
@pombredanne
Copy link
Contributor

All the planned fixes have been applied eg.:

  • ensure that exceptions are properly caught in threads and processes and reported as scan errors, always
  • ensure that on POSIX, files that are valid files are scanned alright (including having tests if we are on POSIX or not and handling paths splits accordingly
  • in extractcode and may be elsewhere either handle correctly or report as warnings paths that are eventually non-portable and overall weird (e.g with / or \ or : or other prohibited characters). This also uses a more advanced unicode to ascii transliteration for sanity.

pombredanne added a commit that referenced this issue Jan 11, 2017
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants