Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate streamlined analysis workfile (CSV) from SCIO-DB #46

Closed
mjherzog opened this issue Nov 12, 2020 · 4 comments
Closed

Generate streamlined analysis workfile (CSV) from SCIO-DB #46

mjherzog opened this issue Nov 12, 2020 · 4 comments
Assignees

Comments

@mjherzog
Copy link
Member

mjherzog commented Nov 12, 2020

A streamlined CSV workfile will be very useful for SCA planning. The columns we need are listed below by SCTK runtime option using current SCTK CSV output column names. If there are multiple values in a JSON field we want all of the values in one cell ("flattened").

Info:
Resource
type
name
base_name
extension
size
sha1
mime_type
file_type
programming_language

Copyrights:
copyright
copyright_holder
author

Licenses:
license_expression
license__key
license__score
license__category
license__owner

email
url

Packages:
package__type
package__namespace
package__name
package__version
package__primary_language
package__description
package__release_date
package__homepage_url
package__download_url
package__size
package__sha1
package__vcs_url
package__copyright
package__license_expression
package__declared_license
package__notice_text

@steven-esser
Copy link

@JonoYang has written this utility: https://github.com/nexB/spats/blob/develop/src/spats/scanpipe_results_to_xlsx.py that "flattens" the data how we normally expect it (\n characters separating multiple values) and supports package data.

@mjherzog I believe we want to support package data as well? This is on our normal work file alongside regular scancode detections.

@mjherzog
Copy link
Member Author

@MaJuRG My mistake to forget about packages. I updated the main text above with selected package fields.

@mjherzog
Copy link
Member Author

The list of fields requested is my take on what is most useful for planning and simpler analysis tasks. My working assumption is that there will be other CSV workfile output variations.

tdruez added a commit that referenced this issue Dec 11, 2020
Signed-off-by: Thomas Druez <tdruez@nexb.com>
tdruez added a commit that referenced this issue Dec 11, 2020
Signed-off-by: Thomas Druez <tdruez@nexb.com>
tdruez added a commit that referenced this issue Dec 11, 2020
Signed-off-by: Thomas Druez <tdruez@nexb.com>
tdruez added a commit that referenced this issue Dec 11, 2020
- Exclude the following "technical" fields: "licenses", "extra_data", "declared_license"

Signed-off-by: Thomas Druez <tdruez@nexb.com>
tdruez added a commit that referenced this issue Dec 14, 2020
Signed-off-by: Thomas Druez <tdruez@nexb.com>
tdruez added a commit that referenced this issue Dec 22, 2020
Signed-off-by: Thomas Druez <tdruez@nexb.com>
tdruez added a commit that referenced this issue Dec 22, 2020
Signed-off-by: Thomas Druez <tdruez@nexb.com>
tdruez added a commit that referenced this issue Dec 22, 2020
Signed-off-by: Thomas Druez <tdruez@nexb.com>
@tdruez
Copy link
Contributor

tdruez commented Jul 10, 2023

Closing as the existing XLSX output should be used instead.

@tdruez tdruez closed this as completed Jul 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants