-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for RPM-based distros for docker and rootfs images scanpipe #6
Comments
Signed-off-by: Thomas Druez <tdruez@nexb.com>
To parse simply the XML output of the rpm command
I have also attached that sample XML created with: |
Note that for the odd cases where the rpmdb is not in the format that the current rpm tool can analyze, the process could be to rebuild the database, though it is not entirely clear if this may not be missing out some RPMs (if so we may need to either read the RPM db directly OR use multiple older version of the RPM exe)
|
@pombredanne I have written a simple parser to parse the above XML file and convert to a JSON file. #
# Copyright (c) nexB Inc. and others. All rights reserved.
#
import click
import io
import json
import sys
import xmltodict
"""
The current code will parse everyhing and convert and save to a JSON output.
Perhaps we will only need to collect the interested bit:\
============
String value
============
Sha1header
Name
Version
Release
Summary
Description
Size
Distribution
Vendor
License
Os
Arch
Sourcerpm
Rpmversion
==========
List value
==========
Basenames
Filesizes
Filedigests <-- This is the MD5 value
Dirindexes
Fileclass
Dirnames
Classdict
Requireflags
Requirename
Requireversion
********
* Note *
********
The value of Dirindexes and Fileclass is the index of Dirnames and Classict
For instance,
Dirindexes: [u'0', u'0', u'0', u'0', u'0', u'0', u'0', u'0', u'0', u'0', u'0', u'0', u'0', u'1', u'2', u'3']
Dirnames: [u'/etc/', u'/usr/share/doc/', u'/usr/share/doc/setup-2.5.58/', u'/var/log/']
Fileclass: [u'1', u'2', u'4', u'4', u'2', u'1', u'2', u'3', u'2', u'2', u'1', u'4', u'2', u'3', u'2', u'4']
Classdict: [None, u'ASCII English text', u'ASCII text', u'directory', u'empty']
"""
def parse_rpm_db(input):
"""
Parse the rpm DB with xmltodict
"""
parsed_result = []
with io.open(input, encoding="utf8", errors='ignore') as loc:
contents = loc.read()
sections = contents.split('<rpmHeader>')
for section in sections:
if section:
result = '<rpmHeader>' + section
parsed_result.append(xmltodict.parse(result))
return parsed_result
def format_to_dict(parsed_result):
"""
Convert the parsed information to a list of dictionaries
"""
results = []
for result in parsed_result:
content_dict = result['rpmHeader']['rpmTag']
new_dict = {}
for dict in content_dict:
# The keys should be '@name' and "type" (such as string/integer etc)
# This is the convention from xmltodict
assert len(dict.keys()) == 2
new_dict[dict[dict.keys()[0]]] = dict[dict.keys()[1]]
results.append(new_dict)
return results
def save_to_json(results, output):
"""
Save the output to a JSON file
"""
with open(output, 'w') as jsonfile:
json.dump(results, jsonfile, indent=3)
@click.command()
@click.argument('input',
required=True,
metavar='INPUT',
type=click.Path(
exists=True, file_okay=True, dir_okay=False, readable=True, resolve_path=True))
@click.argument('output',
required=True,
metavar='OUTPUT',
type=click.Path(exists=False, dir_okay=False, writable=True, resolve_path=True))
@click.help_option('-h', '--help')
def cli(input, output):
if not output.endswith('.json'):
print("The output has to be in JSON format.")
sys.exit(1)
parsed_result = parse_rpm_db(input)
results = format_to_dict(parsed_result)
save_to_json(results, output)
Attached the parsed result: Suggestion and feedback are welcome. |
This is to support these tickets: aboutcode-org/scancode-toolkit#437 aboutcode-org/scancode.io#6 aboutcode-org/scancode-toolkit#2058 Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
Signed-off-by: Thomas Druez <tdruez@nexb.com>
@pombredanne running a docker pipeline on a
|
This is a problem on Linux when using full RPM support otherwise. See: aboutcode-org/scancode.io#6 (comment) Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
I pushed a new RPM plugin for the toolkit https://pypi.org/project/rpm-inspector-rpm/4.16.1.3.210404/ and two commits: |
I tested locally on a centos: latest docker image with success |
Signed-off-by: Thomas Druez <tdruez@nexb.com>
* Add minimal support for RPM distros #6 Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com> * Relax scancode-toolkit version requirements Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com> * Install scancode-toolkit[packages] for rpm support #6 Signed-off-by: Thomas Druez <tdruez@nexb.com> * Require newest RPM plugin and its deps Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com> * Update documentation for all OSes open is a macOS'ism Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com> * Require newest RPM plugin and its deps Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com> * Update documentation for all OSes open is a macOS'ism Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com> * Remove explicit dependency on rpm-inspector-rpm This is not needed as it comes with scancode-tk Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com> * Add changelog entry for RPM support #6 Signed-off-by: Thomas Druez <tdruez@nexb.com> Co-authored-by: Philippe Ombredanne <pombredanne@nexb.com>
Signed-off-by: Thomas Druez <tdruez@nexb.com>
There is no easy way to access the RPM database but through librpm and the rpm executable.
The installed RPMs database comes in three formats:
librpm provides support for each of these formats and also contains a built-in read-only handler for the 1.
bdb
format such that librpm can be built without Berkeley DB and still can read an older RPM db (for instance to convert it to a newer format).It needs to be built with specific flags to enable all these formats (typically a given build of a distro does not nee to support all the formats).
The installed DBs locations are:
/var/lib/rpm/Packages
/var/lib/rpm/Packages
/var/lib/rpm/rpmdb.sqlite
/var/lib/rpm/Packages
/var/lib/rpm/Packages
/var/lib/rpm/Packages
/usr/lib/sysimage/rpm/Packages.db
In addition on Fedora distros there are files under
/etc/yum.repos.d/*
that contains base and mirror URLs for the repo used to install RPMs. Each file is in .ini format. On openSUSE and SLES, these are under/etc/zypp/repos.d
The licenses (when not deleted as in some CentOS Docker images) are found in
/usr/share/licenses/<package name>/<license files>
or/usr/share/doc/<package name>/<license files>
If using the
rpm
cli, this can create an XML like output:./rpm --query --all --qf '[%{*:xml}\n]' --rcfile=./rpmrc --dbpath=<path to>/var/lib/rpm > somefile.xml
The .rcfile option may not be needed, but when using a fresh RPM build this is needed.
The RPM db may need to be rebuilt first when this is a bdb format from an older version than the bdb with which librpm was built.
The text was updated successfully, but these errors were encountered: