Skip to content

Commit

Permalink
Example: os_architecture.txt Normalization (#266)
Browse files Browse the repository at this point in the history
* This adds the initial "known identifiers" lists.

The goal is to drive standardization of the hw, os, product, service,
and vendor fields across Recog.

* Sort the identifier lists for easier diffs

* A quick script to compare against known identifiers

* Standardize architecture values to known identifiers

* Add missing architectures to identifiers

* Introduce aarch64/ARM64 matches to architecture
  • Loading branch information
hdm authored May 27, 2020
1 parent e9bf416 commit 22c1943
Show file tree
Hide file tree
Showing 15 changed files with 1,914 additions and 19 deletions.
118 changes: 118 additions & 0 deletions bin/recog_standardize
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
#!/usr/bin/env ruby

$:.unshift(File.expand_path(File.join(File.dirname(__FILE__), "..", "lib")))
require 'optparse'
require 'ostruct'
require 'recog'

def load_identifiers(path)
res = {}
File.readlines(path).map{|line| line.strip}.each do |ident|
res[ident] = true
end
return res
end

def write_identifiers(vals, path)
res = []
vals.each_pair do |k,v|
res = res.push(k)
end
res = res.sort.uniq
File.write(path, res.join("\n") + "\n")
end

bdir = File.expand_path(File.join(File.dirname(__FILE__), "..", "identifiers"))

options = OpenStruct.new(write: false)
option_parser = OptionParser.new do |opts|
opts.banner = "Usage: #{$0} [options] XML_FINGERPRINT_FILE1 ..."
opts.separator "Verifies that each fingerprint asserts known identifiers."
opts.separator ""
opts.separator "Options"

opts.on("-w", "--write") do
options.write = true
end

opts.on("-h", "--help", "Show this message.") do
puts opts
exit
end
end
option_parser.parse!(ARGV)

if ARGV.empty?
$stderr.puts 'Missing XML fingerprint files'
puts option_parser
exit(1)
end

# Load the unique identifiers
vendors = load_identifiers(File.join(bdir, "vendor.txt"))
os_arch = load_identifiers(File.join(bdir, "os_architecture.txt"))
os_prod = load_identifiers(File.join(bdir, "os_product.txt"))
os_family = load_identifiers(File.join(bdir, "os_family.txt"))
os_device = load_identifiers(File.join(bdir, "os_device.txt"))
svc_prod = load_identifiers(File.join(bdir, "service_product.txt"))
svc_family = load_identifiers(File.join(bdir, "service_family.txt"))

ARGV.each do |arg|
Dir.glob(arg).each do |file|
ndb = Recog::DB.new(file)
ndb.fingerprints.each do |f|
f.params.each do |k,v|
paramIndex, val = v
next if paramIndex != 0
case k
when "os.vendor", "service.vendor", "service.component.vendor", "hw.vendor"
if ! vendors[val]
puts "VENDOR MISSING: #{val}"
vendors[val] = true
end
when "os.product"
if ! os_prod[val]
puts "OS PRODUCT MISSING: #{val}"
os_prod[val] = true
end
when "os.arch"
if ! os_arch[val]
puts "OS ARCH MISSING: #{val}"
os_arch[val] = true
end
when "os.family"
if ! os_family[val]
puts "OS FAMILY MISSING: #{val}"
os_family[val] = true
end
when "os.device"
if ! os_device[val]
puts "OS DEVICE MISSING: #{val}"
os_device[val] = true
end
when "service.product"
if ! svc_prod[val]
puts "SERVICE PRODUCT MISSING: #{val}"
svc_prod[val] = true
end
when "service.family"
if ! svc_family[val]
puts "SERVICE FAMILY MISSING: #{val}"
svc_family[val] = true
end
end
end
end
end
end

exit if ! options.write

# Write back the unique identifiers
write_identifiers(vendors, File.join(bdir, "vendor.txt"))
write_identifiers(os_arch, File.join(bdir, "os_architecture.txt"))
write_identifiers(os_prod, File.join(bdir, "os_product.txt"))
write_identifiers(os_family, File.join(bdir, "os_family.txt"))
write_identifiers(os_device, File.join(bdir, "os_device.txt"))
write_identifiers(svc_prod, File.join(bdir, "service_product.txt"))
write_identifiers(svc_family, File.join(bdir, "service_family.txt"))
47 changes: 47 additions & 0 deletions identifiers/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
# Recog: Identifiers

This directory contains lists of standard identifiers for mapping Recog matches. The goal is define a standard set of constants to represent known software, hardware, vendors, and categories.

This is currently incomplete and will be updated as standardization work moves forward.

Fingerprints should use these identifiers whenever possible; if a different name or syntax for a given identifier is preferred, this should be implemented in the application through a mapping function.

## Lists

### Vendors

`vendor.txt` defines known vendor names, covering services, operating systems, and hardware.

### Operating Systems

`os_architecture.txt` defines known CPU types.

`os_product.txt` defines known operating system names.

`os_family.txt` defines known operating system families.

`os_device.txt` defines known types of devices by function or purpose.

### Services

`service_product.txt` defines known service product names.

`service_family.txt` defines known service product families.

### Software

`software_product.txt` defines known software product names.

`software_family.txt` defines known software product families.

`software_class.txt` defines known types of software by function or purpose.

## Pending Work

* All existing fingerprints should be correlated against these lists to identify mismatches and updated accordingly.

* All net new identifiers from the existing fingerprints should be merged into these lists.

* All fingerprint assertions should be enumerated, documented, and standardized where possible (`host.mac`, etc).

* Hardware identifiers should be enumerated, consolidated, and standardized.
20 changes: 20 additions & 0 deletions identifiers/os_architecture.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
680xx
880xx
Alpha
ARM
ARM64
ia64
iSeries
MIPS
MIPS64
MPC
PA
PowerPC
pSeries
Risc
s390
s390x
Sparc
System/6000
x86
x86_64
52 changes: 52 additions & 0 deletions identifiers/os_device.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
BBS
Bridge
Broadband router
Console server
CSU/DSU
Domain controller
DSLAM
Encryption accelerator
Fax server
File server
Firewall
Game console
General
Hub
IPS
KVM
Lights Out Management
Load balancer
Mainframe
Management
Monitoring
Multifunction Device
Multiplexer
NAC
Network management device
PBX
PDA
Point of sale
Power device
Print server
Printer
Remote access server
Router
Scanner
Server
Specialized
Storage
Switch
Tablet
Tape library
Telecom
Terminal server
UPS
Virtualization host
VoIP
VPN
WAP
Web cam
Web proxy
Web server
Workstation
X terminal
Loading

0 comments on commit 22c1943

Please sign in to comment.