Skip to content
grutz@jingojango.net edited this page Oct 10, 2013 · 1 revision

Sometimes an import will go crazy, usually because of a bug or web2py scheduler weirdness. This can cause duplicates. Here's how to clean them out.

NOTE This has only been validated for PostgreSQL databases. Others may work or may not.

Dupe detail

What constitutes a record duplication? Lets take the t_host_os_refs table as an example. This table includes the following fields:

id
f_certainty
f_class
f_family
f_hosts_id
f_os_id

In some rare cases multiple records may appear in the database where f_hosts_id, f_os_id and f_certainty are all the same. In these cases we only want to pick the lowest id record and purge the rest.

De-dupe Host Operating System references

./web2py.py -S appname -M -R applications/appname/private/dedupe.py -A -f f_hosts_id -f f_os_id -f f_certainty -d t_host_os_refs

De-dupe Service vulnerability references

./web2py.py -S appname -M -R applications/appname/private/dedupe.py -A -f f_services_id -f f_vulndata_id -f f_status -f f_exploited t_service_vulns