Skip to content
/ web2rdf Public

Convert "dumb" WWW information into RDF (semantic) information via specific rules or intelligent heuristics. Bootstrap this content into the distributed web. Goal: no more websites.

Notifications You must be signed in to change notification settings

spsu/web2rdf

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Project Description

THIS CODE SUCKS. I'll work on it in the coming weeks.

Turn any generic web address into semantic RDF content if possible via the use of:

  • site specific rules (if they exist)
  • software specific rules (if a particular software, eg. Wordpress, phpBB) is detected
  • rely on heuritics otherwise
    • supply a basic learning algorithm to improve quality of heuristics
    • These can be pretty advanced...

Also report the following:

  • Type of website
  • Type of software (eg. Wordpress, phpBB, etc.)
  • If it appears the page is static rather than dynamic
  • License of content (copyright, CC, public domain, etc.)
  • If the website is friendly to this method
    • Also, if the website makes an attempt to block this method via HTTP user agent headers, etc.
  • If the website uses ads or user tracking (and list them)
  • If the website supports direct semantic endpoints rather than scraping
  • An estimated approximation of how accurate the parsing was in [0.0, 1.0]
  • Basic sitemap
    • Location of likely "feed" urls (that can be utilized like RSS/Atom)

All ads will be filtered out of the returned RDF.

This project is a work in progress, and it is designed to support my other distributed web project: Sylph.

License

Project is AGPL3 licensed (may relicense if requested). Site-specific rules and templates are licensed under BSD/MIT and GFDL/CC-BY-SA.

About

Convert "dumb" WWW information into RDF (semantic) information via specific rules or intelligent heuristics. Bootstrap this content into the distributed web. Goal: no more websites.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published