ipfs_indexer

An ipfs indexer / search engine built in rust.

Build notes

On Ubuntu you may need to apt install: libssl-dev.

As of libp2p 0.44.0, it seems to require rust nightly: https://stackoverflow.com/questions/69848319/unable-to-specify-edition2021-in-order-to-use-unstable-packages-in-rust

Did init as a "binary" - not sure if this makes sense, or if other people think we should split this into a library bit and an application bit. I suppose we can always change it later as it grows.

Following this guide for libp2p: https://github.com/libp2p/rust-libp2p/blob/master/src/tutorial.rs

Trying to follow best practices from here: https://doc.rust-lang.org/cargo/commands/cargo-init.html

Adding Cargo.lock to version control - it seems like it might be best practice for a binary (app): https://stackoverflow.com/questions/62861623/should-cargo-lock-be-committed-when-the-crate-is-both-a-rust-library-and-an-exec

Building on mac

Requirements: xcode

Run xcode-select --install if you do not have xcode installed, need to update xcode, or run into xcode related build errors

Running with logging output

Run cargo build to build
Run RUST_LOG=info ./target/debug/ipfs_indexer to see logging output (adjust level accordingly)
Run RUST_LOG=info ./target/debug/ipfs_indexer 127.0.0.1:8080 to use your own ipfs gateway instead of ipfs.io

By default runs an endpoing on 0.0.0.0:9090 so you can go to

http://localhost:9090/status
http://localhost:9090/enqueue/somecid
http://localhost:9090/search/somequery

Running with docker

From the docker directory, run docker-compose up. Currently image is only ~26MB.

CI

I setup a workflow that should run a build at least on push, but doesnt run any tests because I have no idea how test suites work yet for rust.

What Needs to be Done

Discover content to be indexed, add them to the index queue
- Listen in on the gossip protocol Jason working on
- Start from some collection of pages on ipfs.io/ipfs
Implement an index queue processor
- Fetch the ipfs content
- Process the page for more ipfs links, Add those links into the index queue
- Index the pages somehow
  - Ranked keywords by frequency or something?
  - Need to update to support more than just html content (look at header and index files)
  - Update the except to be flexible - for images, it could be a small crop render of the original image, for videos it could be a gif preview render
- Store the index somehow (start with in-memory, then figure out how to do storage later) - Conor working on
  - A hashmap of map[keyword] -> sorted tree where the entries are sorted by keyword frequency and entries contain ipfs hash? - Conor working on
  - Will probably want to think of ejection mechanism sooner than later so we can eject to storage (least recently used? oldest? who knows?)
  - Farther out - need to think about how the store will be designed
- Probably also want to store an excerpt, page title of the page to present to front-end
Implement a backend API which a future front-end can use, and in short term we can use to debug
- search -> search result
  - ordered list of <title, link, excerpts>, possibly grouped by text, images, videos, other
- stats:
  - indexed entries
  - outstanding index queue
  - memory used / free
  - stoage used / free
Implement a front-end which queries the index storage and displays the page title, ipfs/io/ipfs link to the page and excerpt from the browser
Feedback loop from what people click on more often to rank those higher
Might a docker container:
- deploy will auto restart itself on crash (will also make it easy to see consumed memory with docker stats and other tools)
- will be able to deploy with a local ipfs instance all ready to go within the container
- can artificially restrict memory so we can test things like ejection mechanisms
Farther out -> hook into papertrail or some logging service so we can see what's up if it dies
Tests! - Conor working on

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

ipfs_indexer

Build notes

Building on mac

Running with logging output

Running with docker

CI

What Needs to be Done

Files

README.md

Latest commit

History

README.md

File metadata and controls

ipfs_indexer

Build notes

Building on mac

Running with logging output

Running with docker

CI

What Needs to be Done