IPIP-0383: Compact Denylist Format #383

hsanjuan · 2023-03-09T17:51:07Z

This IPIP introduces a line-based denylist format for content blocking on IPFS
focused on simplicity, scalability and ease-of-use.

https://github.com/hsanjuan/nopfs – an implementation of this IPIP which add supports for content blocking to the go-ipfs stack and particularly to Kubo.

thattommyhall · 2023-03-14T13:09:34Z

One thing I didnt like about the existing double hash is the concatenation leads to more work (you have to hash each subpath), I think a tuple of hash(CID), [path length, hash(path)] would spare the work

so to see if /ipfs/bafyWHATEVER/some/long/path/to/a/thing is blocked you would only have to hash bafyWHATEVER and see if it has any paths associated with it, maybe you see a length 3 one so you hash /some/long/path too and see if its blocked.

In the end in the denylist automation the path is always / and the only URLs were legacy ones

I wrote something to reprocess historical ones if we need to

hsanjuan · 2023-03-22T09:25:02Z

One thing I didnt like about the existing double hash is the concatenation leads to more work (you have to hash each subpath), I think a tuple of hash(CID), [path length, hash(path)] would spare the work

so to see if /ipfs/bafyWHATEVER/some/long/path/to/a/thing is blocked you would only have to hash bafyWHATEVER and see if it has any paths associated with it, maybe you see a length 3 one so you hash /some/long/path too and see if its blocked.

In the end in the denylist automation the path is always / and the only URLs were legacy ones

I wrote something to reprocess historical ones if we need to

Let's agree of talking always of double-hashing the MULTIHASH.
Part of this is implementation specific (how to hash the minimum amount of things). However, it would allow more efficient checking to have a hint like path length (not even component-count, but actual length of the path string), so that no double-hashing needs to even occur if lengths don't match. I'm thinking this could be metadata added to the rule via a hint, and implementations could decide whether to use it for their benefit or not.

COMPACT_DENYLIST_FORMAT.md

hsanjuan · 2023-03-28T21:40:19Z

I have done one more pass to the spec, explaining a bit better complex things like double-hashing and making a distinction between NameSystem and PathResolver. I think I'm pretty happy with the current form.

hsanjuan · 2023-03-28T21:43:04Z

Test fixtures are missing... A list of tests will come next.

An implementation of a content blocker with support to this format exists now in https://github.com/hsanjuan/nopfs, and the Badbits list is also available in compact form: https://badbits.dwebops.pub/badbits.deny.

IPIP/383-compact-denylist-format.md

COMPACT_DENYLIST_FORMAT.md

olizilla · 2023-06-16T14:36:38Z

COMPACT_DENYLIST_FORMAT.md

+The latter form allows blocking by double-hash using any hashing function of
+choice. Implementations will have to hash requests using all the hashing functions
+used in the denylist, so we recommend sticking to one.


Implementations will have to hash requests using ALL the hashing functions used in the denylist

Do we need this yet? Can we specify use of a multihash for self-describing future-proof goodness but require that only sha256 is valid for now to simplify things? For context: I'm looking at how i would add support for this spec to e-ipfs and w3s.link.

I acknowledge that we already have to accept providers can only do best effort enforcement of these lists given we already have to deal with path based rules which can't enforced at the BlockService level e.g in e-ipfs which is bitswap only... so i could just ignore entries that are hashed with a fn i don't support yet, but i'd prefer to be able to say "we support all block items that can be applied by the blockservice".

If your denylist only includes one type of hashing function, then you only hash using that one to check the double-hashes. So it is up to who makes the denylist... I would totally discourage using multiple hashing functions for items in the same denylist.

But other than that, since they are multihashes there's no reason to support anything. In a way I'm deferring the discussion of "what is the best hashing function for this" until later, because sha256 is probably not it. Mostly you just want something very very fast, and doesn't necessarily need to be a cryptographic one as it just meant to obfuscate.

as above, the tradeoffs are expressiveness of list-sharing for compute-obligation for conforming processors of the lists. here too deciding layering before committing to a degree of expressiveness (or the core/extension, required/optional corrolary) might be preferable to deciding this in advance of commitments to implement and conform at scale.

I am fine prominently recommending a hashing function. Which one should it be though?

What is:

A cryptographic hash function (we need this in the end as pre-image and collision protection are important).

Well supported in all languages/platforms

Very fast

Is it blake3? sha256? Are we ok with md5 (sure there are collisions but need to be carefully crafted). ? I'll ask around.

Would you consider defining it as only sha2-256 for the first version of the spec? That restriction could be loosened later if folks need it.

For pragmatic reasons, our options are probably limited to either sha2-256 or blake3 (ipfs impls. will have both).
Between sha2 and blake3, the latter should make a faster default (faster than md5 too). Maybe benchmark?

ps. The purpose behind the explicit list of hashes in the YAML header was to protect the node operator from silently exploding the list of hashes to check when the intention is to only use one (or two) functions. But maybe it is not a real DoS risk, these lists will come from somehow trusted sources.
If we decide that we only use a single hash function per list, it still feels like something that should be explicitly defined in YAML header, and not hard-coded in the spec (future-proofing without having to change the specs).

yes, i strongly agree that the header yaml should define the hash fn / use mutlhashes for future upgradeability, and i am asking that the spec, in it's initial wording, require only sha-256 be used.

we can even add a note in the spec that folks should raise a PR here if that is causing them real perf problems.

adding support for more hash functions would have to take into account that there is no baseline requirement of what hash functions an ipfs impl must support. web3-storage does not verify blake3 today. surprising but true https://github.com/web3-storage/car-block-validator/blob/cb121b780c86724c0266f797b5a4badffad77140/src/index.js#L1-L17

You don't need to list hash functions in the header. As you parse the list, you see the multihash function used in the multihashes that you encounter and that is the list of hashing functions you have double-hash with. Putting the list at the top is redundant.

I think the issue here is a matter of compatibility, and sha256 being the best thing that is nicely and universally supported everywhere. I'm ok with the spec requiring sha256 for double-hashing, otherwise leave things open as they are. If people use other hashes (which remains possible), they would be breaking the spec, but that's up to them.

I actually disagree that listing which hash functions are used in a list is necessary and always redundant, because I am focused on a use-case where more lists from more unfamiliar authorities get added and subtracted more often, making the list itself worth annotating. If managing and federating many denylists is being optimized for, all kinds of (optional!) frontmatter fields could be quite helpful in deciding which ones to federate/follow before parsing them, particularly for software that builds on CIDs selectively (which might need to know in advance if a denylist includes entries it cannot, or does not need to, parse).

In any case, I would argue that an optional frontmatter field for declaring which hash functions were/are used to populate a list is good, actually. I don't have strong opinions, but some basic questions, in case enough people agree that a section is worth adding to the IPIP itself:

should the values be name entries from the multicodec table, or the actual tags? mild preference for name, but admittedly this adds fragility and another possible conflict/drift risk.

what should a consumer do/think/feel if an entry uses a tag not announced up top? perhaps the spec need not say anything here and leave that up to use-case-specific considerations, but perhaps not.

should list hosters ADD entries to the frontmatter declaration over time if they add functions over time, or create a new list at a new URL when these "env var"s change? Considering many subscribers have to unload and reload the list for any changes except EOF appends, this could complicate the spec a lot.

name for sanity

With or without hash-function annotations at the top, you're in the same situation if you find a hash that you don't support while parsing. Annotation at the top would only help to reject a list altogether before parsing it. But why would you do that and not try to work with the entries that you can support in any case? Anyways, you can use any optional hints you want. We can standardize an optional hint name for this as long as it's optional.

I think you're finding the subtle issues. "whoops now my list includes also blake3 double-hashes" is bad to communicate in any way. "List A is done and you can follow up again on list B" is also not considered. It would be easier if we specify what the recommended behavior is when finding an unknown hashing function.

Also, we are talking a lot about hashing functions pertaining to the double-hash entries that are enconded as multihashes. You could also have normal /ipfs/<cid> entries, where the CID itself is not something that your underlying IPFS implementation understands: maybe you don't know the ipld codec, maybe the hash-function is unknown. In this cases, it would not be the "compact denylist parser component/blocker" the one that needs to be aware, but the underlying "ipfs" implementation. etc. but some implications might be similar.

So my practical approach is, 1) everything goes 2) try to parse what you can 3) strong recommendation or obligation to only use sha256 for double-hashing, everything else is non-spec-compliant (for now).

COMPACT_DENYLIST_FORMAT.md

Ref. #383 (comment)

lidel

Rebased and applied editorial fixes to publish both spec and IPIP on https://specs.ipfs.tech/ website.
I'll be revisiting this as we iron remaining kinks in reference implementation ipfs/kubo#10161 (+ related PRs), but the content is ready for final reviews from the wider audience of node operators and implementers.
I'll mention this during next IPFS Implementers Call.

src/ipips/ipip-0383.md

lidel · 2023-10-25T23:33:59Z

src/ipips/ipip-0383.md

+
+## Detailed design
+
+See :cite[compact-denylist-format].


ℹ️ This reference will start working once we have the spec published at https://specs.ipfs.tech/compact-denylist-format

src/compact-denylist-format.md

COMPACT_DENYLIST_FORMAT.md

src/compact-denylist-format.md

improves the flow of the document, hints are described only once now (3x before) in a single place, table of contents is easy to glance at and follows the flow of the file syntax

Fixes #8492 This introduces "nopfs" as a preloaded plugin into Kubo with support for denylists from ipfs/specs#383 It automatically makes Kubo watch *.deny files found in: - /etc/ipfs/denylists - $XDG_CONFIG_HOME/ipfs/denylists - $IPFS_PATH/denylists * test: Gateway.NoFetch and GatewayOverLibp2p adds missing tests for "no fetch" gateways one can expose, in both cases the offline mode is done by passing custom blockservice/exchange into path resolver, which means global path resolver that has nopfs intercept is not used, and the content blocking does not happen on these gateways. * fix: use offline path resolvers where appropriate this fixes the problem described in #10161 (comment) by adding explicit offline path resolvers that are backed by offline exchange, and using them in NoFetch gateways instead of the default online ones --------- Co-authored-by: Henrique Dias <hacdias@gmail.com> Co-authored-by: Marcin Rataj <lidel@lidel.org>

kdenhartog · 2024-01-24T12:01:12Z

Would it be possible to include metadata tags about categories of why a line has been added? This would be useful for displaying in Brave as an enhancement to the messaging that normally gets displayed with safe browsing which seems like the best way to fit this in.

hsanjuan · 2024-01-24T15:37:26Z

Would it be possible to include metadata tags about categories of why a line has been added? This would be useful for displaying in Brave as an enhancement to the messaging that normally gets displayed with safe browsing which seems like the best way to fit this in.

You can add whatever hints per line. It's just not specified whether some labels have special meanings that require certain actions. Implementations can give meaning to labels.

lidel

This has been ready for ratification since December, but took bit longer due to holidays and other work. No concerns were raised, merging.

Thank you @hsanjuan for driving this, on both spec and implementation front, and for being patient ❤️
Thank you to everyone who provided feedback and tested the reference implementations.

FYSA we shipped implementation in Kubo 0.24 and is also included in Rainbow.

src/compact-denylist-format.md

anthonybudd · 2024-05-28T20:13:21Z

I've just posted a question on r/IPFS and I think it would be better suited here.

I just added a file to IPFS that was a .txt file of the output of printenv from inside a docker container, when I went to the file’s address I was greeted by this 401 screen, I have been out of the IPFS community for a while and I have not seen this feature before.

The error message strikes me as strange and I have some questions about this feature

~~Who controls the list of blocked content? Is this list centralized?~~
How/Why was file found and blocked so quickly? ipns link to a plain text file that is also blocked
Who decides what qualifies as “legal, abuse, malware or security reasons”? Given the interplanetary nature of IPFS, what nation's laws are being used? For example I could share an image of Jon Venables in the USA but this would be illegal in the UK. If the UK court ruled this was illegal but the server was in the US, what would happen to my content in this case?
What safeguards are in place to prevent this feature being abused? For example, say a whistleblower published documents on IPFS, would it be possible for a government to leverage this feature to prevent it from being shared, "national security reasons"?
~~How does this feature work technically? If IPFS is a peer-to-peer protocol how is this (presumably) centralized list of banned content shared between nodes?~~

On the face of it this feature seems like a hideous idea that undermines the whole point of IPFS. Sure nobody wants to host hateful material, CSAM, [insert buzzword]-phobia, blah, blah, blah, but we all know that this feature will just get abused by governments and the big-tech companies that will end-up being the backbone of IPFS. Like consider a journalist wants to publish WikiLeaks 2.0 on IPFS, how quickly will the admins cave to the FBI and block that content?

hsanjuan · 2024-09-03T21:48:41Z

I've just posted a question on r/IPFS and I think it would be better suited here.

This is not really the best place. Ask on the discuss.ipfs.tech forums if you still wondering about these questions.

This comment was marked as resolved.

Sign in to view

hsanjuan self-assigned this Mar 9, 2023

hsanjuan marked this pull request as ready for review March 10, 2023 16:15

hsanjuan requested a review from a team as a code owner March 10, 2023 16:15

This comment was marked as resolved.

Sign in to view

lidel changed the title ~~IPIP: compact denylist format~~ IPIP-383: compact denylist format Mar 27, 2023

lidel reviewed Mar 27, 2023

View reviewed changes

COMPACT_DENYLIST_FORMAT.md Outdated Show resolved Hide resolved

hsanjuan force-pushed the denylist-compact branch 3 times, most recently from 7bb7428 to 77a804b Compare March 28, 2023 21:39

lidel mentioned this pull request Mar 28, 2023

IPFS filtering to allow node operators to decide on content they are willing to serve ipfs/kubo#8492

Closed

3 tasks

BigLep mentioned this pull request Mar 30, 2023

IPIP 298: (allow|deny)lists for IPFS Nodes and Gateways #340

Closed

olizilla reviewed Jun 16, 2023

View reviewed changes

IPIP/383-compact-denylist-format.md Outdated Show resolved Hide resolved

olizilla reviewed Jun 16, 2023

View reviewed changes

COMPACT_DENYLIST_FORMAT.md Outdated Show resolved Hide resolved

olizilla reviewed Jun 16, 2023

View reviewed changes

COMPACT_DENYLIST_FORMAT.md Outdated Show resolved Hide resolved

olizilla reviewed Jun 16, 2023

View reviewed changes

bumblefudge reviewed Jun 23, 2023

View reviewed changes

COMPACT_DENYLIST_FORMAT.md Outdated Show resolved Hide resolved

bumblefudge reviewed Jun 23, 2023

View reviewed changes

COMPACT_DENYLIST_FORMAT.md Outdated Show resolved Hide resolved

darobin reviewed Jul 6, 2023

View reviewed changes

COMPACT_DENYLIST_FORMAT.md Outdated Show resolved Hide resolved

darobin reviewed Jul 6, 2023

View reviewed changes

COMPACT_DENYLIST_FORMAT.md Outdated Show resolved Hide resolved

darobin reviewed Jul 6, 2023

View reviewed changes

COMPACT_DENYLIST_FORMAT.md Outdated Show resolved Hide resolved

darobin reviewed Jul 6, 2023

View reviewed changes

COMPACT_DENYLIST_FORMAT.md Outdated Show resolved Hide resolved

lidel reviewed Oct 20, 2023

View reviewed changes

COMPACT_DENYLIST_FORMAT.md Outdated Show resolved Hide resolved

lidel reviewed Oct 20, 2023

View reviewed changes

COMPACT_DENYLIST_FORMAT.md Outdated Show resolved Hide resolved

hsanjuan and others added 6 commits October 23, 2023 12:27

Make clarifying notes about /ipfs/CID/*.

068383c

Do not equate multihash to CIDv0 in double-hash spec

6752540

chore: move denylist to website src

1f88145

ipip-383: publish the spec on the website

79bfdc7

Merge main into denylist-compact

66545ff

chore: fix markdown identation

ffa81ee

lidel changed the title ~~IPIP-383: compact denylist format~~ IPIP-0383: Compact Denylist Format Oct 25, 2023

lidel added 2 commits October 26, 2023 01:44

ipip-383: remove mime rules

7233af2

Ref. #383 (comment)

chore: markdown lint MD049/emphasis-style

a5d8503

lidel reviewed Oct 25, 2023

View reviewed changes

lidel added 2 commits October 26, 2023 23:39

ipip-383: deduplicate content, editorial cleanup

fb7304a

improves the flow of the document, hints are described only once now (3x before) in a single place, table of contents is easy to glance at and follows the flow of the file syntax

fix: doublehash example, editorials

3eb75a7

lidel force-pushed the denylist-compact branch from 0ebce5a to 3eb75a7 Compare October 26, 2023 22:25

This was referenced Oct 27, 2023

Fix: doublehashes can use different hashes ipfs-shipyard/nopfs#28

Merged

feat: built-in content blocking based on IPIP-383 ipfs/kubo#10161

Merged

This was referenced Oct 30, 2023

Content Filter Plugin ipfs/kubo#6842

Closed

[Feature Suggestion] - CID deny / allow API ipfs/kubo#7871

Closed

Merge branch 'main' into denylist-compact

580a874

lidel approved these changes Feb 1, 2024

View reviewed changes

src/compact-denylist-format.md Outdated Show resolved Hide resolved

src/compact-denylist-format.md Outdated Show resolved Hide resolved

src/compact-denylist-format.md Outdated Show resolved Hide resolved

lidel added 2 commits February 1, 2024 16:07

ipip-383: final editorials

a0cb44b

Merge branch 'main' into denylist-compact

a2be889

lidel merged commit 7b2bbeb into main Feb 1, 2024
4 checks passed

lidel deleted the denylist-compact branch February 1, 2024 15:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IPIP-0383: Compact Denylist Format #383

IPIP-0383: Compact Denylist Format #383

hsanjuan commented Mar 9, 2023 •

edited by lidel

Loading

This comment was marked as resolved.

thattommyhall commented Mar 14, 2023 •

edited

Loading

This comment was marked as resolved.

This comment was marked as resolved.

hsanjuan commented Mar 22, 2023

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

hsanjuan commented Mar 28, 2023

hsanjuan commented Mar 28, 2023

olizilla Jun 16, 2023 •

edited

Loading

hsanjuan Jun 16, 2023

bumblefudge Jul 6, 2023

hsanjuan Oct 3, 2023

olizilla Oct 3, 2023

lidel Oct 4, 2023

olizilla Dec 18, 2023 •

edited

Loading

hsanjuan Dec 21, 2023

bumblefudge Dec 21, 2023 •

edited

Loading

hsanjuan Dec 21, 2023

lidel left a comment

lidel Oct 25, 2023

kdenhartog commented Jan 24, 2024

hsanjuan commented Jan 24, 2024

lidel left a comment

anthonybudd commented May 28, 2024 •

edited

Loading

hsanjuan commented Sep 3, 2024 •

edited by lidel

Loading

IPIP-0383: Compact Denylist Format #383

IPIP-0383: Compact Denylist Format #383

Conversation

hsanjuan commented Mar 9, 2023 • edited by lidel Loading

This comment was marked as resolved.

thattommyhall commented Mar 14, 2023 • edited Loading

This comment was marked as resolved.

This comment was marked as resolved.

hsanjuan commented Mar 22, 2023

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

hsanjuan commented Mar 28, 2023

hsanjuan commented Mar 28, 2023

olizilla Jun 16, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

olizilla Dec 18, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bumblefudge Dec 21, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lidel left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kdenhartog commented Jan 24, 2024

hsanjuan commented Jan 24, 2024

lidel left a comment

Choose a reason for hiding this comment

anthonybudd commented May 28, 2024 • edited Loading

hsanjuan commented Sep 3, 2024 • edited by lidel Loading

hsanjuan commented Mar 9, 2023 •

edited by lidel

Loading

thattommyhall commented Mar 14, 2023 •

edited

Loading

olizilla Jun 16, 2023 •

edited

Loading

olizilla Dec 18, 2023 •

edited

Loading

bumblefudge Dec 21, 2023 •

edited

Loading

anthonybudd commented May 28, 2024 •

edited

Loading

hsanjuan commented Sep 3, 2024 •

edited by lidel

Loading