distinguish POSSE posts vs non-POSSE mentions and handle accordingly #51

snarfed · 2014-01-31T22:17:00Z

this would be nice for catching when other people post a link to your post in a silo.

i did this for a while in mid 2012, before bridgy's re-release with webmentions. i stopped because the POSSEd posts showed up as comments on the original posts, and i kept that decision in the re-release because i didn't see enough people using rel-syndication links, which meant i couldn't prevent the same thing happening to them.

on the other hand, we've been thinking more about de-duping and similar issues recently, and @tantek proposed that this kind of noise might help motivate people to make their mention handling smarter. worth a thought.

snarfed · 2014-01-31T23:52:20Z

concretely, these would only differ from current webmentions in that they wouldn't have an in-reply-to, since they truly are "mentions."

snarfed · 2014-04-14T19:21:54Z

two possible approaches for distinguishing the original author's POSSEd posts:

don't bother. ideally, webmention handlers would detect them and filter them out, or whatever they want. (@tantek advocates this.)
omit original silo posts from the author, but not from other people.

both are reasonable, and this would be a good feature. promoting to now.

snarfed · 2014-08-26T20:47:25Z

lots of discussion about this on IRC today.

summary: when tweet links to a post, but isn't the official POSSE tweet of that post, responses are backfed and rendered as if they were responses to the original post. two examples. some people like this somewhat (e.g. @snarfed, @kevinmarks, maybe @kylewm); others don't (@aaronpk, @tantek).

it's hard to prevent this. @tantek correctly notes that we can use rel=me to identify the original author, and only treat their tweets as POSSE candidates. that's a good step.

however, the common case is that the original author later links to their post from a different (non-POSSE) tweet. we could use u-syndication and permashortcitations to distinguish that from the original POSSE tweet, but both of those have low adoption rates among bridgy users, so we'd end up muzzling the majority of responses, which i don't want to do.

@kevinmarks suggests that we use time as a heuristic. if the author links to their post over 24h after it's originally posted, don't consider that a POSSE. definitely a good idea!

(i'd re-emphasize that this is all tradeoffs. given real world usage, i don't see a single best answer so far, and leaving the current behavior is on the table. good to hash through options though!)

snarfed · 2015-01-08T21:33:09Z

current proposal from @tantek in IRC today: only consider a link to be the original copy if it's on a domain in the user's silo profile. sounds ok to me, we could consider implementing it.

tantek · 2015-01-08T23:53:46Z

As written up: http://indiewebcamp.com/original-post-discovery#POSSE_copy_domain_approximation

kylewm · 2015-04-13T22:34:45Z

✋ In case it's useful, here's an example where Bridgy is being overly aggressive in assuming a tweet is the POSSE copy of an original.

here's the original: https://adactio.com/journal/8710
here's a tweet from someone else (another bridgy user) linking to the original: https://twitter.com/jgarber/status/587245857034133504

and then a bunch of RT's of that tweet are backfed to the original as if they are RTs of the original. e.g., https://brid-gy.appspot.com/repost/twitter/jgarber/587245857034133504/587680705938907136

snarfed · 2015-04-13T22:39:17Z

thanks @kylewm!

one way to mitigate: when the post's domain isn't one of the tweet author's domains, demote to u-mention.

snarfed · 2015-08-28T16:01:12Z

some new thoughts from #452:

here's a concrete example. i recently tweeted this:

My silly privacy antics landed me in a @vice @Motherboard article on prepaid credit cards. Fun, mildly embarrassing. http://motherboard.vice.com/read/the-simple-trick-ashley-madisons-users-could-have-used-to-protect-themselves

with this new feature, we'd attempt to send a webmention with this tweet as the source and the motherboard.vice.com link as the target. of course, the source wouldn't actually be the twitter.com permalink, it'd be the bridgy proxy URL that renders the tweet as mf2.

one interesting question is whether to do consider this part of "listen" or "publish." ie should we start doing this when you sign up for backfeed? or only when you enable publish? it's not clear to me which one it belongs to. i'm leaning toward listen (backfeed), but not sure.

also, a catch: POSSE/PESOSed silo posts would end up sending multiple wms, one from the original post and one from each silo post, so the target would end up showing duplicates. bridgy already causes this for POSSEd comments/likes/reposts, though, so it's not a new problem, and we've pretty much agreed that it's the recipient's job to use syndication links, etc to de-dupe.

snarfed · 2015-08-28T16:05:47Z

an idea for expanding this: search silos for any posts, from anyone, that link to the user's domain(s), and send wms for them too. these are effectively mentions.

silo support for this is mixed:

twitter: /search/tweets.json?q=
G+: /activities?query=
FB: no. search was removed in API v2.0. you can search over other things, including events, but mentions in event descriptions are a small minority case.
IG: no. can't find any full text search support in their API reference docs.

moved this to #456

snarfed · 2015-08-29T17:55:24Z

added the full set of OPD heuristics to the IWC wiki. the important part for implementing is:

When considering a backlink in a silo post, use most or all of these heuristics to determine whether it's a POSSE:

The backlink must be at or near the end. (Allow e.g. a close paren after the link.)
The backlink must point to one of the user's domains, as determined by rel-me and links in their silo profile.
The silo post must be published within 24h of the original post.
New: compare the silo post's text and the original post's name, summary, and/or content, taking prefixes if they're meaningfully longer. (If the silo post has an ellipsis at or near the end, that's a strong hint to use a prefix.) The edit distance should be below a certain threshold, disregarding common differences like @-usernames in silo posts vs human names in original posts (e.g. this OP vs this POSSE).

current plan is to skip the last one due to complexity. i think the first three get us 80-95% of the value.

snarfed · 2015-09-01T16:53:44Z

reorganizing this slightly. this issue will cover implementing the algorithm above for determining whether a silo post is a POSSE. if it is, we won't send a wm from it to the original post, but we will send its responses. if it isn't a POSSE, we'll send wms to each link in its text (and attachments, etc), as mentions, but we won't send wms for its responses anywhere.

@kylewm @tantek @kevinmarks @aaronpk @kartikprabhu i know this has been controversial for a while now. does that sound like the ideal behavior?

i'm opening a new issue for the feature to search all silo posts for links to users' sites and send mentions for those: #456

kevinmarks · 2015-09-02T00:56:56Z

Not sure that is ideal - the pattern I get currently is that I quote an old post, my link to it is assumed to be POSSE, and so it isn't shown, but replies are. If it shows my non-pOSEE link, the follow-ups are often interesting too, with that context.

snarfed · 2015-09-02T02:23:29Z

@kevinmarks thanks for reviewing, and good point! ok, so for non-POSSE mentions, we backfeed replies, but not likes or reposts. sound good?

snarfed · 2015-09-02T03:49:55Z

@kevinmarks on second thought, comparing to pure indieweb behavior...if i include a link in a post, I'd send a mention to it, but i wouldn't also send wms to it for each comment i get on my post, nor would i expect the commenters to send wms directly from their comment posts, since they're not replying to or mentioning that link. so... maybe we shouldn't backfeed replies to mentions after all?

kylewm · 2015-09-02T04:36:25Z

I agree with that last bit -- Instead of backfeeding only the responses to a mention, it should only backfeed the mention itself. Replies to a mention are not replies to the original.

Unfortunately that means it matters even more that Bridgy guess correctly that something is a mention rather than a syndication (or err on the side of assuming syndication unless proven otherwise)... @snarfed in particular often rewords the silo copy so that I don't think edit distance would find them very similar at all, even though all the same information is contained (e.g. https://snarfed.org/2015-08-26_15313).

armingrewe · 2015-09-14T12:13:55Z

Just to confirm, as far as I can tell the Twitter and G+ mentions are now flowing through again. On the blog with the most activity I usually post my morning (UK, ~6:30 GMT/BST) and the majority of mentions come over the next few hours. All fine so far.

snarfed · 2015-09-14T15:07:54Z

thanks for the update @armingrewe! glad to hear it.

btw Facebook should work in general too, but I know you mentioned it hasn't for you. feel free to post details if you want!

armingrewe · 2015-09-14T15:13:55Z

Facebook was fine all the time ;-) There might be something where bridgy isn't picking up something when I post via WordPress, but I need to look at that before I can be sure if there's an issue.

snarfed · 2015-09-15T00:23:42Z

i've updated the discussion of these OPD heuristics in https://indiewebcamp.com/original-post-discovery#Brainstorming . tldr: there are four, and we've hit real world counterexamples for all of them in bridgy, so none are ideal.

user's domain
within 24h
near the end of the silo post
nearly the same text as the silo post, ie edit distance is below a given threshold

kylewm · 2015-09-15T01:20:27Z

few random thoughts...

Another possible heuristic: have we already seen a POSSE for this post on this service? if so, it's more likely that subsequent links are mentions. It's not that strong of a criteria because many people will tweet links to the same piece throughout the day (e.g. Dave Winer), and of course tweets are deleted and reposted as edits.

It's much more costly to incorrectly identify a POSSE copy as a mention, i.e. no backfeed for that post. So the threshold for qualifying as a POSSE copy should probably be way lower, maybe matching some subset of the criteria, like off the top of my head:

* any two of the first three
* any one of the first three + lower than 50% edit distance
* lower than 30% edit distance

It's very difficult to correctly categorize the "Kevin tweets a link to his post within 24h" case without throwing out a lot of legitimate POSSEs. In the specific case on the wiki, we could say it looks like he is tweeting at someone but the original isn't in-reply-to anything...wonder if that applies more generally to self-mentions.

snarfed · 2015-09-15T05:32:35Z

thanks @kylewm! interesting idea to record inferred POSSE links and check them later. kind of an extension of the way we already store syndication links. and you're right, the standard way to handle a complicated inference like this based on heuristics is to combine them with weights into a score... and that in this case, false negatives hurt much more than false positives. (I've always described bridgy as deliberately "promiscuous." :P)

I'm already second guessing all this added complexity, though, and it looks like the domain check is comfortably the strongest so far, so I'm kind of leaning toward just that. meh.

kylewm · 2015-09-15T16:57:40Z

I'm already second guessing all this added complexity, though, and it looks like the domain check is comfortably the strongest so far, so I'm kind of leaning toward just that. meh.

I would support that too. Fight that sunk cost fallacy!

ghost · 2015-09-22T16:43:49Z

I'm not sure if it's this issue. I came here when searching for the "No post links found" message in this repository. For me Bridgy behaves a bit odd. I have posted my links as usual to Google+ (manually from my Known instance) and the favorites are feeded back to my site as normal, but the replies are not with the message "No post links found". I checked my Google+ profile and https://stream.tinokremer.nl is mentioned. On my own Known instance, my Google+ profile is mentioned too and IndieAuth sees it as normal.

I'm puzzled why Bridgy cannot see post links, can you shed light on that @snarfed ?

snarfed · 2015-09-23T04:07:07Z

@tinokremer sorry for the trouble! you're right, it probably is due to this. current status: trying to track down the memory leak in #456 (comment), which is blocking further fixes here. wish me luck!

ghost · 2015-09-23T05:37:59Z

Memory leaks are the hardest issues to solve and I'm a C# .Net developer. The reference system and garbage collector cleans up most of my mess. Good luck indeed!

@kylewm

…direct URL for snarfed/bridgy#51, snarfed/bridgy#485. thanks to @kylewm for help debugging!

matches same kwarg in bridgy's original_post_discovery.discover(). for snarfed/bridgy#51, snarfed/bridgy#485

…ial ones uses new include_redirect_sources kwarg in Source.original_post_discovery(). for #51, #485

for #51

snarfed · 2015-09-26T01:46:32Z

tentatively closing. this has been running in prod and stable for a few days. I'm sure there are more bugs left to fix, but we can open new issues for them.

re #51, #456, #1021

snarfed changed the title ~~send webmentions for original POSSE silo posts~~ send webmentions for posts as well as responses Apr 14, 2014

snarfed added now and removed maybe labels Apr 14, 2014

snarfed mentioned this issue Apr 22, 2014

POSSE post discovery to support link-less syndicated content #130

Merged

snarfed changed the title ~~send webmentions for posts as well as responses~~ send webmentions for (non-POSSE) posts as well as responses Aug 26, 2014

snarfed added now and removed now labels Aug 26, 2014

snarfed removed the later label Sep 4, 2014

snarfed mentioned this issue Dec 13, 2014

"favorited a tweet linking to"... aaronpk/webmention.io#34

Closed

snarfed mentioned this issue Mar 14, 2015

Misdirected webmentions from twitter #376

Closed

snarfed mentioned this issue Aug 27, 2015

feature: send outbound webmentions for silo posts #452

Closed

snarfed mentioned this issue Sep 1, 2015

search all silo posts for links to users' sites and send mentions #456

Closed

snarfed changed the title ~~send webmentions for (non-POSSE) posts as well as responses~~ distinguish POSSE posts vs non-POSSE mentions and handle accordingly Sep 1, 2015

snarfed added the now label Sep 1, 2015

kylewm mentioned this issue Sep 15, 2015

Facebook: publish should set "link" field when post ends in a link #474

Closed

snarfed mentioned this issue Sep 17, 2015

Option for forcing strict original post discovery #241

Open

snarfed mentioned this issue Sep 23, 2015

Got mentions for my POSSE'd copies #485

Closed

snarfed added a commit to snarfed/granary that referenced this issue Sep 23, 2015

PPD: for redirects, use final URL for domain check, and attach pre-re…

2186a97

…direct URL for snarfed/bridgy#51, snarfed/bridgy#485. thanks to @kylewm for help debugging!

snarfed added a commit to snarfed/granary that referenced this issue Sep 24, 2015

add include_redirect_sources kwarg to Source.original_post_discovery()

8fcd45f

matches same kwarg in bridgy's original_post_discovery.discover(). for snarfed/bridgy#51, snarfed/bridgy#485

snarfed added a commit that referenced this issue Sep 24, 2015

on redirects, only include final URLs in webmention targets, not init…

9b47032

…ial ones uses new include_redirect_sources kwarg in Source.original_post_discovery(). for #51, #485

snarfed added a commit that referenced this issue Sep 25, 2015

document new domain check for POSSE vs mention logic

70cd8cb

for #51

snarfed mentioned this issue Sep 25, 2015

PPD: rel-feed discovery should update Source.domains #491

Closed

snarfed closed this as completed Sep 26, 2015

This was referenced Sep 26, 2015

not backfeeding comments #492

Closed

Twitter - "no webmention targets"? #494

Closed

This was referenced Oct 19, 2015

don't send blog webmentions for syndication links #297

Closed

Same bridgy link sent to upstream replies #455

Closed

snarfed mentioned this issue Dec 23, 2015

handle quote tweet-style mentions #582

Merged

snarfed added a commit to snarfed/granary that referenced this issue Dec 28, 2015

relax OPD domain req't to allow subdomain match. for snarfed/bridgy#51

1fe3ba2

snarfed mentioned this issue Jun 13, 2019

No webmention targets for targets already found #875

Closed

snarfed mentioned this issue Nov 11, 2019

POSSE links vs discussion links? #897

Closed

stedn mentioned this issue Apr 4, 2020

[twitter] adding an optional flag to send "non-owner" tweets as mentions in addition to current POSSE replies #930

Closed

snarfed added a commit that referenced this issue Apr 4, 2021

don't backfeed responses to link posts unless they're by the user

ca712c4

re #51, #456, #1021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

distinguish POSSE posts vs non-POSSE mentions and handle accordingly #51

distinguish POSSE posts vs non-POSSE mentions and handle accordingly #51

snarfed commented Jan 31, 2014

snarfed commented Jan 31, 2014

snarfed commented Apr 14, 2014

snarfed commented Aug 26, 2014

snarfed commented Jan 8, 2015

tantek commented Jan 8, 2015

kylewm commented Apr 13, 2015

snarfed commented Apr 13, 2015

snarfed commented Aug 28, 2015

snarfed commented Aug 28, 2015

snarfed commented Aug 29, 2015

snarfed commented Sep 1, 2015

kevinmarks commented Sep 2, 2015

snarfed commented Sep 2, 2015

snarfed commented Sep 2, 2015

kylewm commented Sep 2, 2015

armingrewe commented Sep 14, 2015

snarfed commented Sep 14, 2015

armingrewe commented Sep 14, 2015

snarfed commented Sep 15, 2015

kylewm commented Sep 15, 2015

snarfed commented Sep 15, 2015

kylewm commented Sep 15, 2015

ghost commented Sep 22, 2015

snarfed commented Sep 23, 2015

ghost commented Sep 23, 2015

snarfed commented Sep 26, 2015

distinguish POSSE posts vs non-POSSE mentions and handle accordingly #51

distinguish POSSE posts vs non-POSSE mentions and handle accordingly #51

Comments

snarfed commented Jan 31, 2014

snarfed commented Jan 31, 2014

snarfed commented Apr 14, 2014

snarfed commented Aug 26, 2014

snarfed commented Jan 8, 2015

tantek commented Jan 8, 2015

kylewm commented Apr 13, 2015

snarfed commented Apr 13, 2015

snarfed commented Aug 28, 2015

snarfed commented Aug 28, 2015

snarfed commented Aug 29, 2015

snarfed commented Sep 1, 2015

kevinmarks commented Sep 2, 2015

snarfed commented Sep 2, 2015

snarfed commented Sep 2, 2015

kylewm commented Sep 2, 2015

armingrewe commented Sep 14, 2015

snarfed commented Sep 14, 2015

armingrewe commented Sep 14, 2015

snarfed commented Sep 15, 2015

kylewm commented Sep 15, 2015

snarfed commented Sep 15, 2015

kylewm commented Sep 15, 2015

ghost commented Sep 22, 2015

snarfed commented Sep 23, 2015

ghost commented Sep 23, 2015

snarfed commented Sep 26, 2015