-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
distinguish POSSE posts vs non-POSSE mentions and handle accordingly #51
Comments
concretely, these would only differ from current webmentions in that they wouldn't have an in-reply-to, since they truly are "mentions." |
two possible approaches for distinguishing the original author's POSSEd posts:
both are reasonable, and this would be a good feature. promoting to now. |
lots of discussion about this on IRC today. summary: when tweet links to a post, but isn't the official POSSE tweet of that post, responses are backfed and rendered as if they were responses to the original post. two examples. some people like this somewhat (e.g. @snarfed, @kevinmarks, maybe @kylewm); others don't (@aaronpk, @tantek). it's hard to prevent this. @tantek correctly notes that we can use however, the common case is that the original author later links to their post from a different (non-POSSE) tweet. we could use @kevinmarks suggests that we use time as a heuristic. if the author links to their post over 24h after it's originally posted, don't consider that a POSSE. definitely a good idea! (i'd re-emphasize that this is all tradeoffs. given real world usage, i don't see a single best answer so far, and leaving the current behavior is on the table. good to hash through options though!) |
current proposal from @tantek in IRC today: only consider a link to be the original copy if it's on a domain in the user's silo profile. sounds ok to me, we could consider implementing it. |
✋ In case it's useful, here's an example where Bridgy is being overly aggressive in assuming a tweet is the POSSE copy of an original. here's the original: https://adactio.com/journal/8710 and then a bunch of RT's of that tweet are backfed to the original as if they are RTs of the original. e.g., https://brid-gy.appspot.com/repost/twitter/jgarber/587245857034133504/587680705938907136 |
thanks @kylewm! one way to mitigate: when the post's domain isn't one of the tweet author's domains, demote to u-mention. |
some new thoughts from #452:
|
an idea for expanding this: search silos for any posts, from anyone, that link to the user's domain(s), and send wms for them too. these are effectively mentions. silo support for this is mixed:
moved this to #456 |
added the full set of OPD heuristics to the IWC wiki. the important part for implementing is: When considering a backlink in a silo post, use most or all of these heuristics to determine whether it's a POSSE:
current plan is to skip the last one due to complexity. i think the first three get us 80-95% of the value. |
reorganizing this slightly. this issue will cover implementing the algorithm above for determining whether a silo post is a POSSE. if it is, we won't send a wm from it to the original post, but we will send its responses. if it isn't a POSSE, we'll send wms to each link in its text (and attachments, etc), as mentions, but we won't send wms for its responses anywhere. @kylewm @tantek @kevinmarks @aaronpk @kartikprabhu i know this has been controversial for a while now. does that sound like the ideal behavior? i'm opening a new issue for the feature to search all silo posts for links to users' sites and send mentions for those: #456 |
Not sure that is ideal - the pattern I get currently is that I quote an old post, my link to it is assumed to be POSSE, and so it isn't shown, but replies are. If it shows my non-pOSEE link, the follow-ups are often interesting too, with that context. |
@kevinmarks thanks for reviewing, and good point! ok, so for non-POSSE mentions, we backfeed replies, but not likes or reposts. sound good? |
@kevinmarks on second thought, comparing to pure indieweb behavior...if i include a link in a post, I'd send a mention to it, but i wouldn't also send wms to it for each comment i get on my post, nor would i expect the commenters to send wms directly from their comment posts, since they're not replying to or mentioning that link. so... maybe we shouldn't backfeed replies to mentions after all? |
I agree with that last bit -- Instead of backfeeding only the responses to a mention, it should only backfeed the mention itself. Replies to a mention are not replies to the original. Unfortunately that means it matters even more that Bridgy guess correctly that something is a mention rather than a syndication (or err on the side of assuming syndication unless proven otherwise)... @snarfed in particular often rewords the silo copy so that I don't think edit distance would find them very similar at all, even though all the same information is contained (e.g. https://snarfed.org/2015-08-26_15313). |
Just to confirm, as far as I can tell the Twitter and G+ mentions are now flowing through again. On the blog with the most activity I usually post my morning (UK, ~6:30 GMT/BST) and the majority of mentions come over the next few hours. All fine so far. |
thanks for the update @armingrewe! glad to hear it. btw Facebook should work in general too, but I know you mentioned it hasn't for you. feel free to post details if you want! |
Facebook was fine all the time ;-) There might be something where bridgy isn't picking up something when I post via WordPress, but I need to look at that before I can be sure if there's an issue. |
i've updated the discussion of these OPD heuristics in https://indiewebcamp.com/original-post-discovery#Brainstorming . tldr: there are four, and we've hit real world counterexamples for all of them in bridgy, so none are ideal.
|
few random thoughts... Another possible heuristic: have we already seen a POSSE for this post on this service? if so, it's more likely that subsequent links are mentions. It's not that strong of a criteria because many people will tweet links to the same piece throughout the day (e.g. Dave Winer), and of course tweets are deleted and reposted as edits. It's much more costly to incorrectly identify a POSSE copy as a mention, i.e. no backfeed for that post. So the threshold for qualifying as a POSSE copy should probably be way lower, maybe matching some subset of the criteria, like off the top of my head:
It's very difficult to correctly categorize the "Kevin tweets a link to his post within 24h" case without throwing out a lot of legitimate POSSEs. In the specific case on the wiki, we could say it looks like he is tweeting at someone but the original isn't in-reply-to anything...wonder if that applies more generally to self-mentions. |
thanks @kylewm! interesting idea to record inferred POSSE links and check them later. kind of an extension of the way we already store syndication links. and you're right, the standard way to handle a complicated inference like this based on heuristics is to combine them with weights into a score... and that in this case, false negatives hurt much more than false positives. (I've always described bridgy as deliberately "promiscuous." :P) I'm already second guessing all this added complexity, though, and it looks like the domain check is comfortably the strongest so far, so I'm kind of leaning toward just that. meh. |
I would support that too. Fight that sunk cost fallacy! |
I'm not sure if it's this issue. I came here when searching for the "No post links found" message in this repository. For me Bridgy behaves a bit odd. I have posted my links as usual to Google+ (manually from my Known instance) and the favorites are feeded back to my site as normal, but the replies are not with the message "No post links found". I checked my Google+ profile and https://stream.tinokremer.nl is mentioned. On my own Known instance, my Google+ profile is mentioned too and IndieAuth sees it as normal. I'm puzzled why Bridgy cannot see post links, can you shed light on that @snarfed ? |
@tinokremer sorry for the trouble! you're right, it probably is due to this. current status: trying to track down the memory leak in #456 (comment), which is blocking further fixes here. wish me luck! |
Memory leaks are the hardest issues to solve and I'm a C# .Net developer. The reference system and garbage collector cleans up most of my mess. Good luck indeed! |
…direct URL for snarfed/bridgy#51, snarfed/bridgy#485. thanks to @kylewm for help debugging!
matches same kwarg in bridgy's original_post_discovery.discover(). for snarfed/bridgy#51, snarfed/bridgy#485
tentatively closing. this has been running in prod and stable for a few days. I'm sure there are more bugs left to fix, but we can open new issues for them. |
this would be nice for catching when other people post a link to your post in a silo.
i did this for a while in mid 2012, before bridgy's re-release with webmentions. i stopped because the POSSEd posts showed up as comments on the original posts, and i kept that decision in the re-release because i didn't see enough people using rel-syndication links, which meant i couldn't prevent the same thing happening to them.
on the other hand, we've been thinking more about de-duping and similar issues recently, and @tantek proposed that this kind of noise might help motivate people to make their mention handling smarter. worth a thought.
The text was updated successfully, but these errors were encountered: