Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't allow previewing shared history rooms #239

Merged
merged 3 commits into from
Jun 27, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 0 additions & 12 deletions server/lib/matrix-utils/fetch-room-data.js
Original file line number Diff line number Diff line change
Expand Up @@ -155,7 +155,6 @@ const fetchRoomData = traceFunction(async function (
stateCanonicalAliasResDataOutcome,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution and patience @tulir 🙇 🐦

stateAvatarResDataOutcome,
stateHistoryVisibilityResDataOutcome,
stateJoinRulesResDataOutcome,
predecessorInfoOutcome,
successorInfoOutcome,
] = await Promise.allSettled([
Expand All @@ -182,10 +181,6 @@ const fetchRoomData = traceFunction(async function (
abortSignal,
}
),
fetchEndpointAsJson(getStateEndpointForRoomIdAndEventType(roomId, 'm.room.join_rules'), {
accessToken: matrixAccessToken,
abortSignal,
}),
fetchPredecessorInfo(matrixAccessToken, roomId, { abortSignal }),
fetchSuccessorInfo(matrixAccessToken, roomId, { abortSignal }),
]);
Expand Down Expand Up @@ -220,12 +215,6 @@ const fetchRoomData = traceFunction(async function (
historyVisibility = data?.content?.history_visibility;
}

let joinRule;
if (stateJoinRulesResDataOutcome.reason === undefined) {
const { data } = stateJoinRulesResDataOutcome.value;
joinRule = data?.content?.join_rule;
}

let roomCreationTs;
let predecessorRoomId;
let predecessorLastKnownEventId;
Expand All @@ -251,7 +240,6 @@ const fetchRoomData = traceFunction(async function (
canonicalAlias,
avatarUrl,
historyVisibility,
joinRule,
roomCreationTs,
predecessorRoomId,
predecessorLastKnownEventId,
Expand Down
4 changes: 2 additions & 2 deletions server/routes/room-directory-routes.js
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,6 @@ const matrixServerName = config.get('matrixServerName');
assert(matrixServerName);
const matrixAccessToken = config.get('matrixAccessToken');
assert(matrixAccessToken);
const stopSearchEngineIndexing = config.get('stopSearchEngineIndexing');

const router = express.Router({
caseSensitive: true,
Expand Down Expand Up @@ -71,7 +70,8 @@ router.get(
}

// We index the room directory unless the config says we shouldn't index anything
const shouldIndex = !stopSearchEngineIndexing;
const stopSearchEngineIndexingFromConfig = config.get('stopSearchEngineIndexing');
const shouldIndex = !stopSearchEngineIndexingFromConfig;

const pageOptions = {
title: `Matrix Public Archive`,
Expand Down
12 changes: 5 additions & 7 deletions server/routes/room-routes.js
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,6 @@ const matrixServerUrl = config.get('matrixServerUrl');
assert(matrixServerUrl);
const matrixAccessToken = config.get('matrixAccessToken');
assert(matrixAccessToken);
const stopSearchEngineIndexing = config.get('stopSearchEngineIndexing');

const matrixPublicArchiveURLCreator = new MatrixPublicArchiveURLCreator(basePath);

Expand Down Expand Up @@ -828,15 +827,13 @@ router.get(
}),
MadLittleMods marked this conversation as resolved.
Show resolved Hide resolved
]);

// Only `world_readable` or `shared` rooms that are `public` are viewable in the archive
const allowedToViewRoom =
roomData.historyVisibility === 'world_readable' ||
(roomData.historyVisibility === 'shared' && roomData.joinRule === 'public');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having public shared rooms viewable but not indexed by search engines is by design.

My reply in the opt-out issue probably explains this the best so far:

"archived" is a bit of a overloaded term here but given that this project is called "Matrix Public Archive" I can see where the confusion may be be coming from. Any public room should be viewable in Matrix Public Archive. The idea is if a random Matrix user can view the room, then it should be viewable in the archive. But only history_visibility: "world_readable" rooms are indexable by search engines.

The Matrix Public Archive doesn't hold onto any data (it's stateless) and requests the messages from the homeserver every time (it archives nothing). The archive.matrix.org instance has some caching in place, 5
minutes for the current day, and 2 days for past content.

I've tried to clarify more of this in the FAQ document and added more details on why not guest access/peeking.

Banning @archive:matrix.org will prevent the room from showing up on archive.matrix.org and the cache will expire after 5-minutes/2-days for any content that is showing there now. Adding better opt-out controls like this issue is discussing is on the list 👍. I've updated the description with the current MSC proposals out there.

-- #47 (comment)

Copy link
Contributor

@MadLittleMods MadLittleMods Jun 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We plan to move ahead with this PR to remove shared public rooms from the archive ⏩

The points in favor of keeping shared accessible can be summarized by the following but in the end, while the archive bot was respectful of the technical implications and doesn't expose messages to any further audience (random people), there is a social obligation to consider. This option has been represented as "members only" in many clients which doesn't leave any nuance.

Otherwise, the main idea was if I can view the messages from a Matrix client as a random user, I should also be able to see the messages in the archive. In both the native Matrix client and archive cases, it’s the same result when a random user wants to view a shared room:

  • A random Matrix user accesses the room, they see the history
  • A random user accesses the room in the archive, they see the history
  • Search engines are not allowed in either case (that only applies to world_readable rooms)

The join is mostly a technical detail to anyone trying to view the room. While I don't think the join event provides much value to the room in the normal cases, it could have benefit in tracing bad actors for moderation.

From the spec:

  • world_readable - All events while this is the m.room.history_visibility value may be shared by any participating homeserver with anyone, regardless of whether they have ever joined the room.
  • shared - Previous events are always accessible to newly joined members. All events in the room are accessible, even those sent when the member was not a part of the room.

Removing shared rooms, does mean we’re re-introducing friction for a portion people which the archive eliminates (which homeserver do I choose, which client, why do I even need an account, how do I view this on mobile, how do I reference and share this message to someone not already in the Matrix ecosystem, etc). But people can update their room to be world_readable as they see fit now to regain these benefits for their community.

// Only `world_readable` rooms are viewable in the archive
const allowedToViewRoom = roomData.historyVisibility === 'world_readable';

if (!allowedToViewRoom) {
throw new StatusError(
403,
`Only \`world_readable\` or \`shared\` rooms that are \`public\` can be viewed in the archive. ${roomData.id} has m.room.history_visiblity=${roomData.historyVisibility} m.room.join_rules=${roomData.joinRule}`
`Only \`world_readable\` rooms can be viewed in the archive. ${roomData.id} has m.room.history_visiblity=${roomData.historyVisibility}`
);
}

MadLittleMods marked this conversation as resolved.
Show resolved Hide resolved
Expand Down Expand Up @@ -891,7 +888,8 @@ router.get(

// Default to no indexing (safe default)
let shouldIndex = false;
if (stopSearchEngineIndexing) {
const stopSearchEngineIndexingFromConfig = config.get('stopSearchEngineIndexing');
if (stopSearchEngineIndexingFromConfig) {
shouldIndex = false;
} else {
// Otherwise we only allow search engines to index `world_readable` rooms
Expand Down
45 changes: 40 additions & 5 deletions test/e2e-tests.js
Original file line number Diff line number Diff line change
Expand Up @@ -2686,15 +2686,50 @@ describe('matrix-public-archive', () => {
assert.strictEqual(dom.document.querySelector(`meta[name="robots"]`), null);
});

it('search engines not allowed to index `public` room', async () => {
it('search engines not allowed to access public room with non-`world_readable` history visibility', async () => {
const client = await getTestClientForHs(testMatrixServerUrl1);
const roomId = await createTestRoom(client, {
// The default options for the test rooms adds a
// `m.room.history_visiblity` state event so we override that here so
// it's only a public room.
initial_state: [],
// Set as `shared` since it's the next most permissive history visibility
// after `world_readable` but still not allowed to be accesible in the
// archive.
initial_state: [
{
type: 'm.room.history_visibility',
state_key: '',
content: {
history_visibility: 'shared',
},
},
],
});

try {
archiveUrl = matrixPublicArchiveURLCreator.archiveUrlForRoom(roomId);
await fetchEndpointAsText(archiveUrl);
assert.fail(
new TestError(
'We expect the request to fail with a 403 since the archive should not be able to view a non-world_readable room but it succeeded'
)
);
} catch (err) {
if (err instanceof TestError) {
throw err;
}
assert.strictEqual(
err.response.status,
403,
`Expected err.response.status=${err?.response?.status} to be 403 but error was: ${err.stack}`
);
}
});

it('Configuring `stopSearchEngineIndexing` will stop search engine indexing', async () => {
// Disable search engine indexing across the entire instance
config.set('stopSearchEngineIndexing', true);

const client = await getTestClientForHs(testMatrixServerUrl1);
const roomId = await createTestRoom(client);

archiveUrl = matrixPublicArchiveURLCreator.archiveUrlForRoom(roomId);
const { data: archivePageHtml } = await fetchEndpointAsText(archiveUrl);

Expand Down