Architecture: graphql

sm-graphql

This project is organized into vertical slices of functionality, grouped by Business Domain. This is an intentional design choice to minimize the amount of layering and boilerplate that is common in web server projects. Most simple CRUD logic should be implemented directly inside GraphQL resolver functions.

sm-graphql starts several servers/processes, all managed in the server.js file:

Apollo GraphQL API server
Apollo GraphQL Subscription API server (WebSockets)
HTTP registration/login/etc REST API
HTTP "Storage Server" for raw optical image upload
(in the Storage Server) Uppy Companion server for signing upload URLs for direct dataset/molDB upload to S3
A scheduled "cron" job for sending reminder emails to users to publish their data if they have old private projects

Additionally, TypeORM runs any new database migrations on startup.

The GraphQL API can be easily explored at https://metaspace2020.eu/graphql (or your local dev server equivalent). Set "request.credentials": "include", in the settings and it will use your login details from the browser cookies.

Security

Almost all security-related logic happens in sm-graphql:

User creation/login is handled by the REST API in src/modules/auth/controller.ts
Authentication is handled by Passport middleware based on each request's cookies/JWTs/Api-Keys
Authorization needs to be handled explicitly in GraphQL resolver functions. This is usually done when retrieving the entity, e.g.:
- The ElasticSearch queries in esConnector.ts filter datasets/annotations to only include results that should be visible to the current user. Some controllers will even query and discard the result just to check that the dataset is allowed to be accessed.
- When there are multiple different levels of access privilege, it should be explicit in the function names, e.g. getDatasetForEditing which will raise an exception if the user isn't allowed to edit the dataset.
- Operations that call sm-api must still handle authorization! sm-api doesn't do any authorization itself.
As an optimization, some resolvers pass authorization information to their children resolvers through a scopeRole field, e.g.

Authentication methods

Cookie

Managed by the Passport library, works like every other website - the cookie content includes a signed session ID, the actual session data is stored in Redis. Cookies are the primary authentication mechanism - (non-anonymous) JWTs and Api-Keys can only be generated by a user authenticated with a cookie.

The cookie is the same whether a user logs in with Google or Email+Password.

JWT

GraphQL requests from webapp use a JWT for authentication. This isn't really needed anymore - previously webapp and graphql were separate and webapp handled authentication. It's just more work to clean up - getting access to the cookies in the GraphQL Subscription Server has been an difficult/impossible in the past. The subscription server library has probably fixed that by now.

Python Client also uses JWTs if Email+Password authentication is used. For Api-Key authentication, the JWT isn't needed.

Api-Key

API Keys use a similar authentication code path to JWTs, but have significant restrictions (only specific mutations are allowed, some queries are blocked, all usages are logged) to limit the impact if they're leaked. They're intended for use with the Python Client.

Project review link

The project publication workflow allows a user to create a share link to that project. Anyone who accesses this link is allowed to see the datasets in the project - the authorization details are persisted in the user's session, even if they're not logged in.

Email validation link

Not intended to be used continually, but for new users' convenience, clicking the email validation link will give them a logged-in cookie up to 1 hour after account creation. This technically counts as an authentication method from a security perspective.

Points of interest:

`src/modules/auth`

Contains authentication middleware and a non-GraphQL REST API for registration, login, JWT issuing, etc.

`src/modules/webServer`

Contains the Storage Server and code to run Uppy Companion

`schemas`

Contains the GraphQL schema files. These are compiled by Apollo into a single schema at runtime.

Webapp's tests also use compiled version of these schema files so that it can run a mock graphql server for the tests to call. The schema is kept in webapp/tests/utils/graphql-schema.json (not stored in Git) and is generated by running yarn run gen-graphql-schema in the graphql project. Webapp automatically calls this as part of yarn run test.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly