Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change database tables schema for Clustering #3573

Closed
marciofoz opened this issue Apr 10, 2018 · 12 comments
Closed

Change database tables schema for Clustering #3573

marciofoz opened this issue Apr 10, 2018 · 12 comments
Assignees
Labels
Housekeeping:1:Todo Any dependency management or refactor that would be nice to have some day.

Comments

@marciofoz
Copy link

When running PKP/OJS in Percona XtraDB Cluster (MySql Cluster), all tables are required to have a primary key. Not all of the PKP/OJS tables contain a primary key.

Reference: https://dev.mysql.com/doc/refman/5.7/en/group-replication-requirements.html

"Primary Keys. Every table that is to be replicated by the group must have an explicit primary key defined. Primary keys play an extremely important role in determining which transactions conflict by identifying exactly which rows each transaction has modified."

We see this PKP/OJS tables that do not contain primary keys:

announcement_settings
announcement_type_settings
article_galley_settings
article_html_galley_images
article_search_object_keywords
article_settings
article_supp_file_settings
author_settings
books_for_review_settings
citation_settings
controlled_vocab_entry_settings
custom_issue_orders
custom_section_orders
data_object_tombstone_settings
email_log_users
email_templates_data
email_templates_default_data
event_log_settings
external_feed_settings
filter_settings
group_memberships
group_settings
issue_galley_settings
issue_settings
journal_settings
metadata_description_settings
metrics
notification_settings
oai_resumption_tokens
object_for_review_settings
plugin_settings
processes
referral_settings
review_form_element_settings
review_form_responses
review_form_settings
review_object_metadata_settings
review_object_type_settings
roles
scheduled_tasks
section_editors
section_settings
sessions
site
site_settings
static_page_settings
subscription_type_settings
usage_stats_temporary_records
user_interests
user_settings
versions

@asmecher
Copy link
Member

@marciofoz, this is a side-effect of the database toolset we use for schema management (ADODB), and our need to support both MySQL and PostgreSQL, which have their own peculiarities around schema management, indexing, etc. If you'll look at those tables you'll generally find a unique index with a name that has a _pkey suffix. This takes the place of explicitly defined primary keys in a way that ADODB and both databases can handle.

Our longer term plans are to replace ADODB with something else, but that's going to take a lot of work, and I'm not sure it'll permit us to explicitly define primary keys as you require.

Meanwhile, you can likely cook up some DDL SQL that'll add the explicit primary key definitions where they're needed by looking for the _pkey-suffixed indexes.

@marciofoz
Copy link
Author

@asmecher
Ok, I understand.
Let me know: changed the tables definitions, should the application address these new integrity constraints?
BR

@asmecher
Copy link
Member

@marciofoz, can you rephrase the question? I'm not sure I follow.

NateWr added a commit to NateWr/pkp-lib that referenced this issue Jan 29, 2019
bozana pushed a commit to bozana/pkp-lib that referenced this issue Jan 31, 2019
bozana pushed a commit to bozana/pkp-lib that referenced this issue Feb 7, 2019
bozana pushed a commit to bozana/pkp-lib that referenced this issue Feb 7, 2019
bozana pushed a commit to bozana/pkp-lib that referenced this issue Feb 13, 2019
bozana pushed a commit to bozana/pkp-lib that referenced this issue Feb 20, 2019
bozana pushed a commit to bozana/pkp-lib that referenced this issue Feb 20, 2019
@mpbraendle
Copy link
Contributor

mpbraendle commented Nov 25, 2020

I can second this requirement - MariaDB also offers a cluster version which requires that all tables have a primary key in the InnoDB scheme (that should be standard anyway).

Tables that don't have a primary key can be found with the following SQL statement:

SELECT t.TABLE_SCHEMA, t.TABLE_NAME
FROM information_schema.TABLES AS t
LEFT JOIN information_schema.KEY_COLUMN_USAGE AS c 
ON t.TABLE_SCHEMA = c.CONSTRAINT_SCHEMA
    AND t.TABLE_NAME = c.TABLE_NAME
    AND c.CONSTRAINT_NAME = 'PRIMARY'
WHERE t.TABLE_SCHEMA != 'information_schema'
    AND t.TABLE_SCHEMA != 'performance_schema'
    AND t.TABLE_SCHEMA != 'mysql'
    AND c.CONSTRAINT_NAME IS NULL;

yielding the following list in our OJS 3.2.1-2 test database:

+--------------+------------------------------------------+
| TABLE_SCHEMA | TABLE_NAME |
+--------------+------------------------------------------+
| ojs3test | filter_settings |
| ojs3test | author_settings |
| ojs3test | scheduled_tasks |
| ojs3test | user_interests |
| ojs3test | site |
| ojs3test | review_round_files |
| ojs3test | external_feed_settings |
| ojs3test | plugin_settings |
| ojs3test | static_page_settings |
| ojs3test | submission_settings |
| ojs3test | books_for_review_settings |
| ojs3test | item_views |
| ojs3test | group_settings |
| ojs3test | user_group_settings |
| ojs3test | email_log_users |
| ojs3test | submission_file_settings |
| ojs3test | query_participants |
| ojs3test | sessions |
| ojs3test | section_settings |
| ojs3test | review_files |
| ojs3test | user_user_groups |
| ojs3test | issue_settings |
| ojs3test | group_memberships |
| ojs3test | event_log_settings |
| ojs3test | genre_settings |
| ojs3test | journal_settings |
| ojs3test | site_settings |
| ojs3test | processes |
| ojs3test | custom_issue_orders |
| ojs3test | custom_section_orders |
| ojs3test | review_form_settings |
| ojs3test | user_group_stage |
| ojs3test | library_file_settings |
| ojs3test | metrics |
| ojs3test | object_for_review_settings |
| ojs3test | roles |
| ojs3test | navigation_menu_item_settings |
| ojs3test | issue_galley_settings |
| ojs3test | notification_settings |
| ojs3test | review_form_responses |
| ojs3test | referral_settings |
| ojs3test | citation_settings |
| ojs3test | category_settings |
| ojs3test | review_object_metadata_settings |
| ojs3test | review_object_type_settings |
| ojs3test | usage_stats_temporary_records |
| ojs3test | versions |
| ojs3test | metadata_description_settings |
| ojs3test | announcement_settings |
| ojs3test | review_form_element_settings |
| ojs3test | publication_galley_settings |
| ojs3test | subscription_type_settings |
| ojs3test | email_templates_settings |
| ojs3test | subeditor_submission_group |
| ojs3test | oai_resumption_tokens |
| ojs3test | announcement_type_settings |
| ojs3test | controlled_vocab_entry_settings |
| ojs3test | user_settings |
| ojs3test | publication_settings |
| ojs3test | publication_categories |
| ojs3test | email_templates_default_data |
| ojs3test | submission_search_object_keywords |
| ojs3test | data_object_tombstone_settings |
| ojs3test | navigation_menu_item_assignment_settings |
+--------------+------------------------------------------+

@asmecher
Copy link
Member

#2493 has been closed, and we're now using Laravel's migrations toolset for schema management and upgrades. One of the deeper-rooted compound primary key tables (submission_files) has been resolved to have a proper ID with #6057. It's probably a good time to dust this issue off again.

Adapting the above query (thanks, @mpbraendle) for a specific schema name (see the last line):

SELECT t.TABLE_NAME
FROM information_schema.TABLES AS t
LEFT JOIN information_schema.KEY_COLUMN_USAGE AS c
ON t.TABLE_SCHEMA = c.CONSTRAINT_SCHEMA
AND t.TABLE_NAME = c.TABLE_NAME
AND c.CONSTRAINT_NAME = 'PRIMARY'
WHERE c.CONSTRAINT_NAME IS NULL
AND t.table_schema='ojs-master';

This currently gives the following tables, sorted into categories:

  • Settings tables
    • issue_galley_settings
    • navigation_menu_item_settings
    • submission_file_settings
    • site_settings
    • publication_settings
    • metadata_description_settings
    • category_settings
    • review_form_settings
    • event_log_settings
    • author_settings
    • subscription_type_settings
    • announcement_settings
    • notification_settings
    • submission_settings
    • user_group_settings
    • journal_settings
    • publication_galley_settings
    • announcement_type_settings
    • controlled_vocab_entry_settings
    • library_file_settings
    • filter_settings
    • citation_settings
    • static_page_settings
    • user_settings
    • navigation_menu_item_assignment_settings
    • email_templates_settings
    • user_interests
    • data_object_tombstone_settings
    • issue_settings
    • plugin_settings
    • genre_settings
    • review_form_element_settings
    • section_settings
  • Relationship tables:
    • subeditor_submission_group
    • publication_categories
    • user_user_groups
    • query_participants
    • review_round_files
    • email_log_users
    • review_files
    • submission_search_object_keywords
    • user_group_stage
  • Sequencing/ordering:
    • custom_section_orders
    • custom_issue_orders
  • Other tables:
    • email_templates_default_data
    • item_views
    • scheduled_tasks
    • usage_stats_temporary_records
    • sessions
    • metrics
    • review_form_responses
    • site (this is a singleton and should be removed at some point)
    • versions
    • oai_resumption_tokens

Some of the tables already do have primary keys, just not autoincrements:

  • sessions
  • scheduled_tasks
  • oai_resumption_tokens

@mpbraendle, can you verify that these last three tables are indeed a problem?

@mpbraendle
Copy link
Contributor

These three still occur in my prod (3.1.2-4) and test (3.2.1-2) environments with the query above. We would not be able to put the database to a MariaDB cluster because a test script would fail. However, the database was originally migrated from a OJS 2.4.8 system.

@asmecher
Copy link
Member

asmecher commented Dec 1, 2020

Hmm, @mpbraendle, I'm trying to pin down MySQL's expectations. On my system, I get the following for DESCRIBE sessions:

+------------+--------------+------+-----+---------+-------+
| Field      | Type         | Null | Key | Default | Extra |
+------------+--------------+------+-----+---------+-------+
| session_id | varchar(128) | NO   | PRI | NULL    |       |
| user_id    | bigint(20)   | YES  | MUL | NULL    |       |
| ip_address | varchar(39)  | NO   |     | NULL    |       |
| user_agent | varchar(255) | YES  |     | NULL    |       |
| created    | bigint(20)   | NO   |     | 0       |       |
| last_used  | bigint(20)   | NO   |     | 0       |       |
| remember   | tinyint(4)   | NO   |     | 0       |       |
| data       | text         | YES  |     | NULL    |       |
| domain     | varchar(255) | YES  |     | NULL    |       |
+------------+--------------+------+-----+---------+-------+

I would expect the presence of a PRI under the Key column to satisfy the clustering requirement for that table; does yours look the same?

@mpbraendle
Copy link
Contributor

Checked again with

show full columns from sessions;
+------------+--------------+-----------------+------+-----+---------+-------+---------------------------------+---------+
| Field      | Type         | Collation       | Null | Key | Default | Extra | Privileges                      | Comment |
+------------+--------------+-----------------+------+-----+---------+-------+---------------------------------+---------+
| session_id | varchar(128) | utf8_general_ci | NO   | PRI | NULL    |       | select,insert,update,references |         |
| user_id    | bigint(20)   | NULL            | YES  | MUL | NULL    |       | select,insert,update,references |         |
| ip_address | varchar(39)  | utf8_general_ci | NO   |     | NULL    |       | select,insert,update,references |         |
| user_agent | varchar(255) | utf8_general_ci | YES  |     | NULL    |       | select,insert,update,references |         |
| created    | bigint(20)   | NULL            | NO   |     | 0       |       | select,insert,update,references |         |
| last_used  | bigint(20)   | NULL            | NO   |     | 0       |       | select,insert,update,references |         |
| remember   | tinyint(4)   | NULL            | NO   |     | 0       |       | select,insert,update,references |         |
| data       | text         | utf8_general_ci | YES  |     | NULL    |       | select,insert,update,references |         |
| domain     | varchar(255) | utf8_general_ci | YES  |     | NULL    |       | select,insert,update,references |         |
+------------+--------------+-----------------+------+-----+---------+-------+---------------------------------+---------+

So the primary key is there, but no auto_increment (in column Extra). As I see from https://mariadb.com/kb/en/mariadb-galera-cluster-known-limitations/ , the latter is not a requirement.

@NateWr NateWr added the Housekeeping:1:Todo Any dependency management or refactor that would be nice to have some day. label Dec 16, 2021
@asmecher asmecher added this to the 3.4.0rc1 Internal Testing milestone Mar 21, 2023
@asmecher
Copy link
Member

I'm going to bump this into 3.4.0 in order to fix an OOM problem with the locale migration and large databases. We can't use chunkById on tables without primary keys, and this would help solve it.

@asmecher asmecher self-assigned this Mar 21, 2023
@asmecher
Copy link
Member

asmecher commented Mar 21, 2023

TODO:

  • Make sure ID columns get added to plugin schema creation migrations
  • Add IDs to table creation for installation

asmecher added a commit to asmecher/pkp-lib that referenced this issue Mar 21, 2023
asmecher added a commit to pkp/staticPages that referenced this issue Mar 21, 2023
asmecher added a commit to asmecher/funding that referenced this issue Mar 21, 2023
asmecher added a commit to asmecher/pkp-lib that referenced this issue Mar 21, 2023
asmecher added a commit to asmecher/ojs that referenced this issue Mar 21, 2023
asmecher added a commit to asmecher/ojs that referenced this issue Mar 21, 2023
asmecher added a commit to asmecher/pkp-lib that referenced this issue Mar 22, 2023
asmecher added a commit to asmecher/pkp-lib that referenced this issue Mar 22, 2023
asmecher added a commit to pkp/ops that referenced this issue Mar 25, 2023
asmecher added a commit to pkp/ops that referenced this issue Mar 25, 2023
asmecher added a commit to asmecher/omp that referenced this issue Mar 25, 2023
asmecher added a commit to asmecher/omp that referenced this issue Mar 25, 2023
asmecher added a commit to pkp/omp that referenced this issue Mar 25, 2023
asmecher added a commit to asmecher/ojs that referenced this issue Mar 25, 2023
asmecher added a commit to asmecher/ojs that referenced this issue Mar 25, 2023
asmecher added a commit to asmecher/ojs that referenced this issue Mar 25, 2023
asmecher added a commit to asmecher/ojs that referenced this issue Mar 25, 2023
asmecher added a commit to asmecher/ojs that referenced this issue Mar 25, 2023
asmecher added a commit to asmecher/ojs that referenced this issue Mar 25, 2023
asmecher added a commit to asmecher/ojs that referenced this issue Mar 25, 2023
asmecher added a commit to pkp/ojs that referenced this issue Mar 25, 2023
asmecher added a commit to pkp/ojs that referenced this issue Mar 25, 2023
asmecher added a commit to pkp/ojs that referenced this issue Mar 25, 2023
asmecher added a commit to pkp/ojs that referenced this issue Mar 25, 2023
asmecher added a commit to pkp/ojs that referenced this issue Mar 25, 2023
asmecher added a commit to pkp/ojs that referenced this issue Mar 25, 2023
asmecher added a commit to pkp/omp that referenced this issue Mar 25, 2023
asmecher added a commit to pkp/ops that referenced this issue Mar 25, 2023
ajnyga added a commit to ajnyga/funding that referenced this issue Mar 26, 2023
pkp/pkp-lib#3573 Add ID columns for clustering support
@asmecher
Copy link
Member

Merged!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Housekeeping:1:Todo Any dependency management or refactor that would be nice to have some day.
Projects
Status: Done
Development

No branches or pull requests

5 participants