Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Taxonomies populated during migration cannot contain term names longer than 255 characters #1017

Open
mjordan opened this issue Feb 4, 2019 · 13 comments
Labels
Subject: Migration Concerning migration from Islandora 7 to Islandora 2.x.x
Milestone

Comments

@mjordan
Copy link
Contributor

mjordan commented Feb 4, 2019

Taxonomy term names cannot exceed 255 characters (see name field below):

mysql> describe taxonomy_term_field_data;
+------------------------------+------------------+------+-----+---------+-------+
| Field                        | Type             | Null | Key | Default | Extra |
+------------------------------+------------------+------+-----+---------+-------+
| tid                          | int(10) unsigned | NO   | PRI | NULL    |       |
| vid                          | varchar(32)      | NO   | MUL | NULL    |       |
| langcode                     | varchar(12)      | NO   | PRI | NULL    |       |
| status                       | tinyint(4)       | NO   | MUL | NULL    |       |
| name                         | varchar(255)     | NO   | MUL | NULL    |       |
| description__value           | longtext         | YES  |     | NULL    |       |
| description__format          | varchar(255)     | YES  |     | NULL    |       |
| weight                       | int(11)          | NO   |     | NULL    |       |
| changed                      | int(11)          | YES  |     | NULL    |       |
| default_langcode             | tinyint(4)       | NO   |     | NULL    |       |
| content_translation_source   | varchar(12)      | YES  |     | NULL    |       |
| content_translation_outdated | tinyint(4)       | YES  |     | NULL    |       |
| content_translation_uid      | int(10) unsigned | YES  | MUL | NULL    |       |
| content_translation_created  | int(11)          | YES  |     | NULL    |       |
+------------------------------+------------------+------+-----+---------+-------+
14 rows in set (0.00 sec)

The GUI form that is used to create terms probably enforces this constraint, but dynamically populated terms like those produced by the corporate, subject, geographic, and person migrations that are part of https://github.com/Islandora-Devops/migrate_7x_claw risk failing if the term "name" is longer than 255 characters. Here's an example of what happens:

 [error]  Drupal\Core\Database\DatabaseExceptionWrapper: SQLSTATE[22001]: String data, right truncated: 1406 Data too long for column 'name' at row 1: INSERT INTO {taxonomy_term_field_data} (tid, vid, langcode, status, name, description__value, description__format, weight, changed, default_langcode, content_translation_source, content_translation_outdated, content_translation_uid, content_translation_created) VALUES (:db_insert_placeholder_0, :db_insert_placeholder_1, :db_insert_placeholder_2, :db_insert_placeholder_3, :db_insert_placeholder_4, :db_insert_placeholder_5, :db_insert_placeholder_6, :db_insert_placeholder_7, :db_insert_placeholder_8, :db_insert_placeholder_9, :db_insert_placeholder_10, :db_insert_placeholder_11, :db_insert_placeholder_12, :db_insert_placeholder_13); Array
(
    [:db_insert_placeholder_0] => 157
    [:db_insert_placeholder_1] => subject
    [:db_insert_placeholder_2] => en
    [:db_insert_placeholder_3] => 1
    [:db_insert_placeholder_4] => nd CCICED Policy Recommendations Impact  environmnetal, social and economic development. It also discusses the key policies that have had an impact in China's changes, especially with respect to recommendations such as environmental instruments and economic tools to control pollution, governance systems and legislation, including: ecological civilization system construction and Institutional innovation
    [:db_insert_placeholder_5] => 
    [:db_insert_placeholder_6] => 
    [:db_insert_placeholder_7] => 0
    [:db_insert_placeholder_8] => 1549293162
    [:db_insert_placeholder_9] => 1
    [:db_insert_placeholder_10] => und
    [:db_insert_placeholder_11] => 0
    [:db_insert_placeholder_12] => 0
    [:db_insert_placeholder_13] => 1549293162
)
 in Drupal\Core\Entity\Sql\SqlContentEntityStorage->saveToSharedTables() (line 930 of /var/www/html/drupal/web/core/lib/Drupal/Core/Entity/Sql/SqlContentEntityStorage.php). 

You would hope that names for people, things, and places as expressed in MODS documents out in the wild are not longer than 255 characters, but they probably exist! Since we're populating term names from uncontolled text values in XML, we need a strategy for dealing with long term names.

@seth-shaw-unlv
Copy link
Contributor

I've used the title_length module for nodes with long titles. The module is relatively simple and could probably be adapted to extend the length of name fields.

Of course, you could still get really long values. If you are comfortable trimming the value to the max length, use the substr process plugin:

process:
  name:
    plugin: substr
    source: title // Or whatever your source field is
    start: 0
    length: 254

You can even log all of your name values...

process:
  name:
    -
      plugin: log
      source: title // Or whatever your source field is
    -
      plugin: substr
      start: 0
      length: 254

and come back after the fact to find any that were longer than 255 (and thus trimmed).

@mjordan
Copy link
Contributor Author

mjordan commented Feb 4, 2019

@seth-shaw-unlv excellent, thanks for the suggestions. I assume this problem might not manifest itself for most 7.x instances, since having long values like those is probably an edge case. I know we've discussed some workflows that allow people working on migrations to do some sort of audit of their metadata prior to running the migration. Stuff like using the substr plugin are a good way to prevent errors from occurring but it would be good to document the problem and ways of preventing it from happening in the first place. @rosiel, @rtilla1 and other MIG groups, what do you think?

@dannylamb
Copy link
Contributor

@mjordan Worst case scenario is we have field to hold the long text and then trim the actual name to 255. Sort of like how a node's body gets trimmed for the summary blurb. I wouldn't do it unless it were a serious issue, but at least there's a way for folks to keep their long titles and not upset the drupal gods.

@mjordan
Copy link
Contributor Author

mjordan commented Feb 4, 2019

@dannylamb are you suggesting that we add that extra field to the node (multiple copies of it), or can we add it to the term itself (one copy of it)?

@dannylamb
Copy link
Contributor

I'm saying stash it in an extra field on the term itself. Like field_long_name, and then we substr it to set name. All this would be done in one shot during the migration. There'd be more work if you want to auto-update it every time you change field_long_name though... So it would be more work if people are expecting that as a permanent feature. But nonetheless, at least you wouldn't lose your full title during the migration.

@mjordan
Copy link
Contributor Author

mjordan commented Feb 4, 2019

Might be worth considering. Let's let the MIG folks weigh in to see what they think.

@bryjbrown
Copy link
Member

I'm definitely in favor of having an extra field with more space in it for storing longer titles, but instead of using the standard required "title" field to be a truncated version, I'd suggest using that in a way that is the title of the node as opposed to the work that represents it, perhaps in the spirit of an ID or something.

This is what we are doing with our current 7.x submission process at FSU, "titles" are automatically generated using a pattern of "type_timestamp_hash", for instance "honorsthesis_1549124815_ab134724" and this title ends up getting reused as a unique ID in the resulting MODS record.

Having two different fields for the same title, with one being the "long title" might be confusing and appear as if its an alternate or variant title instead of the true title, while making the titles represent different things makes the difference clear.

@rosiel
Copy link
Member

rosiel commented Feb 11, 2019

This was discussed at the Feb 6 CLAW call. @alxp proposed (forgive me if my paraphrasing is wrong) that a new field such as field_full_title be created to hold the actual titles/names, and the built-in fields be used rarely-if-at-all due to that restriction. I think the justification for not hacking the core Drupal tables was to make future upgrades safer.

@whikloj whikloj added this to the 1.x milestone Apr 11, 2019
@mjordan
Copy link
Contributor Author

mjordan commented Jun 6, 2019

Just practicing some migrations using Islandora 8 1.0.0 and hit this again. I'm not sure how common it is in the wild to have corporate names, for example, that are longer than 255 but when it occurs, migrations stop dead in their tracks.

@seth-shaw-unlv
Copy link
Contributor

@mjordan using the trim method I mention above will keep the migration moving, however, there is something to be said for letting the migration die to emphasize you have an issue to address.

@mjordan
Copy link
Contributor Author

mjordan commented Jun 12, 2019

@seth-shaw-unlv I have to respectfully disagree with this. I do not want my migration of 100,00 objects to fail at object 99,999 because that one has an errant source metadata value. YYMV but I'd rather have that target object created but the problem logged. We could have it both ways too - a strict mode and a permissive mode.

@seth-shaw-unlv
Copy link
Contributor

@mjordan I don't think we necessarily disagree; I use the trim method myself because I certainly don't want my migrations failing part way through. I just leave open the possibility that someone may prefer the hard fail. Documentation of options and implications is key, as usual.

@mjordan
Copy link
Contributor Author

mjordan commented Jun 13, 2019

Gotcha. Different people will want to take different approaches.

My personal preference would be to use a tool like that to find potential problems so they can be addressed prior to a migration. Over in #1021 I attempted to develop a tool to allow for practice migrations. This worked for some fields types, but not for taxonomies since they are referenced and therefore must exist.

@kstapelfeldt kstapelfeldt added Subject: Migration Concerning migration from Islandora 7 to Islandora 2.x.x and removed migration labels Sep 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Subject: Migration Concerning migration from Islandora 7 to Islandora 2.x.x
Projects
Development

No branches or pull requests

7 participants