Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trending tags showing duplicates: group only by name #4266

Merged
merged 3 commits into from
Dec 12, 2018

Conversation

milaaraujo
Copy link
Collaborator

Fixes #3188

@ghost ghost assigned milaaraujo Dec 11, 2018
@ghost ghost added the in progress label Dec 11, 2018
@milaaraujo
Copy link
Collaborator Author

Hey @jywarren,
I think the problem here is the group([:name, 'node.nid', 'term_data.tid', 'community_tags.nid', 'community_tags.uid', 'community_tags.date']). There is this note "ONLY_FULL_GROUP_BY, issue #3120", but I didn't understand why we need to group by all these columns if all we need is the name.

Locally, running the query with the group([:name, 'node.nid', 'term_data.tid', 'community_tags.nid', 'community_tags.uid', 'community_tags.date']) we have duplicates:

MariaDB [publiclab_dev]>   SELECT  node.nid, node.created, node.status, term_data.*, community_tags.* FROM `term_data` 
    ->   INNER JOIN `community_tags` ON `community_tags`.`tid` = `term_data`.`tid` 
    ->   INNER JOIN `community_tags` `node_tags_term_data_join` ON `node_tags_term_data_join`.`tid` = `term_data`.`tid` 
    ->   INNER JOIN `node` ON `node`.`nid` = `node_tags_term_data_join`.`nid` 
    ->   WHERE (node.status = 1) AND (node.created > 1541922310) AND (node.created <= 1544514310) 
    ->   GROUP BY `term_data`.`name`, node.nid, term_data.tid, community_tags.nid, community_tags.uid, community_tags.date 
    ->   ORDER BY count DESC LIMIT 5;
+-----+------------+--------+-----+-----+-------------------+-------------+--------+-------+--------+-----+-----+-----+------------+
| nid | created    | status | tid | vid | name              | description | weight | count | parent | tid | nid | uid | date       |
+-----+------------+--------+-----+-----+-------------------+-------------+--------+-------+--------+-----+-----+-----+------------+
|  44 | 1544511014 |      1 |  73 |   3 | first-time-poster |             |      0 |     2 | NULL   |  73 |  44 |   4 | 1544511017 |
|  44 | 1544511014 |      1 |  73 |   3 | first-time-poster |             |      0 |     2 | NULL   |  73 |  45 |   4 | 1544511072 |
|  45 | 1544511072 |      1 |  73 |   3 | first-time-poster |             |      0 |     2 | NULL   |  73 |  44 |   4 | 1544511017 |
|  45 | 1544511072 |      1 |  73 |   3 | first-time-poster |             |      0 |     2 | NULL   |  73 |  45 |   4 | 1544511072 |
|   8 | 1544077622 |      1 |   2 |   0 | blog              |             |      0 |     1 | NULL   |   2 |   8 |   1 |          0 |
+-----+------------+--------+-----+-----+-------------------+-------------+--------+-------+--------+-----+-----+-----+------------+

Running only with .group([:name]) we don't have duplicates:

MariaDB [publiclab_dev]>   SELECT  node.nid, node.created, node.status, term_data.*, community_tags.* FROM `term_data` 
    ->   INNER JOIN `community_tags` ON `community_tags`.`tid` = `term_data`.`tid` 
    ->   INNER JOIN `community_tags` `node_tags_term_data_join` ON `node_tags_term_data_join`.`tid` = `term_data`.`tid` 
    ->   INNER JOIN `node` ON `node`.`nid` = `node_tags_term_data_join`.`nid` 
    ->   WHERE (node.status = 1) AND (node.created > 1541922310) AND (node.created <= 1544514310) 
    ->   GROUP BY `term_data`.`name`
    ->   ORDER BY count DESC LIMIT 5;
+-----+------------+--------+-----+-----+------------------------+-------------+--------+-------+--------+-----+-----+-----+------------+
| nid | created    | status | tid | vid | name                   | description | weight | count | parent | tid | nid | uid | date       |
+-----+------------+--------+-----+-----+------------------------+-------------+--------+-------+--------+-----+-----+-----+------------+
|  44 | 1544511014 |      1 |  73 |   3 | first-time-poster      |             |      0 |     2 | NULL   |  73 |  44 |   4 | 1544511017 |
|   8 | 1544077622 |      1 |   2 |   0 | blog                   |             |      0 |     1 | NULL   |   2 |   8 |   1 |          0 |
|   9 | 1544077623 |      1 |   4 |   0 | lon:1.44819439410508   | Desc 0      |      5 |     1 | NULL   |   4 |   9 |   1 |          0 |
|  10 | 1544077623 |      1 |   5 |   0 | lat:8.814665776391895  | Desc 1      |      5 |     1 | NULL   |   5 |  10 |   1 |          0 |
|  10 | 1544077623 |      1 |   6 |   0 | lon:17.365520433555464 | Desc 1      |      5 |     1 | NULL   |   6 |  10 |   1 |          0 |
+-----+------------+--------+-----+-----+------------------------+-------------+--------+-------+--------+-----+-----+-----+------------+

@plotsbot
Copy link
Collaborator

plotsbot commented Dec 11, 2018

1 Message
📖 @milaaraujo Thank you for your pull request! I’m here to help with some tips and recommendations. Please take a look at the list provided and help us review and accept your contribution! And don’t be discouraged if you see errors – we’re here to help.

Generated by 🚫 Danger

@milaaraujo
Copy link
Collaborator Author

desenho sem titulo 1

@SidharthBansal
Copy link
Member

Hi @milaaraujo can you please see #4263. Thanks

@milaaraujo
Copy link
Collaborator Author

milaaraujo commented Dec 11, 2018

@SidharthBansal, I will review the other PR later, if it is working we can merge it and I close this one.

@milaaraujo
Copy link
Collaborator Author

But @jywarren, can you explain why we are using group([:name, 'node.nid', 'term_data.tid', 'community_tags.nid', 'community_tags.uid', 'community_tags.date'])? I still want to understand what is happening here.

@jywarren
Copy link
Member

Yes, and thanks for being very thorough. It looks like @ValentinaTironi has implemented something different here -- a .pluck(:name) addition to the end of the ActiveRecord chain. I'm not sure what pluck() does, actually?

What's so weird about the bug that was resolved by the addition of group([:name, 'node.nid', 'term_data.tid', 'community_tags.nid', 'community_tags.uid', 'community_tags.date']) is that it wasn't reproducible in Travis, and wasn't showing up in tests. So I fixed and tested it locally and left the note of #3120 for future documentation.

Unfortunately this was to fix a recurring error which people had been experiencing in local copies (although not in the production site or Travis tests) -- notes in #3120 and it looks like it was related to a change in default config for MySQL -- 5.7 in particular -- see my comment here about ONLY_FULL_GROUP_BY.

Maybe we don't have to worry about this anymore if we're now on mariadb 10.2 (see versions here what do you think? do enough people use MySQL still that we should remain backwards-compatible, and/or is there another way to fix this that wouldn't cause trouble on MySQL 5.7 and subsequent versions?

@milaaraujo
Copy link
Collaborator Author

I also don't know what pluck() does... I will review her PR later.

However this query doesn't look right to me, I think it is selecting and grouping more columns that are really necessary... I think we just need:

    Tag.joins(:node_tag, :node)
       .where('node.status = ?', 1)
       .where('node.created > ?', start_date.to_i)
       .where('node.created <= ?', end_date.to_i)
       .group([:name])
       .order('count DESC')
       .limit(limit)

or maybe:

    Tag.select([:name])
          .joins(:node_tag, :node)
          .where('node.status = ?', 1)
         .where('node.created > ?', start_date.to_i)
         .where('node.created <= ?', end_date.to_i)
         .distinct
         .order('count DESC')
         .limit(limit)

I'm using mariadb so I don't know if we will have - MySQL - problems with these modifications. Do you have mysql installed?

@jywarren
Copy link
Member

Hmm. So i totally agree that we are including too much, but had added that in to get mysql compatibility. How about this. Let's:

  1. make a change here -- going back to the more minimal version without the extra grouping, but leaving in a link back to this PR discussion in case it breaks again.
  2. Then let's merge this,
  3. then let's actually open a PR where we change the database back to mysql (but version 5.7) to see if the error crops up there again. That'll require:

image: mysql:5.7

to be added to this line:

Let's do that as soon as this is merged. So if you could add the comments, i'll merge, and we can do the next steps. Thanks!!!

@milaaraujo
Copy link
Collaborator Author

milaaraujo commented Dec 12, 2018

Both solutions I suggested before are working. Using this one I think we won't have problems with MySQL because I'm not using group by :

  def self.trending(limit = 5, start_date = DateTime.now - 1.month, end_date = DateTime.now)
    Tag.select([:name])
       .joins(:node_tag, :node)
       .where('node.status = ?', 1)
       .where('node.created > ?', start_date.to_i)
       .where('node.created <= ?', end_date.to_i)
       .distinct
       .order('count DESC')
       .limit(limit)
  end

What the query really looks like:

SELECT  DISTINCT `term_data`.`name` FROM `term_data` 
INNER JOIN `community_tags` ON `community_tags`.`tid` = `term_data`.`tid` 
INNER JOIN `community_tags` `node_tags_term_data_join` ON `node_tags_term_data_join`.`tid` = `term_data`.`tid` 
INNER JOIN `node` ON `node`.`nid` = `node_tags_term_data_join`.`nid` 
WHERE (node.status = 1) AND (node.created > 1541995609) AND (node.created <= 1544587609) 
ORDER BY count DESC LIMIT 5;

Response:

+------------------------+
| name                   |
+------------------------+
| first-time-poster      |
| lon:21.714137445587394 |
| lon:45.76772749324683  |
| lat:69.21861314079344  |
| lat:68.83460642210348  |
+------------------------+

@vatbq
Copy link
Contributor

vatbq commented Dec 12, 2018

pluck does the same of map(&:name). I just did a fast solution but you both are doing a deeper solution and that is really great!
I'm agree with @milaaraujo I also think that the query has a lot of unnecessary things. Maybe the solution is refactor it after check what of queries are really necessary
@milaaraujo @jywarren

@vatbq
Copy link
Contributor

vatbq commented Dec 12, 2018

So, we are going to change of mysql version?

@milaaraujo
Copy link
Collaborator Author

I think we just need to test if this solution works with MySQL, because some people still use MySQL for development... Right, @jywarren?

I'm using mariadb. Prodution and Travis also use mariadb

@vatbq
Copy link
Contributor

vatbq commented Dec 12, 2018

Oh, I see. Which do you use? I thought that all use mysql :/

@jywarren
Copy link
Member

jywarren commented Dec 12, 2018 via email

@vatbq
Copy link
Contributor

vatbq commented Dec 12, 2018

Great! Please, let me know everything

@milaaraujo
Copy link
Collaborator Author

Yeah, I think that's it: open a second PR with your latest code but changing the MySQL version to 5.7 and see if it passes tests!

@jywarren, this one is ready to merge.

@jywarren jywarren merged commit bb48d31 into publiclab:master Dec 12, 2018
@ghost ghost removed the in progress label Dec 12, 2018
@jywarren
Copy link
Member

Fantastic, thanks all!

@vatbq
Copy link
Contributor

vatbq commented Dec 12, 2018

Can I push a commit in my actual pr with new mysql version?
Another question: How I can change the version of database? I see a lot of database files and I'm confused now. Maybe this one docker-compose-testing.yml?

@jywarren
Copy link
Member

jywarren commented Dec 12, 2018 via email

@milaaraujo
Copy link
Collaborator Author

I think it's just doing what Jeff explained in one of his last comments (#4266 (comment)):

"...then let's actually open a PR where we change the database back to mysql (but version 5.7) to see if the error crops up there again. That'll require..."

@vatbq
Copy link
Contributor

vatbq commented Dec 12, 2018

OK, I'll do that 👍

@vatbq
Copy link
Contributor

vatbq commented Dec 12, 2018

I'm gonna close my PR #4263

SrinandanPai pushed a commit to SrinandanPai/plots2 that referenced this pull request May 5, 2019
* group by name only

* removing group by

* adding link
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants