Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected number of sessions regarding nrcpt and avg values #604

Closed
FredNass opened this issue Apr 26, 2019 · 9 comments · Fixed by #779
Closed

Unexpected number of sessions regarding nrcpt and avg values #604

FredNass opened this issue Apr 26, 2019 · 9 comments · Fixed by #779
Milestone

Comments

@FredNass
Copy link

FredNass commented Apr 26, 2019

Version

6.2.32

Installation method

From @sympa-ja.org repository

Expected behavior

We have a list made of 6152 members from 44 different domains. 72 members are from 43 different domains and 6080 are from a single domain. We set nrcpt=1000 and avg=100 globally (not in nrcpt_by_domain.conf)

We would expect Sympa to create 50 sessions with recipients grouped by their destination domain: 7 sessions with the 6080 recipients for the unique domain and 43 sessions with recipients for each one of the other domains.

Actual behavior

Sympa creates 44 sessions with a number of recipients between 100 and 280 and does not respect the global nrcpt=1000.

Additional information

Setting nrcpt in nrcpt_by_domain.conf has no effect.

This issue has for major consequence that a single message turns into the creation of many more messages and reduces deduplication efficiency on the same mail store. Our mail stores receive and store 44 different messages with only 20 to 30 mailboxes referencing each one of these messages, when it should have received and stored only 7 messages with 1000 mailboxes referencing each one of them.

Please tell us how we can help troubleshoot this issue.

Regards,
Frédéric.

@ikedas ikedas added the question label May 7, 2019
@FredNass
Copy link
Author

Hi ikedas. I'm not sure whether you need more informations from us or not. If so please tell me. Bests, Frédéric.

@ikedas
Copy link
Member

ikedas commented May 29, 2019

Sorry for delayed response.

Reading code, I found that, recipients are sorted by their email addresses, not by their domains. Then, a packet is filled by recipients by each. When

  • the packet is filled by recipients not less than the number specified by nrcpt parameter or
  • the packet contains recipients with a domain not less than number specified by nrcpt_by_domain.conf file (if this file specifies the number) or
  • the packet contains recipients more than the number specified by avg and the recipient added at the last time has different domain from previously added recipient,

the packet will be saved in the spool and the new packet will be prepared.

Thant's why each packet does not always contain recipients with number specified by nrcpt.

@FredNass
Copy link
Author

FredNass commented Jun 5, 2019

Hi Soji,

Thank you for looking at the code! It appears to me that sorting email addresses by domains prior to filling a packet would help reducing the number of packets and postfix tasks, then improve the storage efficiency. Do you see any issues coming with changing the sorting order? Can you provide any quick hack or point me to the line of code I should change to have email addresses sorted by domains?

Bests,
Frédéric.

@FredNass
Copy link
Author

Hello,

I'm not sure how the above reference has anything to do with this issue. @ikedas, is there a chance that we can make some progress on this? Please tell me what I should try. It sounds inappropriate to not sort by domains at first, as this leads to the creation of too many packets and too many emails for the same destination domain, reducing deduplication efficiency and increasing disk space used by a factor of 20 to 30 for a few thousands of recipients.

Regards,
Frédéric.

@racke
Copy link
Contributor

racke commented Sep 10, 2019

I also think it would be more logical to group by domain as this would allow more optimizations and special case treatment for specific domains.

@FredNass
Copy link
Author

This morning, a staff member sent a 2.48MB email with a picture to sell..... a trash can! This resulted in 45 packets and even with the dedup in place this led to 45 packets x 2,48MB x 8 stores ==> 892.8MB of data written!

Can someone please help with this issue? It's about having Sympa sorting by destination domains before creating packets.

@ikedas ikedas added enhancement ready A PR is waiting to be merged. Close to be solved labels Oct 25, 2019
@ikedas
Copy link
Member

ikedas commented Oct 25, 2019

Hi @FredNass,

Could you please check this patch?

@FredNass
Copy link
Author

Hi @ikedas,

Thanks a lot for providing this patch. We'll try it when my colleague is back on Monday and we'll get back to you.

@FredNass
Copy link
Author

Hi @ikedas,
After a couple of weeks in production, we do confirm that this patch restores the original purpose of nrpct #. You may want to double check if it does not mess anything with the avg # purpose but as far as it looks, everything is now fine.
Regards,
Frédéric.

ikedas added a commit that referenced this issue Nov 13, 2019
Make storage into outgoing spool more efficient (#604)
@ikedas ikedas added this to the 6.2.50 milestone Nov 13, 2019
@ikedas ikedas removed ready A PR is waiting to be merged. Close to be solved question labels Nov 13, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants