-
-
Notifications
You must be signed in to change notification settings - Fork 213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fqdn() to always include suffix if private suffix enabled and private suffix exists #300
fqdn() to always include suffix if private suffix enabled and private suffix exists #300
Conversation
Is the point of this PR mainly to fix the # entire URL is a public suffix
>>> tldextract.extract("gov.au", include_psl_private_domains=True)
ExtractResult(subdomain='', domain='', suffix='gov.au') # not domain='gov', suffix='au'?
# entire URL is a private suffix
>>> tldextract.extract("blogspot.com", include_psl_private_domains=True)
ExtractResult(subdomain='', domain='blogspot', suffix='com') Also, I don't know how useful it is to check if an entire URL was a suffix, but you could check that before this PR. After this PR, I don't think you can. >>> result = tldextract.extract("blogspot.com", include_psl_private_domains=True)
>>> is_suffix = result.suffix and not (result.subdomain or result.domain) |
This can be done with an opt-in flag (not yet implemented as of now) to enable the new behaviour.
If the caller doesn't set this flag to True, the old behaviour applies. If you find this approach feasible, what would be an appropriate name to replace |
What desirable new behavior would the flag let users opt into? Is the flag just a bugfix for #178? If so, breaking changes, new configuration options, and something the caller has to do all seem like a chainsaw, for what calls for a knife. I'm afraid to maintain 2 different string splitting experiences or more options to the already too many to construct a |
Yes, this flag is just a bugfix for #178. If an opt-in flag is used, none of the existing code out there should be affected, and hence it would not be a breaking change. However I do agree that this complicates the string splitting algorithm; having to handle 2 different cases. After re-reading the issue, I realised your solution of using the is_private property is simpler and there won't be a need ti opt-in, but it would change the output of FQDN for the private domains. |
You got it. That approach is what I'm interested in. #178 is a legit bug, so I'll keep it open for now. The FQDN of private domains was never intended to be blank, so changing that output is desirable. |
This PR changed a lot since my last comment! I think it's much closer to the solution I was seeking in my last comment. Nice! My only remaining concern is the semi-breaking change of adding another field to this project's I wonder if there's a better way? Do we add an attribute to the namedtuple, instead of extending its fields? Do we bump the breaking version number of the project? Do we move away from namedtuple? I don't want to make this PR's fix go on much longer. Just noodling what compatibility I'm comfortable with. 🤔 |
I've played around with this, and it doesn't feel viable.
I still haven't investigated what the stdlib is doing differently in e.g.
Doing this for every field added to the namedtuple seems like a lot. I'd just steer people away from iterating over all fields, unless they really know what they're doing. All this to say, I'm leaning toward merging this as is, releasing, and seeing how niche the compatibility issues are. |
I had some code using Glad this package is getting improved and am okay with the breaking change, I don't know if I would have caught it, but if this had kicked the project up to 4.0.0 (to help highlight the breaking change). I might have checked the release notes more carefully. I realize not everyone uses that convention, and that this is not such a deeply breaking change to possibly warrant it. But throwing it out there nonetheless. As it was, I just updated to Thanks for any and all efforts to maintain and further extend this great package. |
@banagale thank you for weighing in! Per #305, 3.6.0 is yanked and republished as 4.0.0. Your tuple slicing code will work in 3.x and 4.0.0. However, I've also published 5.0.0, which moves away from a tuple return type entirely. In 5.0.0, you will need to directly reference the fields you're interested in. ext = tldextract.extract(url)
'.'.join((ext.domain, ext.suffix)) |
Cool. Makes good sense. |
https://build.opensuse.org/request/show/1119465 by user mia + anag+factory - Update to 5.0.1: Bugfixes: * Indicate MD5 not used in a security context (FIPS compliance) #gh/john-kurkowski/tldextract#309 Misc.: * Increase typecheck aggression - Changes in 5.0.0: Breaking Changes: * Migrate `ExtractResult` from `namedtuple` to `dataclass` #gh/john-kurkowski/tldextract#306 Bugfixes: * Drop support for EOL Python 3.7 - Changes in 4.0.0: Breaking Bugfixes: * Always include suffix if private suffix enabled and private suffix exists #gh/john-kurkowski/tldextract#300 - Changes in 3.5.0: Features: * Support IPv6 addresses #gh/john-kurkowski/tldextract#298 Bugfixes: * Accept only 4 decimal octet IPv4 addresses #gh/john-kurkowski/tldextract#292 * Support IPv4 addresses with unicode dots * Reject IPv4 addresses with trailing whitespace
Closes #178
Changes
Misc
This can be a step towards fixing Semantics of
registered_domain
property for private domains #138. Would need a consensus on whether registered_domain should strictly refer to domain + public suffix regardless of whether private suffixes are enabled.Is there a better way to expose is_private than putting it in the existing NamedTuple?
The CLI parser behaviour remains unchanged; it only prints out the str attributes of the NamedTuple.