Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UDF CaMeL cASe consistency issues #368

Closed
lintool opened this issue Oct 26, 2019 · 6 comments
Closed

UDF CaMeL cASe consistency issues #368

lintool opened this issue Oct 26, 2019 · 6 comments
Labels

Comments

@lintool
Copy link
Member

lintool commented Oct 26, 2019

In terms of Scala RDD UDFs, we have:

RemoveHTML(r.getContentString)

And:

RemoveHTML(RemoveHttpHeader(r.getContentString))

I can't think of a case when you'd want clean text but want to keep the HTTP headers... so RemoveHTML should just call RemoveHttpHeader.

Also, we're mixing camel cases, so it should either be:

  1. RemoveHtml and RemoveHttpHeader
  2. RemoveHTML and RemoveHTTPHeader

Note the MiXEd mess we have now.

Option (1) is more conforming to Java practices, but then we have removePrefixWww, which just looks odd. Maybe we can rename to RemoveW3Prefix?

We also have ComputeMD5 and ComputeSHA1, so perhaps option (2) is better?

Thoughts?

@ruebot
Copy link
Member

ruebot commented Oct 26, 2019

Oh, there's more (UDFs, class, and object names)!

Might as well make a list so we have more info on whatever decision we make:

I might have missed some.


I can't think of a case when you'd want clean text but want to keep the HTTP headers... so RemoveHTML should just call RemoveHttpHeader.

👍

@lintool
Copy link
Member Author

lintool commented Oct 26, 2019

I think my vote is for RemoveHTML, RemoveHTTPHeader, etc.
Also, fewer things to change.

@ruebot
Copy link
Member

ruebot commented Oct 26, 2019

I'm confused.

So, if I'm I understanding you correctly, you're suggesting that we don't worry about the list I dropped in, which highlight other examples of what you originally raised, and just roll with inconsistency, or resolve just the two you highlighted?

@lintool
Copy link
Member Author

lintool commented Oct 26, 2019

What I meant was that RemoveHttpHeader seems to be the only UDF that's inconsistently cased. I think if we renamed RemoveHttpHeader to RemoveHTTPHeader, this issue resolves.

@greebie
Copy link
Contributor

greebie commented Oct 28, 2019

FYI - Did some quick research and found Scalastyle does not seem to support custom rule configuration (just on or off), so it might be good to include a short list of style rules in the README (I doubt there are many).

Google style guide wants the former style due to the XMLHTTPRequest readability problem. I don't think we have this problem for aut, and it would be preferable as a rule to put action before format anyway.

@lintool
Copy link
Member Author

lintool commented Nov 5, 2019

Video chat with @ruebot and @ianmilligan1 - conclusion: rename RemoveHttpHeader to RemoveHTTPHeader and all the UDFs will have consistent casing - this will close the issue.

ruebot pushed a commit that referenced this issue Nov 5, 2019
- Replace ExtractBaseDomain with ExtractDomain
- Closes #367
- Address bug in ArcTest; RemoveHTML -> RemoveHttpHeader
- Closes #369
- Wraps RemoveHttpHeader and RemoveHTML for use in data frames.
- Partially addresses #238
- Updates tests where necessary
- Punts on #368 UDF CaMeL cASe consistency issues
@ruebot ruebot closed this as completed in 25ca5a9 Nov 7, 2019
ruebot pushed a commit to archivesunleashed/aut-docs that referenced this issue Nov 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants