Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bump com.github.crawler-commons:crawler-commons from 1.3 to 1.4 #15

Conversation

dependabot[bot]
Copy link

@dependabot dependabot bot commented on behalf of github Jul 26, 2024

Bumps com.github.crawler-commons:crawler-commons from 1.3 to 1.4.

Release notes

Sourced from com.github.crawler-commons:crawler-commons's releases.

crawler-commons-1.4

Important Changes

  • Java 11 is now required to run or build crawler-commons
  • the robots.txt parser (SimpleRobotRulesParser) is now compliant with RFC 9309

Full List of Changes

  • [Robots.txt] Implement Robots Exclusion Protocol (REP) IETF Draft: port unit tests (sebastian-nagel, Richard Zowalla) #245, #360
  • [Robots.txt] Close groups of rules as defined in RFC 9309 (kkrugler, garyillyes, jnioche, sebastian-nagel) #114, #390, #430
  • [Robots.txt] Empty disallow statement not to clear other rules (sebastian-nagel, jnioche) #422, #424
  • [Robots.txt] SimpleRobotRulesParser main() to follow five redirects (sebastian-nagel, jnioche) #428
  • [Robots.txt] Add more spelling variants and typos of robots.txt directives (sebastian-nagel, jnioche) #425
  • [Robots.txt] Document effect of rules merging in combination with multiple agent names (sebastian-nagel, Richard Zowalla) #423, #426
  • [Robots.txt] Pass empty collection of agent names to select rules for any robot (wildcard user-agent name) (sebastian-nagel, Richard Zowalla) #427
  • [Robots.txt] Rename default user-agent / robot name in unit tests (sebastian-nagel, Richard Zowalla) #429
  • [Robots.txt] Add units test based on examples in RFC 9309 (sebastian-nagel, Richard Zowalla) #420
  • [BasicNormalizer] Query parameters normalization in BasicURLNormalizer (aecio, sebastian-nagel, Richard Zowalla) #308, #421
  • [Robots.txt] Deduplicate robots rules before matching (sebastian-nagel, jnioche) #416
  • [Robots.txt] SimpleRobotRulesParser main to use the new API method (sebastian-nagel, jnioche) #413
  • Generate JaCoCo reports when testing (jnioche) #409, #412
  • Push Code Coverage to Coveralls (Richard Zowalla, jnioche) #414
  • [Robots.txt] Path analyse bug with url-decode if allow/disallow path contains escaped wild-card characters (tkalistratov, sebastian-nagel, Richard Zowalla) #195, #408
  • [Robots.txt] Handle allow/disallow directives containing unescaped Unicode characters (sebastian-nagel, Richard Zowalla, aecio) #389, #401
  • [Robots.txt] Improve readability of robots.txt unit tests (sebastian-nagel, Richard Zowalla) #383
  • Upgrade project to use Java 11 (Avi Hayun, Richard Zowalla, aecio, sebastian-nagel) #320, #376
  • [Robots.txt] RFC compliance: matching user-agent names when selecting rule blocks (sebastian-nagel, Richard Zowalla) #362
  • [Robots.txt] Matching user-agent names does not conform to robots.txt RFC (YossiTamari, sebastian-nagel) #192
  • [Robots.txt] Improve robots check draft rfc compliance (Eduardo Jimenez) #351
  • Upgrade dependencies (dependabot) #379, #384, #394, #399, #404, #419
  • Upgrade Maven plugins (dependabot) #377, #381, #386, #396, #397, #398, #400, #402, #403, #405, #406, #407, #415, #418
  • Javadoc: ensure Javascript search is working (sebastian-nagel, Richard Zowalla, aecio) #378, #380
Changelog

Sourced from com.github.crawler-commons:crawler-commons's changelog.

Crawler-Commons Change Log

Current Development 1.5-SNAPSHOT (yyyy-mm-dd)

  • [Sitemaps] Improve logging done by SiteMapParser (Valery Yatsynovich, sebastian-nagel) #457
  • [Sitemaps] Google Sitemap PageMap extensions (josepowera, sebastian-nagel, Richard Zowalla, jnioche) #388, #442
  • [Domains] Installation of a gzip-compressed public suffix list from Maven cache breaks EffectiveTldFinder to address (sebastian-nagel, Richard Zowalla) #441, #443
  • Upgrade dependencies (dependabot) #437, #444, #448, #451
  • Upgrade Maven plugins (dependabot) #434, #438, #439, #449, #445, #452, #455

Release 1.4 (2023-07-13)

  • [Robots.txt] Implement Robots Exclusion Protocol (REP) IETF Draft: port unit tests (sebastian-nagel, Richard Zowalla) #245, #360
  • [Robots.txt] Close groups of rules as defined in RFC 9309 (kkrugler, garyillyes, jnioche, sebastian-nagel) #114, #390, #430
  • [Robots.txt] Empty disallow statement not to clear other rules (sebastian-nagel, jnioche) #422, #424
  • [Robots.txt] SimpleRobotRulesParser main() to follow five redirects (sebastian-nagel, jnioche) #428
  • [Robots.txt] Add more spelling variants and typos of robots.txt directives (sebastian-nagel, jnioche) #425
  • [Robots.txt] Document effect of rules merging in combination with multiple agent names (sebastian-nagel, Richard Zowalla) #423, #426
  • [Robots.txt] Pass empty collection of agent names to select rules for any robot (wildcard user-agent name) (sebastian-nagel, Richard Zowalla) #427
  • [Robots.txt] Rename default user-agent / robot name in unit tests (sebastian-nagel, Richard Zowalla) #429
  • [Robots.txt] Add units test based on examples in RFC 9309 (sebastian-nagel, Richard Zowalla) #420
  • [BasicNormalizer] Query parameters normalization in BasicURLNormalizer (aecio, sebastian-nagel, Richard Zowalla) #308, #421
  • [Robots.txt] Deduplicate robots rules before matching (sebastian-nagel, jnioche) #416
  • [Robots.txt] SimpleRobotRulesParser main to use the new API method (sebastian-nagel, jnioche) #413
  • Generate JaCoCo reports when testing (jnioche) #409, #412
  • Push Code Coverage to Coveralls (Richard Zowalla, jnioche) #414
  • [Robots.txt] Path analyse bug with url-decode if allow/disallow path contains escaped wild-card characters (tkalistratov, sebastian-nagel, Richard Zowalla) #195, #408
  • [Robots.txt] Handle allow/disallow directives containing unescaped Unicode characters (sebastian-nagel, Richard Zowalla, aecio) #389, #401
  • [Robots.txt] Improve readability of robots.txt unit tests (sebastian-nagel, Richard Zowalla) #383
  • Upgrade project to use Java 11 (Avi Hayun, Richard Zowalla, aecio, sebastian-nagel) #320, #376
  • [Robots.txt] RFC compliance: matching user-agent names when selecting rule blocks (sebastian-nagel, Richard Zowalla) #362
  • [Robots.txt] Matching user-agent names does not conform to robots.txt RFC (YossiTamari, sebastian-nagel) #192
  • [Robots.txt] Improve robots check draft rfc compliance (Eduardo Jimenez) #351
  • Upgrade dependencies (dependabot) #379, #384, #394, #399, #404, #419
  • Upgrade Maven plugins (dependabot) #377, #381, #386, #396, #397, #398, #400, #402, #403, #405, #406, #407, #415, #418
  • Javadoc: ensure Javascript search is working (sebastian-nagel, Richard Zowalla, aecio) #378, #380

Release 1.3 (2022-07-19)

  • [Sitemaps] Disable support for DTDs in XML sitemaps and feeds by default (Kenneth Wong) #371
  • Migrate Continuous Integration from Travis to GitHub Actions (Valery Yatsynovich) #333
  • Upgrade dependencies (dependabot, Richard Zowalla) #334, #339, #345, #346, #347, #350, #354, #361, #369
  • Upgrade Maven plugins (dependabot, Richard Zowalla, sebastian-nagel) #328, #329, #330, #331, #335, #336, #337, #338, #340, #341, #343, #356, #363. #364, #366, #373, #374
  • Update pom.xml to address Maven warnings and deprecations (sebastian-nagel, Richard Zowalla, Avi Hayun) #342
  • Enable Dependabot (Valery Yatsynovich) #327
  • Removes test dependency towards mockito-core (Richard Zowalla) #367
  • Drops provided dependency towards servlet-api (Richard Zowalla) #368

Release 1.2 (2021-10-06)

  • [Sitemaps] Avoid calling java.net.URL::equals in equals method of sitemaps and extensions (sebastian-nagel) #322
  • [URLs] Provide a builder class to configure the URL normalizer (aecio) #321, #324
  • [URLs] Make normalization of IDNs configurable (to ASCII or Unicode) via builder (aecio, sebastian-nagel) #324
  • [Sitemaps] Fix XXE vulnerability in Sitemap parser (kovyrin) #323

... (truncated)

Commits
  • ce9cf46 Update CHANGES.txt for release of crawler-commons 1.4
  • 2b8717d [maven-release-plugin] prepare release crawler-commons-1.4
  • a62bd80 Updates changelog for #376, #380, #401, #414, #425, #428, #422/#424, #114/#39...
  • 6fb34cf Implement Robots Exclusion Protocol (REP) IETF Draft: port unit tests (#360)
  • 871e4e6 Merge pull request #430 from sebastian-nagel/cc-390-114-robots-closing-rule-g...
  • d685baf [Robots.txt] SimpleRobotRulesParser main() to follow five redirects (#428)
  • de7221d [Robots.txt] Empty disallow statement not to clear other rules, fixes #422 (#...
  • 7ae8617 [Robots.txt] Add more spelling variants and typos of robots.txt directives (#...
  • e672994 [Robots.txt] Clarify behavior when to close blocks of multiple user-agents
  • 17e8544 [Robots.txt] Clarify behavior when to close blocks of multiple user-agents
  • Additional commits viewable in compare view

Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot merge will merge this PR after your CI passes on it
  • @dependabot squash and merge will squash and merge this PR after your CI passes on it
  • @dependabot cancel merge will cancel a previously requested merge and block automerging
  • @dependabot reopen will reopen this PR if it is closed
  • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
  • @dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot dependabot bot added dependencies Pull requests that update a dependency file java Pull requests that update Java code labels Jul 26, 2024
@dependabot dependabot bot force-pushed the dependabot/maven/com.github.crawler-commons-crawler-commons-1.4 branch from 8c11fbf to 67e14aa Compare July 26, 2024 22:38
Bumps [com.github.crawler-commons:crawler-commons](https://github.com/crawler-commons/crawler-commons) from 1.3 to 1.4.
- [Release notes](https://github.com/crawler-commons/crawler-commons/releases)
- [Changelog](https://github.com/crawler-commons/crawler-commons/blob/master/CHANGES.txt)
- [Commits](crawler-commons/crawler-commons@crawler-commons-1.3...crawler-commons-1.4)

---
updated-dependencies:
- dependency-name: com.github.crawler-commons:crawler-commons
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
@dependabot dependabot bot force-pushed the dependabot/maven/com.github.crawler-commons-crawler-commons-1.4 branch from 67e14aa to 5121874 Compare July 26, 2024 22:38
@valfirst valfirst merged commit e574d4b into master Jul 26, 2024
1 check passed
@valfirst valfirst deleted the dependabot/maven/com.github.crawler-commons-crawler-commons-1.4 branch July 26, 2024 22:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependencies Pull requests that update a dependency file java Pull requests that update Java code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant