Skip to content

Commit

Permalink
Merge branch 'master' into equals-hashcode-improve
Browse files Browse the repository at this point in the history
  • Loading branch information
Isira-Seneviratne authored Jul 23, 2024
2 parents bd812a9 + dcf190c commit 77932b5
Show file tree
Hide file tree
Showing 50 changed files with 1,896 additions and 282 deletions.
4 changes: 2 additions & 2 deletions .github/workflows/build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,10 +20,10 @@ jobs:
uses: actions/checkout@v4

- name: Set up JDK ${{ matrix.java }}
uses: actions/setup-java@v3
uses: actions/setup-java@v4
with:
java-version: ${{ matrix.java }}
distribution: 'temurin'
distribution: 'zulu'
cache: 'maven'

- name: Maven Compile
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/codeql.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ jobs:
- name: Checkout
uses: actions/checkout@v4
- name: Set up JDK
uses: actions/setup-java@v3
uses: actions/setup-java@v4
with:
java-version: 17
distribution: 'temurin'
Expand Down
55 changes: 54 additions & 1 deletion CHANGES.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,68 @@
# jsoup Changelog

## 1.18.1 (Pending)
## 1.18.2 (Pending)

### Improvements
* The form associated elements returned by `FormElement.elements()` now reflect changes made to the DOM,
subsequently to the original parse. [2140](https://github.com/jhy/jsoup/issues/2140)

### Bug Fixes

* `Element.cssSelector()` would fail if the element's class contained a `*`
character. [2169](https://github.com/jhy/jsoup/issues/2169)

## 1.18.1 (2024-Jul-10)

### Improvements

* **Stream Parser**: A `StreamParser` provides a progressive parse of its input. As each `Element` is completed, it is
emitted via a `Stream` or `Iterator` interface. Elements returned will be complete with all their children, and an
(empty) next sibling, if applicable. Elements (or their children) may be removed from the DOM during the parse,
for e.g. to conserve memory, providing a mechanism to parse an input document that would otherwise be too large to fit
into memory, yet still providing a DOM interface to the document and its elements. Additionally, the parser provides
a `selectFirst(String query)` / `selectNext(String query)`, which will run the parser until a hit is found, at which
point the parse is suspended. It can be resumed via another `select()` call, or via the `stream()` or `iterator()`
methods. [2096](https://github.com/jhy/jsoup/pull/2096)
* **Download Progress**: added a Response Progress event interface, which reports progress and URLs are downloaded (and
parsed). Supported on both a session and a single connection
level. [2164](https://github.com/jhy/jsoup/pull/2164), [656](https://github.com/jhy/jsoup/issues/656)
* Added `Path` accepting parse methods: `Jsoup.parse(Path)`, `Jsoup.parse(path, charsetName, baseUri, parser)`,
etc. [2055](https://github.com/jhy/jsoup/pull/2055)
* Updated the `button` tag configuration to include a space between multiple button elements in the `Element.text()`
method. [2105](https://github.com/jhy/jsoup/issues/2105)
* Added support for the `ns|*` all elements in namespace Selector. [1811](https://github.com/jhy/jsoup/issues/1811)
* When normalising attribute names during serialization, invalid characters are now replaced with `_`, vs being
stripped. This should make the process clearer, and generally prevent an invalid attribute name being coerced
unexpectedly. [2143](https://github.com/jhy/jsoup/issues/2143)

### Changes

* Removed previously deprecated internal classes and methods. [2094](https://github.com/jhy/jsoup/pull/2094)
* Build change: the built jar's OSGi manifest no longer imports itself. [2158](https://github.com/jhy/jsoup/issues/2158)

### Bug Fixes

* When tracking source positions, if the first node was a TextNode, its position was incorrectly set
to `-1.` [2106](https://github.com/jhy/jsoup/issues/2106)
* When connecting (or redirecting) to URLs with characters such as `{`, `}` in the path, a Malformed URL exception would
be thrown (if in development), or the URL might otherwise not be escaped correctly (if in
production). The URL encoding process has been improved to handle these characters
correctly. [2142](https://github.com/jhy/jsoup/issues/2142)
* When using `W3CDom` with a custom output Document, a Null Pointer Exception would be
thrown. [2114](https://github.com/jhy/jsoup/pull/2114)
* The `:has()` selector did not match correctly when using sibling combinators (like
e.g.: `h1:has(+h2)`). [2137](https://github.com/jhy/jsoup/issues/2137)
* The `:empty` selector incorrectly matched elements that started with a blank text node and were followed by
non-empty nodes, due to an incorrect short-circuit. [2130](https://github.com/jhy/jsoup/issues/2130)
* `Element.cssSelector()` would fail with "Did not find balanced marker" when building a selector for elements that had
a `(` or `[` in their class names. And selectors with those characters escaped would not match as
expected. [2146](https://github.com/jhy/jsoup/issues/2146)
* Updated `Entities.escape(string)` to make the escaped text suitable for both text nodes and attributes (previously was
only for text nodes). This does not impact the output of `Element.html()` which correctly applies a minimal escape
depending on if the use will be for text data or in a quoted
attribute. [1278](https://github.com/jhy/jsoup/issues/1278)
* Fuzz: a Stack Overflow exception could occur when resolving a crafted `<base href>` URL, in the normalizing regex.
[2165](https://github.com/jhy/jsoup/issues/2165)

---

Expand Down
2 changes: 1 addition & 1 deletion LICENSE
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
The MIT License

Copyright (c) 2009-2023 Jonathan Hedley <https://jsoup.org/>
Copyright (c) 2009-2024 Jonathan Hedley <https://jsoup.org/>

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
Expand Down
48 changes: 18 additions & 30 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@

<groupId>org.jsoup</groupId>
<artifactId>jsoup</artifactId>
<version>1.18.1-SNAPSHOT</version><!-- remember to update previous version below for japicmp -->
<version>1.18.2-SNAPSHOT</version><!-- remember to update previous version below for japicmp -->
<url>https://jsoup.org/</url>
<description>jsoup is a Java library that simplifies working with real-world HTML and XML. It offers an easy-to-use API for URL fetching, data parsing, extraction, and manipulation using DOM API methods, CSS, and xpath selectors. jsoup implements the WHATWG HTML5 specification, and parses HTML to the same DOM as modern browsers.</description>
<inceptionYear>2009</inceptionYear>
Expand Down Expand Up @@ -33,7 +33,7 @@

<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<jetty.version>9.4.53.v20231009</jetty.version>
<jetty.version>9.4.55.v20240627</jetty.version>
</properties>

<build>
Expand All @@ -42,7 +42,7 @@
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.12.1</version>
<version>3.13.0</version>
<configuration>
<encoding>UTF-8</encoding>
<compilerArgs>
Expand All @@ -68,7 +68,7 @@
<plugin>
<groupId>org.codehaus.mojo</groupId>
<artifactId>animal-sniffer-maven-plugin</artifactId>
<version>1.23</version>
<version>1.24</version>
<executions>
<execution>
<id>animal-sniffer</id>
Expand Down Expand Up @@ -117,7 +117,7 @@
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-javadoc-plugin</artifactId>
<version>3.6.3</version>
<version>3.7.0</version>
<configuration>
<doclint>none</doclint>
<source>8</source>
Expand All @@ -135,7 +135,7 @@
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-source-plugin</artifactId>
<version>3.3.0</version>
<version>3.3.1</version>
<configuration>
<excludes>
<exclude>org/jsoup/examples/**</exclude>
Expand All @@ -153,7 +153,7 @@
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-jar-plugin</artifactId>
<version>3.3.0</version>
<version>3.4.2</version>
<configuration>
<archive>
<manifest>
Expand Down Expand Up @@ -186,7 +186,7 @@
<instructions>
<Bundle-DocURL>https://jsoup.org/</Bundle-DocURL>
<Export-Package>org.jsoup.*</Export-Package>
<Import-Package>org.jspecify.annotations;version=!;resolution:=optional,*</Import-Package>
<Import-Package>!org.jsoup.*,org.jspecify.annotations;version=!;resolution:=optional,*</Import-Package>
</instructions>
</configuration>
</plugin>
Expand All @@ -197,20 +197,20 @@
</plugin>
<plugin>
<artifactId>maven-release-plugin</artifactId>
<version>3.0.1</version>
<version>3.1.1</version>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-surefire-plugin</artifactId>
<version>3.2.3</version>
<version>3.3.1</version>
<configuration>
<!-- smaller stack to find stack overflows -->
<argLine>-Xss256k</argLine>
<!-- smaller stack to find stack overflows. Was 256, but Zulu on MacOS ARM needs >= 640 -->
<argLine>-Xss640k</argLine>
</configuration>
</plugin>
<plugin>
<artifactId>maven-failsafe-plugin</artifactId>
<version>3.2.3</version>
<version>3.3.1</version>
<executions>
<execution>
<goals>
Expand All @@ -228,14 +228,14 @@
<!-- API version compat check - https://siom79.github.io/japicmp/ -->
<groupId>com.github.siom79.japicmp</groupId>
<artifactId>japicmp-maven-plugin</artifactId>
<version>0.18.3</version>
<version>0.21.2</version>
<configuration>
<!-- hard code previous version; can't detect when running stateless on build server -->
<oldVersion>
<dependency>
<groupId>org.jsoup</groupId>
<artifactId>jsoup</artifactId>
<version>1.17.1</version>
<version>1.17.2</version>
<type>jar</type>
</dependency>
</oldVersion>
Expand All @@ -260,18 +260,6 @@
<binaryCompatible>true</binaryCompatible>
<sourceCompatible>true</sourceCompatible>
</overrideCompatibilityChangeParameter>

<!--
One off, getting a spurious ping on adding [<T extends Node> Stream<T> nodeStream(Class<T> class)] to Node.
Manually verified binary & source compatibility
todo: remove after 1.17.1 release
-->
<overrideCompatibilityChangeParameter>
<compatibilityChange>CLASS_GENERIC_TEMPLATE_CHANGED</compatibilityChange>
<binaryCompatible>true</binaryCompatible>
<sourceCompatible>true</sourceCompatible>
</overrideCompatibilityChangeParameter>

</overrideCompatibilityChangeParameters>
</parameter>
</configuration>
Expand Down Expand Up @@ -383,7 +371,7 @@
<plugins>
<plugin>
<artifactId>maven-failsafe-plugin</artifactId>
<version>3.2.3</version>
<version>3.3.1</version>
<executions>
<execution>
<goals>
Expand All @@ -404,15 +392,15 @@
<dependency>
<groupId>org.junit.jupiter</groupId>
<artifactId>junit-jupiter</artifactId>
<version>5.10.1</version>
<version>5.10.3</version>
<scope>test</scope>
</dependency>

<dependency>
<!-- gson, to fetch entities from w3.org -->
<groupId>com.google.code.gson</groupId>
<artifactId>gson</artifactId>
<version>2.10.1</version>
<version>2.11.0</version>
<scope>test</scope>
</dependency>

Expand Down
34 changes: 33 additions & 1 deletion src/main/java/org/jsoup/Connection.java
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
import org.jsoup.helper.RequestAuthenticator;
import org.jsoup.nodes.Document;
import org.jsoup.parser.Parser;
import org.jsoup.parser.StreamParser;
import org.jspecify.annotations.Nullable;

import javax.net.ssl.SSLSocketFactory;
Expand Down Expand Up @@ -46,7 +47,17 @@ public interface Connection {
* GET and POST http methods.
*/
enum Method {
GET(false), POST(true), PUT(true), DELETE(true), PATCH(true), HEAD(false), OPTIONS(false), TRACE(false);
GET(false),
POST(true),
PUT(true),
DELETE(true),
/**
Note that unfortunately, PATCH is not supported in many JDKs.
*/
PATCH(true),
HEAD(false),
OPTIONS(false),
TRACE(false);

private final boolean hasBody;

Expand Down Expand Up @@ -465,6 +476,18 @@ default Connection auth(@Nullable RequestAuthenticator authenticator) {
*/
Connection response(Response response);

/**
Set the response progress handler, which will be called periodically as the response body is downloaded. Since
documents are parsed as they are downloaded, this is also a good proxy for the parse progress.
<p>The Response object is supplied as the progress context, and may be read from to obtain headers etc.</p>
@param handler the progress handler
@return this Connection, for chaining
@since 1.18.1
*/
default Connection onResponseProgress(Progress<Response> handler) {
throw new UnsupportedOperationException();
}

/**
* Common methods for Requests and Responses
* @param <T> Type of Base, either Request or Response
Expand Down Expand Up @@ -883,6 +906,15 @@ <p>Other body methods (like bufferUp, body, parse, etc) will generally not work
@return the response body input stream
*/
BufferedInputStream bodyStream();

/**
Returns a {@link StreamParser} that will parse the Response progressively.
* @return a StreamParser, prepared to parse this response.
* @throws IOException if an IO exception occurs preparing the parser.
*/
default StreamParser streamParser() throws IOException {
throw new UnsupportedOperationException();
}
}

/**
Expand Down
17 changes: 17 additions & 0 deletions src/main/java/org/jsoup/Progress.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
package org.jsoup;

@FunctionalInterface

public interface Progress<ProgressContext> {
/**
Called to report progress. Note that this will be executed by the same thread that is doing the work, so either
don't take to long, or hand it off to another thread.
@param processed the number of bytes processed so far.
@param total the total number of expected bytes, or -1 if unknown.
@param percent the percentage of completion, 0.0..100.0. If the expected total is unknown, % will remain at zero
until complete.
@param context the object that progress was made on.
@since 1.18.1
*/
void onProgress(int processed, int total, float percent, ProgressContext context);
}
Loading

0 comments on commit 77932b5

Please sign in to comment.