Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add selecting elements by namespace (#1811) #1847

Closed
wants to merge 3 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 21 additions & 0 deletions src/main/java/org/jsoup/select/Evaluator.java
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,27 @@ public String toString() {
}
}

/**
* Evaluator for element namespace
*/
public static final class Namespace extends Evaluator {
private final String nameSpace;

public Namespace(String nameSpace) {
this.nameSpace = nameSpace;
}

@Override
public boolean matches(Element root, Element element) {
return (element.normalName().startsWith(nameSpace));
}

@Override
public String toString() {
return String.format("%s", nameSpace);
}
}

/**
* Evaluator for element id
*/
Expand Down
10 changes: 9 additions & 1 deletion src/main/java/org/jsoup/select/QueryParser.java
Original file line number Diff line number Diff line change
Expand Up @@ -162,6 +162,8 @@ private void findElements() {
byId();
else if (tq.matchChomp("."))
byClass();
else if (tq.toString().endsWith("|*"))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the right way to do this. The TokenQueue must be consumed token by token. With this implementation, queries like ns|* div (or anything after the namespace wildcard selector) will fail. You will need an appropriate matcher in TokenQueue. I think you could update the current tag matcher.

byNamespace();
else if (tq.matchesWord() || tq.matches("*|"))
byTag();
else if (tq.matches("["))
Expand Down Expand Up @@ -262,6 +264,12 @@ private void byTag() {
}
}

private void byNamespace() {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I would just extend the byTag method to support this.

String nameSpace = normalize(tq.consumeElementSelector());
Validate.notEmpty(nameSpace);
evals.add(new Evaluator.Namespace(nameSpace.replace("|", ":")));
}

private void byAttribute() {
TokenQueue cq = new TokenQueue(tq.chompBalanced('[', ']')); // content queue
String key = cq.consumeToAny(AttributeEvals); // eq, not, start, end, contain, match, (no val)
Expand Down Expand Up @@ -312,7 +320,7 @@ private void indexGreaterThan() {
private void indexEquals() {
evals.add(new Evaluator.IndexEquals(consumeIndex()));
}

//pseudo selectors :first-child, :last-child, :nth-child, ...
private static final Pattern NTH_AB = Pattern.compile("(([+-])?(\\d+)?)n(\\s*([+-])?\\s*\\d+)?", Pattern.CASE_INSENSITIVE);
private static final Pattern NTH_B = Pattern.compile("([+-])?(\\d+)");
Expand Down
1 change: 1 addition & 0 deletions src/main/java/org/jsoup/select/Selector.java
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@
* <tr><td><code>tag</code></td><td>elements with the given tag name</td><td><code>div</code></td></tr>
* <tr><td><code>*|E</code></td><td>elements of type E in any namespace (including non-namespaced)</td><td><code>*|name</code> finds <code>&lt;fb:name&gt;</code> and <code>&lt;name&gt;</code> elements</td></tr>
* <tr><td><code>ns|E</code></td><td>elements of type E in the namespace <i>ns</i></td><td><code>fb|name</code> finds <code>&lt;fb:name&gt;</code> elements</td></tr>
* <tr><td><code>ns|*</code></td><td>all elements in the namespace <i>ns</i></td><td><code>fb|*</code> finds <code>&lt;fb:name&gt;</code> and <code>&lt;fb:school&gt;</code> elements</td></tr>
* <tr><td><code>#id</code></td><td>elements with attribute ID of "id"</td><td><code>div#wrap</code>, <code>#logo</code></td></tr>
* <tr><td><code>.class</code></td><td>elements with a class name of "class"</td><td><code>div.left</code>, <code>.result</code></td></tr>
* <tr><td><code>[attr]</code></td><td>elements with an attribute named "attr" (with any value)</td><td><code>a[href]</code>, <code>[title]</code></td></tr>
Expand Down
10 changes: 10 additions & 0 deletions src/test/java/org/jsoup/nodes/ElementTest.java
Original file line number Diff line number Diff line change
Expand Up @@ -1399,6 +1399,16 @@ public void testNamespacedElements() {
assertEquals("html > body > fb|comments", els.get(0).cssSelector());
}

@Test
public void testSelectByNamespace() {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be in SelectorTest, and needs to be more detailed -- e.g. to catch the example I pointed out, and validate other variations work.

String html = "<html><body><ns:p>p in namespace ns</ns:p><ns:img>img in namespace ns</ns:img></body></html>";
Document doc = Jsoup.parse(html);
Elements els = doc.select("ns|*");
assertEquals(2, els.size());
assertEquals("html > body > ns|p", els.get(0).cssSelector());
assertEquals("html > body > ns|img", els.get(1).cssSelector());
}

@Test
public void testChainedRemoveAttributes() {
String html = "<a one two three four>Text</a>";
Expand Down