Remove some strings #29

NBibikov · 2015-07-11T21:56:04Z

Hi! Please help me. I read docs but don't understand how remove some strings. I have some html strings with different parts(aHirg7S8Zu0):

<p><img src="//img.youtube.com/vi/aHirg7S8Zu0/0.jpg" height="505" width="640"></p>
<p>&nbsp;</p>
<h2 style="text-align: center;">Dear parents, I want say you...</h2>
<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sit sane ista voluptas. Aliter autem vobis placet. Fortemne possumus dicere eundem illum Torquatum? Duo Reges: constructio interrete. Igitur neque stultorum quisquam beatus neque sapientium non beatus.<br>
<p>&nbsp;</p>

How i can delete first line and all nbsp(2-nd line)?

1. <p><img src="//img.youtube.com/vi/aHirg7S8Zu0/0.jpg" height="505" width="640"></p> 
2. <p>&nbsp;</p>

Thank you very much

The text was updated successfully, but these errors were encountered:

nolanw · 2015-07-13T12:30:57Z

There are fundamentally two ways to go about it: focus on the content to keep; or discard unwanted content. I'm not sure which one makes more sense in the context you've given, so I'll describe both.

If you choose to focus on the content to keep, it looks like you're interested in the header and the paragraph thereafter. So you could do something like:

HTMLDocument *document = /* load a document */;
HTMLElement *h2 = [document firstNodeMatchingSelector:@"h2"];
HTMLElement *relevantParagraph = [document firstNodeMatchingSelector:@"h2 + p"];

If you choose to discard unwanted content, you might do something like:

HTMLDocument *document = /* load a document */;
HTMLElement *img = [document firstNodeMatchingSelector:@"p > img"];
HTMLElement *imageParagraph = img.parentElement;
// Grab the parent of all these paragraphs for later.
HTMLElement *parent = imageParagraph.parentElement;
[imageParagraph removeFromParentNode];
for (HTMLElement *child in parent.children) {
  // U+00A0 is non-breaking space, aka &nbsp;
  if ([child.tagName isEqualToString:@"p"] &&
      [child.textContent isEqualToString:@"\u00a0"])
  {
    [child removeFromParentNode];
  }
}

These examples lean pretty heavily on assuming your document looks exactly like the context you've provided here, so you might need to make it a bit more general.

Does that make sense?

NBibikov · 2015-07-20T12:36:56Z

Thank you very much! I will experiment two options

nolanw · 2015-09-20T19:34:26Z

@NBibikov did you ever solve this?

sujeet14108 · 2016-02-01T19:26:33Z

Hi
Actually there is no problem in your code.
"nbsp;" this thing would not be shown on the output page although you can simply delete it. Some edition put this on the time of declaration

and also the image source url would not be printed as such .

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove some strings #29

Remove some strings #29

NBibikov commented Jul 11, 2015

nolanw commented Jul 13, 2015

NBibikov commented Jul 20, 2015

nolanw commented Sep 20, 2015

sujeet14108 commented Feb 1, 2016

Remove some strings #29

Remove some strings #29

Comments

NBibikov commented Jul 11, 2015

nolanw commented Jul 13, 2015

NBibikov commented Jul 20, 2015

nolanw commented Sep 20, 2015

sujeet14108 commented Feb 1, 2016