Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

not detecting complete set of meta tags #35

Open
jameslin101 opened this issue Aug 17, 2015 · 7 comments
Open

not detecting complete set of meta tags #35

jameslin101 opened this issue Aug 17, 2015 · 7 comments

Comments

@jameslin101
Copy link

Hi,

I am trying to parse meta tags using this code:

NSArray *metaNodes = [document nodesMatchingSelector:@"meta"];

I ran the code through this page:
http://www.nytimes.com/2015/08/16/technology/inside-amazon-wrestling-big-ideas-in-a-bruising-workplace.html

and it only picked up 31 meta tags when there is clearly 50+

@nolanw
Copy link
Owner

nolanw commented Aug 17, 2015

Hello! I'm not seeing the same results as you. I downloaded the HTML with the command

curl -OL "http://www.nytimes.com/2015/08/16/technology/inside-amazon-wrestling-big-ideas-in-a-bruising-workplace.html"

When I opened the downloaded file in a text editor, I counted ten instances of <meta. When I loaded it into HTMLReader, the array returned by [document nodesMatchingSelector:@"meta"] had a count of ten.

May I ask how you're counting 31 and 50+ meta tags? Are you loading the page in a browser?

@jameslin101
Copy link
Author

Hi Nolan

Thanks for getting back so quickly. Yup when I look at the source code in
Chrome its showing at least 50+ tags.
I'm using the exact code from your github readme and added:

NSArray *metaNodes = [document nodesMatchingSelector:@"meta"];

NSLog(@"metaNodes %@", metaNodes);

The metaNodes array consistently comes back on my side to be 31 objects.

Wow that very strange you are getting 10. Would the meta tag count be
different based on how you load it? Maybe some are ones generated by
javascript after it is loaded?
That wouldn't make sense either because I went through line by line what is
in metaNodes and it matches up correctly until it cuts off at:

twitter:app:url:googleplay

Thanks!

On Mon, Aug 17, 2015 at 7:38 PM, Nolan Waite notifications@github.com
wrote:

Hello! I'm not seeing the same results as you. I downloaded the HTML with
the command

curl -OL "http://www.nytimes.com/2015/08/16/technology/inside-amazon-wrestling-big-ideas-in-a-bruising-workplace.html"

When I opened the downloaded file in a text editor, I counted ten
instances of <meta. When I loaded it into HTMLReader, the array returned
by [document nodesMatchingSelector:@"meta"] had a count of ten.

May I ask how you're counting 31 and 50+ meta tags? Are you loading the
page in a browser?


Reply to this email directly or view it on GitHub
#35 (comment).

@nolanw
Copy link
Owner

nolanw commented Aug 19, 2015

Oh, I was not very careful. Turns out that curl command gets redirected to the login page because it isn't accepting cookies. I changed it to:

curl -OL -c cookies.txt "http://www.nytimes.com/2015/08/16/technology/inside-amazon-wrestling-big-ideas-in-a-bruising-workplace.html"

And got a text file with 92 instances of the string <meta, and with HTMLReader I dragged it into the playground and did

import HTMLReader

let path = NSBundle.mainBundle().pathForResource("updog.html", ofType: nil)!
let data = NSData(contentsOfFile: path)!
let home = HTMLDocument(data: data, contentTypeHeader: nil)
home.nodesMatchingSelector("meta").count

and got a count of 92 matching nodes.

Just for fun, I went to the page you linked in Safari, opened the Web Inspector, typed

document.querySelectorAll('meta').length

into the console, and got 93.

Is any of this helpful?

@jameslin101
Copy link
Author

Hi Nolan

Yup I'm getting the 93 when I'm running the javascript query as well.

I'm running this code in a brand new iOS singleview project after importing
HTMLReader.h via Cocoapods and still getting a 31. Very strange.

  • (void)viewDidLoad {
    [super viewDidLoad];
    NSURL *url = [NSURL URLWithString:@"
    http://www.nytimes.com/2015/08/16/technology/inside-amazon-wrestling-big-ideas-in-a-bruising-workplace.html?_r=0
    "];
    NSURLSession *session = [NSURLSession sharedSession];
    [[session dataTaskWithURL:url completionHandler:
    ^(NSData *data, NSURLResponse *response, NSError *error) {
    NSString *contentType = nil;
    if ([response isKindOfClass:[NSHTTPURLResponse class]]) {
    NSDictionary *headers = [(NSHTTPURLResponse *)response
    allHeaderFields];
    contentType = headers[@"Content-Type"];
    }
    HTMLDocument *document = [HTMLDocument documentWithData:data

contentTypeHeader:contentType];
NSArray *metaNodes = [document nodesMatchingSelector:@"meta"];
NSLog(@"metaNodes %@ count:%lu", metaNodes, (unsigned
long)[metaNodes count]);
}] resume];
}

On Tue, Aug 18, 2015 at 8:31 PM, Nolan Waite notifications@github.com
wrote:

Oh, I was not very careful. Turns out that curl command gets redirected to
the login page because it isn't accepting cookies. I changed it to:

curl -OL -c cookies.txt "http://www.nytimes.com/2015/08/16/technology/inside-amazon-wrestling-big-ideas-in-a-bruising-workplace.html"

And got a text file with 92 instances of the string <meta, and with
HTMLReader I dragged it into the playground and did

import HTMLReader
let path = NSBundle.mainBundle().pathForResource("updog.html", ofType: nil)!let data = NSData(contentsOfFile: path)!let home = HTMLDocument(data: data, contentTypeHeader: nil)
home.nodesMatchingSelector("meta").count

and got a count of 92 matching nodes.

Just for fun, I went to the page you linked in Safari, opened the Web
Inspector, typed

document.querySelectorAll('meta').length

into the console, and got 93.

Is any of this helpful?


Reply to this email directly or view it on GitHub
#35 (comment).

@nolanw
Copy link
Owner

nolanw commented Aug 20, 2015

I'm afraid I'm nearly out of ideas! If you take the data from your snippet and log it out as a string, do you see the markup you expect to see?

@nolanw
Copy link
Owner

nolanw commented Sep 20, 2015

@jameslin101 did you ever solve this?

@jameslin101
Copy link
Author

No I was not able to solve it. Probably a one-off issue with that
particular article, but very strange.

On Sun, Sep 20, 2015 at 3:34 PM, Nolan Waite notifications@github.com
wrote:

@jameslin101 https://github.com/jameslin101 did you ever solve this?


Reply to this email directly or view it on GitHub
#35 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants