Id in INDEX data.frame and of html-files don't match #68

cspersonal · 2019-10-25T14:37:50Z

Hi,

first of all thank you for developing and maintaining this great package!

I ran into a problem yesterday, when crawling a large website. The site contains round about 20k pages. Before analyzing the crawled data I did a few crosschecks manually by comparing the content of an url with e.g. Id = 100 in the browser and the stored html-file with the name 100.html. Doing that i realized that the content doesn't match. Somehow the Ids of the html-files are shuffled and when i open the file get the content of some other page of the site.

I would really appreciate any help with this!

Best,
cspersonal

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Id in INDEX data.frame and of html-files don't match #68

Id in INDEX data.frame and of html-files don't match #68

cspersonal commented Oct 25, 2019

Id in INDEX data.frame and of html-files don't match #68

Id in INDEX data.frame and of html-files don't match #68

Comments

cspersonal commented Oct 25, 2019