-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Missing posts \ different meta information depends on json\text output #200
Comments
Please choose your examples more carefully, no nsfw blogs in posts. The rest I'll look at later. |
Oh, sorry. Didn't know just put example where noticed first. |
In short, it's not a bug. The text/json setting only affects the format of the output files, but not how or what items are found/downloaded. In detail, I downloaded the blog several times with gaps between 15 minutes to several hours and saw what I already assumed.
At the moment the numbers don't go down any more, but it's unclear if it's the final state. If you do the same tests with blogs that have only posts within the last one or two years, you normally get the same result. Regarding the json output, it was already addressed in #82 and hasn't been changed yet, because the app just appends the new elements instead of reading the whole file to replace the ending bracket and add the new elements. For now to use it in another app, comma needs to be removed and brackets added. |
Interesting, why is that so? I saw in my metadata output post which doesn't exist in new output and post too, but images from post exist.
Well brackets shouldn't be too hard to add before and after loop or something where crawl is happening. As to comma, you could change and add it before or at start of new element output. And for first item just check if it first and don't add comma if it is. Or move out from loop first element or something like that. Okay, thank you for response. It's up to you to close the issue. |
- Until now new JSON elements have just been appended to the list/file. So the last element in the file ended with a comma and array brackets were missing totally. - Now the files are written as and changed to complete JSON structure the next time an element is written to them.
Brackets have been added and the issue has been closed. You can still comment. Feel free to ask for reopening the issue if needed. |
Just notice with this blog
[redacted]
(and I guess with some others too)Checked them compare to each other and found that some post are in json output and not in text output and vice versa.
Here is json\text output files and third where I put list which posts in json and not in text and in text and not in json outputs.
[redacted]
[redacted]
Diffs.txt
Also, need to wrap json output into [ *output* ] to make json file correct and remove comma after last post before ].
The text was updated successfully, but these errors were encountered: