Missing posts \ different meta information depends on json\text output #200

dajotim937 · 2021-12-27T21:54:18Z

Just notice with this blog [redacted] (and I guess with some others too)
Checked them compare to each other and found that some post are in json output and not in text output and vice versa.
Here is json\text output files and third where I put list which posts in json and not in text and in text and not in json outputs.
[redacted]
[redacted]
Diffs.txt

Also, need to wrap json output into [ *output* ] to make json file correct and remove comma after last post before ].

The text was updated successfully, but these errors were encountered:

thomas694 · 2021-12-28T00:08:20Z

Please choose your examples more carefully, no nsfw blogs in posts.

The rest I'll look at later.

dajotim937 · 2021-12-28T00:20:45Z

Oh, sorry. Didn't know just put example where noticed first.

thomas694 · 2021-12-28T16:01:26Z

In short, it's not a bug. The text/json setting only affects the format of the output files, but not how or what items are found/downloaded.

In detail, I downloaded the blog several times with gaps between 15 minutes to several hours and saw what I already assumed.
The blog details always say it has 2418 posts, but the amount of (media) items the api lets you download changed (3140, 3149, 3004, 3033, 3015, 2982, 2978, 2957).
In the middle, on one run it "finds" 100 "new" items and can't find 120 that it had before.
Normally a blue default image ("this content has been removed...") instead of the original media should be shown.
But at the end more and more posts itself "vanished", e.g.:

ID: 159330284307
Name part: tumblr_ol79aw6jBc1vqhtixo1_540
Post date: "2017-04-08 07:46:56 GMT"

At the moment the numbers don't go down any more, but it's unclear if it's the final state.
You can see that behavior especially with blogs that have posts which are years old and are flagged/rated nsfw and so on, and where all over sudden the whole blog history is accessed by one or more downloaders again.

If you do the same tests with blogs that have only posts within the last one or two years, you normally get the same result.

Regarding the json output, it was already addressed in #82 and hasn't been changed yet, because the app just appends the new elements instead of reading the whole file to replace the ending bracket and add the new elements. For now to use it in another app, comma needs to be removed and brackets added.

dajotim937 · 2021-12-28T19:30:39Z

The blog details always say it has 2418 posts, but the amount of (media) items the api lets you download changed (3140, 3149, 3004, 3033, 3015, 2982, 2978, 2957).

Interesting, why is that so? I saw in my metadata output post which doesn't exist in new output and post too, but images from post exist.

Regarding the json output, it was already addressed in #82 and hasn't been changed yet, because the app just appends the new elements instead of reading the whole file to replace the ending bracket and add the new elements. For now to use it in another app, comma needs to be removed and brackets added.

Well brackets shouldn't be too hard to add before and after loop or something where crawl is happening. As to comma, you could change and add it before or at start of new element output. And for first item just check if it first and don't add comma if it is. Or move out from loop first element or something like that.

Okay, thank you for response. It's up to you to close the issue.

- Until now new JSON elements have just been appended to the list/file. So the last element in the file ended with a comma and array brackets were missing totally. - Now the files are written as and changed to complete JSON structure the next time an element is written to them.

thomas694 · 2022-01-14T20:36:19Z

Brackets have been added and the issue has been closed. You can still comment. Feel free to ask for reopening the issue if needed.

thomas694 closed this as completed Jan 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missing posts \ different meta information depends on json\text output #200

Missing posts \ different meta information depends on json\text output #200

dajotim937 commented Dec 27, 2021 •

edited by thomas694

Loading

thomas694 commented Dec 28, 2021

dajotim937 commented Dec 28, 2021

thomas694 commented Dec 28, 2021

dajotim937 commented Dec 28, 2021

thomas694 commented Jan 14, 2022

Missing posts \ different meta information depends on json\text output #200

Missing posts \ different meta information depends on json\text output #200

Comments

dajotim937 commented Dec 27, 2021 • edited by thomas694 Loading

thomas694 commented Dec 28, 2021

dajotim937 commented Dec 28, 2021

thomas694 commented Dec 28, 2021

dajotim937 commented Dec 28, 2021

thomas694 commented Jan 14, 2022

dajotim937 commented Dec 27, 2021 •

edited by thomas694

Loading