Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

无法获取post[n]的user对象 #67

Closed
z6189949 opened this issue Jan 4, 2023 · 7 comments
Closed

无法获取post[n]的user对象 #67

z6189949 opened this issue Jan 4, 2023 · 7 comments
Labels
bug Something isn't working

Comments

@z6189949
Copy link

z6189949 commented Jan 4, 2023

简要描述这个bug
去年功能一切正常,从2023年开始无法获取到post里的每个用户的个人信息
...

如何复现
使用client.get_posts方法
在何种场景下用何种操作复现

你希望程序作出何种行为
统计每个发帖人 回帖人的用户名 昵称 归属地
...

截图(可选)
image
image

...

@n0099
Copy link

n0099 commented Jan 4, 2023

#64 (comment)

@z6189949
Copy link
Author

z6189949 commented Jan 4, 2023

#64 (comment)

现在我只能暂时使用get_user_info接口,有没有更方便的方法呢

@n0099
Copy link

n0099 commented Jan 4, 2023

cc @Starry-OvO

@lumina37
Copy link
Owner

lumina37 commented Jan 4, 2023

升级到2.10.1试试
0bd449d 应该能解决问题

@lumina37 lumina37 closed this as completed Jan 5, 2023
@n0099
Copy link

n0099 commented Jan 5, 2023

#64 (comment)
并且还是灰度发布(随机返回修改后或之前的结构)的,导致我不得不同时保留对两种结构的处理并回退: n0099/open-tbm@0e7d15b

贴吧后端好像做完灰度全量发布使用滚回7.x及以前版本客户端json接口中的冗余结构:把每个回复贴的用户元数据都放在reply.author下,这下主题帖回复贴楼中楼3大接口都一致了(以前只有楼中楼接口还是7.x及以前的结构)

我在 https://github.com/n0099/TiebaMonitor/blob/1ed0332563d4deb02201cb2a3b18ce8816a68d9d/crawler/src/Tieba/Crawl/Facade/ReplyCrawlFacade.cs#L23 加的如下两个断点显示只有第二个断点(reply.author)会触发而第一个(user_list)不会
image

@n0099
Copy link

n0099 commented Jan 9, 2023

https://github.com/Starry-OvO/aiotieba/releases/tag/v2.10.2 中进一步指出:

#67 #68 描述的bug的产生原因是 /pb/page 接口新增了对 _client_type 参数的依赖,缺失该参数会导致获取的数据归属于默认的远古版本,表现出的具体特征有: 返回用户昵称为旧版, user_list 不包含数据,无法识别音频内容等。因此在添加上 _client_type 参数后, 2.10.1 的以下变化可以被安全地回滚: FragImage.big_src 被重新添加, FragLink.is_external 恢复由 /mo/q/checkurl 前缀判断, Posts 和 Threads 的解析流程仅保留【同时使用 user_list + author_id 字段】而弃用【直接使用 author 字段】

所以带个_client_type=2就可以变回user_list结构了?

@lumina37 lumina37 added the bug Something isn't working label Jan 10, 2023
n0099 added a commit to n0099/tbclient.protobuf that referenced this issue Jan 10, 2023
n0099 added a commit to n0099/open-tbm that referenced this issue Jan 10, 2023
…37/aiotieba#67 (comment) , this allow us revert two previous commits 32168f6 and 31cd3ad @ `ClientRequester.PostProtoBuf()`

+ move update the parent thread of reply with the new title extracted from the first-floor reply in the first page from `PostParseHook()` into a new method `SaveParentThreadTitle()`
- parent virtual method `BaseCrawlFacade.ParsePostsEmbeddedUsers()` and move its override to `FillAuthorInfoBackToReply()`
- overridden parent virtual method `ThrowIfEmptyUsersEmbedInPosts()`
@ ReplyCrawlFacade.cs

* parse users stored in `response.Data.UserList` @ `PostParseHook()`
- overridden parent virtual method `ThrowIfEmptyUsersEmbedInPosts()` @ (Thread|Reply)CrawlFacade.cs

* no longer adding embed users into param `outUsers` @ `ParsePostsInternal()`
* assign `outPost.AuthorUid` with `inPost.AuthorId` instead of `.Author.Uid`
@ (Thread|Reply)Parser.cs

* change required param `ThreadResponse.Types.Data data` to `IEnumerable<Thread> threads` @ `ThreadCrawlFacade.ParseLatestRepliers()`, also affects `ThreadArchiveCrawlFacade.PostParseHook()`

$ `git submodule update --remote`
@ crawler
@lumina37
Copy link
Owner

Of course

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants