Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

下载的文件都只有2kb,大佬这么解决?谢谢! #7

Open
dockerwang opened this issue Dec 9, 2019 · 19 comments
Open

下载的文件都只有2kb,大佬这么解决?谢谢! #7

dockerwang opened this issue Dec 9, 2019 · 19 comments

Comments

@dockerwang
Copy link

No description provided.

@gityfx2018
Copy link

我也才发现时这样

@czcxwe
Copy link

czcxwe commented Mar 8, 2020

估计url改了需要再重写一下

@akong0716
Copy link

下下来的文件只有2kb,大佬后续是怎么解决的

@ljh2057
Copy link

ljh2057 commented Dec 12, 2020

下下来的文件只有2kb,大佬后续是怎么解决的

验证码问题,可以获取验证码页面的 cookie 带入下载就可以了(不用输入验证码,直接用那个 cookie 就行)。

@songyadong106
Copy link

@ljh2057 可以再具体一些吗? 我也遇到上面这个问题了,谢谢!

@ljh2057
Copy link

ljh2057 commented Dec 17, 2020 via email

@songyadong106
Copy link

songyadong106 commented Dec 17, 2020 via email

@ljh2057
Copy link

ljh2057 commented Dec 18, 2020

非计算机专业,不懂前端知识,可以说在哪里加加一些什么代码吗?  谢谢!!!!

---原始邮件--- 发件人: "ljh2057"<notifications@github.com> 发送时间: 2020年12月17日(周四) 晚上7:58 收件人: "CyrusRenty/CNKI-download"<CNKI-download@noreply.github.com>; 抄送: "Comment"<comment@noreply.github.com>;"songyadong106"<2232661644@qq.com>; 主题: Re: [CyrusRenty/CNKI-download] 下载的文件都只有2kb,大佬这么解决?谢谢! (#7) 随便取页面一个cookie 使用 webdriver带上cookie去下载文件
---原始邮件--- 发件人: "songyadong106"<notifications@github.com&gt; 发送时间: 2020年12月17日(周四) 晚上6:52 收件人: "CyrusRenty/CNKI-download"<CNKI-download@noreply.github.com&gt;; 抄送: "Mention"<mention@noreply.github.com&gt;;"ljh2057"<ljh0313@qq.com&gt;; 主题: Re: [CyrusRenty/CNKI-download] 下载的文件都只有2kb,大佬这么解决?谢谢! (#7) @ljh2057 可以再具体一些吗? 我也遇到上面这个问题了,谢谢! — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe. — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

from selenium import webdriver
from time import sleep
from selenium.webdriver.chrome.options import Options
def get_cookies():
webdriver_path = "D:\chromedriver.exe"
chrome_options = Options()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--disable-gpu')
driver = webdriver.Chrome(executable_path=webdriver_path, chrome_options=chrome_options)
driver.get("https://www.cnki.net/")
driver.find_element_by_id("txt_SearchText").click()
driver.find_element_by_id("txt_SearchText").send_keys("机器学习")
sleep(1)
element = driver.find_element_by_class_name("search-btn")
webdriver.ActionChains(driver).move_to_element(element).click(element).perform()
driver.find_element_by_class_name("search-btn").click()
sleep(1)
coo = driver.get_cookies()
ck = ""
for cookie in coo:
ck += cookie['name'] + '=' + cookie['value'] + ';'
return ck

@songyadong106
Copy link

songyadong106 commented Dec 18, 2020 via email

@Letualone
Copy link

同问

@Letualone
Copy link

---------------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------------- 您好,我将您提供的代码加入进来后,在一下地方进行了使用         ****************************************************************** self.session.get(             'https://i.shufang.cnki.net/KRS/KRSWriteHandler.ashx',             headers=HEADER,             cookies =get_cookies(),             params=params)                  self.session.get(             'https://kns.cnki.net/KRS/KRSWriteHandler.ashx',             headers=HEADER,             cookies =get_cookies(),             params=params) ********************************************************************* 运行后遇到一下问题: ********************************************************************* -------------------------- 正在下载: 基于卷积的高效非对称S速度曲线规划算法.caj DevTools listening on ws://127.0.0.1:56573/devtools/browser/634265e5-c4ce-4951-a205-a43b19e38b28 [1218/104004.534:INFO:CONSOLE(5)] "Synchronous XMLHttpRequest on the main thread is deprecated because of its detrimental effects to the end user's experience. For more help, check https://xhr.spec.whatwg.org/.", source: https://login.cnki.net/TopLogin/Scripts/jquery-1.11.3.min.js (5) 11111111111111111111111111111111111111111111111111 2222222222222222222222222222222222222222222222 Ecp_ClientIp=49.52.46.36;CurrSortFieldType=desc;CurrSortField=%e5%8f%91%e8%a1%a8%e6%97%b6%e9%97%b4%2f(%e5%8f%91%e8%a1%a8%e6%97%b6%e9%97%b4%2c%27TIME%27);c_m_LinID=LinID=WEEvREcwSlJHSldSdmVqMDh6aSs3b2tNOWp2ekpNWStFQ1RES05LeXIxTT0=$9A4hF_YAuvQ5obgVAqNKPCYcEjKensW4IQMovwHtwkF4VYPoHbKxJw!!&ot=12/18/2020 10:59:55;_pk_id=1ccdc941-e9e9-4f38-872c-376e9e9a91e6.1608259196.1.1608259202.1608259196.;cnkiUserKey=7962a104-411d-cab9-aa99-ddec2e518fa8;SID_kns8=123111;ASP.NET_SessionId=akdl1uoxger3jbsg3cbq1ns1;_pk_ses=;Ecp_ClientId=2201218103901202117;c_m_expire=2020-12-18 10:59:55;Ecp_LoginStuts={"IsAutoLogin":false,"UserName":"sh0322","ShowName":"%E5%8D%8E%E4%B8%9C%E7%90%86%E5%B7%A5%E5%A4%A7%E5%AD%A6","UserType":"bk","BUserName":"","BShowName":"","BUserType":"","r":"S24zR5"};Ecp_notFirstLogin=S24zR5;Ecp_session=1;LID=WEEvREcwSlJHSldSdmVqMDh6aSs3b2tNOWp2ekpNWStFQ1RES05LeXIxTT0=$9A4hF_YAuvQ5obgVAqNKPCYcEjKensW4IQMovwHtwkF4VYPoHbKxJw!!; 3333333333333333333333333333333333333333333333333333333 Ecp_ClientIp=49.52.46.36;CurrSortFieldType=desc;CurrSortField=%e5%8f%91%e8%a1%a8%e6%97%b6%e9%97%b4%2f(%e5%8f%91%e8%a1%a8%e6%97%b6%e9%97%b4%2c%27TIME%27);c_m_LinID=LinID=WEEvREcwSlJHSldSdmVqMDh6aSs3b2tNOWp2ekpNWStFQ1RES05LeXIxTT0=$9A4hF_YAuvQ5obgVAqNKPCYcEjKensW4IQMovwHtwkF4VYPoHbKxJw!!&ot=12/18/2020 10:59:55;_pk_id=1ccdc941-e9e9-4f38-872c-376e9e9a91e6.1608259196.1.1608259202.1608259196.;cnkiUserKey=7962a104-411d-cab9-aa99-ddec2e518fa8;SID_kns8=123111;ASP.NET_SessionId=akdl1uoxger3jbsg3cbq1ns1;_pk_ses=;Ecp_ClientId=2201218103901202117;c_m_expire=2020-12-18 10:59:55;Ecp_LoginStuts={"IsAutoLogin":false,"UserName":"sh0322","ShowName":"%E5%8D%8E%E4%B8%9C%E7%90%86%E5%B7%A5%E5%A4%A7%E5%AD%A6","UserType":"bk","BUserName":"","BShowName":"","BUserType":"","r":"S24zR5"};Ecp_notFirstLogin=S24zR5;Ecp_session=1;LID=WEEvREcwSlJHSldSdmVqMDh6aSs3b2tNOWp2ekpNWStFQ1RES05LeXIxTT0=$9A4hF_YAuvQ5obgVAqNKPCYcEjKensW4IQMovwHtwkF4VYPoHbKxJw!!; DevTools listening on ws://127.0.0.1:56755/devtools/browser/63d1b781-908d-48ac-a3c8-84c1ac1f9fba [1218/104020.738:INFO:CONSOLE(5)] "Synchronous XMLHttpRequest on the main thread is deprecated because of its detrimental effects to the end user's experience. For more help, check https://xhr.spec.whatwg.org/.", source: https://login.cnki.net/TopLogin/Scripts/jquery-1.11.3.min.js (5) 11111111111111111111111111111111111111111111111111 2222222222222222222222222222222222222222222222 c_m_LinID=LinID=WEEvREcwSlJHSldSdmVqM1BLUWh5QjhQSlhOakpMTEw4RWpmUnZTSXl1Zz0=$9A4hF_YAuvQ5obgVAqNKPCYcEjKensW4IQMovwHtwkF4VYPoHbKxJw!!&ot=12/18/2020 10:59:35;Ecp_ClientIp=49.52.46.36;CurrSortFieldType=desc;_pk_id=e2180398-31e4-461d-bd66-949360d8af96.1608259212.1.1608259220.1608259212.;cnkiUserKey=766fb650-4c29-d703-ab1b-303f4ce275c5;SID_kns8=123114;ASP.NET_SessionId=ychew2402n3zrjzn3ovmylpe;_pk_ses=*;Ecp_ClientId=1201218104001189727;c_m_expire=2020-12-18 10:59:35;Ecp_LoginStuts={"IsAutoLogin":false,"UserName":"sh0322","ShowName":"%E5%8D%8E%E4%B8%9C%E7%90%86%E5%B7%A5%E5%A4%A7%E5%AD%A6","UserType":"bk","BUserName":"","BShowName":"","BUserType":"","r":"oPDEnf"};Ecp_notFirstLogin=oPDEnf;Ecp_session=1;CurrSortField=%e5%8f%91%e8%a1%a8%e6%97%b6%e9%97%b4%2f(%e5%8f%91%e8%a1%a8%e6%97%b6%e9%97%b4%2c%27TIME%27);LID=WEEvREcwSlJHSldSdmVqM1BLUWh5QjhQSlhOakpMTEw4RWpmUnZTSXl1Zz0=$9A4hF_YAuvQ5obgVAqNKPCYcEjKensW4IQMovwHtwkF4VYPoHbKxJw!!; 3333333333333333333333333333333333333333333333333333333 Traceback (most recent call last):   File "main.py", line 259, in <module>     main()   File "main.py", line 253, in main     search.search_reference(get_uesr_inpt())   File "main.py", line 99, in search_reference     self.pre_parse_page(second_get_res.text), second_get_res.text)   File "main.py", line 188, in parse_page     self.download_url)   File "D:\paper_search_program\CNKI-download-master\GetPageDetail.py", line 101, in get_detail_page     params=params)   File "D:\anaconda3\lib\site-packages\requests\sessions.py", line 546, in get     return self.request('GET', url, **kwargs)   File "D:\anaconda3\lib\site-packages\requests\sessions.py", line 519, in request     prep = self.prepare_request(req)   File "D:\anaconda3\lib\site-packages\requests\sessions.py", line 440, in prepare_request     cookies = cookiejar_from_dict(cookies)   File "D:\anaconda3\lib\site-packages\requests\cookies.py", line 524, in cookiejar_from_dict     cookiejar.set_cookie(create_cookie(name, cookie_dict[name])) TypeError: string indices must be integers D:\paper_search_program\CNKI-download-master> 希望您可以帮忙看一下,这是什么缘故,目前只是把项目中所有的http-->https,然后就加入了你上面提供获取cookies的代码 ——————————————————————————————————————————————————————————————————————————————————————————————————————————————————

------------------ 原始邮件 ------------------ 发件人: "CyrusRenty/CNKI-download" <notifications@github.com>; 发送时间: 2020年12月18日(星期五) 上午9:16 收件人: "CyrusRenty/CNKI-download"<CNKI-download@noreply.github.com>; 抄送: "可爱男孩"<2232661644@qq.com>;"Comment"<comment@noreply.github.com>; 主题: Re: [CyrusRenty/CNKI-download] 下载的文件都只有2kb,大佬这么解决?谢谢! (#7) 非计算机专业,不懂前端知识,可以说在哪里加加一些什么代码吗?  谢谢!!!! … ---原始邮件--- 发件人: "ljh2057"<notifications@github.com> 发送时间: 2020年12月17日(周四) 晚上7:58 收件人: "CyrusRenty/CNKI-download"<CNKI-download@noreply.github.com>; 抄送: "Comment"<comment@noreply.github.com>;"songyadong106"<2232661644@qq.com>; 主题: Re: [CyrusRenty/CNKI-download] 下载的文件都只有2kb,大佬这么解决?谢谢! (#7) 随便取页面一个cookie 使用 webdriver带上cookie去下载文件 ---原始邮件--- 发件人: "songyadong106"<notifications@github.com&gt; 发送时间: 2020年12月17日(周四) 晚上6:52 收件人: "CyrusRenty/CNKI-download"<CNKI-download@noreply.github.com&gt;; 抄送: "Mention"<mention@noreply.github.com&gt;;"ljh2057"<ljh0313@qq.com&gt;; 主题: Re: [CyrusRenty/CNKI-download] 下载的文件都只有2kb,大佬这么解决?谢谢! (#7) @ljh2057 可以再具体一些吗? 我也遇到上面这个问题了,谢谢! — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe. — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe. from selenium import webdriver from time import sleep from selenium.webdriver.chrome.options import Options def get_cookies(): webdriver_path = "D:\chromedriver.exe" chrome_options = Options() chrome_options.add_argument('--headless') chrome_options.add_argument('--disable-gpu') driver = webdriver.Chrome(executable_path=webdriver_path, chrome_options=chrome_options) driver.get("https://www.cnki.net/") driver.find_element_by_id("txt_SearchText").click() driver.find_element_by_id("txt_SearchText").send_keys("机器学习") sleep(1) element = driver.find_element_by_class_name("search-btn") webdriver.ActionChains(driver).move_to_element(element).click(element).perform() driver.find_element_by_class_name("search-btn").click() sleep(1) coo = driver.get_cookies() ck = "" for cookie in coo: ck += cookie['name'] + '=' + cookie['value'] + ';' return ck — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

可以看看我fork后修改的版本(下载后只有2k是因为没有登录,实现学校校园网IP登录即可正常下载)

@Letualone
Copy link

同问

有问题的同学可以看看我fork后修改的版本(下载后只有2k是因为没有登录,实现学校校园网IP登录即可正常下载,目前公网登录尚未实现)

@Letualone
Copy link

Letualone commented Oct 7, 2021 via email

@mckChloe
Copy link

mckChloe commented Oct 7, 2021 via email

@Letualone
Copy link

Letualone commented Oct 7, 2021 via email

@mckChloe
Copy link

mckChloe commented Oct 7, 2021

嗯,有时候会出现这种现象,重新运行就行。 发自我的iPhone

------------------ 原始邮件 ------------------ 发件人: Qiong Zhong @.> 发送时间: 2021年10月7日 16:57 收件人: CyrusRenty/CNKI-download @.> 抄送: Letualone @.>, Comment @.> 主题: 回复:[CyrusRenty/CNKI-download] 下载的文件都只有2kb,大佬这么解决?谢谢! (#7) 诶 ,我刚刚改了ip是可以了。我想问下获得详情页的excel表里面的摘要那些,你运行的时候是无摘要吗? 下载解决好了,excel表里面的详情页我又整不好了。
------------------&nbsp;原始邮件&nbsp;------------------ 发件人: @.&gt;; 发送时间: 2021年10月7日(星期四) 下午4:54 收件人: @.&gt;; 抄送: @.&gt;; @.&gt;; 主题: Re: [CyrusRenty/CNKI-download] 下载的文件都只有2kb,大佬这么解决?谢谢! (#7) 代码中的ip登录的网址改了没,如果你也改了,还是不行,那有可能是知网网站更新了。之前我是能够正常运行和下载的,现在周围没有环境也不能帮你测试,不好意思。
------------------&amp;nbsp;原始邮件&amp;nbsp;------------------ 发件人: "CyrusRenty/CNKI-download" @.&amp;gt;; 发送时间:&amp;nbsp;2021年10月7日(星期四) 中午1:49 @.&amp;gt;; 抄送:&amp;nbsp;"Miss @.@.&amp;gt;; 主题:&amp;nbsp;Re: [CyrusRenty/CNKI-download] 下载的文件都只有2kb,大佬这么解决?谢谢! (#7) 同问 有问题的同学可以看看我fork后修改的版本(下载后只有2k是因为没有登录,实现学校校园网IP登录即可正常下载,目前公网登录尚未实现) 我尝试了一下你fork后修改过的版本也不行呢,我使用的校园网ip登录 — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

大佬,我重新运行了好多遍,详情页的信息还是出不来诶。excel表里无摘要,无关键字。你有时间能帮我看看吗?

@Letualone
Copy link

Letualone commented Oct 7, 2021 via email

@mckChloe
Copy link

mckChloe commented Oct 7, 2021

我不在学校,没有环境,帮不了你,你自己研究研究吧,不好意思了 发自我的iPhone

------------------ 原始邮件 ------------------ 发件人: Qiong Zhong @.> 发送时间: 2021年10月7日 17:57 收件人: CyrusRenty/CNKI-download @.> 抄送: Letualone @.>, Comment @.> 主题: 回复:[CyrusRenty/CNKI-download] 下载的文件都只有2kb,大佬这么解决?谢谢! (#7)

好的 谢谢~

@Maer321
Copy link

Maer321 commented Nov 20, 2021

您好 我有两个问题想要请教一下 嘻嘻
1.pip install tesserocr
安装失败报错原因如下 求大佬解决方案 呜呜呜
tesserocr.cpp:24:10: fatal error: 'Python.h' file not found
#include "Python.h"
^~~~~~~~~~
1 error generated.
error: command '/usr/bin/clang' failed with exit code 1

  1. 在进行crackverifycode文件注释后运行
    先将网址http改为https出现如下情况
    AttributeError: 'NoneType' object has no attribute 'find_all'
    进行百度后 猜测可能是遭遇反爬 建议解决方式是 添加头文件 伪装自己的电脑user-agent
    目前没有卡到这里 您看一下可以提供一些建议嘛 我可以进行有目的性地解决
    ps:欢迎大佬们与我联系:➕q764537596
    非常感谢大佬的代码 真的太棒了🌟

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants