Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

请教下载链接的解析方法 链接形式 https://kns.cnki.net/kns/download.aspx?filename=WRGMhx2KSxkQxNUQD50cSZXZUlHTv8ma3I2RKlnbwpFMrJXcEpHc5dzUPF3Z1BneZFHNGhEdCdFUnJkRzh3ayU1dE9WSiZ2KQxUbGdETQl1KSp1dw40b1JWcpV3cxAzYqFGaydmNQlmSDlXNsRkcQZEZrZVTul2N&tablename=CJFDLAST2018 #16

Open
skygongque opened this issue Jun 19, 2020 · 0 comments

Comments

@skygongque
Copy link

skygongque commented Jun 19, 2020

实现了学校ip的知网登录但下载文献需要验证码(每一篇都要),真实的浏览器(selenium驱动浏览器也每篇都要验证码)请求可以直接下载到文献,是少量什么参数还是什么?
看了下CNKI-download的文献下载部分只是简单的get请求加了headers是一个404

import requests
headers = {
        'Connection': 'keep-alive',
        'Cache-Control': 'max-age=0',
        'Upgrade-Insecure-Requests': '1',
        'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.106 Safari/537.36',
        'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
        'Accept-Language': 'zh-CN,zh;q=0.9,en-GB;q=0.8,en;q=0.7',
        # 'Cookie': 'SID=020197; Ecp_LoginStuts={"IsAutoLogin":false,"UserName":"DX0434","ShowName":"%e6%b5%99%e6%b1%9f%e7%90%86%e5%b7%a5%e5%a4%a7%e5%ad%a6","UserType":"bk","BUserName":"","BShowName":"","BUserType":"","r":"0rHTHE"}; c_m_LinID=LinID=WEEvREcwSlJHSldRa1FhcEFLUmVicE1SUFRzQTZEZW5Va0VWYitsa2NPMD0=$9A4hF_YAuvQ5obgVAqNKPCYcEjKensW4IQMovwHtwkF4VYPoHbKxJw!!&ot=06/19/2020 13:54:08; LID=WEEvREcwSlJHSldRa1FhcEFLUmVicE1SUFRzQTZEZW5Va0VWYitsa2NPMD0=$9A4hF_YAuvQ5obgVAqNKPCYcEjKensW4IQMovwHtwkF4VYPoHbKxJw!!; c_m_expire=2020-06-19 13:54:08; Ecp_session=1; ASP.NET_SessionId=vughxubnlqvnxrf0vtd0brwz; Ecp_ClientId=5200619133401915832'
    }
    session = requests.Session()
    session.headers.update(headers)
    # ip 登录
    r = session.get(
        'https://login.cnki.net/TopLogin/api/loginapi/IpLoginFlush')
    r.encoding = r.apparent_encoding
    # print(r.text)
    res = session.get('https://kns.cnki.net/kns/download.aspx?filename=WRGMhx2KSxkQxNUQD50cSZXZUlHTv8ma3I2RKlnbwpFMrJXcEpHc5dzUPF3Z1BneZFHNGhEdCdFUnJkRzh3ayU1dE9WSiZ2KQxUbGdETQl1KSp1dw40b1JWcpV3cxAzYqFGaydmNQlmSDlXNsRkcQZEZrZVTul2N&tablename=CJFDLAST2018')
    res.encoding = res.apparent_encoding
    # print(res.headers)
    print(res.text)

output

</head>
<body>
    <div class="c_verify-box">
        <form method="post" onsubmit="return validate();">
            <h3 class="title">安全验证</h3>
            <p class="c_verify-desc">您当前的IP为:183.134.192.27,您的操作过于频繁,为保障帐
户的正常使用,请输入验证码:</p>
            <dl class="c_verify-code">
                <dt><img id="vImg" src="/kdoc/request/ValidateCode.ashx?t=1577242936454" alt="验证码" title="点击切换验证码"></dt>
                <dd>
                    <p class="tips" id="tips"></p>
                    <input type="password" id="vcode" name="vcode" maxlength="4"><button class="c_btn" type="submit">提交</button>
                </dd>
            </dl>
        </form>
    </div>

</body>
</html>
@skygongque skygongque changed the title 请教下载链接的解析方法https://kns.cnki.net/kns/download.aspx?filename=WRGMhx2KSxkQxNUQD50cSZXZUlHTv8ma3I2RKlnbwpFMrJXcEpHc5dzUPF3Z1BneZFHNGhEdCdFUnJkRzh3ayU1dE9WSiZ2KQxUbGdETQl1KSp1dw40b1JWcpV3cxAzYqFGaydmNQlmSDlXNsRkcQZEZrZVTul2N&tablename=CJFDLAST2018 请教下载链接的解析方法 链接形式 https://kns.cnki.net/kns/download.aspx?filename=WRGMhx2KSxkQxNUQD50cSZXZUlHTv8ma3I2RKlnbwpFMrJXcEpHc5dzUPF3Z1BneZFHNGhEdCdFUnJkRzh3ayU1dE9WSiZ2KQxUbGdETQl1KSp1dw40b1JWcpV3cxAzYqFGaydmNQlmSDlXNsRkcQZEZrZVTul2N&tablename=CJFDLAST2018 Jun 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant