Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ocr识别出问题 #12

Open
czcxwe opened this issue Mar 8, 2020 · 6 comments
Open

ocr识别出问题 #12

czcxwe opened this issue Mar 8, 2020 · 6 comments

Comments

@czcxwe
Copy link

czcxwe commented Mar 8, 2020

问题描述

直接fork到的代码不是直接能用的
然后修改了一下

    def depoint(self, img):
        """传入二值化后的图片进行降噪"""
        pixdata = img.load()
        w, h = img.size
        for y in range(1, h - 1):
            for x in range(1, w - 1):
                count = 0
                if pixdata[x, y - 1] > 245:  # 上
                    count = count + 1
                if pixdata[x, y + 1] > 245:  # 下
                    count = count + 1
                if pixdata[x - 1, y] > 245:  # 左
                    count = count + 1
                if pixdata[x + 1, y] > 245:  # 右
                    count = count + 1
                if pixdata[x - 1, y - 1] > 245:  # 左上
                    count = count + 1
                if pixdata[x - 1, y + 1] > 245:  # 左下
                    count = count + 1
                if pixdata[x + 1, y - 1] > 245:  # 右上
                    count = count + 1
                if pixdata[x + 1, y + 1] > 245:  # 右下
                    count = count + 1
                if count > 4:
                    pixdata[x, y] = 255
        return img

    def imge2string(self,image,threshold):
        """
        图片转字符串
        按照threshold进行降噪
        """

        image = image.convert('L')
        # 二值化
        image = image.point(lambda x: 255 if x > threshold else 0)
        #
        # 继续降噪
        image = self.depoint(image)
        # 识别//这里识别还有问题 tesserocr识别内容为空
        result = tesserocr.image_to_text(image)
        print(str(threshold)+"识别到验证码:" + str(result))
        return result

    def crack_code(self):
        '''
        自动识别验证码
        '''
        image = Image.open('./data/crack_code.jpeg')
        # 转为灰度图像

        # 设定二值化阈值
        threshold = 127
        s1 = self.imge2string(image, threshold)
        s2 = self.imge2string(image, threshold+20)
        s3 = self.imge2string(image, threshold-20)
        if s1 == s2 == s3 or s1 == s2 or s1 == s3:
            return self.send_code(str(s1))
        elif s2 == s3:
            return self.send_code(str(s2))

result = tesserocr.image_to_text(image)这里出现了问题
无论如何识别,或者处理图像,tesserocr返回结果均为空

@czcxwe
Copy link
Author

czcxwe commented Mar 8, 2020

修改代码的部分是 :CrackVerifyCode.py的 CrackCode 类中的成员函数

@dengwen168
Copy link

你好,我使用中遇到下面的问题,请问如何解决?
File "C:\Users\john1\Desktop\PI\cnki\CNKI-download-master\CNKI-download-master\CrackVerifyCode.py", line 34, in get_im age self.current_url = re.search(r'(.*?)#', current_url).group(1) AttributeError: 'NoneType' object has no attribute 'group'

@czcxwe
Copy link
Author

czcxwe commented Mar 31, 2020 via email

@dengwen168
Copy link

哦,其实我不用下载文献的,只需要采集详情页的那些关键词,摘要信息,应该还是可以用的吧?

@czcxwe
Copy link
Author

czcxwe commented Mar 31, 2020 via email

@dengwen168
Copy link

好的,谢谢,看样子得自己好好研究一下才行了。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants