Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

输入字符图片二值化后输出依旧很潦草 #69

Open
TNXG opened this issue Mar 3, 2024 · 7 comments
Open

输入字符图片二值化后输出依旧很潦草 #69

TNXG opened this issue Mar 3, 2024 · 7 comments

Comments

@TNXG
Copy link

TNXG commented Mar 3, 2024

如题

image
(白纸+扫描仪+QQ截图)

以下是我用来二值化图片的代码

import os
from PIL import Image
import cv2

source_dir = 'now_style_samples'
target_dir = 'style_samples'

if not os.path.exists(target_dir):
    os.makedirs(target_dir)

files = os.listdir(source_dir)

for file in files:
    image = Image.open(os.path.join(source_dir, file)).convert('1')

    aspect_ratio = image.width / image.height
    new_width = int(image.width * 2)
    new_height = int(new_width / aspect_ratio)

    new_size = (new_width, new_height)
    image = image.resize(new_size, Image.LANCZOS)
    image.save(os.path.join(target_dir, file))

for file in files:
    image = cv2.imread(os.path.join(source_dir, file), cv2.IMREAD_GRAYSCALE)
    _, image = cv2.threshold(image, 127, 255, cv2.THRESH_BINARY)

    aspect_ratio = image.shape[1] / image.shape[0]
    new_width = int(image.shape[1] * 2)
    new_height = int(new_width / aspect_ratio)

    new_size = (new_width, new_height)
    image = cv2.resize(image, new_size, interpolation=cv2.INTER_LANCZOS4)

    cv2.imwrite(os.path.join(target_dir, file), image)

使用Microsoft Whiteboard写字后通过Snipaste截图生成的也有如上问题
image

以上两种方法都通过了贴出来的二值化代码处理

@lvitol
Copy link

lvitol commented Mar 7, 2024

模型训练使用的数据集是casia-hwdb,里面起码一半以上的数据都是狂草风格,模型从这样的数据训练出来就是这个样子了,清理下训练数据,重新训练模型可能能好些

@maygyd
Copy link

maygyd commented Mar 9, 2024

狂草风格,模型从这样的数据训练出来就是这个样子了,清理下训练数据,重新训练模型可能能好些

要怎么清理呢

@YZcat2023
Copy link

我找到可能比较好的办法了,看一下我的回答喵

@dailenson
Copy link
Owner

可以看下这位老铁的复现结果#75 (comment)

@YunDouYue
Copy link

用数位板,狂草主要是噪点
分享一下纸写扫描的生成结果和数位板写的生成结果
21-15
21-15-fix

@maygyd
Copy link

maygyd commented Jul 17, 2024 via email

@TNXG
Copy link
Author

TNXG commented Jul 17, 2024

用数位板,狂草主要是噪点 分享一下纸写扫描的生成结果和数位板写的生成结果 21-15 21-15-fix

QQ_1721214163572

不如尝试#75 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants