Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support to respect the layout of the original pdf file #68

Open
Nan-Do opened this issue Jul 26, 2023 · 2 comments
Open

Support to respect the layout of the original pdf file #68

Nan-Do opened this issue Jul 26, 2023 · 2 comments

Comments

@Nan-Do
Copy link

Nan-Do commented Jul 26, 2023

I'm trying to convert some files that contain python code but the tool doesn't respect the original formatting and prints the files without any spacing. For example, a pdf containing the following text:

# Time: O(n)
# Space: O(n)

# freq table

Next there is the solution to the proposed problem using Python2:

class Solution(object):
    def isGood(self, nums):
        """
        :type nums: List[int]
        :rtype: bool
        """
        cnt = [0]*len(nums)
        for x in nums:
             if x < len(cnt):
                 cnt[x] += 1
             else:
                 return False
        return all(cnt[x] == 1 for x in xrange(1, len(nums)-1))

Is translated into:

# Time: O(n) # Space: O(n)

# freq table

Next there is the solution to the proposed problem using Python2:

class Solution(object): def isGood(self, nums): """ :type nums: List[int] :rtype: bool """ cnt = [0]*len(nums) for x in nums: if x < len(cnt): cnt[x] += 1 else: return False return all(cnt[x] == 1 for x in xrange(1, len(nums)-1))

In this case, it doesn't detect it as a code block, in some other examples, the tool detects the code blocks correctly but still removes the initial spacing. One such example is this book

Is there a way to force the tool to respect the original formatting?

@LoneRifle
Copy link
Collaborator

Could you try 0.1.25 and verify if the problem is present there too?

@Nan-Do
Copy link
Author

Nan-Do commented Jul 26, 2023

Sure!
I just have tried with version 0.1.25 and the output is exactly the same in regards with the python formatting issue.
The same behavior of ignoring the initial spacing also happens with pdf-to-markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants