Skip to content

Python library for dealing with PageXML files (WIP)

Notifications You must be signed in to change notification settings

fstrunz/page-py

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

68 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

page-py

Python library for dealing with PageXML files.

Example

from typing import List
from page.elements import PcGts, Page, Region, Text

# Currently, only pagecontent files are supported via PcGts.
pcgts = PcGts.from_file("example.gt.xml")
page: Page = pcgts.page
regions: List[Region] = page.regions

# Accumulate the TextEquiv tags of all TextLines in the document.
texts: List[Text] = [
    line.text for region in regions for line in region.lines
]

# Print all of their unicode representations.
for text in texts:
    print(text.unicode)

About

Python library for dealing with PageXML files (WIP)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages