-
-
Notifications
You must be signed in to change notification settings - Fork 155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UTF-8 support for XML output in pycsw? #276
Comments
@mattenp do you have a CSW URL or test case (configuration, test file[s]) you can send to demonstrate the issue? |
Hi Tom, I get the xml document from here http://www.ifremer.fr/geonetwork-sdn/srv/eng/csw-csr?SERVICE=CSW&VERSION=2.0.2&REQUEST=GetRecordById&id=urn:SDN:CSR:LOCAL:1000010&outputschema=csw:IsoRecord&ElementSetName=full https://github.com/mattenp/testcase/blob/master/IFREMER-CSW_1000010_2009-02-24.xml Then I load this record to database: pycsw-admin.py -c load_records -f default.cfg -p /path/to/records default.cfg https://github.com/mattenp/testcase/blob/master/default.cfg If I use a PostgreSQL database,then it's UTF-8, if I use an Oracle database, then it's Latin 1. If I request the record the Browser show's a fine result in UTF-8, but if I look at the sourcecode in Firefox in IE or if I request the result by wget the result is encoded in Latin1. |
@mattenp thanks. FYI I'm unable to load the data into pycsw: $ pycsw-admin.py -c load_records -f default.cfg -p issue-276-testcase/
Initializing static context
creating new engine: sqlite:///tests/suites/cite/data/records.db
binding ORM to existing database
setting repository queryables
Processing file issue-276-testcase/IFREMER-CSW_1000010_2009-02-24.xml (1 of 1)
Serialized metadata, parsing content model
Traceback (most recent call last):
File "/home/tkralidi/work/foss4g/pycsw/master/bin/pycsw-admin.py", line 7, in <module>
execfile(__file__)
File "/home/tkralidi/work/foss4g/pycsw/master/pycsw/bin/pycsw-admin.py", line 234, in <module>
admin.load_records(CONTEXT, DATABASE, TABLE, XML_DIRPATH, RECURSIVE)
File "/home/tkralidi/work/foss4g/pycsw/master/pycsw/pycsw/admin.py", line 337, in load_records
record = metadata.parse_record(context, exml, repo)
File "/home/tkralidi/work/foss4g/pycsw/master/pycsw/pycsw/metadata.py", line 102, in parse_record
return _parse_metadata(context, repos, record)
File "/home/tkralidi/work/foss4g/pycsw/master/pycsw/pycsw/metadata.py", line 127, in _parse_metadata
return [_parse_iso(context, repos, exml)]
File "/home/tkralidi/work/foss4g/pycsw/master/pycsw/pycsw/metadata.py", line 844, in _parse_iso
_set(context, recobj, 'pycsw:Title', md.identification.title)
AttributeError: 'NoneType' object has no attribute 'title' When I inspect https://github.com/mattenp/testcase/blob/master/IFREMER-CSW_1000010_2009-02-24.xml closer, this appears to be a ISO 19115/SeaDataNet profile type record, which is not supported by OWSLib (which pycsw uses to parse metadata). 1./ are you working off a fork/custom OWSLib build that supports the SeaDataNet profile (aside: this would be a valuable enhancement to OWSLib) |
@mattenp FYI there would need to be some enhancement to pycsw to support some of the Oracle specifics like geometry and full text search (Oracle text?). I'm cc'ing @msmitherdc, who I believe/heard is interested in this as well (we can open another issue related to Oracle). |
@tomkralidis That's right, I work for SeaDataNet and I will do this. But it's ISO19139/SeaDataNet profile for CSR and CDI http://www.seadatanet.org/Standards-Software/Metadata-formats/CSR I'm next on holiday for 2 weeks, after that I will report a solution for OWSLib concerning to ISO19139/SDN
Well, I've uploaded another test file https://github.com/mattenp/testcase/blob/master/ISO19139-example.xml . <gmd:organisationName>
<gco:CharacterString>commerciale de la société -- Centre for Ecology & Hydrology</gco:CharacterString>
</gmd:organisationName>
@tomkralidis @msmitherdc Yes, I'm very interested in this. |
@mattenp are you able to make a local change/test? This should fix things: diff --git a/pycsw/server.py b/pycsw/server.py
index f078e5e..55f1c1b 100644
--- a/pycsw/server.py
+++ b/pycsw/server.py
@@ -2323,7 +2323,7 @@ class Csw(object):
else: # it's XML
self.contenttype = self.mimetype
response = etree.tostring(self.response,
- pretty_print=self.pretty_print)
+ pretty_print=self.pretty_print, encoding='unicode')
xmldecl = '<?xml version="1.0" encoding="%s" standalone="no"?>\n' \
% self.encoding
appinfo = '<!-- pycsw %s -->\n' % self.context.version
@@ -2331,7 +2331,7 @@ class Csw(object):
LOGGER.debug('Response:\n%s' % response)
s = '%s%s%s' % (xmldecl, appinfo, response)
- return s.encode()
+ return s.encode('utf8')
def _gen_soap_wrapper(self): |
fix response encoding support (#276)
Applied to master and 1.10 branch. |
Hi,
I use pycsw 1.10.0 only for CSW-publication, not for harvesting. So I use an PostgreSQL database, encoding is UTF-8 and it works. But the XML in database is UTF-8 and the XML output in browser isn't UTF-8, it's WIn1252. The browser convert's the encoding to UTF-8. So If I would harvest I wouldn't get UTF-8. Only The browser convert's the xml to unicode. Do you know a solution for this issue?
an example:
Best regards,
Matthias
The text was updated successfully, but these errors were encountered: