Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use of xsi:type in AES output #111

Open
carlwilson opened this issue Sep 7, 2016 · 7 comments
Open

Use of xsi:type in AES output #111

carlwilson opened this issue Sep 7, 2016 · 7 comments
Assignees
Labels
bug A product defect that needs fixing P2 Medium priority issues to be scheduled in a future release

Comments

@carlwilson
Copy link
Member

carlwilson commented Sep 7, 2016

Dev Effort

1D

Description

Both the WAVE and AIFF modules embed audio metadata in AES format without providing a schema. One of the produced elements make use of xsi:type, <tcf:filmFraming tcf:framing="NOT_APPLICABLE" xsi:type="tcf:ntscFilmFramingType"/>.

Because JHOVE schema does not validate embedded xml (processContents="skip"), the use of xsi:type does not cause problem. However, METS & PREMIS schema will validate embedded xml if sufficient definition is available (processContents="lax").

When we import this element into PREMIS document, it is not valid because xsi:type references a Type Definition (http://www.w3.org/TR/xmlschema-1/#xsi_type), thus explicit assertion of type validation is attempted.

The type tcf:ntscFilmFramingType cannot be resolved and causes validation to fail.
Looking into aes.org, we cannot find a schema describing the element in the namespace: http://www.aes.org/tcf.

It appears the AES X098B schema is not publicly available yet (according to Gary).

@carlwilson carlwilson added bug A product defect that needs fixing legacy legacy Legacy bugs that require testing to establish status labels Sep 7, 2016
@gmcgath
Copy link
Contributor

gmcgath commented Sep 7, 2016

The last I checked on that was several years ago.

@carlwilson carlwilson added this to the Testing Backlog Cleared milestone Sep 7, 2016
@carlwilson carlwilson removed the legacy label Dec 11, 2018
@ghost ghost added the P3 Low priority bugs label Mar 7, 2019
@ghost ghost modified the milestones: Legacy testing backlog cleared, Dev hack week initiation Mar 7, 2019
@ross-spencer
Copy link

SourceForge: https://sourceforge.net/p/jhove/bugs/5/

@ross-spencer
Copy link

Related to daitss/core#714

@ross-spencer
Copy link

ross-spencer commented May 5, 2020

Analysis

PREMIS/METS asks for external schemas to be validated against. We can ask the JHOVE schema to do the same by setting processContents="strict", so I created a strict version. Attached.

jhove-strict.zip

We then have the problem of being able to locate the external schema to validate against. I noticed FCLA referenced two versions of these documents. I do not know if we can reference the schema locations inline, so I think we have to change them in the global header to:

<?xml version="1.0" encoding="utf-8"?>
<jhove 
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
    xmlns:aes="http://www.aes.org/audioObject"
    xmlns="http://schema.openpreservation.org/ois/xml/ns/jhove" 
    xsi:schemaLocation="http://schema.openpreservation.org/ois/xml/ns/jhove 
                        file:///home/user/.../jhove-strict.xsd
                        http://www.aes.org/tcf http://schema.fcla.edu/tcf.xsd                          
                        http://www.aes.org/audioObject http://schema.fcla.edu/audioObject.xsd"

I have attached two copies of the AES schemas to run locally.

schmas-aes.zip

NB. Can they be hosted on openpreservation.org so that they are collected together?

Then the fun starts! There are a raft of validation errors trying to validate the AES based segment of the XML.

I haven't a 1.02b version of the audioOutput schema to validate against, so let's look at the changes we need for 1.03b:

Original from JHOVE:

  <property>
    <name>AESAudioMetadata</name>
    <values arity="Scalar" type="AESAudioMetadata">
      <value>
        <aes:audioObject xmlns:aes="http://www.aes.org/audioObject" 
                         xmlns:tcf="http://www.aes.org/tcf" 
                         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
                         ID="J4" 
                         analogDigitalFlag="FILE_DIGITAL" 
                         disposition="Validated by JHOVE" 
                         schemaVersion="1.02b">
          <aes:format specificationVersion="1.3 (1989-01-04)">AIFF</aes:format>
          <aes:audioDataEncoding>PCM</aes:audioDataEncoding>
          <aes:byteOrder>BIG_ENDIAN</aes:byteOrder>
          <aes:firstSampleOffset>98</aes:firstSampleOffset>
          <aes:use useType="OTHER" otherType="JHOVE_validation"/>
          <aes:primaryIdentifier identifierType="FILE_NAME">/home/user/.../aiff-untitled.aiff</aes:primaryIdentifier>
          <aes:face direction="NONE" ID="J3" audioObjectRef="J4" label="Face">
            <aes:timeline>
              <tcf:startTime tcf:frameCount="30" 
                             tcf:timeBase="1000" 
                             tcf:videoField="FIELD_1" 
                             tcf:countingMode="NTSC_NON_DROP_FRAME">
                <tcf:hours>0</tcf:hours>
                <tcf:minutes>0</tcf:minutes>
                <tcf:seconds>0</tcf:seconds>
                <tcf:frames>0</tcf:frames>
              </tcf:startTime>
            </aes:timeline>
            <aes:region ID="J2" formatRef="J1" faceRef="J3" label="BuiltByJHOVE">
              <aes:timeRange>
                <tcf:startTime tcf:frameCount="30" 
                               tcf:timeBase="1000" 
                               tcf:videoField="FIELD_1" 
                               tcf:countingMode="NTSC_NON_DROP_FRAME">
                  <tcf:hours>0</tcf:hours>
                  <tcf:minutes>0</tcf:minutes>
                  <tcf:seconds>0</tcf:seconds>
                  <tcf:frames>0</tcf:frames>
                </tcf:startTime>
              </aes:timeRange>
              <aes:numChannels>2</aes:numChannels>
              <aes:stream ID="J90" label="JHOVE" faceRegionRef="J2">
                <aes:channelAssignment channelNum="0" mapLocation="LEFT"/>
              </aes:stream>
              <aes:stream ID="J91" label="JHOVE" faceRegionRef="J2">
                <aes:channelAssignment channelNum="1" mapLocation="RIGHT"/>
              </aes:stream>
            </aes:region>
          </aes:face>
          <aes:formatList>
            <aes:formatRegion ID="J1">
              <aes:bitDepth>16</aes:bitDepth>
              <aes:sampleRate>44100</aes:sampleRate>
            </aes:formatRegion>
          </aes:formatList>
        </aes:audioObject>
      </value>
    </values>
  </property>

Fixed-up (with changes needed to validate correctly):

  <property>
    <name>AESAudioMetadata</name>
    <values arity="Scalar" type="AESAudioMetadata">
      <value>
        <aes:audioObject xmlns:aes="http://www.aes.org/audioObject" 
                         xmlns:tcf="http://www.aes.org/tcf" 
                         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
                         ID="J4" analogDigitalFlag="FILE_DIGITAL" 
                         disposition="Validated by JHOVE" 
                         schemaVersion="1.03b">
          <aes:format specificationVersion="1.3 (1989-01-04)">AIFF</aes:format>
          <aes:audioDataEncoding>PCM</aes:audioDataEncoding>
          <aes:byteOrder>BIG_ENDIAN</aes:byteOrder>
          <aes:firstSampleOffset>98</aes:firstSampleOffset>
          <aes:use useType="OTHER" otherType="JHOVE_validation"/>
          <aes:primaryIdentifier identifierType="FILE_NAME">/home/user/.../aiff-untitled.aiff</aes:primaryIdentifier>
          <aes:face direction="NONE" ID="J3" audioObjectRef="J4" label="Face">
            <aes:timeline>
              <tcf:startTime frameCount="30" 
                             timeBase="1000" 
                             videoField="FIELD_1" 
                             countingMode="NTSC_NON_DROP_FRAME">
                <tcf:hours>0</tcf:hours>
                <tcf:minutes>0</tcf:minutes>
                <tcf:seconds>0</tcf:seconds>
                <tcf:frames>0</tcf:frames>
                <tcf:samples sampleRate="48000">
                  <tcf:numberOfSamples>999999</tcf:numberOfSamples>
                </tcf:samples>
                <tcf:filmFraming xsi:type="tcf:palFilmFramingType" framing="NOT_APPLICABLE"/>
              </tcf:startTime>
            </aes:timeline>
            <aes:region ID="J2" formatRef="J1" faceRef="J3" label="BuiltByJHOVE">
              <aes:timeRange>
                <tcf:startTime frameCount="30" timeBase="1000" videoField="FIELD_1" countingMode="NTSC_NON_DROP_FRAME">
                  <tcf:hours>0</tcf:hours>
                  <tcf:minutes>0</tcf:minutes>
                  <tcf:seconds>0</tcf:seconds>
                  <tcf:frames>0</tcf:frames>
                  <tcf:samples sampleRate="S48000">
                    <tcf:numberOfSamples>999999</tcf:numberOfSamples>
                  </tcf:samples>
                  <tcf:filmFraming xsi:type="tcf:palFilmFramingType" framing="NOT_APPLICABLE"/>
                </tcf:startTime>
              </aes:timeRange>
              <aes:numChannels>2</aes:numChannels>
              <aes:stream ID="J90" label="JHOVE" faceRegionRef="J2">
                <aes:channelAssignment channelNum="0" mapLocation="LEFT"/>
              </aes:stream>
              <aes:stream ID="J91" label="JHOVE" faceRegionRef="J2">
                <aes:channelAssignment channelNum="1" mapLocation="RIGHT"/>
              </aes:stream>
            </aes:region>
          </aes:face>
          <aes:formatList>
            <aes:formatRegion label="LABEL" ownerRef="J1" ID="J1">
              <aes:bitDepth>16</aes:bitDepth>
              <aes:sampleRate>44100</aes:sampleRate>
            </aes:formatRegion>
          </aes:formatList>
        </aes:audioObject>
      </value>
    </values>
  </property>

The primary changes, are the way the namespaces are referenced on attributes (I don't know if there is another way to use them like in the original JHove output but the validator complained). And then there are additional sequence requirements in the 1.03b schema (filmFraming and samples are two such examples):

    <aes:timeline>
      <tcf:startTime frameCount="30" 
                     timeBase="1000" 
                     videoField="FIELD_1" 
                     countingMode="NTSC_NON_DROP_FRAME">
        <tcf:hours>0</tcf:hours>
        <tcf:minutes>0</tcf:minutes>
        <tcf:seconds>0</tcf:seconds>
        <tcf:frames>0</tcf:frames>
        <tcf:samples sampleRate="48000">
          <tcf:numberOfSamples>999999</tcf:numberOfSamples>
        </tcf:samples>
        <tcf:filmFraming xsi:type="tcf:palFilmFramingType" framing="NOT_APPLICABLE"/>
      </tcf:startTime>
    </aes:timeline>

NB. Some of these are just placeholder values. They're unlikely to be accurate.

Additional changes are needed to the audioOutput schema as well where it uses an xlink:simpleLink type, where the w3c didn't maintain compatibility with previous specifications and now simpleLink is simpleAttr.

Ref: https://www.spatineo.com/ogc-w3c-xlink-transition-a-potential-validity-breaker/

<xsd:complexType name="locStringType">
    <xsd:attributeGroup ref="xlink:simpleLink" />
</xsd:complexType>

Becomes:

<xsd:complexType name="locStringType">
    <xsd:attributeGroup ref="xlink:simpleAttr" />
</xsd:complexType>

Once all of these changes are made, we can get the JHOVE output to validate and validate against external schemas as well.

I've attached an original and modified version of the XML below:

original-and-modified-jhovexml.zip

And I've two sample files to generate this output. I'm happy to add these to the OPF Format Corpus sometime in the next few weeks.

audio-samples.zip

Questions

  1. There's a bit of fixing-up to do here to make the use of the inline AES XML format. Including some validation issues with the AES schema itself, and its general availability. Do we want to continue with this output in these modules? (AIFF and WAV?)
  2. Is there a more simple rendering of audio output data that can be embedded in JHove output instead? e.g. LoC AudioMD (which looks to be solving some of these problems? something else?
  3. The AES format still does not look to be publicly available. The FCLA versions (now also attached to this issue) seem to be the only versions that are easy to get hold of (and those took some tracking down!)
  4. Is it worth enabling some form of strict validation rule in the JHOVE schema so that there is this additional line of testing around its output?
  5. I think that's about it!! Though, as it looks like we do need to fix-up these two modules, I wonder if there is someone with good a/v skills to be able to audit the output of them to see what else can be added and what a correct output might look-like? That might also answer 1. and the future of the AES schema too? (The use of NTSC_NON_DROP_FRAME seems like a bit of a smell here for audio only?)

@carlwilson
Copy link
Member Author

Hi @ross-spencer, I now think that I've been here fairly recently from another direction, namely this PR: #357 which is open and I suspect it fixes some of this. I now remember I got pretty deep in the leadup to 1.22 and then bottled it. The unpublished schema rings a bell. Will add myself to assignation and link the PR.

@carlwilson carlwilson linked a pull request May 5, 2020 that will close this issue
@carlwilson carlwilson self-assigned this May 5, 2020
@carlwilson carlwilson added P2 Medium priority issues to be scheduled in a future release and removed P3 Low priority bugs legacy Legacy bugs that require testing to establish status labels May 5, 2020
@ross-spencer
Copy link

Ah!! Okay, A brief glance the PR definitely looks to clean the logic up a bit. It'll be interesting to compare the output.

@MartinSpeller
Copy link

@carlwilson carlwilson modified the milestones: Hackathon tasks , OPF Hackathon 2023 Tasks Jun 21, 2023
@carlwilson carlwilson removed this from the OPF Hackathon 2023 Tasks milestone Mar 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug A product defect that needs fixing P2 Medium priority issues to be scheduled in a future release
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants