Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generic Interfaces #3

Open
wsc1 opened this issue Aug 6, 2018 · 21 comments
Open

Generic Interfaces #3

wsc1 opened this issue Aug 6, 2018 · 21 comments
Labels
seeking opinions up for technical discussion

Comments

@wsc1
Copy link
Member

wsc1 commented Aug 6, 2018

We need to settle in on generic codec interfaces.

In particular, do we want to do golang/image style compile time plugins or something more dynamic? Both?

Also, we need to iron out random access.

@wsc1 wsc1 added the seeking opinions up for technical discussion label Aug 6, 2018
@wsc1
Copy link
Member Author

wsc1 commented Aug 7, 2018

For completeness, I am repeating part of the dialog with @mewmew from his pull request:
"""
The generic level of the codec package is just a sketch at this point, so I
have no authoritative answers and invite dialog at a higher level to start
off.

The main questions to resolve for it are:

  • compile time plugin, runtime plugin, or both?
  • do we provide wrappers to OS level support?

If we provide runtime or OS wrappers, then the list of codecs available
would be higher, but non uniform across users of the codec package.

The non-uniformity would in turn mean that we'd have to be able to identify
the codec in use to debug programs, and also need to have a way to decide
what codec to use when there is more than one option.

The idea I had in mind with Manufacturer was to help identify the codec, ie
who provides it.

The idea I had in mind with Extensions is filename extensions.

I see you are an author of a Go FLAC package, which is great! Do you have
any thoughts about this? How would you like to see github.com/mewkiz/flac
interfacing with a generic codec package?
"""

@mewmew
Copy link
Member

mewmew commented Aug 8, 2018

I see you are an author of a Go FLAC package, which is great! Do you have
any thoughts about this? How would you like to see github.com/mewkiz/flac
interfacing with a generic codec package?

I have a proof of concept decoder ready to be reviewed for FLAC :) I did not really make use of the generic codec interface though. What are the benefits of doing so? I.e. what are the intended consumers of this interface?

Note: The proof of concept flac zikichombo frontend contains some bug which makes the playback of music insert some incorrect samples once every second or something. This is probably due to an off-by-one error or an incorrect calculation of the stride between channels in the dst slide of Receive.

The idea I had in mind with Manufacturer was to help identify the codec, ie
who provides it.

To be honest, I dislike this idea with a passion. It feels like the way FLAC tried to do with vendor IDs, and its a mess to have a central register around it, both in maintenance and prevents open source repos to easily provide implementations as you have to register the ID to avoid collisions.

To communicate the feeling better, imagine trying to use a library in C, as compared to Go. In Go the package has a fully qualified import path with by convention contains the domain name for packages released to the public. In contrast, in C you have to make sure that library names don't collide, which is more or less impossible in the every dynamic nature of open source development. To handle collisions, Go relies on DNS which is already distributed in nature and handle collision by unique domain name registration.

That being said, I don't know exactly what the solution may look like. Only that I would veto having a Manufature/vendor ID that requires a global register. Haha, given that I have any veto power in this issue of course :p

mewmew added a commit to mewpull/codec that referenced this issue Aug 8, 2018
As noted in zikichombo#3 (comment)
there are some bugs to be ironed out.
@mewmew
Copy link
Member

mewmew commented Aug 8, 2018

My brother and I developed the flac library together. Inviting him to the discussion :) cc: @karlek

@wsc1
Copy link
Member Author

wsc1 commented Aug 8, 2018

Great! Thanks

First, I guess we can add int32 versions of ToFixed/etc in sound/sample.

What are the benefits of doing so? I.e. what are the intended consumers of this interface?

Second, for the purpose of the generic interface: The main consumer is a developer working on any aspect of sound not related to encodings: playback, music information retrieval, voip, etc. By proxy, their end users are also consumers. Many of these usages would like to treat the codecs as a black box, at least as much as possible. Developers would like to be able to respond to the question of what codecs they support. The generic codec package is intended to allow a developer to just say "we support [...]" with [...] being the list of codecs supported here.

As you have observed, the generic codec interface is already halfway done: just implement sound.{Source,Sink,RadomAccess} and it plugs in nicely, without even needing to reference zikichombo.org . thanks to Go's interfaces. So one option is to do just that.
[update/correction: there is zikichombo.org/sound/freq.T referenced in the interface, but that is the only thing to reference, so it is at least very light weight]

The umbrella-level part is about

  • mapping filename (and url) extensions and requests for read, write, ARA (=audio random access) to functioning code.
  • providing lists of codecs with capabilities so codecs and their implementations can be chosen when needed. Doing this at runtime would enable developers to provide this functionality to their users.
  • trying to make things consistent between the different use cases (source,sink,ara) for a given codec.
    Although in principle codecs are transparent to their implementations, in practice they simply are not.
    So there's always more risk of incompatability when using different implementations for say encoding vs decoding or when someone wants a different bitrate for example.
  • trying to maximise portability across platforms for the functionality/codecs that are out there.
  • The functionality may also grow to include meta-data.

In order to at once enable more use of existing Go codec implementations and guarantee IP and quality, zikichombo vets upstream suppliers of codecs for inclusion in the codebase. In order to to have import "xyz" with "xyz" not in the standard library or based at zikichombo.org. we need an agreement on release guidelines and guarantees of upstream suppliers, so that we know how to incorporate the code safely for our users, and so that users can know that all dependencies of zikichombo.org code are license compatible. This aspect is not so much related to "manufacturer" in the API, which is intended more for the runtime codec selection mentioned above.

We do not want to prevent or discourage distributed open source development, but as we have no choice for what happens when someone does "go get zikichombo.org", we do want to be able to control for quality of the result (as a matter of process, even though we are still in alpha), just as you want to control for what dependencies you use in your flac decoder. Just implementing the Source/Sink/RandomAccess interfaces is one option for distributed development. Maybe we can do something similar for filename extensions as well.

That said, I think it would be a great demonstration of cooperation and very useful if users could find a functional, reliable list of supported codecs for doing sound in Go in one place. Zikichombo can provide that, and it would be great to add FLAC to the list.

@mewmew
Copy link
Member

mewmew commented Aug 9, 2018

As you have observed, the generic codec interface is already halfway done: just implement sound.{Source,Sink,RadomAccess} and it plugs in nicely, without even needing to reference zikichombo.org. thanks to Go's interfaces. So one option is to do just that.

I like this design, a lot.

Maybe we can do something similar for filename extensions as well.

For file name extensions, we could follow the example set by the image library. Something along the lines of image.RegisterFormat.

Note, this is the way Azul3D incorporated audio decoding, and it seemed to work quite well. Have a look at audio.RegisterFormat if you haven't already for some inspiration.

That said, I think it would be a great demonstration of cooperation and very useful if users could find a functional, reliable list of supported codecs for doing sound in Go in one place. Zikichombo can provide that, and it would be great to add FLAC to the list.

Definitely!

@wsc1
Copy link
Member Author

wsc1 commented Aug 9, 2018

Thanks for the pointers.

I think we have more functionality to address than Azul3D's audio.RegisterFormat, most importantly, encoding and random access. In audio, my impression is that there is also more heavy usage and reliance on metadata than images, which can also be read/write.

However, I also think it would be best to address decoding+encoding+random access first.

It looks like Azul3D's "magic" let's codecs look at the first few bytes of data to say whether or not they volunteer to the task. It doesn't look documented exactly how the codec implementation is selected.

Also, I never understood why the Go standard library image package didn't allow to register encoders.

So my thoughts after reading your comments are to do something like RegisterFormat, but for encoding, decoding, and random access, and to document the selection mechanism so that somehow it can either be controlled directly, or predicted by developers in such a way that they can manipulate what codec actually gets used.

Thoughts?

@mewmew
Copy link
Member

mewmew commented Aug 10, 2018

random access

Just to clarify, when you talk about random access, do you simply mean seek support? Just checking so I interpret it correctly.

It looks like Azul3D's "magic" let's codecs look at the first few bytes of data to say whether or not they volunteer to the task. It doesn't look documented exactly how the codec implementation is selected.

I think this is mostly implemented in the same fashion as the Go image library. Peek on a few bytes to determine file format by magic values. If there are more than one codec for a specific file format, then the first file format to register the magic value will get precedence. At least this is how it works in the Go image library:

https://golang.org/src/image/format.go?s=1005:1129#L63

Also, I never understood why the Go standard library image package didn't allow to register encoders.

Good point, it only allows to register decoders. My guess is that decoders need less options specified, whereas encoders can be very specific and thus providing a general interface may be bloated and adding too many layers of abstraction. Just a guess though. For instance, if you want to encode an animation (e.g. GIF), you have to provide option to do so in the encoder; but many of the other formats don't have support for animations.

Similar comparisons can probably be made for sound codecs. That being said, it would perhaps make sense to provide a general encoding interface, which selects "good defaults" for each codec, and let users use the specific encoders directly if they require more control of the encoding options. I do not think it should be a goal to support everything in the general interface. This can also be a guiding principle when it comes to metadata support. Only let the general interface support the most basic metadata (e.g. key-value pairs), and use specific encoders/decoders for more exotic metadata.

Well, these are only preliminary thoughts and they may change as we explore the problem and solution space more.

mewmew added a commit to mewpull/codec that referenced this issue Aug 10, 2018
As noted in zikichombo#3 (comment)
there are some bugs to be ironed out.
@wsc1
Copy link
Member Author

wsc1 commented Aug 10, 2018

Thanks for your thoughts.

First, for random access, the sound.RandomAccess interface is what I had in mind; not
ARA, sorry for any confusion.

Second, after reading your thoughts, I agree that limiting the functionality to a bare minimum at the registration level is desirable, and let direct interaction with individual codecs deal with everything else.

For me, bare minimum would however include functionality for all of sound.{Source,Sink,RandomAccess}. Obviously RandomAccess would either be unavailable or of limited efficiency for most compressed formats. Unfortunately, there are more pure Go decoders than encoders.

I would go ahead and fork a bare bones proposal, but I'm still thinking about if/how easy control of selection can be done, hopefully obviating the need for the developer to deal with package initialisation order.

@wsc1
Copy link
Member Author

wsc1 commented Aug 14, 2018

For info, I have created a "bare bones" pull request at #11

Upon coding it, it occurred to me that the role of the sample.Codec in the interface is important. Some codecs implicitly require or rely on the notion of a sample.Codec, and others, mostly related to perceptual based compression do not really have a notion of sample.Codec as the bitDepth varies during the encoding/decoding process and according to frequency range.

This is handled in the proposed generic interface, and was not until now part of the discussion.

The import comment was removed just so I could compile in a wsc1 fork, it is not part of the proposal.

Thoughts appreciated.

@mewmew
Copy link
Member

mewmew commented Aug 14, 2018

To keep discussion mostly in one place, I'll add the comment from #11 (review)

@wsc1 Looks like a good start. I've added some comments for minor details.

Regarding the overall API, we can only determine if it is good once we have lets say two or three encoders and decoders for various formats (especially for formats which are not similar to each other, e.g. varying sample bit rate in the same sound file).

For this reason, I think that the API should be marked experimental, and only finalized once the above condition holds true. At which point we can evaluate if the API is a good match for the various different encoders/decoders.

Cheers,
/u

@wsc1
Copy link
Member Author

wsc1 commented Aug 14, 2018

Thanks. I'm glad you find it to be a good start.

To summarise your more detailed comments that I feel can be continued here, there are:

  1. Sniff() vs Magic
  • Sniff() would make the overall interface larger.
  1. sample.SInt8 vs sample.Byte
  • It is just signed/unsigned 2-s complement representation. In the unsigned case,
    0b10000000 is at what zero is in the signed case.
  1. sample.SCodec zero value as codec.AnySampleCodec.
  • Perhaps, as it would entail inter-module commits, I had decided not.
  1. Clarifying when AnySampleCodec should be used for a Decoder.
    I think you and I are trying to say the same thing, but your tentative interpretation
    was a bit more precise, thanks.

About marking experimental, there is project maintenance overhead associated with making this distinction explicit rather than implicit via the fact it is in alpha and semver. I have addressed
this question of which parts of the interface are most likely to change previously elsewhere on the project by using tags on issues "interface changing".

As this issue is referenced as a design discussion, I think anyone following would conclude it is experimental.

Given all this, could you make more precise what you would like to see in terms of marking it experimental?

@wsc1
Copy link
Member Author

wsc1 commented Aug 14, 2018

It also occurred to me that it may be worth adding a function member of Codec which
allows creating a sound.SourceSeeker, for players or player servers. But then again, maybe players would just try to type-assert the result.

I can't think of a corresponding use case for sound.SinkSeeker

Thoughts?

@wsc1
Copy link
Member Author

wsc1 commented Aug 15, 2018

As an update, I simplified the codec selection mechanism so there is no sense of codec-implementor priority, just package path. It now uses reflect and the package path of the encoding/decoding functions of a codec. It

  1. Enforces that all these functions be supplied in one package.
  2. Allows a consumer of the codec package to select codec implmentations by filtering on their package path if so desired.

Still working on integrating existing codecs.

@mewmew
Copy link
Member

mewmew commented Aug 15, 2018

About marking experimental, there is project maintenance overhead associated with making this distinction explicit rather than implicit via the fact it is in alpha and semver. I have addressed
this question of which parts of the interface are most likely to change previously elsewhere on the project by using tags on issues "interface changing".

As this issue is referenced as a design discussion, I think anyone following would conclude it is experimental.

Sounds good to me. Let sem ver cover this.

@wsc1
Copy link
Member Author

wsc1 commented Aug 18, 2018

New PR #12

Looking for community opinions.

@wsc1
Copy link
Member Author

wsc1 commented Aug 21, 2018

Proposal: rename Codec to Godec

just an idea.

@mewmew
Copy link
Member

mewmew commented Aug 21, 2018

Proposal: rename Codec to Godec

Why? Codec makes more sense to me at least.

@wsc1
Copy link
Member Author

wsc1 commented Aug 21, 2018

maybe it's too simple to merit being called a codec :)

@mewmew
Copy link
Member

mewmew commented Aug 22, 2018

maybe it's too simple to merit being called a codec :)

Ah. Well package encoding only provides interfaces for encoding implementations, but does not implement an encoding. In that sense, codec does the same.

@wsc1
Copy link
Member Author

wsc1 commented Aug 22, 2018

Yes, my feeling is that the main difference with sound is that sound, to me, is inherently flowing and not so easily capturable by an object/type per se. Not only in that it is stream-like but also in the physical/perceptual sense -- a block of samples is not really what a sound is in this sense because, for example, you will hear a click when it starts or stops unless it has fade in/out.

So encoding to blocks of bytes, as in the encoding package, feels awkward to me as a high level functionality to provide for sound.

mewmew added a commit to mewpull/ext that referenced this issue Sep 3, 2018
As noted in zikichombo/codec#3 (comment)
there are some bugs to be ironed out.
@wsc1
Copy link
Member Author

wsc1 commented Sep 7, 2018

I think the package selection mechanism would be better (and this one may be buggy because of pointers and types) if done like in sio, where host.Entry is an interface, and reflect.TypeOf gives the implementing type.

Should be fixed in this PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
seeking opinions up for technical discussion
Projects
None yet
Development

No branches or pull requests

2 participants