Skip to content
This repository has been archived by the owner on Mar 26, 2021. It is now read-only.

Support loading data at runtime, or by configuring a different location for a preinstalled version #1

Closed
jellybob opened this issue Mar 24, 2021 · 84 comments

Comments

@jellybob
Copy link
Owner

jellybob commented Mar 24, 2021

No license lawyering please - unless you're someone able to speak on behalf of freedesktop.org your interpretation of the GPL isn't going to add anything to this conversation

See rails/rails#41750 and mimemagicrb#97 for background.

In order to cause minimal impact on existing users of the mimemagic gem, particularly people using Rails, I'm going to have it load MIME types from a preinstalled version of the Freedesktop MIME types database, rather than bundling one with the gem. This will require having a copy of that either installed with your distribution, or obtained in some other way. The availability of that will be checked at build time.

@minad
Copy link

minad commented Mar 24, 2021

Maybe you want to synchronize with @coding-bunny, who attempts to replace mimemagic with another gem. This could also be a viable approach?

@jellybob
Copy link
Owner Author

I honestly see the alternative of using a different gem as a non-starter, since that gem only supports matching on file extension.

@Scharrels
Copy link

There seems to be a parallel effort over here: https://github.com/Deradon/mimemagic/tree/fetch-mine-data-dynamically

It might be useful to combine these efforts.

@stevenhaddox
Copy link

stevenhaddox commented Mar 24, 2021

The license file here seems to be GPLv2 which is a large part of the problem many have with the newest release of the original gem from my understanding.

Is there any intent to find an older version to start this fork from that has a prior license option or will this fork require GPLv2?

EDIT: This comment was made before I realized the yank of the older versions occurred due to files that were being used in the older version in a way that likely violated the license those files were released under. Please disregard.

@minad
Copy link

minad commented Mar 24, 2021

From my side it is okay to take the newest commit and revert back to MIT. But the tables.rb and freedesktop.org.xml must not be distributed as part of the gem.

@jellybob
Copy link
Owner Author

Digging into this further it seems that the XML file isn't the source of truth for this gem, but instead a Ruby class generated from that file, so there are in fact a few steps:

  1. Locate an XML source file, either pulled remotely or via environment variable.
  2. Build lib/mimemagic/tables.rb somewhere. (Possibly it would work to pipe it all through eval, but that makes me uncomfortable)
  3. Load lib/mimemagic/tables.rb

The parallel effort going on appears to just pull down the XML file at build time, rather than having it in source control, but the final gem will still include that file and therefore remain GPL licensed.

@jellybob
Copy link
Owner Author

@minad I'm seeing 4 test failures on master currently - are they expected?

$ rake test
Run options: --seed 26532

# Running:

.F...FF....F.

Finished in 0.173703s, 74.8404 runs/s, 299.3616 assertions/s.

  1) Failure:
TestMimeMagic#test_recognize_by_magic [/Users/jonwood/Code/mimemagic/test/mimemagic_test.rb:83]:
--- expected
+++ actual
@@ -1 +1,2 @@
-"application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"
+# encoding: ASCII-8BIT
+"application/zip"


  2) Failure:
TestMimeMagic#test_recognize_all_by_magic [/Users/jonwood/Code/mimemagic/test/mimemagic_test.rb:92]:
--- expected
+++ actual
@@ -1 +1 @@
-["application/vnd.openxmlformats-officedocument.spreadsheetml.sheet", "application/zip"]
+["application/zip"]


  3) Failure:
TestMimeMagic#test_recognize_extensions [/Users/jonwood/Code/mimemagic/test/mimemagic_test.rb:55]:
--- expected
+++ actual
@@ -1 +1,2 @@
-"text/html"
+# encoding: ASCII-8BIT
+"application/xhtml+xml"


  4) Failure:
TestMimeMagic#test_recognize_by_a_path [/Users/jonwood/Code/mimemagic/test/mimemagic_test.rb:64]:
--- expected
+++ actual
@@ -1 +1,2 @@
-"text/html"
+# encoding: ASCII-8BIT
+"application/xhtml+xml"


13 runs, 52 assertions, 4 failures, 0 errors, 0 skips
rake aborted!
Command failed with status (1)
/Library/Ruby/Gems/2.6.0/gems/rake-12.3.2/exe/rake:27:in `<top (required)>'
Tasks: TOP => test
(See full trace by running task with --trace)

@khalilovcmd
Copy link

khalilovcmd commented Mar 24, 2021

application/vnd.openxmlformats-officedocument.spreadsheetml.sheet

I can at least comment on this having a history of monkey patching and instability.

mimemagicrb#39
mimemagicrb#86

@minad
Copy link

minad commented Mar 24, 2021

@jellybob You may also consider distributing a GPL-licensed package including freedesktop.xml+tables.rb+LICENSE. Then the gem could pull that at runtime, offloading the generation process somewhere else. These failures are probably due to using a newer database version and encoding changes in ruby.

@tenderlove
Copy link

I don't want to have too many cooks in the kitchen, but couldn't we add an extconf.rb to the gem? It could download the xml file and generate the rb file when the gem is installed on target systems. No GPL code or files would be distributed with the gem.

@jellybob
Copy link
Owner Author

I don't think that really helps anything, as you're still pulling in a GPL dependency. I'm honestly a little bit dubious that pulling in the Freedesktop XML at runtime does much for strict license compliance as well - I'm attempting to get in touch with the maintainer of that file to ensure this approach does in fact result in a compliant gem.

@jellybob
Copy link
Owner Author

Just to revise that statement, I'm 90% confident that using a pre-existing install of the file is going to be safe, as there's no distribution or attempts to do an end run round GPL licensing involved in that, so I'll push on with that path. I'm less confident that including code that goes and downloads wouldn't be considered as against the spirit of the license.

@Deradon
Copy link

Deradon commented Mar 24, 2021

WDYT about this:

  • Build the tables.rb during runtime while installing the gem?

We could use pre_install_hooks when doing gem install.
Still, feels quite hacky tbh.

So the gem would not share any derived copies of GPL licenced work.

@coding-red-panda
Copy link

How would that work on platforms using this gem that don't have the required XML files installed?
Let's say a Windows / Macos System?

@Deradon
Copy link

Deradon commented Mar 24, 2021

I just thought about fetch the xml during gem install from https://gitlab.freedesktop.org/xdg/shared-mime-info/-/blob/2.1/data/freedesktop.org.xml.in.

UPDATE: More or less doing a rake tables during gem install based on my quick'n dirty PoC. (ofc rake tables would not work. Have to achieve this somehow different)

@jellybob
Copy link
Owner Author

Given the generation doesn't actually take very long I'm inclined to do this at runtime - the alternative requires a bunch of hackery in a pre install hook, and potentially confusing error messages during the install. People who are concerned about the amount of time it might take to download the source file at runtime can make sure the machine they're running on has the file available before hand.

@Deradon
Copy link

Deradon commented Mar 24, 2021

At runtime would be a showblocker for any1 where the rails application is running w/o access to the public internet.

(e.g. build docker image, which fetches data from public internet, then deploy it internally, where you don't have access)

@jellybob
Copy link
Owner Author

That's why there'll be the option to load a pre-existing version of the file via an environment variable.

@jesseclark
Copy link

WDYT about this:

* Build the `tables.rb` during runtime while installing the gem?

We could use pre_install_hooks when doing gem install.
Still, feels quite hacky tbh.

So the gem would not share any derived copies of GPL licenced work.

Seems like this approach still could trigger the GPL requirement. Here is a freedesktop contributor indicating that they believe the the GPL applies to the db/xml itself.

@jellybob
Copy link
Owner Author

Unless you're able to speak on behalf of freedesktop.org discussions of what the GPL does or does not require are just adding noise here. Please keep discussion to the actual implementation of the plan described.

@jesseclark
Copy link

Unless you're able to speak on behalf of freedesktop.org discussions of what the GPL does or does not require are just adding noise here. Please keep discussion to the actual implementation of the plan described.

I linked to a thread where contributors to freedesktop.org are discussing exactly the issues that are also being discussed in this thread. It seemed like relevant information to take into consideration for the "plan described". Just trying to be helpful but I'll not add anymore "noise" 👍🏼 .

@jellybob
Copy link
Owner Author

Apologies for the slightly rough tone there. Just to clarify I'm talking to the maintainers at the moment about what would be needed (if its at all possible) to be compliant with both the legal terms of the license, and more generally the spirit of that license.

@hadess
Copy link

hadess commented Mar 24, 2021

Unless you're able to speak on behalf of freedesktop.org discussions of what the GPL does or does not require are just adding noise here. Please keep discussion to the actual implementation of the plan described.

There isn't anyone that can do that. Individual contributors hold their own copyright. I'm just but one of a number of contributors to the module where the freedesktop.org.xml originates from.

1. Support for pulling down the freedesktop.org definitions at application startup. This will be the default behaviour.

2. Support for setting the environment variable `FREEDESKTOP_MIME_TYPES_PATH`, which will disable downloading at runtime, and instead load the data from a different location.

Both look good options to me, as long as the files are used as data files, and not use to create code from. Otherwise you should use the update-mime-database file to process the XML, to create cache files (those caches are part of the shared mime info specification).

@jellybob
Copy link
Owner Author

@hadess thanks for the reply. I believe the proposed solution falls within the usage you've described, so in lieu of anyone being able to definitively speak on this I'm going to go ahead and finish implementing this.

@jellybob
Copy link
Owner Author

For anyone following along at home, I think we can also skip over the whole thing of generating a Ruby class, and instead just parse the mime type database directly into constants. The generation of a source file is just an optimisation which doesn't really buy us anything if we're doing it all at runtime anyway.

@fooishbar
Copy link

Thanks for the pointer @jellybob. I'm one of the people who runs freedesktop.org, but @hadess is correct. fd.o doesn't hold copyright assignments - the copyright belongs to whoever authored the code - so any decisions on copyright and licensing, including enforcement, are at the total discretion of the authors. In this case, @hadess holds substantial copyright on shared-mime-info so was entitled to take the action, and even if I wanted to I can't tell him otherwise.

I agree with @hadess that your interpretation of the license is correct - with the caveat that I am not a lawyer. Transforming the GPLed XML definitions into Ruby certainly retains the GPL obligations and implications on that Ruby. Loading and parsing the data at runtime changes that situation in two ways:

  • you are using an abstract interface to the data files; you could just as easily pull a non-GPLed dataset and use that instead, so in this case the virality of the GPL is less applicable (this is heavily paraphrased and should not be taken as legal gospel), and
  • the obligations of the GPL are primarily invoked upon redistribution rather than use; executing code as you're proposing here is very different from distributing code as you were previously doing

Thanks so much for your co-operation and helpful attitude. We really appreciate it, especially given the additional burden it introduces on your users.

@wwahammy
Copy link

I only have one request: please make sure this doesn't break things where it's possible for those of us who can use GPL2+ code to be able to include the data in the Gem. I'd like to still use the original mimemagic because it's simpler. :)

@jsteinberg
Copy link

jsteinberg commented Mar 24, 2021

I don't want to have too many cooks in the kitchen, but couldn't we add an extconf.rb to the gem? It could download the xml file and generate the rb file when the gem is installed on target systems. No GPL code or files would be distributed with the gem.

Could we still use extconf.rb to download the xml just not generate a rb file?

Downloading at runtime seems inefficient and introduces an external dependency to application boot.

@fooishbar
Copy link

@jellybob I’m afraid we don’t have a canonical location for a post-processed file. To be honest I’m pretty glad we don’t, since we’re not behind a CDN or anything fancy ... there are a lot of Rails installs happening!

Perhaps you could serve the post-processed from a GH repo alongside this one? AIUI the only difference is translations.

@ziggythehamster
Copy link

Without having looked at that other PR (I'm trying to land one of my own for an unrelated and hopefully more fun thing), my suggestion would be to say something like:

Install your operating system's shared-mime-info package, or fetch the XML file from the Debian package:

  1. Visit https://packages.debian.org/sid/amd64/shared-mime-info/download and download the .deb file.
  2. Install the command-line version of 7-Zip for your platform (sometimes called p7zip)
  3. Run this command: 7z x -so shared-mime-info_2.0-1_amd64.deb data.tar | 7z e -sidata.tar './usr/share/mime/packages/freedesktop.org.xml'

I would suggest to use ar + tar but ar cannot output to stdout (what) and tar needs --strip-components=4 to not create a hierarchy in the current directory.

@ziggythehamster
Copy link

In fact, if you use double quotes for the path, it runs on Windows verbatim:

M:\>7z x -so shared-mime-info_2.0-1_amd64.deb data.tar | 7z e -sidata.tar "./usr/share/mime/packages/freedesktop.org.xml"

7-Zip 19.00 (x64) : Copyright (c) 1999-2018 Igor Pavlov : 2019-02-21


Extracting archive: data.tar
--
Path = data.tar
Type = tar
Code Page = UTF-8

Everything is Ok

Folders: 166
Files: 92
Size:       4773833
Compressed: 140800

M:\>dir *.xml
 Volume in drive M is rpool_HOME
 Volume Serial Number is BA16-F57C

 Directory of M:\

10/09/2020  10:26         2,341,534 freedesktop.org.xml
               1 File(s)      2,341,534 bytes
               0 Dir(s)   8,020,033,536 bytes free

So change the above to use double quotes and then the instructions work for every major OS.

@jellybob
Copy link
Owner Author

@ziggythehamster this is great, thank you, I've added instructions based on your exploration to the readme.

@jellybob
Copy link
Owner Author

#3 has now been merged into master here, I'm currently talking with @minad about getting it released.

@pezholio
Copy link

Amazing work. Thanks for all your work on this @jellybob and others. I'm sure I speak for all of the community when I say I really appreciate it! 👍

@Deradon
Copy link

Deradon commented Mar 25, 2021

As #3 has been merged already I'll copy my comment overhere:

When releasing this as a 0.3.x version will this possibly break on windows machines if you're creating a new rails project?
Rails still has a soft-dependency on 0.3.x.
So I'ld personally like to at least bump the minor version so the rails maintainers can explicitly decide to go w/ the approach here.
(I'd like to avoid another left-pad-situation where suddenly rails install fails for a lot of projects)

@jellybob
Copy link
Owner Author

Also copying my reply from that thread :)

Yes, it will potentially break creating a new Rails project on Windows, but that feels like a better option than quietly imposing GPL 2 licensing on every new Rails project which is the current situation. We're also looking at yanking 0.3.6 because of those licensing implications, so in practice whatever happens here new Rails projects on Windows are going to be broken to some degree.

Ultimately I don't think there's any way to avoid some degree of pain for users while still complying with license terms, and I consider abiding by those terms to take precedence for both legal and moral reasons.

@jellybob
Copy link
Owner Author

Mimemagic has been moved to its own org, so I'm also redirecting discussion of this to the repo over there, as shown above. Closing this one.

@Deradon
Copy link

Deradon commented Mar 25, 2021

@jellybob Just to let you know, can't comment yet on issues there.

An owner of this repository has limited the ability to comment to users that have contributed to this repository in the past.

Not sure if intended or not.

@jellybob jellybob reopened this Mar 25, 2021
@jellybob
Copy link
Owner Author

@Deradon opening this one back up for now - they are indeed locked down temporarily while we get everything moved over and a release out to try and keep the noise down.

@Deradon
Copy link

Deradon commented Mar 25, 2021

I'd highly suggest to let the rails maintainers know when you release this version and what implications this release has.
Would like to avoid, or at least mitigate, another left-pad-situation where suddenly bundle install does not work in a (new) rails project, for at least some users, and issues pile up in the rails/rails repo.

@jellybob
Copy link
Owner Author

Yup, we will be doing that.

@jellybob
Copy link
Owner Author

Closing this issue now as 0.3.7 has been released.

@ljharb
Copy link

ljharb commented Mar 25, 2021

@jellybob just so it’s explicit, can you (and ideally, @hadess) confirm that to the best of your knowledge, v0.3.7 is properly MIT-licensed?

@jellybob
Copy link
Owner Author

This has been discussed with both @hadess and @fooishbar, and yes, to the best of knowledge of all of us 0.3.7 is legitimately licensed as MIT since we no longer distribute any GPL licensed data with this gem.

@hadess
Copy link

hadess commented Mar 25, 2021

@jellybob just so it’s explicit, can you (and ideally, @hadess) confirm that to the best of your knowledge, v0.3.7 is properly MIT-licensed?

Sorry, but I'm not going to do that. You should ask a lawyer.

I don't intend to file a DMCA takedown request against the repo at this point though ;)

@ljharb
Copy link

ljharb commented Mar 25, 2021

I indeed have done so :-) but thanks, that’s sufficient for a public answer here.

@jellybob
Copy link
Owner Author

@ljharb I'd be really curious to hear the outcome of that if you're happy to share it, either publicly or more privately.

@fooishbar
Copy link

Same from the fd.o side; my email is daniel@fooishbar.org and Bastien's is pretty easily findable as well. It would be really good to understand what you've gleaned from this. We've worked with SFC before and they've always been extremely sensible.

@ipepe
Copy link

ipepe commented Mar 25, 2021

@hadess I wanted to ask about installing dependencies: https://github.com/mimemagicrb/mimemagic#dependencies. As far as I understand, installing those on production environment to use in my project forces my project's source code to be licensed under GPL? Basically shifting the licensing issue from mimemagic gem to me as author of my project?

@rubyFeedback
Copy link

installing those on production environment to use in my project forces my project's source code to be
licensed under GPL

I heavily doubt that. Look at the linux kernel running via GPLv2 in proprietary environments. IMO there
are comments here on the issue tracker that can not possibly be correct, but it probably distracts too
much from the main issue at hand to discuss that here. It would be nice to read a post-mortem analysis
at a later point, though, simply because I am pretty certain that other projects may be in a somewhat
similar situation licence-wise.

@pboling
Copy link

pboling commented Mar 25, 2021

@ipepe
IANAL - A huge portion of the GNU stack is GPL licensed, and a large chunk of that is installed on millions of machines running corporate software.

The license cares about "distribution". You can install whatever GPL'd code you want on the machine and use the tools together. As long as this gem, and your project, are not "distributed" with GPL-licensed code inside them, then it is safe.

@base10
Copy link

base10 commented Mar 25, 2021

@jellybob Thank you for taking point on this and working through an acceptable solution with @hadess.

@brettwgreen
Copy link

Seems to me the previously bundled version of the Freedesktop had some handling for docx files that is no longer handled with the standalone installer... I had to patch in some mimetypes in init of my rails app to get the same behavior as before.

Used the init found here for what it's worth.

mimemagicrb#39 (comment)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests