Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move Statistics stdlib module to external repository #33399

Merged
merged 5 commits into from
Nov 25, 2019
Merged

Conversation

nalimilan
Copy link
Member

The goal is to be able to import features from StatsBase (notably #31395) and keep history without messing too much with the Julia history. And of course, once stdlib modules can be updated, to allow using new features from old Julia versions.

I've tried to follow what is done for Pkg.jl, I hope I got everything right.

See JuliaStats/Statistics.jl#1 for how the files and history were extracted from the Julia repo.

@nalimilan nalimilan added stdlib Julia's standard library domain:statistics The Statistics stdlib module labels Sep 27, 2019
Copy link
Member

@c42f c42f left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to see that the hacks I made to generalize STDLIBS_EXT (quite some time ago now) actually work :-D The technical detail of this looks fine to me.

One query - is there consensus that this stdlib should live in JuliaStats? (Sounds reasonable to me; just wondering whether it's been bikeshedded etc.)

stdlib/Statistics.version Outdated Show resolved Hide resolved
@nalimilan
Copy link
Member Author

One query - is there consensus that this stdlib should live in JuliaStats? (Sounds reasonable to me; just wondering whether it's been bikeshedded etc.)

I asked on Slack and nobody commented except @ararslan, so this PR reflects the consensus of 2 people. :-)

@c42f
Copy link
Member

c42f commented Oct 5, 2019

so this PR reflects the consensus of 2 people

Heh :-)

FWIW I was expecting to throw Logging into JuliaLang/Logging.jl at some stage but it doesn't have an obvious alternative home.

What are the pros and cons of having stdlibs in JuliaLang vs elsewhere? It seems mainly about management of permissions and possibly access to infrastructure? For example, does it make sense for stdlibs to eventually have access to the core benchmarking infrastructure and CI runners? @staticfloat any thoughts?

@ararslan
Copy link
Member

ararslan commented Oct 5, 2019

Where the repository actually lives doesn't really matter; by virtue of it being a stdlib, it's pulled in and bundled with Julia when Julia is built.

For testing, that means that the tests for stdlibs are always run on the buildbot infrastructure, since a build of Julia contains all stdlibs, unless tests are explicitly disabled. (That was done recently for Pkg.)

As for benchmarking, Nanosoldier's code and infrastructure is set up only to run on this repo. It does a full build of Julia so it will end up pulling in stdlibs anyway, which can continue to be benchmarked via BaseBenchmarks, just like the ones that currently live in this repo are.

@c42f
Copy link
Member

c42f commented Oct 5, 2019

Where the repository actually lives doesn't really matter; by virtue of it being a stdlib, it's pulled in and bundled with Julia when Julia is built.

That's fine for releases, but not really any good for development in the longer term. The stdlibs need their own CI just like any other package.

@ararslan
Copy link
Member

ararslan commented Oct 5, 2019

Pkg seems to be getting by with Travis and AppVeyor.

@nalimilan
Copy link
Member Author

Yeah I don't think where the package lives matters a lot. We can even move it very easily without breaking anything. If Julia developers prefer all stdlib packages to live in JuliaLang, that's also fine with me.

@ViralBShah
Copy link
Member

I think having the stdlibs be in JuliaLang but in separate reps would be good. I would love to refactor out linalg and sparse similarly.

@c42f
Copy link
Member

c42f commented Oct 9, 2019

I'll admit I've also got a bias to have the stdlib repos in JuliaLang, but I'm not sure it's founded.

That's why I'm asking whether there's any substantive difference about where these live. I think it ultimately comes down to who has default commit access. There's two different needs here:

  • Timely and non-disruptive maintenance fixes, as related to releases of the standard Julia distribution (Base + stdlibs). The language itself will likely depend even more intimately on some stdlibs in the future. For example, CSTParser or whatever it evolves into may become the official parser. From this perspective it makes sense to put stdlibs in JuliaLang so that core contributors automatically have access. The more detached the stdlibs become, the more they end up like other external dependencies which we currently carry patches for.
  • Community-driven development: The community outside the core language contributors is super important in knowing what is needed, how to build it and in driving that development forward. The community needs to be empowered to build what is needed or the stdlibs will stagnate. From this perspective it makes sense to embed stdlibs in the part of the community which knows them best (Statistics going into JuliaStats being a good example).

With everything being open source and the ability to add external contributors to individual repos there may not be much practical difference here. There might be a symbolic difference though :-)

@ararslan
Copy link
Member

ararslan commented Oct 9, 2019

JuliaStats seems like a natural home for it based on the expected contributors and audience, but I also don't care that much, so I'm fine with whatever.

@staticfloat
Copy link
Sponsor Member

staticfloat commented Oct 9, 2019

I think the tension here is between domain experts (e.g. contributors to JuliaStatistics) and core language folks/release managers (e.g. Alex, myself, etc....) having access to "get things done". As long as both groups have the ability to do things like tag a release (this is not very important now, but may become more important once we further decouple stdlibs from the Julia build process) I don't think there are any problems with hosting this in a separate org like JuliaStatistics.

@ViralBShah
Copy link
Member

I myself have the same tension. I'd like a bunch of stuff in JuliaSparse to have that community maintain these packages. But so long as it is an stdlib, I like having it all in one org since there are other non-sparse matrix maintenance tasks that get carried out by others. Is security a concern? We have a managed group of contributors on JuliaLang, with 2FA enforced and so on - but other orgs might not do the same.

Copy link
Member

@c42f c42f left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. FWIW I restarted CI on windows though it would seem failures could hardly be related.

@nalimilan
Copy link
Member Author

Thanks. I've added checksums which are needed AFAICT.

@nalimilan
Copy link
Member Author

nalimilan commented Oct 22, 2019

Now that's funny. Before the last commit I got this error on 32-bit Linux and 64-bit MacOS:

/buildworker/worker/package_linux32/build/deps/tools/jlchecksum /tmp/srccache/Statistics-da6057baf849cbc803b952ef7adf979ae3a9f9d2.tar.gz
===============================================================================
  ERROR: sha512 checksum failure on Statistics-da6057baf849cbc803b952ef7adf979ae3a9f9d2.tar.gz, should be:
      982a1c3e0ee4d6dc5935c1db6d3e6dd67595cd95d867d6a0bf9bb88300b97a2f
      8aa7e91bf4b9461570a0b7a17f0f80006e8af4a1fd65c0529a93fab29a0760be
  But `sha512sum /tmp/srccache/Statistics-da6057baf849cbc803b952ef7adf979ae3a9f9d2.tar.gz | awk '{ print $1; }'` results in:
      e3c490ef7f49b6d01a26406cac128a35d51ef5c66b470a3cb3bf924662959525
      780cfe1cc463181c4ee4b698c40eaf13e79f62c6a2fe31acb1d30cb6c2dbcc2f
  This can happen due to bad downloads or network proxies, please check your
  network proxy/firewall settings and delete
  /tmp/srccache/Statistics-da6057baf849cbc803b952ef7adf979ae3a9f9d2.tar.gz
  to force a redownload when you are ready
===============================================================================
Makefile:27: recipe for target 'Statistics-da6057baf849cbc803b952ef7adf979ae3a9f9d2/source-extracted' failed

And now with the last commit I get this error on 64-bit Linux and 64-bit Windows:

/buildworker/worker/package_linux64/build/deps/tools/jlchecksum /tmp/srccache/Statistics-da6057baf849cbc803b952ef7adf979ae3a9f9d2.tar.gz
===============================================================================
  ERROR: sha512 checksum failure on Statistics-da6057baf849cbc803b952ef7adf979ae3a9f9d2.tar.gz, should be:
      e3c490ef7f49b6d01a26406cac128a35d51ef5c66b470a3cb3bf924662959525
      780cfe1cc463181c4ee4b698c40eaf13e79f62c6a2fe31acb1d30cb6c2dbcc2f
  But `sha512sum /tmp/srccache/Statistics-da6057baf849cbc803b952ef7adf979ae3a9f9d2.tar.gz | awk '{ print $1; }'` results in:
      982a1c3e0ee4d6dc5935c1db6d3e6dd67595cd95d867d6a0bf9bb88300b97a2f
      8aa7e91bf4b9461570a0b7a17f0f80006e8af4a1fd65c0529a93fab29a0760be
  This can happen due to bad downloads or network proxies, please check your
  network proxy/firewall settings and delete
  /tmp/srccache/Statistics-da6057baf849cbc803b952ef7adf979ae3a9f9d2.tar.gz 
  to force a redownload when you are ready
===============================================================================
make[1]: *** [Statistics-da6057baf849cbc803b952ef7adf979ae3a9f9d2/source-extracted] Error 2

FWIW, locally on 63-bit Linux I get e3c490ef....

@nalimilan
Copy link
Member Author

I've changed the PR to use another commit, and CI passed. I really think something weird happened due to moving the project from JuliaStats to JuliaLang, either on the GitHub side or on the cache side.

@nalimilan
Copy link
Member Author

I guess it looks good now? Not sure what's going on with the Windows tests (failing download).

@c42f
Copy link
Member

c42f commented Nov 5, 2019

Yeah those failures are pretty weird: basically a timeout when downloading julia-installer.exe. It seems to download the whole thing or almost the whole thing, then gets stuck. (The last log message from curl is at 57.2M and the whole thing is 58M; I can download the x64 version locally to a linux machine with the same curl command.)

Anyway, the windows builds passed which means the changes here should be fine given they apply to the build process. A failed download during testing on one platform doesn't seem worrying.

@c42f
Copy link
Member

c42f commented Nov 5, 2019

I restarted the win64 tester and it seems stuck again at the same download. @staticfloat any thoughts?

@staticfloat
Copy link
Sponsor Member

Okay I debugged this a bit today, and I think the issue here is that this PR just barely managed to dodge the commit on Sept. 28th that changed the installer to use Inno instead of NSIS. Can someone rebase this off of the latest master to see if that will play nicely with the new buildbots?

@nalimilan
Copy link
Member Author

That fixed the failures! I'll merge in a few days if nobody objects.

@nalimilan nalimilan merged commit 2e6b993 into master Nov 25, 2019
@nalimilan nalimilan deleted the nl/Statistics branch November 25, 2019 14:34
@ViralBShah
Copy link
Member

Should we transfer the statistics related issues to that repo?

@KristofferC
Copy link
Sponsor Member

Imo yes.

@ViralBShah
Copy link
Member

ViralBShah commented Nov 25, 2019

I'm looking forward to doing the same then with sparse matrix packages and eventually linear alebra. Having the issues in the new place will make it a lot easier.

@nalimilan
Copy link
Member Author

Yes, definitely.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:statistics The Statistics stdlib module stdlib Julia's standard library
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants