-
Notifications
You must be signed in to change notification settings - Fork 15
Scala IO fix-up/overhaul #19
Comments
@pathikrit has a NIO library that may be of interest: |
What's wrong with "deprecate and point people towards java.nio or third party libraries"? The former is built in and perfectly usable, even from Scala code (as compared to java.io). The latter would be able to evolve much more quickly than something living in the scala std lib, and end up much higher quality. "The standard library is where code goes to die" isn't it? Here's one possible alternative: we take some large-ish Scala projects (play? akka? sbt? scalac?) and extract out the common bits of their IO libraries (and they all have their own IO libraries!) into something used by all. We'd need buy in from all the different owners, but that would force us to actually make something of production-quality that is actually getting used. If we make something "cool and elegant" in the vacuum, my $ says it'll be just as useless as Here's another alternative workflow: we deprecate I don't think coming at it from a point of view of "let's make an awesome generic powerful IO library with a better Source and a Target and other abstractions..." will yield us any useful results. |
A better way I think would split it out as scala.io project.where I think we could evolve more fast than it lives in the std one. |
The reason that in first place I proposed |
The hard part is what should be in and out in the std,If we provide it via a better separate project eg, For the file operation,one thing I am using is the vert.x's https://github.com/eclipse/vert.x/blob/master/src/main/java/io/vertx/core/file/FileSystem.java.And I still looked at the https://github.com/google/jimfs. Look at the way golang ,clojure and rust do,keep some of the module/stdlib out really always helps.I think the core/language should be core and small,scala is a language,but still it lives on JVM. And I still looked at better-files, ammonite-ops,both them have a shell like syntax,but I don't know how much do them share on the io side. I think we could improve the update: for the |
Both are basically thin wrappers around |
Agree with @lihaoyi . Every fairly large project has their own "IOUtils" or "FileUtils" somewhere internally. We can start with a goal of targeting the feature set covered by these 3 APIs:
But, even before we think about starting on idiomatic AND simple I/O in Scala, we need to answer these:
val file = File(....)
assert(file.exists)
file.delete()
assert(!file.exists) This is surprising for people coming from a functional/immutable background. Do we go with a more correct immutable API centered around IO monads but increase the barrier to entry for non-fp folks?
I would recommend, "Why not both?" - let's have both blocking dumb APIs and asynchronous reactive APIs. |
Let's try to make a distinction between core functionality that almost everyone could use and advanced functionality that will support demanding users. That Scala doesn't have an easy way to slurp up a file is not to our credit. Nor is that we have to choose an external JSON library. These things are ubiquitous needs, and should just be there, and should just work. Easy stuff should be easy. So, going off of @pathikrit's list:
|
I can see 3 parts to this:
(root / "tmp" / "diary.txt")
.createIfNotExists()
.appendNewLine
.appendLines("My name is", "Inigo Montoya")
.moveTo(home / "Documents")
.renameTo("princess_diary.txt")
.changeExtensionTo(".md")
.lines
But, even javascript programmers have had them for many years now 🐼 |
@Ichoran The method names might be too terse, but this little library I wrote seems to hit the sweet spot for me in terms of simplicity/power for reading "regular" files: https://github.com/non/junkion#recipes. (The library's operating principle is "allow the user to read files without importing anything from |
Another one that might be of interest, particularly for the way it fixes On Thu, 17 Sep 2015 at 22:01 Erik Osheim notifications@github.com wrote:
|
I think the odds of getting this "right" in any satisfying sense are very close to zero. So I vote for removing Disclaimer: I want to get rid of almost everything in the Scala standard library. |
@tpolecat : But then what happens when I want to use lib1 which uses scalaz-stream exposes some method which takes in a scalaz-file and lib2 which uses rapture and exposes a method that uses rapture-file. I now need convert between scalaz-file and rapture-file! Not sure why you are pessimistic about getting this "right". Many other languages (and libraries) have gotten this "right" enough to make it painless: https://nodejs.org/api/fs.html http://ruby-doc.org/stdlib/libdoc/fileutils/rdoc/FileUtils.html http://www.boost.org/doc/libs/1_59_0/libs/filesystem/doc/reference.html We already suffer from this fragmentation because of a lack of JSON library in the stdlib. |
You say fragmentation, I say marketplace of ideas. :-) |
The main reason I'm pessimistic is that we've gotten it wrong before. Many times! That resulted in pretty awkward, senseless code making it into the standard lib and being frozen there for eternity: scala.io, scala.xml, scala.parsers, scala.collections.views, scala.collections.parallel, ... If we encourage people to use third party libraries, we can then pick the winner to include with full confidence we're not leaving half-broken rubbish around for future generations. I mean, I'm super happy people are trying stuff like:
But I don't see why we should run experiments in the standard lib when previous such experiments (XML, parser-combinators, parallel collections, views, current scala.io, ...), run with the best of intentions, are in the process of being painfully excised from it. For example, Things like
Indicate we have no idea what we're doing as of this time. "Let's put it in the standard library!" is not the right response to this kind of situation =D We should be pretty damn sure what we want, and why we want it, before we saddle future generations with our bright ideas! We have a perfectly functional dependency resolution system, as well as a perfectly functional IO library in java.nio. Both are possible alternatives to bundling things in the standard library. If we can't get some large number of Scala users/projects using our third party library, who's to say our code is good enough to force it upon everybody? |
w.r.t. @Ichoran's description of "core" functionality, java.nio is perfectly usable to provide that. e.g. to write to a file in a single line: Files.write(Paths.get("file.txt"), "file contents".getBytes) To read from a file in a single line new String(Files.readAllBytes(Paths.get("test.txt"))) This works great. In fact, it's barely any more verbose than using io.Source to read from a file! io.Source.fromFile("test.txt").mkString Anything we include in the standard library would need to be sufficiently better than java.nio to be worth it's weight in the standard library. |
@tpolecat: I want Scala to be "batteries included". I don't want to spend my time evaluating which library to use (or copying code from StackOverflow) to do simple stuff like delete a directory on my filesystem or parse a json or download a webpage etc. I don't want to spend time making two different libraries I depend on talk to each other just because they use different JSON converters or File classes. But, as @lihaoyi mentioned, the std lib ends up being the code graveyard frozen in time. Can we have a compromise? Maybe make scala-io an incubator/experimental project that is decoupled from the regular Scala release schedule so it can evolve much faster? A canonical Scala I/O library on GitHub (that is officially blessed/promoted/recommended by typesafe/scala/@odersky) and manages to attract the best minds in Scala would be an excellent start! |
@pathikrit decoupled is what exactly what @lihaoyi suggested first and I vote for too.@ktoso is going to add some files support for akka too,then what's your idea about this @ktoso ? |
Adding my 2c: I'd be interested to see how far we could get with a |
@retronym: pretty far IMHO |
@pathikrit I'd argue then that you should rename |
@retronym: This may not be the right place to discuss it but I removed the implicit conversion, so you would have to explicitly do This started out as a personal project and for me |
use java.toScala and scala.toJava instead
Thanks @lihaoyi for writing the novel above. Agree 100%. @pathikrit it would great to assemble a team and start looking at writing an awesome IO library, but I don't see why this should be done under a SLIP. It's also important to recognize that there are now two largely disjoint Scala canons, and I think you will find substantial and likely intractable disagreement among the "great minds" on how such a library should work. |
@tpolecat : If its not done under official blessing of the SLIPs (i.e. typesafe/scala/@odersky like entity), it may not necessarily get the attention/mindshare/buy-in it deserves (which is fine for most libraries but may not be for critical ones like an I/O library which every Scala library/company reinvents internally). I am not an expert in such community processes, I will let @dickwall chime in. Either way, would be happy to contribute once we get something going.
Haha, one, can code under |
In case others are interested, @pathikrit and I continued the discussion of the pros and cons of only using extension methods vs providing a parallel hierarchy of data types over here: pathikrit/better-files@346b982#commitcomment-13302644 |
Anyway, let me help out @dickwall a little here by repeating his gentle instructions, before we all get too deep into the nitty gritty of API design.
|
What about a I really like ammonite's distinction between relative vs absolute paths (makes certain operations safer) but it may violate "let's not introduce any type-hierarchy"? Similarly for files, I grappled with type-safety e.g. should you be able to call File("/tmp/foo") match {
case d: Directory => d.list()
case f: RegularFile => f.readBytes
case SymbolicLink(d: Directory) => d.list()
case _ => // something else e.g. UNIX pipes/processes/devices etc
} If our goal is to simply wrap NIO, I would say no and let those additional type-safety be provided by external libraries like ammonite (type-safe paths) and better-files (type-safe files). Also, if we go down the path of "let's wrap NIO", how exception-happy should we be? The Java NIO |
I think we should throw exceptions willy-nilly. Exceptions are great, well understood, familiar, and can trivially be wrapped in more principled abstractions via try-catch. The scala standard library only has Try, which I think isn't that appropriate, and further research into fancier-while-still-usable abstractions are still just abtractions. The problem with files is that they're halfway between statically-known and unknown. e.g. if I'm dealing with files I know on disk, and can see them in front of me and know what they are, having everything return |
Here are my 2cents:
Here's what I would suggest:
|
At least we now know when a project has run out of steam: "Aligned Scala logo." https://github.com/scala-incubator/scala-io/commit/8b5467d66760536d34b6bcb36f69a1b7f67f68b5 By coincidence, I'm aligning the logos on my desk this very minute. |
There was a discussion (and earlier) and my suggestion that we could use this issue as a test-bed on separating the std-lib interface and implementation and see how naming conventions work etc. But this need not be part of the final solution. By naming, for example, should an implementation have the std namespace and/or its own: import scala.io // std lib import, as defined in a library dependency in SBT
import scala.std.io // the scalac implementation
import nodejs.std.io // My own node version |
Oops - didn't mean to edit Haoyi's post but reply to him - reply is below (with context) Just catching up with this very long thread now - I was on vacation so sue me :-)
Not a cunning trick, but it certainly has led to a healthy discussion. The original post offers some options but certainly makes no demands or assumptions on what the EG should decide. My only request is that such discussions are held in the open (which this one seems to be) Just for the record, I am hands off any decision making on the technical side because I don't believe I can attempt to get a working process bootstrapped and also influence the decisions made within that process without a huge conflict of interest. As @non points out, formation of an expert group says nothing about whether an IO library should be forthcoming, only that the discussion should occur. The original post closes with:
If the EG decides no action is the correct action, aside from being very Zen, then that is what the EG decides. The discussion here is obviously healthy, but the point I am trying to get across is that right now we are trying to get the process bootstrapped (certainly that's my aim) not to influence anything about the outcome. That said, I am looking forward to the time when the process is trusted better and I can actually get involved in working on the opinion side of things as well. For now I am being as hands off and objective as I know how. Getting the word out about EGs and the messaging around the process is still something that I am very much interested in. How can we improve the messaging so that people are less surprised when issues like this come up (I don't always have time to email everyone individually so we need to find a common place where the message gets out there without being too surprising to people). |
Tomorrow (Monday 12th) being the next SLIP committee meeting, any updates or summaries to add for this issue? Thanks |
Re-reading this thread prior to the meeting tomorrow, the most insightful posting is probably this one:
I agree with this set of questions/priorities 100%, the only difference I have is that why can't the expert group itself answer these? They are, after all, going to be affected by the answers. I think there is some misunderstanding of what an expert group is (or can be). Answering these questions would appear to be an ideal starting point for the group, and that group has full power and responsibility to chose as they see fit. I certainly can't think of any better choice of people to ponder these than the people that have an interest in the IO library. Also please note that being suggested for involvement in an EG does not mean you have to volunteer, nor does it limit the potential membership. It is instead merely a way to notify potentially interested parties that such a thing is being considered. I will be writing up a blog post for the Scala blog about some of these concepts in the near future. |
Thanks for the summary @dickwall. Regarding this:
Can we choose based on "what is the least amount of work we can do for the maximum benefit to the programmer"? To maximize "bang for the buck", wrapping all the utils in java.nio.file.Files into a sensible Scala File class makes the most sense (proof of concept). There are also valid concerns about the standard lib being the "graveyard of code" - IMO, this mitigates some of those concerns. Less code we put in the std lib, the less we put in the graveyard :) |
IMHO you can get very far with a lot less bucks implicit def stringPaths(p: String) = java.nio.file.Paths.get(p)
implicit def stringPaths(p: java.io.File) = java.nio.file.Paths.get(p.toString) Here we're paying two lines of code instead of 150 in your POC. I don't think we really get 75x more value out of wrapping things v.s. just using the methods directly. I mean, is it really worth spending 148 lines of code wrapping every single operation in our own definition, just so we can call |
@lihaoyi I disagree :) We absolutely need to wrap
Do you want to count lines in a file using Java NIO? Why would we burden Scala programmers with all these pitfalls or make them waste their time looking up on StackOverflow how to do trivial things like get an |
As a user of the better.files library I strongly support @pathikrit 's position. This library is a huge relief when having to do filesystem operation. I don't really care if it's included in the std lib or not but it is definitely much better than anything that's currently available in either java or scala standard libraries. |
@pathikrit I'm surprised you forgot to mention that on windows sometimes you can't delete a file immediately (because something like a virus scanner holds it), so to be 'safe' you actually need to call delete multiple times with some kind of time-out/retry. We have most of this in the sbt.IO class as well, and I agree it's basically a necessity for those not writing really low-latency/low-level file code who just want it to "work". However, I'd argue that for a general-purpose standard-library file API, I'm not 100% certain all the "correctness vs. speed" tradeoffs should be made for me. I can totally see this from a utility library. |
@jsuereth : Good point about drawing a line between a "util" library and a std library API. IMO, if you want low-level, run with scissors APIs, we already have the |
My standard take on this
I am also in favour of doing a proper, clean room implementation. The current state of file IO in Scala is a mess, and everyone is using a combination of In terms of design, I am happy with stuff like This puts us in a good position to create a new package (under a different names). I also completely agree with @pathikrit, we need to properly wrap all of the |
One week to the next SLIP meeting. Not that I want these things to just become SLIP meeting driven (in terms of dates/deadlines), but if there are any updates on this issue in the next week, we will pick them up in that meeting. |
+1 to everything that @mdedetrich said. A clean room implementation (more for clean, idiomatic Scala API perspective) would provide the greatest return in the long run, esp w.r.t. Scala.js etc. Plus that File I/O is something people expect in a standard library.... |
Looking back, there's a lot of interesting discussion in this thread, but the one thing that's clear to me is that the community failed to come to a consensus. People have differing use cases, requirements, and styles, problem scopes, and it seems doubtful we'll come to a consensus in the foreseeable future. If we accept that we have not converged on any technical solution, now is the time to start thinking about the meta-solution: given we can't agree or decide, how can we get to a place where we could agree or decide at some point in the future? Even if scala-team/EPFL/soon-to-not-be-called-Typesafe don't bless/pick/write any IO library right-here-right-now, there are things they can do can do that would speed up the process of coming to a decision. For example, if we decided that
They could add links to the docs/tutorials/main-website like
This would funnel new users towards the candidates, so the various libraries all get a steady stream of people vetting them and deciding they like them or not. If we decided the process was
Then there would be a different set of actions we could take to smoothen/speed-up that process This is a reason why an explicit null-decision would be useful, v.s. just not deciding: deciding "we won't pick one now" would let us move on confidently to the next topic of discussion: how would we structure such a selective process and define the ending conditions? How would we make it fair, fast, and hopefully encourage the right kinds of behavior that optimizes for the things we want? This then becomes a very managerial question, and arguably throwing a bunch of "people who write libraries" together wouldn't be the most effective way to answer it =P |
I think the biggest thing to get out of an IO library is to end the confusion, for new users, about what IO to use. @lihaoyi , the talk you gave at Scala By The Bay perfectly demonstrates the problem, to do silly IO stuff, users end up having to search stack overflow. There are around 4-5 solutions, some coming from Java, some coming from stuff like Apache Commons, stuff coming from Scala Source (which some people now accept as not that good of a library), and all are fairly verbose.
The whole "wait for people to use a common IO library" doesn't really hold water, it hasn't happened in some long time. I am sure, for example, that Rapture IO may be a great IO library, I however only found about this a few months ago. The other thing is, that other frameworks/libraries do not use this library, so we then risk ourselves of getting to the perverse situation that landed us with the same problem that we have with JSON We should have an IO library, where as a new user, I can go to the scala website, and the docs will go something like
And then a bunch of your expected operations. I don't think anyone here is asking for a hyper specialized high performant IO library to be used for load balancers or something along those lines, there will always be a case for community making their own IO libraries for specialized circumstances. I believe the idea is to create an idiomatic, non Java, Scala IO library that the majority of users are happy with |
I don't know why you quoted me because this has nothing to do with what I said =P I never proposed inaction. Just a step back from the blind, single minded "let's just do something, community!" strategy that clearly hasn't worked. I mean, it's great that you're so sure you know what to do to fix everything, but clearly lots of people disagree about things. What next? Arguing "This is what we should do, it's so obvious" just goes in circles. |
Sorry if I wasn't clear. I was just confirming your point that "letting the community do it" didn't really work |
The scala.io.Source code is somewhat deprecated, see scala/slip#19 As an additional bonus, better-files contains nice functions to write files, so you can now do the following on the CLI: file"buss3.p" < TPTPFOLExporter(BussTautology(3)).toString
This could be revived under the new Scala Platform Process (http://www.scala-lang.org/blog/2016/11/28/spp.html). |
scala.io.Source is small, useful, troubled and usually recommended against although still used by many.
A recent SLIP submission: #2 suggested a Target cf to Source for similar functionality on the output. Feeling in the SLIP committee is that a Target that aimed to be the equivalent for output as Source is for input as it stands now would not be accepted into the core libraries, however, everyone seemed in favor of an overhaul of the scala.io library.
Since this is likely to be a bigger task, we suggest an expert group form and meet to discuss and work on the problem. Interested parties identified in the meeting include Omid Bakhshandeh @omidb, Jon Pretty @propensive, Jesse Eichar @jesseeichar, Haoyi Li @lihaoyi and Pathikrit Bhowmick @pathikrit. The expert group will, of course, be open to volunteers willing to work on the implementation (if you are just interested in sharing your opinions, I suggest you attach comments to this thread rather than joining the EG).
In order to get things moving, and since the original PR came from @omidb, I suggest he take the lead in forming the group and setting up the first meeting. If at that point someone else wants to volunteer to take the organizational role for the group at that time, that would be the time to discuss it.
Please also note that any IO SLIP targeting Scala 2.12+ will have java's NIO guaranteed to be available, making NIO an option for the basis of an implementation.
First steps:
Please organize the first expert group meeting and provide details of the decisions made and action items. Would suggest following the Either expert group's lead and holding the discussion in the open on Google hangouts-on-air or similar so that the recording is publicly available to all interested. If you are involved with the EG, please post any progress in comments on this issue.
The text was updated successfully, but these errors were encountered: