Skip to content
This repository has been archived by the owner on Sep 5, 2018. It is now read-only.

Scala IO fix-up/overhaul #19

Closed
dickwall opened this issue Sep 17, 2015 · 88 comments
Closed

Scala IO fix-up/overhaul #19

dickwall opened this issue Sep 17, 2015 · 88 comments

Comments

@dickwall
Copy link
Contributor

scala.io.Source is small, useful, troubled and usually recommended against although still used by many.

A recent SLIP submission: #2 suggested a Target cf to Source for similar functionality on the output. Feeling in the SLIP committee is that a Target that aimed to be the equivalent for output as Source is for input as it stands now would not be accepted into the core libraries, however, everyone seemed in favor of an overhaul of the scala.io library.

Since this is likely to be a bigger task, we suggest an expert group form and meet to discuss and work on the problem. Interested parties identified in the meeting include Omid Bakhshandeh @omidb, Jon Pretty @propensive, Jesse Eichar @jesseeichar, Haoyi Li @lihaoyi and Pathikrit Bhowmick @pathikrit. The expert group will, of course, be open to volunteers willing to work on the implementation (if you are just interested in sharing your opinions, I suggest you attach comments to this thread rather than joining the EG).

In order to get things moving, and since the original PR came from @omidb, I suggest he take the lead in forming the group and setting up the first meeting. If at that point someone else wants to volunteer to take the organizational role for the group at that time, that would be the time to discuss it.

Please also note that any IO SLIP targeting Scala 2.12+ will have java's NIO guaranteed to be available, making NIO an option for the basis of an implementation.

First steps:

Please organize the first expert group meeting and provide details of the decisions made and action items. Would suggest following the Either expert group's lead and holding the discussion in the open on Google hangouts-on-air or similar so that the recording is publicly available to all interested. If you are involved with the EG, please post any progress in comments on this issue.

@dickwall
Copy link
Contributor Author

@pathikrit has a NIO library that may be of interest:

https://github.com/pathikrit/better-files

@lihaoyi
Copy link

lihaoyi commented Sep 17, 2015

however, everyone seemed in favor of an overhaul of the scala.io library.

What's wrong with "deprecate and point people towards java.nio or third party libraries"? The former is built in and perfectly usable, even from Scala code (as compared to java.io). The latter would be able to evolve much more quickly than something living in the scala std lib, and end up much higher quality.

"The standard library is where code goes to die" isn't it?

Here's one possible alternative: we take some large-ish Scala projects (play? akka? sbt? scalac?) and extract out the common bits of their IO libraries (and they all have their own IO libraries!) into something used by all. We'd need buy in from all the different owners, but that would force us to actually make something of production-quality that is actually getting used. If we make something "cool and elegant" in the vacuum, my $ says it'll be just as useless as scala.io is now.

Here's another alternative workflow: we deprecate scala.io in 2.12, point people towards java.nio or third party libs (better-files, ammonite-ops, etc.) and when one of them becomes popular we then talk about which parts of it are good and are worth including in the standard library. That way we'd know from the fact that it's popular and widely-used that whatever we're including is useful and usable.

I don't think coming at it from a point of view of "let's make an awesome generic powerful IO library with a better Source and a Target and other abstractions..." will yield us any useful results.

@He-Pin
Copy link

He-Pin commented Sep 17, 2015

A better way I think would split it out as scala.io project.where I think we could evolve more fast than it lives in the std one.

@omidb
Copy link

omidb commented Sep 17, 2015

The reason that in first place I proposed scala.io.target was that whenever I wanted to do IO, I was using java.io and I thought that for a language like Scala having no IO support is kinda not right. Whenever I want to convince people to use Scala (mostly people from Python) they ask me is it easy to read a CSV file? how about pickle it? How about write it to the disk .....
I think deprecating scala.io can be a good idea but my alternative would be doing the same thing that Scala people did for scala.xml. I don't know what they call it (scala module? plugin?). I think having an IO lib with scala domain would be great.

@He-Pin
Copy link

He-Pin commented Sep 17, 2015

The hard part is what should be in and out in the std,If we provide it via a better separate project eg,scala.nio then we could provide the toolkit start with a minimal one and then keep up coming release for the real user case quickly.

For the file operation,one thing I am using is the vert.x's https://github.com/eclipse/vert.x/blob/master/src/main/java/io/vertx/core/file/FileSystem.java.And I still looked at the https://github.com/google/jimfs.

Look at the way golang ,clojure and rust do,keep some of the module/stdlib out really always helps.I think the core/language should be core and small,scala is a language,but still it lives on JVM.

And I still looked at better-files, ammonite-ops,both them have a shell like syntax,but I don't know how much do them share on the io side.

I think we could improve the scala.io,but If we want to introduce something big or more than better,I think that should happens on a seperate project under scala.

update: for the scala.xml side,it will be depreciated in the future,I think that not like the io one,think about it that,why clojure doesn't put org.clojure.async in the clojure project?

@lihaoyi
Copy link

lihaoyi commented Sep 17, 2015

but I don't know how much do them share on the io side.

Both are basically thin wrappers around java.nio. It really isn't bad and does everything you need...

@pathikrit
Copy link

Agree with @lihaoyi . Every fairly large project has their own "IOUtils" or "FileUtils" somewhere internally.
That would be a good starting point to figure out the core "we-need this util" parts of the library. Or the I/O libraries of Python or Go or F# or node.js might be good to imitate to begin with too...

We can start with a goal of targeting the feature set covered by these 3 APIs:

But, even before we think about starting on idiomatic AND simple I/O in Scala, we need to answer these:

  1. API style: Do we go with a more "FileUtils" style approach. This is what was followed in java.nio e.g. instead of doing file.isDirectory(), you do Files.isDirectory(file). This is needlessly verbose IMO but I don't have a strong opinion here. Do we go with a more OO style (e.g. file1.moveTo(file2)) - that is the style followed in better-files or a DSL inspired by the command line like in ammonite-ops (e.g. mv(file1, file2)) or something else?

  2. Is the library centered around Files or Paths? IMO, Paths is the more correct abstraction but also an academic distinction. Most application programmers think about files and do operations on files and for them, files happen to have paths and not the other way (paths happen to have files). I would personally have an immutable set of APIs centered around immutable Paths and a callback-based API based around files (ala node.js).

  3. Referential transparency: I/O libraries have inherent side effects:

val file = File(....)
assert(file.exists)
file.delete()
assert(!file.exists)

This is surprising for people coming from a functional/immutable background. Do we go with a more correct immutable API centered around IO monads but increase the barrier to entry for non-fp folks?

  1. How do you deal with the myriad of InputStreams and BufferedReaders and FileChannel and OutpustReamWriters that populate the Java enterprise world? Are we going to build sane bridges from Scala to them or do away with all that and have complete Scala equivalents? Here is my attempt at a bridge: https://github.com/pathikrit/better-files#java-interoperability

  2. The Java APIs are riddled with things like NotADirectoryException e.g. if you try to list a regular file or read bytes from a directory. This is something I wrestled with in better-files to make I/O operations more type-safe e.g. you cannot call list() on something that is not a directory:

"src"/"test"/"foo" match {
  case SymbolicLink(to) =>          
  case Directory(children) =>       
  case RegularFile(source) =>       
  case other if other.exists() =>   // a file may not be one of the above e.g. UNIX pipes, sockets, devices etc
  case _ =>                         // a file that does not exist
}
// or as extractors on LHS:
val Directory(researchDocs) = home/"Downloads"/"research"
  1. Is this library solely intended for disk-based filesystems or can it be a pluggable interface for other filesystems (e.g. S3 or an in-memory one like Google's jimfs)

  2. Are the APIs going to all non-reactive blocking ones like we are used to? Can we add reactive APIs like node.js:

file.delete(callback(success, error))

I would recommend, "Why not both?" - let's have both blocking dumb APIs and asynchronous reactive APIs.

@Ichoran
Copy link

Ichoran commented Sep 17, 2015

Let's try to make a distinction between core functionality that almost everyone could use and advanced functionality that will support demanding users. That Scala doesn't have an easy way to slurp up a file is not to our credit. Nor is that we have to choose an external JSON library. These things are ubiquitous needs, and should just be there, and should just work. Easy stuff should be easy.

So, going off of @pathikrit's list:

  1. It's easier to have methods on files than to have to drag along a clunky I-can-do-stuff object. file.isDirectory FTW.

  2. Inasmuch as Scala favors correctness over other things, Path is going to have to play a major role.

  3. Monadic interaction with the file system is an advanced functionality. That belongs in other libraries.

  4. Slurping should work with whatever is slurpable. Otherwise, bridges are advanced functionality. That also belongs in other libraries.

  5. Type safety that doesn't get in the way and reliably catches all exceptions is a good thing. I don't know what you have in better-files, but if case d: Directory is different than case Directory(children), that's a good start (i.e. you don't throw an uncaught exception on an access error on the pattern matcher). That said, you don't normally want to be futzing with directories too much directly. You want some higher-level thing to happen and directory-walking or searching is a means to that end. We should provide an API that lets you specify your end, not the steps to get there (to the extent possible). File system walkers are a good example of this.

  6. Supporting all sorts of weird things that aren't actually mounted as a file system on the OS is beyond the scope of a simple solution. If they look that much like a filesystem, get the OS to mount them as such, and use the normal interface.

  7. Reactive APIs are advanced usage. You have to think way more carefully about marshalling resources if you do that. External library.

@pathikrit
Copy link

@Ichoran:

I can see 3 parts to this:

  1. Core purely Scala OO style APIs centered around: scala.io.Path and scala.io.mutable.File classes.
    These are all blocking synchronous side-effecty APIs to do "core" things e.g.
 (root / "tmp" / "diary.txt")
  .createIfNotExists()  
  .appendNewLine
  .appendLines("My name is", "Inigo Montoya")
  .moveTo(home / "Documents")
  .renameTo("princess_diary.txt")
  .changeExtensionTo(".md")
  .lines
  1. Java converters brought in using scala.io.JavaConverters which can add conversions to/from Java (e.g. https://github.com/pathikrit/better-files#java-interoperability)

  2. scala.io.immutable.File - brings in immutable monadic reactive file library. Can be a placeholder for the future.

  1. Reactive APIs are advanced usage.

But, even javascript programmers have had them for many years now 🐼

@non
Copy link

non commented Sep 17, 2015

@Ichoran The method names might be too terse, but this little library I wrote seems to hit the sweet spot for me in terms of simplicity/power for reading "regular" files: https://github.com/non/junkion#recipes.

(The library's operating principle is "allow the user to read files without importing anything from java.io or java.nio" and I think it does a reasonable job.)

@dwijnand
Copy link
Member

Another one that might be of interest, particularly for the way it fixes
Java's API on Windows, is sbt's IO module: https://github.com/sbt/io

On Thu, 17 Sep 2015 at 22:01 Erik Osheim notifications@github.com wrote:

@Ichoran https://github.com/Ichoran The method names might be too
terse, but this little library I wrote seems to hit the sweet spot for me
in terms of simplicity/power for reading "regular" files:
https://github.com/non/junkion


Reply to this email directly or view it on GitHub
#19 (comment).

@tpolecat
Copy link

I think the odds of getting this "right" in any satisfying sense are very close to zero. So I vote for removing scala.io and pointing users to better options like scalaz-stream, Rapture, Junkion, and so on.

Disclaimer: I want to get rid of almost everything in the Scala standard library.

@pathikrit
Copy link

@tpolecat : But then what happens when I want to use lib1 which uses scalaz-stream exposes some method which takes in a scalaz-file and lib2 which uses rapture and exposes a method that uses rapture-file. I now need convert between scalaz-file and rapture-file!

Not sure why you are pessimistic about getting this "right". Many other languages (and libraries) have gotten this "right" enough to make it painless:

https://nodejs.org/api/fs.html

http://ruby-doc.org/stdlib/libdoc/fileutils/rdoc/FileUtils.html

http://www.boost.org/doc/libs/1_59_0/libs/filesystem/doc/reference.html

We already suffer from this fragmentation because of a lack of JSON library in the stdlib.

@tpolecat
Copy link

You say fragmentation, I say marketplace of ideas. :-)

@lihaoyi
Copy link

lihaoyi commented Sep 17, 2015

Not sure why you are pessimistic about getting this "right"

The main reason I'm pessimistic is that we've gotten it wrong before. Many times! That resulted in pretty awkward, senseless code making it into the standard lib and being frozen there for eternity: scala.io, scala.xml, scala.parsers, scala.collections.views, scala.collections.parallel, ...

If we encourage people to use third party libraries, we can then pick the winner to include with full confidence we're not leaving half-broken rubbish around for future generations.

I mean, I'm super happy people are trying stuff like:

scala.io.immutable.File - brings in immutable monadic reactive file library. Can be a placeholder for the future.

But I don't see why we should run experiments in the standard lib when previous such experiments (XML, parser-combinators, parallel collections, views, current scala.io, ...), run with the best of intentions, are in the process of being painfully excised from it.

For example, Things like

Is the library centered around Files or Paths? IMO, Paths is the more correct abstraction but also an academic distinction. Most application programmers think about files and do operations on files and for them, files happen to have paths and not the other way (paths happen to have files). I would personally have an immutable set of APIs centered around immutable Paths and a callback-based API based around files (ala node.js).

Indicate we have no idea what we're doing as of this time. "Let's put it in the standard library!" is not the right response to this kind of situation =D

We should be pretty damn sure what we want, and why we want it, before we saddle future generations with our bright ideas! We have a perfectly functional dependency resolution system, as well as a perfectly functional IO library in java.nio. Both are possible alternatives to bundling things in the standard library.

If we can't get some large number of Scala users/projects using our third party library, who's to say our code is good enough to force it upon everybody?

@lihaoyi
Copy link

lihaoyi commented Sep 17, 2015

w.r.t. @Ichoran's description of "core" functionality, java.nio is perfectly usable to provide that. e.g. to write to a file in a single line:

Files.write(Paths.get("file.txt"), "file contents".getBytes)

To read from a file in a single line

new String(Files.readAllBytes(Paths.get("test.txt")))

This works great. In fact, it's barely any more verbose than using io.Source to read from a file!

io.Source.fromFile("test.txt").mkString

Anything we include in the standard library would need to be sufficiently better than java.nio to be worth it's weight in the standard library.

@pathikrit
Copy link

@tpolecat: I want Scala to be "batteries included". I don't want to spend my time evaluating which library to use (or copying code from StackOverflow) to do simple stuff like delete a directory on my filesystem or parse a json or download a webpage etc. I don't want to spend time making two different libraries I depend on talk to each other just because they use different JSON converters or File classes.

But, as @lihaoyi mentioned, the std lib ends up being the code graveyard frozen in time. Can we have a compromise? Maybe make scala-io an incubator/experimental project that is decoupled from the regular Scala release schedule so it can evolve much faster?

A canonical Scala I/O library on GitHub (that is officially blessed/promoted/recommended by typesafe/scala/@odersky) and manages to attract the best minds in Scala would be an excellent start!

@He-Pin
Copy link

He-Pin commented Sep 18, 2015

@pathikrit decoupled is what exactly what @lihaoyi suggested first and I vote for too.@ktoso is going to add some files support for akka too,then what's your idea about this @ktoso ?

@retronym
Copy link
Member

Adding my 2c: I'd be interested to see how far we could get with a java.nio.files._. wrapper that only adds extension or static helper methods, and avoids the temptation to add a layer of data types.

@pathikrit
Copy link

@retronym: pretty far IMHO

@retronym
Copy link
Member

@pathikrit I'd argue then that you should rename better.files.File to FileOps and make it an implicit value class. Otherwise people will be tempted to use it in their APIs.

@pathikrit
Copy link

@retronym: This may not be the right place to discuss it but I removed the implicit conversion, so you would have to explicitly do .toScala to access the Scala one.

This started out as a personal project and for me File is always better.files.File and whenever I import any Java crap, I do import java.io.{File => JFile} to warn the reader of the code. But, I guess, since I released it into the wild, I should give it a different name...

retronym referenced this issue in pathikrit/better-files Sep 18, 2015
@tpolecat
Copy link

Thanks @lihaoyi for writing the novel above. Agree 100%.

@pathikrit it would great to assemble a team and start looking at writing an awesome IO library, but I don't see why this should be done under a SLIP. It's also important to recognize that there are now two largely disjoint Scala canons, and I think you will find substantial and likely intractable disagreement among the "great minds" on how such a library should work.

@pathikrit
Copy link

@tpolecat : If its not done under official blessing of the SLIPs (i.e. typesafe/scala/@odersky like entity), it may not necessarily get the attention/mindshare/buy-in it deserves (which is fine for most libraries but may not be for critical ones like an I/O library which every Scala library/company reinvents internally). I am not an expert in such community processes, I will let @dickwall chime in.

Either way, would be happy to contribute once we get something going.

now two largely disjoint Scala canons

Haha, one, can code under scala.io.mutable._ and other under scala.io.immutable._ =)

@retronym
Copy link
Member

In case others are interested, @pathikrit and I continued the discussion of the pros and cons of only using extension methods vs providing a parallel hierarchy of data types over here: pathikrit/better-files@346b982#commitcomment-13302644

@retronym
Copy link
Member

Anyway, let me help out @dickwall a little here by repeating his gentle instructions, before we all get too deep into the nitty gritty of API design.

First steps: Please organize the first expert group meeting

@pathikrit
Copy link

Something else?

What about a scala.io.Path or a scala.io.File which simply wraps java.nio.files.Path (that's what better.files.File does). Paths are the more correct term here than files IMO but developers usually think about files and not paths so its a matter of nomenclature..

I really like ammonite's distinction between relative vs absolute paths (makes certain operations safer) but it may violate "let's not introduce any type-hierarchy"?

Similarly for files, I grappled with type-safety e.g. should you be able to call .list on a regular file or call .readBytes on a directory? Should we have type to help our code be safer? e.g.

File("/tmp/foo") match {
  case d: Directory => d.list()
  case f: RegularFile => f.readBytes
  case SymbolicLink(d: Directory) => d.list()
  case _ =>  // something else e.g. UNIX pipes/processes/devices etc
}

If our goal is to simply wrap NIO, I would say no and let those additional type-safety be provided by external libraries like ammonite (type-safe paths) and better-files (type-safe files).

Also, if we go down the path of "let's wrap NIO", how exception-happy should we be? The Java NIO directory.walk() for example throws errors if one of the files in the directory is unreadable. Should we tolerate that in Scala?

@lihaoyi
Copy link

lihaoyi commented Sep 25, 2015

how exception-happy should we be?

I think we should throw exceptions willy-nilly. Exceptions are great, well understood, familiar, and can trivially be wrapped in more principled abstractions via try-catch. The scala standard library only has Try, which I think isn't that appropriate, and further research into fancier-while-still-usable abstractions are still just abtractions.

The problem with files is that they're halfway between statically-known and unknown. e.g. if I'm dealing with files I know on disk, and can see them in front of me and know what they are, having everything return Options would just make me call .get everywhere

@bs76
Copy link

bs76 commented Sep 27, 2015

Here are my 2cents:

  • IO is not just files; resource state management is completely missing from io.* and that is an obstacle; using Try, where do you close resources ? Combining reads/writes on multiple file resources your code becomes a complete mess. I wrote withResources so many times, it's not even funny;
  • IO is about resources, there needs to be a clear way to manage them, and handle errors; 'files I clearly see' do not exist; networks fail etc.
  • io.Source class is harmless and marginably usable; to read in a file in 'one line' is good enough
  • a DSL on top of files will always fail and never be done right. It's point-of-view matter. Where as sometimes OO approach fits, some might prefer pipes and combinators approach
  • files/paths are complex: there are paths (with/without files), files may have paths (virtual,logical,physical), there are links (physical,logical) and all of that on top of an OS;

Here's what I would suggest:

  • leave Source.io as is for now, do not deprecate
  • take java.nio/java.io and pimp it to make it better usable e.g. InputStream / Reader to read a String, convert to (Seq ?)
  • make opening files simpler, with pimed java.io classes manipulation will be simpler
  • add resource management into io. and provide style guidelines how to manage resource safely to be on par with java's try(Closeable ...)
  • pimp java.nio.Path to be more usable
  • let 3rd party libraries extend and build on top of the API, adopt usable abstractions

@som-snytt
Copy link

At least we now know when a project has run out of steam: "Aligned Scala logo." https://github.com/scala-incubator/scala-io/commit/8b5467d66760536d34b6bcb36f69a1b7f67f68b5

By coincidence, I'm aligning the logos on my desk this very minute.

@ghost
Copy link

ghost commented Oct 8, 2015

There was a discussion (and earlier) and my suggestion that we could use this issue as a test-bed on separating the std-lib interface and implementation and see how naming conventions work etc. But this need not be part of the final solution.

By naming, for example, should an implementation have the std namespace and/or its own:

import scala.io          // std lib import, as defined in a library dependency in SBT
import scala.std.io    // the scalac implementation
import nodejs.std.io  // My own node version

@dickwall
Copy link
Contributor Author

dickwall commented Oct 8, 2015

Oops - didn't mean to edit Haoyi's post but reply to him - reply is below (with context)

Just catching up with this very long thread now - I was on vacation so sue me :-)

For what it's worth, the conversation on this issue is exactly what I had hoped would happen.

Does that mean that @dickwall's original post was a cunning trick sufficiently wrong to get people to
respond? =D

Not a cunning trick, but it certainly has led to a healthy discussion. The original post offers some options but certainly makes no demands or assumptions on what the EG should decide. My only request is that such discussions are held in the open (which this one seems to be)

Just for the record, I am hands off any decision making on the technical side because I don't believe I can attempt to get a working process bootstrapped and also influence the decisions made within that process without a huge conflict of interest. As @non points out, formation of an expert group says nothing about whether an IO library should be forthcoming, only that the discussion should occur. The original post closes with:

  • Please organize the first expert group meeting and provide details of the decisions made and action items. Would suggest following the Either expert group's lead and holding the discussion in the open on Google hangouts-on-air or similar so that the recording is publicly available to all interested. If you are involved with the EG, please post any progress in comments on this issue.

If the EG decides no action is the correct action, aside from being very Zen, then that is what the EG decides. The discussion here is obviously healthy, but the point I am trying to get across is that right now we are trying to get the process bootstrapped (certainly that's my aim) not to influence anything about the outcome.

That said, I am looking forward to the time when the process is trusted better and I can actually get involved in working on the opinion side of things as well.

For now I am being as hands off and objective as I know how.

Getting the word out about EGs and the messaging around the process is still something that I am very much interested in. How can we improve the messaging so that people are less surprised when issues like this come up (I don't always have time to email everyone individually so we need to find a common place where the message gets out there without being too surprising to people).

@dickwall
Copy link
Contributor Author

Tomorrow (Monday 12th) being the next SLIP committee meeting, any updates or summaries to add for this issue? Thanks

@dickwall dickwall added this to the Oct 2015 SLIP mtg milestone Oct 11, 2015
@dickwall
Copy link
Contributor Author

Re-reading this thread prior to the meeting tomorrow, the most insightful posting is probably this one:

Here's a few questions that really should be asked before we run off to organize an "expert group" to "overhaul" the io library, and certainly should be answered before we start discussing nitty-gritty API details like whether to use functions or extension methods, or whether to work with exceptions or Eithers:

How long should this take? 1 month? 6 months? 12 months? 36 months? If we're throwing something in now we should probably stop talking, write something passable and land it. But if we have a few months that's enough time to work with some existing friendly project to try and port them onto our API as a POC, or put it on maven central for a while for people to try out before fossilizing it in the std lib.

Why did all the other projects in the past fail? Why does almost nobody use rapture.io? Why does nobody speak about scalax.io except in confusion whether it's alive or dead? Why did scala-arm die off? I don't have the answers to these, but presumably if we don't want this project to die it's worth finding out. Post-mortems take a bit of time but not as much time as 3 years and 400 more commits

Assuming we botch the whole thing, what's our strategy to realize that as early as possible (i.e. not after 3 years and 400 commits), and with as little damage as possible (i.e. not leaving things like sys.process lying around the std lib)? This probably rules out "working on own our awesome code in our own awesome github repo forever" or "YOLO landing stuff in master".

What is this library meant to do anyway? If it's IO, does that include sockets and HTTP like Rapture does? If it's File IO, does it include non-read/write filesystem management like better-files or ammonite-ops does? Does it include "in-memory IO" like working with InputStreams and OutputStreams? Does it work with text only, or binary data, or both? Streaming API or batch API or both?

Are we going for convenience (e.g. open("file.txt").read()) or shared-interfaces (Source, Target, ...) in the API? These are both valuable, but totally orthogonal. Having both is great but either alone is already useful. From the posts so far, some people want one and some people want the other.

Are we sure it's worth putting in all this effort to avoid java.nio, when we could just add java.nio.file.Files and 2-3 implicits to Predef.scala, and be able to leverage the non-trivial amount of documentation and familiarity out in the community w.r.t. how to use NIO? v.s. having to re-document and re-educate everyone ourselves if we make our own API, in addition to making sure our API is sufficiently cohesive and consistent and correct. Maybe we decide enough people are running Scala.js/Node.js to make our own API worthwhile, or the Oracle Legal Risk is too great. Or maybe we decide using Java APIs is just fine.

I agree with this set of questions/priorities 100%, the only difference I have is that why can't the expert group itself answer these? They are, after all, going to be affected by the answers. I think there is some misunderstanding of what an expert group is (or can be). Answering these questions would appear to be an ideal starting point for the group, and that group has full power and responsibility to chose as they see fit. I certainly can't think of any better choice of people to ponder these than the people that have an interest in the IO library.

Also please note that being suggested for involvement in an EG does not mean you have to volunteer, nor does it limit the potential membership. It is instead merely a way to notify potentially interested parties that such a thing is being considered.

I will be writing up a blog post for the Scala blog about some of these concepts in the near future.

@pathikrit
Copy link

Thanks for the summary @dickwall. Regarding this:

I agree with this set of questions/priorities 100%, the only difference I have is that why can't the expert group itself answer these?

Can we choose based on "what is the least amount of work we can do for the maximum benefit to the programmer"? To maximize "bang for the buck", wrapping all the utils in java.nio.file.Files into a sensible Scala File class makes the most sense (proof of concept).

There are also valid concerns about the standard lib being the "graveyard of code" - IMO, this mitigates some of those concerns. Less code we put in the std lib, the less we put in the graveyard :)

@dickwall dickwall modified the milestones: Oct 2015 SLIP mtg, Nov 2015 SLIP mtg Oct 12, 2015
@lihaoyi
Copy link

lihaoyi commented Oct 13, 2015

Can we choose based on "what is the least amount of work we can do for the maximum benefit to the programmer"? To maximize "bang for the buck", wrapping all the utils in java.nio.file.Files into a sensible Scala File class makes the most sense (proof of concept).

IMHO you can get very far with a lot less bucks

implicit def stringPaths(p: String) = java.nio.file.Paths.get(p)
implicit def stringPaths(p: java.io.File) = java.nio.file.Paths.get(p.toString)

Here we're paying two lines of code instead of 150 in your POC. I don't think we really get 75x more value out of wrapping things v.s. just using the methods directly. I mean, is it really worth spending 148 lines of code wrapping every single operation in our own definition, just so we can call f.delete() instead of Files.delete(f)? Especially given any Java programming will already be 100% familiar with the latter.

@pathikrit
Copy link

@lihaoyi I disagree :) We absolutely need to wrap java.nio.file.Files.

just so we can call f.delete() instead of Files.delete(f) ?

java.nio.file.Files has devious traps for us if we are not careful e.g. Files.delete does not actually delete non-empty directories - you have to do that yourself (you get a nice DirectoryNotEmptyException otherwise during run-time). Sure, any self-respecting Scala programmer can recurse and delete a directory in her sleep but try that with Files.copy which cannot copy directories recursively (it silently makes a empty folder with that name) and to do that correctly is entirely non-obvious. Similarly, Files.move - you have to be careful when the target exists and Files.size is not that useful for directories where you may want to calculate the size of the directory rather than the size of the inode entry.
Something simple like chown should have been file.setOwner(owner) - instead you have to write something ridiculous like: Files.setOwner(path, path.getFileSystem.getUserPrincipalLookupService.lookupPrincipalByName(owner))

Do you want to count lines in a file using Java NIO? Files.lines(myFile).size seems pretty innocuous but it is not! Files.lines returns a java.util.Stream which needs to be closed!

Why would we burden Scala programmers with all these pitfalls or make them waste their time looking up on StackOverflow how to do trivial things like get an Iterator[Char] from a file when we can sanely wrap java.nio.file.Files? Sure, many of them would be 1-liner hand-offs; but, in other cases, we can make life a lot better with few extra lines of code around whatever Java gives us to smoothen the rough edges of java.nio.file.Files.

@jeantil
Copy link

jeantil commented Oct 13, 2015

As a user of the better.files library I strongly support @pathikrit 's position. This library is a huge relief when having to do filesystem operation. I don't really care if it's included in the std lib or not but it is definitely much better than anything that's currently available in either java or scala standard libraries.

@jsuereth
Copy link
Contributor

@pathikrit I'm surprised you forgot to mention that on windows sometimes you can't delete a file immediately (because something like a virus scanner holds it), so to be 'safe' you actually need to call delete multiple times with some kind of time-out/retry. We have most of this in the sbt.IO class as well, and I agree it's basically a necessity for those not writing really low-latency/low-level file code who just want it to "work".

However, I'd argue that for a general-purpose standard-library file API, I'm not 100% certain all the "correctness vs. speed" tradeoffs should be made for me. I can totally see this from a utility library.

@pathikrit
Copy link

@jsuereth : Good point about drawing a line between a "util" library and a std library API. IMO, if you want low-level, run with scissors APIs, we already have the java.nio.file in the std lib. The Scala one should not even pretend to be a replacement for that and make that abundantly clear in the docs. Instead, it should strive to be the more intuitive and pragmatic "util" wrapper around the former.

@mdedetrich
Copy link

My standard take on this

  • We generally need to start looking at doing stdlib implementations in pure Scala, rather than doing light wrappers over the Java versions
  • File IO is something that is basically a must have which needs to be standardised, there should be a proper standardised idiomatic scala implementation that isn't just a java.nio.file
  • This means stuff like async file IO, should ideally be returning stuff like Future[File]

I am also in favour of doing a proper, clean room implementation. The current state of file IO in Scala is a mess, and everyone is using a combination of java.io/java.nio/scala.io/Source and then stuff like https://github.com/pathikrit/better-files. Stuff like Scala.js (and future backends that may come as a result of dotty/TASTY, such as LLVM) really scream for Scala idiomatic implementations of stdlib, rather than falling to back to Java all the time

In terms of design, I am happy with stuff like better-files, with additions to using stuff like Future[File] with proper async IO.

This puts us in a good position to create a new package (under a different names).

I also completely agree with @pathikrit, we need to properly wrap all of the java.nio since there are so many corner cases when doing file IO for the reasons he stated

@dickwall
Copy link
Contributor Author

dickwall commented Nov 2, 2015

One week to the next SLIP meeting. Not that I want these things to just become SLIP meeting driven (in terms of dates/deadlines), but if there are any updates on this issue in the next week, we will pick them up in that meeting.

@velvia
Copy link

velvia commented Nov 5, 2015

+1 to everything that @mdedetrich said. A clean room implementation (more for clean, idiomatic Scala API perspective) would provide the greatest return in the long run, esp w.r.t. Scala.js etc. Plus that File I/O is something people expect in a standard library....

@lihaoyi
Copy link

lihaoyi commented Dec 6, 2015

Looking back, there's a lot of interesting discussion in this thread, but the one thing that's clear to me is that the community failed to come to a consensus. People have differing use cases, requirements, and styles, problem scopes, and it seems doubtful we'll come to a consensus in the foreseeable future.

If we accept that we have not converged on any technical solution, now is the time to start thinking about the meta-solution: given we can't agree or decide, how can we get to a place where we could agree or decide at some point in the future? Even if scala-team/EPFL/soon-to-not-be-called-Typesafe don't bless/pick/write any IO library right-here-right-now, there are things they can do can do that would speed up the process of coming to a decision.

For example, if we decided that

"we'll wait and see who picks up adoption"

They could add links to the docs/tutorials/main-website like

"If you want to do more things with files, here's a list of 6 libraries you could try"

This would funnel new users towards the candidates, so the various libraries all get a steady stream of people vetting them and deciding they like them or not. If we decided the process was

"wait till people send PRs to port PlayFramework+SBT+whatever onto their own IO library, and do code-reviews then to decide which one we like"

Then there would be a different set of actions we could take to smoothen/speed-up that process

This is a reason why an explicit null-decision would be useful, v.s. just not deciding: deciding "we won't pick one now" would let us move on confidently to the next topic of discussion: how would we structure such a selective process and define the ending conditions? How would we make it fair, fast, and hopefully encourage the right kinds of behavior that optimizes for the things we want?

This then becomes a very managerial question, and arguably throwing a bunch of "people who write libraries" together wouldn't be the most effective way to answer it =P

@mdedetrich
Copy link

I think the biggest thing to get out of an IO library is to end the confusion, for new users, about what IO to use. @lihaoyi , the talk you gave at Scala By The Bay perfectly demonstrates the problem, to do silly IO stuff, users end up having to search stack overflow. There are around 4-5 solutions, some coming from Java, some coming from stuff like Apache Commons, stuff coming from Scala Source (which some people now accept as not that good of a library), and all are fairly verbose.

Looking back, there's a lot of interesting discussion in this thread, but the one thing that's clear to me is that the community failed to come to a consensus. People have differing use cases, requirements, and styles, problem scopes, and it seems doubtful we'll come to a consensus in the foreseeable future.

The whole "wait for people to use a common IO library" doesn't really hold water, it hasn't happened in some long time. I am sure, for example, that Rapture IO may be a great IO library, I however only found about this a few months ago. The other thing is, that other frameworks/libraries do not use this library, so we then risk ourselves of getting to the perverse situation that landed us with the same problem that we have with JSON

We should have an IO library, where as a new user, I can go to the scala website, and the docs will go something like

import scala.io.File

val f: File = File.open(".someFile")
val asyncF: Future[File] = File.openAsync(".someFile")

And then a bunch of your expected operations. I don't think anyone here is asking for a hyper specialized high performant IO library to be used for load balancers or something along those lines, there will always be a case for community making their own IO libraries for specialized circumstances. I believe the idea is to create an idiomatic, non Java, Scala IO library that the majority of users are happy with

@lihaoyi
Copy link

lihaoyi commented Dec 6, 2015

The whole "wait for people to use a common IO library" doesn't really hold water, it hasn't happened in some long time

I don't know why you quoted me because this has nothing to do with what I said =P

I never proposed inaction. Just a step back from the blind, single minded "let's just do something, community!" strategy that clearly hasn't worked.

I mean, it's great that you're so sure you know what to do to fix everything, but clearly lots of people disagree about things. What next? Arguing "This is what we should do, it's so obvious" just goes in circles.

@mdedetrich
Copy link

I don't know why you quoted me because this has nothing to do with what I said =P

Sorry if I wasn't clear. I was just confirming your point that "letting the community do it" didn't really work

gebner added a commit to gapt/gapt that referenced this issue Jul 29, 2016
The scala.io.Source code is somewhat deprecated, see
scala/slip#19

As an additional bonus, better-files contains nice functions to write
files, so you can now do the following on the CLI:

  file"buss3.p" < TPTPFOLExporter(BussTautology(3)).toString
@SethTisue
Copy link
Member

This could be revived under the new Scala Platform Process (http://www.scala-lang.org/blog/2016/11/28/spp.html).

@scala scala locked and limited conversation to collaborators Nov 30, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests