-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider removing ability to filter on file extensions #74
Comments
My apologies for assuming it came from the HTML Forms. They looked very similar. I would rather have the callers convert the file extension to a guessed mime type rather than supporting file extensions. For example, a docx file could be converted to a mime type and be sent to the web app. Why requiring support for file extensions in the entire pipeline? I also wonder if we could look at what Android is doing. My understanding is that the share api isn't very far from the intent mechanism in Android and it seems that it's mostly based on mime types, isn't it? |
Android intent filters do seem to only deal with MIME types, not file extensions. However, Android is much less focused on files in general (and editors for particular file types) than a desktop environment, and that is essentially what we're trying to build. (Even though Web Share Target isn't particularly desktop-focused, we will be with File Handlers which will use the same syntax.)
The problem with this approach is that it centralizes registration of new file types, and also hands control of it to individual user agents (not standardized). There are two key issues here:
It's not the "entire pipeline". We are proposing a very simple thing, which is a suffix match on the filename. |
I don't know what other browsers do but when you open foo.ext, Chrome will first try to find if it is aware of the MIME type associated with ".ext". If it can't find anything, it will ask the system if it is aware of something. All the work done after that is based on the MIME type. I wonder if it would be more interesting if in order to solve 1. above, we may want to add an API that registers a MIME/extension association. This may have benefits larger than just this API. Regarding 2., as mentioned, it sounds that this is already what Chrome relies on. I would expect that this is also what Android does because in the Files application, it must convert the file name into a MIME type in order to call the right VIEW intent. I do not know if Android has an API to add extension/MIME type associations though. Something worth considering too is if we allow extensions and mime types, we would need to support both on all platforms. With mime types being first class citizens on Android, do we know if we can implement a decent extension-based filtering on the platform? |
Implementations can do what makes sense on each platform. "An implementation MUST support filtering on MIME types or filtering on file extensions, or both." |
This is also being worked on - the explainer was presented to the Service Worker and Web Platform working groups at TPAC. This allows installed web applications to register themselves to handle specific MIME types and/or file extensions. I will be sending an Intent to Implement soon. |
I think what @mounirlamouri means is not an API for registering an app as a MIME type handler, but an API for registering a {file extension, MIME type} association pair, so that instead of having extensions here, we just have MIME types here and rely on the dynamically updated mapping table to map extensions to MIMEs. I don't see how that fits here, though, or how it would work in general. Would this be a global mapping (where the user agent keeps track of all the associations in a table, and sites can add new entries)? Or per-site? If it's a global mapping, I don't like it, because it means that the way share targets / file handlers behave on site A is influenced by whether you've previously been to site B. If it's per-site, then I don't really see the point of adding the extra layer of indirection when you could just say "this share target handles files with these extensions."
This argument is basically "Android apps that accept specific MIME types must rely on a proprietary ext -> MIME table in the Files app, therefore why can't Web apps to too?" Answer: Because it's the Web and we shouldn't rely on any proprietary table for apps to work properly. If you write an Android app, you only have to make sure that your file extension is mapped correctly by the Android files app in order to make sure it works. With a Web Share Target, you'd have to make sure it's mapped correctly by all the OS/user agents' mapping tables, and there would be no guarantee of it mapping correctly on all platforms. Further, Android apps using the share system aren't likely to invent their own new file extensions since sharing is all about interop between apps. Whereas when we do file handlers, that's much more likely since you might just want to associate your own proprietary file type with your app (not something you really have on Android). |
I think it's incorrect to say that an implement must support one OR the other. An implementation must supports BOTH. Because it would be a valid for a page to only support |
No, the spec isn't written to achieve compat in which share targets are presented to a user, e.g.
We could emphasize with a non-normative note that if a web app supports only |
We could require each files entry to specify at least one mime type if it specifies any extensions, and to specify at least one extension if it specifies any MIME type more specific than |
Would it be reasonable to have something like:
With a format like this, we could guarantee that each entry has at least one valid content type and a valid mime type (named |
There is no obvious extension for applications supporting |
If we went this route, I assume "/" would implicitly map to extension "*" without any need to specify it. For "type/", we'd want to let you specify a list of extensions. For example, if your app is accepting "image/", on systems that don't know about MIME types, you'd want that to translate to ["gif", "jpg", "jpeg", "png"] and possibly other image formats your app supports. |
Please correct me if I'm wrong, but I think that we're trying to solve a few problems here:
Is that correct? Am I missing anything? I'd like to ask if (1) is really true? Are there signs that desktop OS's are heading in this direction also? Even putting that aside though, I agree that the web is primarily mime type based (from my very limited expertise in the area) so it may be good to base things around mime types primarily. |
Would the following approach meet all the requirements?
Each extension may only appear once in the extension mapping. Every MIME type in an accept must appear in the extension mapping. If an accept contains a This guides developers towards specifying share targets that should work on all platforms. If the platform only supports share target selection based on MIME types, then |
It's not really "head toward" but as you pointed out later in your reply, the platform is rarely using extensions. I think there is a valid use case around custom files here though.
👍
I think the UA will handle this for regular files as it does today. The real UC is the point below.
👍
I'm not sure what that means but in general I like having low level APIs that we can compose together. I realise that it's not easy/worth it in this case.
Why? I like your proposal but I wonder if we could leave |
We could leave it optional, and then any web share targets that omit the mapping would not appear in the share menu on platforms like Windows that use extensions. I don't think we should be introducing a new web standard for the common MIME types and their common extensions. Every time a browser added support for a new image or video format, they would want it added to the list. We'd need a new repo dedicated to those conversations (which could degenerate into Wikipedia notability discussions). If the default mapping only included file types that are understood by all browsers, then "image/*" would accept more images types on platforms that use MIME types (they'd ignore the mapping and use the MIME type), than on platforms that use file extensions and the default mapping, and it wouldn't be obvious why. Developers might not be aware of the default mapping. There are a couple of lists with the extensions people might want: MDN's "Complete list of MIME types" or Apache's "mime.types" |
@inexorabletash what do you think? We will want Web Share Target and File Handling to use the same approach. We have a few main options:-
|
@benfredwells for thoughts also Thanks @ericwilligers. So the main point of contention seems to be whether we bake in a default mapping of file extensions to mime types into the platform? The downsides you listed are:
The main pro to a default mapping of extension<->mime type is that it means that developers don't have to worry about specifying extensions for common mime types. |
Long reply after a long weekend - sorry for the wall of text... Given the issues we've seen with developers struggling with mime types due to inconsistencies across platforms, browsers, and individual devices, I'm very skeptical with attempting to elevate mime types above extensions. I think they're a lovely idea much like other additional file metadata in directories/auxiliary streams/resource forks, which breaks down rapidly in the face of sharing content across heterogeneous systems where a file being (name, bytes) is all you can count on. File extensions are widely understood by users, operating systems, and tools, even if they are often hidden from users in many cases, and you can view them as a hack - smuggling metadata for type in the name field - but it's been widely successful hack.
Yeah, so I filed w3c/FileAPI#51 years ago. Seemed like a good idea at the time. Looking at it from another perspective, it's inventing a way to turn data we have (extensions) into data we're missing (mime type). This might help developers in some cases, but I think practically they're going to rely on extensions (to filter client-side in the UI) + content verification (server-side). I'm pretty firmly in the "lists containing MIME types and/or file extensions (current spec, reviewed by TAG, same syntax as HTML forms)" camp. My reaction to the other options: having to declare a mapping is a burden on developers; they will likely end up copying/pasting a chunk of the manifest, and shaking fists at us dumb browser developers. Building a mapping into browsers first couples this feature to one that hasn't even been designed yet, and seems like something that could take a long time to get agreement on and ship, with challenges around versioning/iterating. I think even with the "list types and/or extensions" option there's lots of room for evolution by the UA to improve the experience: if ".jpg" is listed, user agents could infer that the desired canonical type is "image/jpeg" via a UA/OS mapping and offer ".jpeg" files or even transcode PNG files. |
@inexorabletash there is already a mapping in browsers and they otherwise rely on the system mappings. I am not sure that in practice this would be a burden for authors as they would in most cases rely on known extensions/mime types already handled by the browsers. It would only apply when they have to deal with custom files. |
Current browser/system mappings are inconsistent across browsers, operating systems, and even individual user's systems. For example, on Windows, the mapping of extension to mime type depends on what applications are installed. "docx" may be known on a machine where Microsoft Office is installed, or be an unknown extension on another. Developers will not be aware that "docx" needs to be treated as a custom file type; testing that "application/vnd.openxmlformats-officedocument.wordprocessingml.document" will work on Chrome on one Windows 7 machine is insufficient, which complicates the testing matrix even beyond the pain we inflict on developers today. |
The goal from my perspective is to avoid cases in which files cannot be shared because the developer missed a particular MIME type or extension that is only applicable on a minority platform. In my opinion the mapping table suggested by @ewilligers mitigates this problem but only if we don't fall back to system defaults because as @inexorabletash points out those expand the testing matrix. The behavior of the manifest should be completely described by the specifications and the manifest itself. |
@inexorabletash my intent is for things like docx to be handled by the author (with a extension/mime type association in the manifest) while basic web platform types like jpg, gif, png, webm, wav wouldn't need to have an association as browsers know about these for certain. The intent here is to avoid a design that fully relies on file extensions or requires verbose mandatory mappings. I like @reillyeon's suggestion but the cost of adding the mapping to a spec and checking compat with tests and likely change Chromium's behaviour isn't trivial. |
The default mapping could be empty, so all file extensions need to be manually specified by the developer. This doesn't actually seem so bad because native file selection APIs already require developers to list the file extensions they are interested in. This only adds the requirement that they must be mapped to a MIME type which helps us with the desire for all content entering the platform to be appropriately tagged with a content type. |
The mapping can be a bit tedious as it can lead to mistakes or omissions (is .jpeg included?). Also, it can make wildcard a bit hard to handle: how can one map |
The problem as I see it is that extensions are not standardized anywhere so it has always been up to developers to know what extensions they want to support. Yes, some will forget to include We could perhaps make it less tedious by allowing multiple extensions to be mapped to the same MIME type in a single entry if there are enough cases (i.e. The mapping is extension → MIME type. Mapping to |
@raymeskhoury asked my opinion so I'll give it, but y'all should trust the experienced specs folks more than me. My opinion is that the original proposal "lists containing MIME types and/or file extensions (current spec, reviewed by TAG, same syntax as HTML forms)" is both simpler and clearer than the mapping that has been outlined. Chrome Apps used this and we had zero complaints: it just worked, and it worked across a variety of platforms. I understand that MIME types are in many ways better (e.g. more flexible) than file extensions, but practically they both have their place and I worry that developers will bear the brunt of us trying to make MIME types the primary primitive for idealistic reasons. To explain what I mean by 'clearer', it isn't obvious to me what happens to files that don't have extensions in the map but which are recognized by the system as a listed particular MIME type. I assume the match is actually an OR: things the UA recognizes as matching the MIME type or things which match the extension, but that is basically the original proposal expressed in a confusing way. |
Thanks all for chiming in.
Hmm, I think that would only make it marginally less tedious. Thinking about it some more, it seems to me that we shouldn't go down the path of forcing developers to specify a full mapping. We know that most apps are going to use the same common types and we're asking developers to put the same mapping down every time.
I think that's a good way to summarise the benefit of forcing a mapping. If a developer only specified .docx in the manifest, the app would receive .docx on Windows but not on Android (since it only supports mime-types). That seems bad because it makes the web app feel like it's not cross-platform. I only really see 2 ways to address this:
(1) is simpler because we don't maintain a centralized list, but it carries some risk of inconsistencies across platforms as has been mentioned. Each platform will have a different set of these and to really make sure their app works the way they want it, developers would have to test on all the platforms in theory. I think the risk is very low because the vast majority of apps are going to use standard mime types that all UAs would be able to know about. But it's still a risk. I wonder if there is a way we can start with (1) and move to (2) later if it turns out we need to do the work (due to too much platform inconsistency). We could start with a similar proposal to @ericwilligers. You would only specify mime types in the manifest and the UA would guess the extensions which correspond when needed, based on its own list: The vast majority of developers would only need to specify this syntax and would not need to specify extensions ever. In rare cases where weird mime types are being handled, they could also specify extensions: Later down the road, if we find that there is too much inconsistency across platforms, we could spec out the default mapping. Maybe this is still all overkill though and the risk for (1) is low enough that we don't need to every do (2) and we can just stick with the current spec. Thoughts? |
If we do attempt (2) a look into the history of |
To clarify, I didn't suggest to have a mapping in the spec but to have UA have a default mapping (I do not believe having it defined in spec is mandatory given that the Web already need this to work). What I suggested is more for the website to be able to specify its custom mapping. In other words, when a file is received, the UA could check its hard coded mapping, the system mapping and the site mapping. I think the added amount of work is mostly if this spec has to maintain the "official" mapping. |
@marcoscaceres @aliams do other browser vendors have thoughts on this? I'm going to number the proposals we've had so far:
In the example above, the mime types image/gif and image/png are in the standard mapping, so browsers would all know what types they map to. application/x-photoshop isn't in the mapping so listing it would raise an error unless the developer also specified what extensions mapped to it.
Pro: We ensure developers consider both platforms that only deal with mime types as well as platforms that only deal with extensions. No need to maintain the standard mapping. |
Thank you @raymeskhoury for enumerating the options discussed as well as providing summaries for the pros and cons. I think I like option 1 as it seems to keep things simple and does not involve automatically determining values on behalf of the developer as you would get with option 2. I think options 3 and 4 can make it challening for web developers to keep track of the various extensions and content type matches and they may not always get them right. It may perhaps be useful to get some insight from various web developers and see what they think about these different options. |
@jakearchibald can you suggest some developers who would want to use web share target/file handlers who might have an opinion on the above? Thanks! |
I have opinions! 😄 I've only skimmed the thread (a summary might be useful if you're looking for others to join in), so apologies if I'm retreading old ground. Mixing mimetypes & extensions already happens with the accept attribute on input elements. Inferring mimetypes from extensions already happens with Being able to specify file extensions feels important for types the browser doesn't understand. We hit this with squoosh.app, where we wanted the browser to be able to open I think option 2 is best since the format matches |
+1 to @jakearchibald's proposal to align the behavior of Great point in Jake's comment that apps through wasm or JavaScript can indeed be able to master more than the browser itself, like the |
Btw, I like the idea of option 3, but only if it can be provided to other file-accepting APIs. It'd be terrible if we end up in a place where I can Share Target a file, but can't I guess you could have a format like Since a solution to this problem should target file inputs and drag & drop, it seems fine for share target to do the same thing as other file inputs (option 2) until a solution can be agreed & implemented. |
Thanks all, it seems like we're converging on option 2 as a step forward and working out something like option 3 in the longer term. We will need to update the spec slightly to give browsers the freedom to guess the matching mime type/extension but otherwise I think we're ok. @jakearchibald any suggestions on where we should create an issue to track that? |
https://github.com/whatwg/html/, since that's where https://html.spec.whatwg.org/multipage/input.html#attr-input-accept is defined. |
Raised whatwg/html#4459. |
See discussion starting on this Chromium code review. Pasting relevant quotes:
@mounirlamouri:
@ericwilligers:
@mgiuca:
My position remains the same as in the above quote.
The text was updated successfully, but these errors were encountered: