By: Yash Khandelwal & Greg Whitworth
Currently editing video within the browser is a very complex task as there isn't any straight forward approach to decoding the encoded video file to produce a raw stream that can do common video editing capabilities such as trimming or concatenation. Normally, web developers will do client side editing in three potential ways:
- They will create their own client-side pipeline to decode the video file(s) to access the stream(s); do whatever edits they need to and then recode the video into the desired format.
- They allow the user to do artificial edits keeping JS based data structure of the edits doing creative work to move the player position to give the illusion that the adjustments that you've made have occurred on the client. Then upon saving this document this document of events is sent to the server where the actual video editing occurs.
- They will capture video content via the MediaRecorder which provides a Blob and then utilize the slice method to trim the content where desired.
All of these approaches have their pros and cons, the first one requires either knowing the video formats that the application will be working with or bundle a full zipped version of library, such as ffmpeg, in WASM to handle multiple codecs. This can result in large file sizes (at times up to 7MB zipped) to enable client side editing. This does however unlock all of the necessary needs of trimming and concatenation.
With the second approach, this likewise has the benefit of being able to handle the use cases denoted above without having to download the larger files. The negative implications of this approach is that the server side solution can produce bottlenecks in an editing queue and costs associated with having dedicated servers for doing the video edits. Additionally, this may result in numerous redundant edits in the queue since upon saving it adds the editing to the queue. This can lead to increased server side costs and a slow turn around time for the end user.
The final approach, allows you to avoid needing to download a large file or send it to a server, but it requires that the editing occurs at 1x speed. For example, if you have a 60 minute video and want to trim it to 20 minutes, you'll need to wait 20 minutes for the new blob to be created. With our early prototypes, this same work can be done in less than 3 seconds.
This API is a starting point to enable video editing on the client that not only enables the capabilities listed above without the need to handle all of above overhead for the most common web based video editing scenarios.
We have worked with Flipgrid to validate that this approach tackles their video editing needs and significantly improves their user experience.
We're proposing a MediaBlob
that extends the regular blob and a MediaBlobOperation
which will be used to batch the proposed media editing operations. Based on initial feedback from customers that have a need for this technology, they needed concatenation and trimming capabilities, as such that is what we started with.
[Exposed=(Window,Worker), Serializable]
interface MediaBlob : Blob {
constructor(Blob blob);
readonly attribute long long duration;
};
When the MediaBlob
constructor is invoked, the User Agent MUST run the following steps:
- Let blob be the constructors first argument
- Run the steps in Handling MimeTypes
- If the return value is true, return the new MediaBlob
- else throw the DOMException that was returned.
- When the duration property is called the User Agent MUST return the length of the Blob in milliseconds
let mediaBlob = new MediaBlob(blob); // blob is a Blob object for a valid media
console.log(mediaBlob.duration) // Outputs 480000 = 8 minutes
[Exposed=(Window,Worker), Serializable]
interface MediaBlobOperation {
constructor(MediaBlob mediaBlob);
void trim(long long startTime, long long endTime);
void split(long long time);
void concat(<Sequence<MediaBlob>);
Promise<Sequence<MediaBlob>> finalize(optional DOMString mimeType);
};
When the MediaBlobOperation
constructor is invoked, the User Agent MUST run the following steps:
- Let mediaBlob be the constructors first argument
- If mediaBlob is not undefined, return the new MediaBlobOperation
- else throw a "DataError" DOMException
The MediaBlobOperation
methods Trim, Concat and Split will not modify the MediaBlob when invoked. These methods will be tracked and executed only when Finalize is called. The benefit of batching these operations is to save memory and provide efficiency. Due to the nature of Split operation, it should always be the last method if called before calling Finalize.
The trim method is utilized to create the segment of time that the author would like to keep; the remaining content on either end, if any, is removed.
startTime
: The starting time position in milliseconds RequiredendTime
: The ending time position in milliseconds Required
- Let x be the byte-order position, with the zeroth position representing the first byte
- Let O represent the blob to be trimmed
The User Agent will execute the following when finalize is called.
- Check for errors
- Move x to the startTime within O
- Consume all of the bytes between the startTime and the endTime and place these bytes in a new MediaBlob object trimmedBlob
let mbo = new MediaBlobOperation(new MediaBlob(blob));
mbo.trim(240000, 360000);
mbo.finalize().then(function(mediaBlobs) {
// mediaBlobs[0] will be the trimmed blob of 2 min duration
});
The split method allows the author to split a blob into two separate MediaBlobs at a given time. Due to the nature of this operation, it should be the last operation before calling finalize().
time
: The time, in milliseconds, at which the blob is to be split into two separate MediaBlobs.
- Let time represent the split location
- Let O represent the blob to be split
The User Agent will execute the following when finalize is called.
- Check for errors
- Consume all of the content prior to the split location and place into mediaBlob1
- Place the remaining content into mediaBlob2
- Place both mediaBlob1 and mediaBlob2 into a sequence
let mbo = new MediaBlobOperation(new MediaBlob(blob));
mbo.split(2000);
mbo.finalize().then(function(mediaBlobs) {
// mediaBlobs will be an array of two MediaBlobs split at 2 seconds
});
This method allows you to take two MediaBlob blobs and concatenate them.
blob
: This is the MediaBlob to concatenate with the current MediaBlob
- Let m1 represent the first MediaBlob which will be the MediaBlob from the MediaBlobOperation that has the concat method called upon
- Let m2 represent the second MediaBlob which will be the MediaBlob that will be concatenated with m1
The User Agent will execute the following when finalize is called.
- Check for errors
- Produce a new MediaBlob and copy the bytes from m1 into this new blob, followed by m2
let mbo = new MediaBlobOperation(new MediaBlob(blob1));
mbo.concat(new MediaBlob(blob2));
mbo.finalize().then(function(mediaBlobs) {
// mediaBlobs[0] will be a concatenated MediaBlob of blob1 and blob2
});
This method will execute all the tracked operations and return an array of MediaBlob object based on the mimeType value.
mimeType
: DOMString representation of the mimetype [RFC2046] as the expected output
- Let O be the MediaBlobOperation context object on which the finalize method is being called.
- The User Agent will perform error checking.
- If mimeType is provided, run the steps in Handling MimeTypes
- If the return value is true, continue
- else reject the promise with the DOMException that was returned.
- If no errors, the User Agent will execute all the tracked operations and get a sequence of MediaBlobs.
- The operations will be executed in a sequential order in which they are added and it is up to web developers to batch the operations in the most optimized way.
- This is necessary to provide better error handling.
- The User Agent will create a new sequence of MediaBlob based on the mime type provided.
- Resolve the promise with the sequence of MediaBlob
// let the mimeType of the blob be 'video/webm; codecs=vp8,opus;'
let mbo = new MediaBlobOperation(new MediaBlob(blob))
mbo.finalize('video/mp4; codecs=h264,aac;').then(function(mediaBlobs) {
// mediaBlobs[0] will be a MediaBlob object encoded with H.264 video codec and AAC audio codec
});
let mbo = new MediaBlobOperation(new MediaBlob(blob));
mbo.trim(4000, 360000);
mbo.concat(new MediaBlob(blob2));
mbo.finalize().then(function(mediaBlobs) {
// mediaBlobs[0] will be a concatenated MediaBlob of blob1 (which will be trimmed) and blob2
});
When finalize() is called, the User Agent will perform these basic checks for the operations that are batched. This error checking should be done before executing any of the operations.
For trim()
- Let O represent the blob to be trimmed
- If startTime is less than 0 OR endTime is greater than the O.duration OR startTime is greater than the endTime:
- Reject promise with a "InvalidStateError" DOMException
For split()
- Let O represent the blob to be split
- If time is less than 0 OR is greater than O.duration OR this is not the last operation before finalize() was called
- Reject promise with a "InvalidStateError" DOMException
For concat()
- Let m1 represent the first MediaBlob which will be the MediaBlob from the MediaBlobOperation that has the concat method called upon
- Let m2 represent the MediaBlob that is passed in to concat method to be concatenated with m1
- If the mimeType of m1 does not equal the mimeType of m2:
- Reject promise with a "InvalidStateError" DOMException
The DOMException.message must contain:
- Operation name
- The sequence number indicating the position of the operation
Example:
let mbo = new MediaBlobOperation(new MediaBlob(blob));
mbo.trim(0,5000); // Trim from 0 to 5 secs
mbo.split(7000); // Split the MediaBlob at 7 secs
mbo.finalize().then(function(mediaBlobs) { })
.catch((error) => {
// sample error.message: "Split called on sequence 2: The time provided is greater than the duration of the MediaBlob."
});
The Finalize method can take a DOMString of the mime-type the author desires to have returned from the method. To determine if the mime-type is supported, do the following:
- Determine the mime type of the blob by using MIME sniffing
- If the mime type is not a valid mime type
- OR the mime type contains a media type or media subtype that the UserAgent can not render:
- return a "NotSupportedError" DOMException
- else
- return true
mimeType specifies the media type and container format for the recording via a type/subtype combination, with the codecs and/or profiles parameters [RFC6381] specified where ambiguity might arise. Individual codecs might have further optional specific parameters.