Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CON-537 "poor-mans UUID" Ensure each track file gets a unique CID to avoid re-upload collisions #4631

Merged
merged 4 commits into from
Jan 19, 2023
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
80 changes: 57 additions & 23 deletions creator-node/src/ffmpeg.ts
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ import ffmpegPath from 'ffmpeg-static'
// The typing for the ffmpeg-static model is incorrect
// so this line is here to fix that
const ffmpeg = (ffmpegPath as unknown as { path: string }).path
const uuid = require('uuid/v4')

/**
* Segments file into equal size chunks without re-encoding.
Expand Down Expand Up @@ -98,47 +99,74 @@ export function segmentFile(
})
}

/**
* Call `ffmpeg -i <filepath>` to get audio file information with ffmpeg
* NOTE - ffmpeg requires an output file always, but for our purposes we don't need one
* For now function always resolves using stderr, which is where the current command pipes output
* This function can be made more robust by either adding an output file to the ffmpeg command, or using ffprobe-static instead
*/
export async function getFileInformation(filePath: string) {
Copy link
Contributor Author

@SidSethi SidSethi Jan 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only used in tests - was gonna put in test file but wasn't working for some reason so just left here

return new Promise((resolve) => {
const proc = spawn(ffmpeg, ['-i', filePath])

// capture output
let stderr = ''
proc.stderr.on('data', (data) => (stderr += data.toString()))

proc.on('close', () => {
resolve(stderr)
})
})
}

/**
* Transcode file into 320kbps mp3 and store in same directory.
* @date 01-27-2022
* @param {Object} params
* @param {string} params.fileDir the directory of the uploaded track artifact
* @param {string} params.fileName the uploaded track artifact filename
* @param {LogContext} params.logContext the log context used to instantiate a logger
* @returns {Promise<string>} the path to the transcode
* @param fileDir the directory of the uploaded track artifact
* @param fileName the uploaded track artifact filename
* @param logContext the log context used to instantiate a logger
* @returns the path to the newly created transcoded file
*/
export async function transcodeFileTo320(
fileDir: string,
fileName: string,
{ logContext }: { logContext: LogContext }
) {
): Promise<string> {
const logger = genericLogger.child(logContext)

const sourcePath = path.resolve(fileDir, fileName)
const targetPath = path.resolve(fileDir, fileName.split('.')[0] + '-dl.mp3')
logger.info(`Transcoding file ${sourcePath}...`)
const destinationPath = path.resolve(
fileDir,
fileName.split('.')[0] + '-dl.mp3'
)
logger.info(
`Transcoding file at ${sourcePath} and saving to ${destinationPath}...`
)

// Exit if dl-copy file already exists at target path.
if (await fs.pathExists(targetPath)) {
logger.info(`Downloadable copy already exists at ${targetPath}.`)
return targetPath
// Exit if dl-copy file already exists at target path
if (await fs.pathExists(destinationPath)) {
logger.info(`Downloadable copy already exists at ${destinationPath}.`)
return destinationPath
}

return new Promise((resolve, reject) => {
// https://ffmpeg.org/ffmpeg-formats.html#hls-2
const args = [
'-i',
sourcePath,
'-metadata',
`fileName="${fileName}"`,
'-metadata',
`uuid="${uuid()}"`,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these 4 lines are the only SUBSTANTIVE changes in this PR

'-ar',
'48000', // TODO - move to configs
'-b:a',
'320k',
// "-vn" flag required to allow track uploading with album art
// https://stackoverflow.com/questions/20193065/how-to-remove-id3-audio-tag-image-or-metadata-from-mp3-with-ffmpeg
'-vn', // skip inclusion of video, process only the audio file without "video"
targetPath
destinationPath
]
logger.debug(`Spawning: ffmpeg ${args}`)
const proc = spawn(ffmpeg, args)

// capture output
Expand All @@ -150,18 +178,24 @@ export async function transcodeFileTo320(
proc.on('close', (code) => {
async function asyncFn() {
if (code === 0) {
if (await fs.pathExists(targetPath)) {
logger.info(`Transcoded file ${targetPath}`)
resolve(targetPath)
if (await fs.pathExists(destinationPath)) {
logger.info(`Transcoded file ${destinationPath}`)
resolve(destinationPath)
} else {
logger.error('Error when processing file with ffmpeg')
logger.error('Command stdout:', stdout, '\nCommand stderr:', stderr)
reject(new Error('FFMPEG Error'))
const logMsg =
'Error when processing file with ffmpeg' +
`\nCommand stdout: ${stdout}` +
`\nCommand stderr: ${stderr}`
logger.error(logMsg)
reject(new Error(`FFMPEG Error: ${logMsg}`))
}
} else {
logger.error('Error when processing file with ffmpeg')
logger.error('Command stdout:', stdout, '\nCommand stderr:', stderr)
reject(new Error('FFMPEG Error'))
const logMsg =
'Error when processing file with ffmpeg' +
`\nCommand stdout: ${stdout}` +
`\nCommand stderr: ${stderr}`
logger.error(logMsg)
reject(new Error(`FFMPEG Error: ${logMsg}`))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here and above - only changes are to return error msg out of promise in addition to just logging it

}
}
asyncFn()
Expand Down
111 changes: 106 additions & 5 deletions creator-node/test/ffmpeg.test.js
Original file line number Diff line number Diff line change
@@ -1,13 +1,17 @@
const { segmentFile } = require('../src/ffmpeg')

const path = require('path')
const fs = require('fs-extra')
const assert = require('assert')
const { libs } = require('@audius/sdk')
const Utils = libs.Utils

const { segmentFile, transcodeFileTo320, getFileInformation } = require('../src/ffmpeg')

describe('test segmentFile()', () => {
// Create the segments directory to store segments in.
// Middleware would normally handle this, however, in this test
// context, segmentFile() is unit tested directly without the middleware.
/**
* Create the segments directory to store segments in.
* Middleware would normally handle this, however, in this test
* context, segmentFile() is unit tested directly without the middleware.
*/
before(async () => {
const segmentsDirPath = path.join(__dirname, 'segments')
if (!(await fs.pathExists(segmentsDirPath))) {
Expand Down Expand Up @@ -89,6 +93,103 @@ describe('test segmentFile()', () => {

// check that all the expected segments were found
assert.deepStrictEqual(allSegmentsSet.size, 0)
})
})

function computeTranscodeDestinationPath (fileDir, fileName) {
return path.resolve(fileDir, fileName.split('.')[0] + '-dl.mp3')
}

/**
* Ignores milliseconds bc of inaccuracy in comparison due to how we transcode
* e.g. Input file info shows:
* [mp3 @ 0x72367c0] Estimating duration from bitrate, this may be inaccurate
* Duration: 00:03:07.44, start: 0.000000, bitrate: 320 kb/s
* and output file shows:
* Duration: 00:03:07.46, start: 0.023021, bitrate: 320 kb/s
* Note these durations are the same after accounting for the start offset (not sure why thats there)
*/
async function getAudioFileInformation (filePath) {
const info = await getFileInformation(filePath)
if (!info) throw new Error('Failed to get file information')

let duration = /Duration: (\d{2}:\d{2}:\d{2}\.\d{2})/g.exec(info.toString())
if (!duration) throw new Error('Failed to find file duration')
duration = duration[1].split('.')[0]

let metadata = {}
// Extract the metadata properties using a regular expression
const properties = /^\s{4}(\w+)\s+:(.+)/gm
let match
while ((match = properties.exec(info.toString())) !== null) {
metadata[match[1]] = match[2].trim()
}

return {
info,
duration,
metadata
}
}

describe('test transcodeFileTo320()', () => {
const fileDir = __dirname
const fileName = 'testTrack.mp3'
const inputFilePath = path.resolve(fileDir, fileName)

// Ensure no file exists at destinationPath
beforeEach(async () => {
const destinationPath = computeTranscodeDestinationPath(fileDir, fileName)
await fs.remove(destinationPath)
})

// Ensure no file exists at destinationPath
after(async () => {
const destinationPath = computeTranscodeDestinationPath(fileDir, fileName)
await fs.remove(destinationPath)
})

it('Happy path - Transcode file to 320kbps, ensure duration matches input file, and metadata properties correctly defined', async () => {
const { duration: inputFileDuration } = await getAudioFileInformation(inputFilePath)
const transcodeFilePath = await transcodeFileTo320(fileDir, fileName, {})
const { duration: outputFileDuration } = await getAudioFileInformation(transcodeFilePath)
assert.strictEqual(inputFileDuration, outputFileDuration)
})

it('Happy path - Ensure works without overrideIfExists param set', async () => {
const { duration: inputFileDuration } = await getAudioFileInformation(inputFilePath)
const transcodeFilePath = await transcodeFileTo320(fileDir, fileName, {})
const { duration: outputFileDuration } = await getAudioFileInformation(transcodeFilePath)

assert.strictEqual(inputFileDuration, outputFileDuration)
})

it('Confirm same file transcoded with different metadata has different CID but same duration & fileSize', async () => {
const filePath1 = await transcodeFileTo320(fileDir, fileName, {})
const { duration: duration1, metadata: metadata1 } = await getAudioFileInformation(filePath1)
const { size: size1 } = await fs.stat(filePath1)
const cid1 = await Utils.fileHasher.generateNonImageCid(filePath1)

// Remove file before re-transcoding
await fs.remove(filePath1)

const filePath2 = await transcodeFileTo320(fileDir, fileName, {})
const { duration: duration2, metadata: metadata2 } = await getAudioFileInformation(filePath2)
const { size: size2 } = await fs.stat(filePath2)
const cid2 = await Utils.fileHasher.generateNonImageCid(filePath2)

assert.strictEqual(filePath1, filePath2)

// Assert duration and size are same
assert.strictEqual(duration1, duration2)
assert.strictEqual(size1, size2)

// Assert metadata values for fileName and encoder are same, but uuid is different
assert.strictEqual(metadata1.fileName, metadata2.fileName)
assert.strictEqual(metadata1.encoder, metadata2.encoder)
assert.notStrictEqual(metadata1.uuid, metadata2.uuid)

// Assert CIDs are different
assert.notStrictEqual(cid1, cid2)
})
})