Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CON-537 "poor-mans UUID" Ensure each track file gets a unique CID to avoid re-upload collisions #4631

Merged
merged 4 commits into from
Jan 19, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion creator-node/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,6 @@
"@opentelemetry/semantic-conventions": "^1.5.0",
"@opentelemetry/tracing": "^0.24.0",
"@solana/web3.js": "1.31.0",
"JSONStream": "^1.3.5",
"async-retry": "^1.3.3",
"axios": "^0.19.2",
"base64-url": "^2.3.3",
Expand Down Expand Up @@ -76,6 +75,7 @@
"hashids": "2.2.10",
"ioredis": "5.2.4",
"jimp": "^0.6.1",
"JSONStream": "^1.3.5",
"lodash": "4.17.21",
"multer": "^1.4.0",
"pg": "^8.0.3",
Expand Down
80 changes: 57 additions & 23 deletions creator-node/src/ffmpeg.ts
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ import ffmpegPath from 'ffmpeg-static'
// The typing for the ffmpeg-static model is incorrect
// so this line is here to fix that
const ffmpeg = (ffmpegPath as unknown as { path: string }).path
const uuid = require('uuid/v4')

/**
* Segments file into equal size chunks without re-encoding.
Expand Down Expand Up @@ -98,47 +99,74 @@ export function segmentFile(
})
}

/**
* Call `ffmpeg -i <filepath>` to get audio file information with ffmpeg
* NOTE - ffmpeg requires an output file always, but for our purposes we don't need one
* For now function always resolves using stderr, which is where the current command pipes output
* This function can be made more robust by either adding an output file to the ffmpeg command, or using ffprobe-static instead
*/
export async function getFileInformation(filePath: string) {
Copy link
Contributor Author

@SidSethi SidSethi Jan 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only used in tests - was gonna put in test file but wasn't working for some reason so just left here

return new Promise((resolve) => {
const proc = spawn(ffmpeg, ['-i', filePath])

// capture output
let stderr = ''
proc.stderr.on('data', (data) => (stderr += data.toString()))

proc.on('close', () => {
resolve(stderr)
})
})
}

/**
* Transcode file into 320kbps mp3 and store in same directory.
* @date 01-27-2022
* @param {Object} params
* @param {string} params.fileDir the directory of the uploaded track artifact
* @param {string} params.fileName the uploaded track artifact filename
* @param {LogContext} params.logContext the log context used to instantiate a logger
* @returns {Promise<string>} the path to the transcode
* @param fileDir the directory of the uploaded track artifact
* @param fileName the uploaded track artifact filename
* @param logContext the log context used to instantiate a logger
* @returns the path to the newly created transcoded file
*/
export async function transcodeFileTo320(
fileDir: string,
fileName: string,
{ logContext }: { logContext: LogContext }
) {
): Promise<string> {
const logger = genericLogger.child(logContext)

const sourcePath = path.resolve(fileDir, fileName)
const targetPath = path.resolve(fileDir, fileName.split('.')[0] + '-dl.mp3')
logger.info(`Transcoding file ${sourcePath}...`)
const destinationPath = path.resolve(
fileDir,
fileName.split('.')[0] + '-dl.mp3'
)
logger.info(
`Transcoding file at ${sourcePath} and saving to ${destinationPath}...`
)

// Exit if dl-copy file already exists at target path.
if (await fs.pathExists(targetPath)) {
logger.info(`Downloadable copy already exists at ${targetPath}.`)
return targetPath
// Exit if dl-copy file already exists at target path
if (await fs.pathExists(destinationPath)) {
logger.info(`Downloadable copy already exists at ${destinationPath}.`)
return destinationPath
}

return new Promise((resolve, reject) => {
// https://ffmpeg.org/ffmpeg-formats.html#hls-2
const args = [
'-i',
sourcePath,
'-metadata',
`fileName="${fileName}"`,
'-metadata',
`uuid="${uuid()}"`,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these 4 lines are the only SUBSTANTIVE changes in this PR

'-ar',
'48000', // TODO - move to configs
'-b:a',
'320k',
// "-vn" flag required to allow track uploading with album art
// https://stackoverflow.com/questions/20193065/how-to-remove-id3-audio-tag-image-or-metadata-from-mp3-with-ffmpeg
'-vn', // skip inclusion of video, process only the audio file without "video"
targetPath
destinationPath
]
logger.debug(`Spawning: ffmpeg ${args}`)
const proc = spawn(ffmpeg, args)

// capture output
Expand All @@ -150,18 +178,24 @@ export async function transcodeFileTo320(
proc.on('close', (code) => {
async function asyncFn() {
if (code === 0) {
if (await fs.pathExists(targetPath)) {
logger.info(`Transcoded file ${targetPath}`)
resolve(targetPath)
if (await fs.pathExists(destinationPath)) {
logger.info(`Transcoded file ${destinationPath}`)
resolve(destinationPath)
} else {
logger.error('Error when processing file with ffmpeg')
logger.error('Command stdout:', stdout, '\nCommand stderr:', stderr)
reject(new Error('FFMPEG Error'))
const logMsg =
'Error when processing file with ffmpeg' +
`\nCommand stdout: ${stdout}` +
`\nCommand stderr: ${stderr}`
logger.error(logMsg)
reject(new Error(`FFMPEG Error: ${logMsg}`))
}
} else {
logger.error('Error when processing file with ffmpeg')
logger.error('Command stdout:', stdout, '\nCommand stderr:', stderr)
reject(new Error('FFMPEG Error'))
const logMsg =
'Error when processing file with ffmpeg' +
`\nCommand stdout: ${stdout}` +
`\nCommand stderr: ${stderr}`
logger.error(logMsg)
reject(new Error(`FFMPEG Error: ${logMsg}`))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here and above - only changes are to return error msg out of promise in addition to just logging it

}
}
asyncFn()
Expand Down
77 changes: 72 additions & 5 deletions creator-node/test/ffmpeg.test.js
Original file line number Diff line number Diff line change
@@ -1,13 +1,19 @@
const { segmentFile } = require('../src/ffmpeg')

const path = require('path')
const fs = require('fs-extra')
const assert = require('assert')
const { libs } = require('@audius/sdk')
const Utils = libs.Utils

const { computeTranscodeDestinationPath, getAudioFileInformation } = require('./lib/helpers')

const { segmentFile, transcodeFileTo320 } = require('../src/ffmpeg')

describe('test segmentFile()', () => {
// Create the segments directory to store segments in.
// Middleware would normally handle this, however, in this test
// context, segmentFile() is unit tested directly without the middleware.
/**
* Create the segments directory to store segments in.
* Middleware would normally handle this, however, in this test
* context, segmentFile() is unit tested directly without the middleware.
*/
before(async () => {
const segmentsDirPath = path.join(__dirname, 'segments')
if (!(await fs.pathExists(segmentsDirPath))) {
Expand Down Expand Up @@ -89,6 +95,67 @@ describe('test segmentFile()', () => {

// check that all the expected segments were found
assert.deepStrictEqual(allSegmentsSet.size, 0)
})
})

describe('test transcodeFileTo320()', () => {
const fileDir = __dirname
const fileName = 'testTrack.mp3'
const inputFilePath = path.resolve(fileDir, fileName)

// Ensure no file exists at destinationPath
beforeEach(async () => {
const destinationPath = computeTranscodeDestinationPath(fileDir, fileName)
await fs.remove(destinationPath)
})

// Ensure no file exists at destinationPath
after(async () => {
const destinationPath = computeTranscodeDestinationPath(fileDir, fileName)
await fs.remove(destinationPath)
})

it('Happy path - Transcode file to 320kbps, ensure duration matches input file, and metadata properties correctly defined', async () => {
const { duration: inputFileDuration } = await getAudioFileInformation(inputFilePath)
const transcodeFilePath = await transcodeFileTo320(fileDir, fileName, {})
const { duration: outputFileDuration } = await getAudioFileInformation(transcodeFilePath)
assert.strictEqual(inputFileDuration, outputFileDuration)
})

it('Happy path - Ensure works without overrideIfExists param set', async () => {
const { duration: inputFileDuration } = await getAudioFileInformation(inputFilePath)
const transcodeFilePath = await transcodeFileTo320(fileDir, fileName, {})
const { duration: outputFileDuration } = await getAudioFileInformation(transcodeFilePath)

assert.strictEqual(inputFileDuration, outputFileDuration)
})

it('Confirm same file transcoded with different metadata has different CID but same duration & fileSize', async () => {
const filePath1 = await transcodeFileTo320(fileDir, fileName, {})
const { duration: duration1, metadata: metadata1 } = await getAudioFileInformation(filePath1)
const { size: size1 } = await fs.stat(filePath1)
const cid1 = await Utils.fileHasher.generateNonImageCid(filePath1)

// Remove file before re-transcoding
await fs.remove(filePath1)

const filePath2 = await transcodeFileTo320(fileDir, fileName, {})
const { duration: duration2, metadata: metadata2 } = await getAudioFileInformation(filePath2)
const { size: size2 } = await fs.stat(filePath2)
const cid2 = await Utils.fileHasher.generateNonImageCid(filePath2)

assert.strictEqual(filePath1, filePath2)

// Assert duration and size are same
assert.strictEqual(duration1, duration2)
assert.strictEqual(size1, size2)

// Assert metadata values for fileName and encoder are same, but uuid is different
assert.strictEqual(metadata1.fileName, metadata2.fileName)
assert.strictEqual(metadata1.encoder, metadata2.encoder)
assert.notStrictEqual(metadata1.uuid, metadata2.uuid)

// Assert CIDs are different
assert.notStrictEqual(cid1, cid2)
})
})
41 changes: 40 additions & 1 deletion creator-node/test/lib/helpers.js
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ const uuid = require('uuid/v4')
const path = require('path')
const crypto = require('crypto')

const { getFileInformation } = require('../../src/ffmpeg')
const DiskManager = require('../../src/diskManager')

const {
Expand Down Expand Up @@ -54,8 +55,46 @@ const computeFilesHash = function (multihashes) {
return filesHash
}

const computeTranscodeDestinationPath = function (fileDir, fileName) {
return path.resolve(fileDir, fileName.split('.')[0] + '-dl.mp3')
}

/**
* Ignores milliseconds bc of inaccuracy in comparison due to how we transcode
* e.g. Input file info shows:
* [mp3 @ 0x72367c0] Estimating duration from bitrate, this may be inaccurate
* Duration: 00:03:07.44, start: 0.000000, bitrate: 320 kb/s
* and output file shows:
* Duration: 00:03:07.46, start: 0.023021, bitrate: 320 kb/s
* Note these durations are the same after accounting for the start offset (not sure why thats there)
*/
const getAudioFileInformation = async function (filePath) {
const info = await getFileInformation(filePath)
if (!info) throw new Error('Failed to get file information')

let duration = /Duration: (\d{2}:\d{2}:\d{2}\.\d{2})/g.exec(info.toString())
if (!duration) throw new Error('Failed to find file duration')
duration = duration[1].split('.')[0]

let metadata = {}
// Extract the metadata properties using a regular expression
const properties = /^\s{4}(\w+)\s+:(.+)/gm
let match
while ((match = properties.exec(info.toString())) !== null) {
metadata[match[1]] = match[2].trim()
}

return {
info,
duration,
metadata
}
}

module.exports = {
uploadTrack,
saveFileToStorage,
computeFilesHash
computeFilesHash,
computeTranscodeDestinationPath,
getAudioFileInformation
}
18 changes: 6 additions & 12 deletions creator-node/test/pollingTracks.test.js
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ const {
} = require('./lib/dataSeeds')
const { getLibsMock } = require('./lib/libsMock')
const { sortKeys } = require('../src/apiSigning')
const { saveFileToStorage, computeFilesHash } = require('./lib/helpers')
const { saveFileToStorage, computeFilesHash, computeTranscodeDestinationPath, getAudioFileInformation } = require('./lib/helpers')
const { computeFilePathAndEnsureItExists } = require('../src/utils')

const testAudioFilePath = path.resolve(__dirname, 'testTrack.mp3')
Expand Down Expand Up @@ -990,18 +990,12 @@ describe('test Polling Tracks with real files', function () {

const { track_segments: trackSegments, transcodedTrackCID } = resp

// check that the generated transcoded track is the same as the transcoded track in /tests
const transcodedTrackAssetPath = path.join(
__dirname,
'testTranscoded320Track.mp3'
)
const transcodedTrackAssetBuf = await fs.readFile(transcodedTrackAssetPath)
// check that the generated transcoded track has same duration as the transcoded track in /tests
const transcodedTrackAssetPath = path.join(__dirname, 'testTranscoded320Track.mp3')
const transcodedTrackPath = await computeFilePathAndEnsureItExists(transcodedTrackCID)
const transcodedTrackTestBuf = await fs.readFile(transcodedTrackPath)
assert.deepStrictEqual(
transcodedTrackAssetBuf.compare(transcodedTrackTestBuf),
0
)
const { duration: inputDuration } = await getAudioFileInformation(transcodedTrackAssetPath)
const { duration: outputDuration } = await getAudioFileInformation(transcodedTrackPath)
assert.strictEqual(inputDuration, outputDuration)

// Ensure 32 segments are returned, each segment has a corresponding file on disk,
// and each segment disk file is exactly as expected
Expand Down