Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

qualification and profiling tool support rolled and compressed event logs for CSPs and Apache Spark #2732

Merged
merged 12 commits into from
Jun 22, 2021

Conversation

tgravescs
Copy link
Collaborator

The main part of this pr is to support compressed and rolled event logs from the various CSPs and Apache Spark. This also has some cleanup to consolidate some duplicate code, move some code to common class for dealing with parsing event log paths, compressing some of the test files and a few style type changes.

The Databricks event logs are different then Apache spark, so special handling was added for them.

fixes #2690

@tgravescs tgravescs added the feature request New feature or request label Jun 17, 2021
@tgravescs tgravescs added this to the June 7 - June 18 milestone Jun 17, 2021
@tgravescs tgravescs self-assigned this Jun 17, 2021
@tgravescs
Copy link
Collaborator Author

build

val EVENT_LOG_FILE_NAME_PREFIX = "events_"

def isEventLogDir(status: FileStatus): Boolean = {
status.isDirectory && status.getPath.getName.startsWith(EVENT_LOG_DIR_NAME_PREFIX)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: would be slightly less duplication to delegate the name check to isEventLogDir(path: String)

@tgravescs
Copy link
Collaborator Author

build

@tgravescs
Copy link
Collaborator Author

investigating test failures, it seems in this env something must be deleted while its open, locally no issues and everything cleaned up

@tgravescs
Copy link
Collaborator Author

build

@tgravescs
Copy link
Collaborator Author

@gerashegalov @nartal1

Copy link
Collaborator

@gerashegalov gerashegalov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, some nits

The tool does not support nested directories, event log files or event log directories should be
at the top level when specifying a directory.

Note: Spark event logs can be downloaded from Spark UI using a "Download" button on the right side,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// assume this is the current log and we want that one to be read last
LocalDateTime.now()
} else {
val date = fileParts(0).split("-")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: usually prefer pattern matching to indexed access to the tune of

val Array(_, yearStr, monthStr, dayStr,  _*) = Array("something", "2021", "06", "22", "something", "else")

def openEventLogInternal(log: Path, fs: FileSystem): InputStream = {
EventLogFileWriter.codecName(log) match {
case c if (c.isDefined && c.get.equals("gz")) =>
val in = new BufferedInputStream(fs.open(log))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BufferInputStream is not needed for GzipInputStream. in = fs.open(log))


// at this point all paths should be valid event logs or event log dirs
val fs = eventlog.getFileSystem(new Configuration())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prefer to wire through hadoop conf from SparkContext to creating brand-new instances.

@tgravescs
Copy link
Collaborator Author

thanks, @gerashegalov I'll incorporate the nits in my next pr.

@tgravescs tgravescs merged commit 09e8390 into NVIDIA:branch-21.08 Jun 22, 2021
@tgravescs tgravescs deleted the compressedRolledNew branch June 22, 2021 17:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Profiling tool doesn't properly read rolled log files
3 participants