Skip to content
This repository has been archived by the owner on Apr 1, 2022. It is now read-only.

Add support for gzip-compressed RPM files #154

Merged
merged 4 commits into from
Nov 4, 2020
Merged

Add support for gzip-compressed RPM files #154

merged 4 commits into from
Nov 4, 2020

Conversation

cnr
Copy link
Contributor

@cnr cnr commented Nov 2, 2020

codec-rpm's payloadContentsC function only supports lzma-compressed rpm files. This adds support for gzip-compressed rpm files as well.


-- | Extract RPM entries to a directory
extractEntries :: (PrimMonad m, MonadThrow m, MonadIO m) => Path Abs Dir -> RPMTypes.RPM -> ConduitT i o m ()
extractEntries dir rpm = yield rpm .| RPM.payloadC .| decompressorFor rpm .| CPIO.readCPIO .| filterC (not . CPIO.isEntryDirectory) .| sinkDir dir
Copy link
Contributor Author

@cnr cnr Nov 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're using conduit here, which allows us to do streaming decompression on the RPM entries. Some notes about reading conduit code:

Whenever you see a type like ConduitT i o m a, the type variables mean:

  • i - the type of elements coming from the upstream
  • o - the type of elements we're passing downstream
  • m - the underlying monad (ConduitT is a monad transformer)
  • a - the arbitrary result type of this computation (like any other monad) -- this is not affected by the elements in the pipe

The way to read .| is to think of it as a unix pipe. Each function in between is doing a transformation on things moving down the pipe


also ignore the vomit of constraints on extractEntries; the libraries we're using can't settle on a single approach. PrimMonad is effectively the same thing as saying IO

Copy link
Member

@zlav zlav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, I don't have any large concerns, just one small question. Tests would be good to add in this repo or the integration tests repo as well.

sendIO . runResourceT . runExceptT . runConduit $
readRPMEntries rpmFile .| filterC (not . CPIO.isEntryDirectory) .| sinkDir
sendIO . runResourceT . runExceptT . runConduit $ do
sourceFileBS (toFilePath rpmFile) .| RPM.parseRPMC .| awaitForever (extractEntries dir)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this function in any danger of hanging with the use of awaitForever? I have seen us come across unsupported file formats when doing RPM extraction in the RPM fetcher that caused extraction to hang. I'm not aware of a better option here, but is this a valid concern? If so, it may be an acceptable failure mode for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awaitForever is a bit of a misnomer: it just repeatedly tries to await input from the upstream pipe, passing it to the provided function (extractEntries dir) until no more input is available.

await in conduit doesn't actually make a thread wait or anything; everything in a conduit happens in the same thread. await is just a way to pass control to the upstream until it can give you back an element, at which point control flow moves back to you

@cnr cnr merged commit 106b4fd into master Nov 4, 2020
@cnr cnr deleted the gzip-rpm branch November 4, 2020 02:42
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants