Skip to content

Commit

Permalink
LRS-72 Handle deduplicated attachment multiparts (#81)
Browse files Browse the repository at this point in the history
* LRS-70 sketch out current multipart validation behavior with some tests

* LRS-70 add cljs dispatch for exinfo

* fix requires in lrs-test

* LRS-72 clean up and refactor in prep to decouple sig checking

* LRS-72 don't slurp in cljs

* LRS-72 make an xapi fn for pulling att objects generally with statement + path

* LRS-72 additional testing on current multipart behavior

* LRS-72 validate multiparts, handle dup input and dedupe output

* LRS-72 impl notes for attachment batches

* LRS-72 added to impl note

* LRS-72 extract big blobs from sig test

* LRS-72 better error for missing attachments + test

* LRS-72 refactor reduce-fn for validate-multiparts
  • Loading branch information
milt authored May 12, 2022
1 parent bcc8fea commit 6a61ad9
Show file tree
Hide file tree
Showing 8 changed files with 391 additions and 222 deletions.
14 changes: 14 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,20 @@ To use the dev profile, which contains all dev/repl/test stuff, use the `:dev` a

LRS Applications implementing the `com.yetanalytics.lrs.protocol/-get-statements-async` method return a channel containing results. Since the body is streamed it is not possible to change the status of the request if an error is encountered. Applications can immediately terminate the stream (resulting in a malformed body) by passing `:com.yetanalytics.lrs.protocol/async-error` to the channel. This is preferable to returning a structurally valid response that is missing data. See [this PR](https://github.com/yetanalytics/lrs/pull/78) for more information.

### Statement Attachment Handling & Normalization

#### `PUT/POST /statements`

xAPI clients can send statement data with arbitrary file attachments using the `multipart/mixed` Content-Type [per the spec](https://github.com/adlnet/xAPI-Spec/blob/master/xAPI-Communication.md#requirements-for-attachment-statement-batches). An attachment referenced in an xAPI statement (or substatement) that does not have a `fileUrl` property must be included *at least once* in the request body as a part identified by hash. In addition, multiple attachment objects, either in a single statement or across multiple statements in the same request, can refer to the same attachment, as recommended by the spec.

In practice this means that duplicate attachments *may* be present on the request. `lrs` will normalize/deduplicate attachments before sending them to implementation code such that every attachment in the list has a distinct hash. See [this PR](https://github.com/yetanalytics/lrs/pull/81) for more information.

Note that requests containing attachments *not* referenced in statement data will fail with a 400 status.

#### `GET /statements`

LRS implementations should return attachments in the same normalized form mentioned above, though this is not checked or enforced by `lrs`.

## Bench Testing

Facilities to bench test any LRS with [DATASIM](https://github.com/yetanalytics/datasim) are available. For instance, to bench the in-memory LRS:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -186,8 +186,8 @@
(if-let [statement-data (get-in ctx [:request :json-params])]
(try (condp s/valid? statement-data
::xs/statement
(let [[_ valid-multiparts]
(attachment/validate-statements-multiparts
(let [valid-multiparts
(attachment/validate-multiparts
[statement-data]
multiparts)
attachment-data
Expand All @@ -199,8 +199,8 @@
::xs/statement statement-data
:xapi.statements/attachments attachment-data))
::xs/statements
(let [[_ valid-multiparts]
(attachment/validate-statements-multiparts
(let [valid-multiparts
(attachment/validate-multiparts
statement-data
multiparts)
attachment-data
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -85,17 +85,6 @@
[multiparts]
(mapv save-attachment multiparts))

(defn statements-attachments
"For each statement, get any attachment objects,
and make of them a seq of seqs."
[statements]
(for [{:strs [attachments
object]
:as statement} statements]
(->> (get object "attachments")
(concat attachments)
(cons statement))))

(defn delete-attachments!
"Delete all tempfiles for a sequence of attachments"
[attachments]
Expand Down Expand Up @@ -133,145 +122,116 @@

(defn validate-sig
"Validate a signed statement against a multipart and return if valid."
[statement {:keys [#?(:clj ^InputStream input-stream
:cljs ^String input-stream)] :as multipart}]
(let [^String jws
#?(:clj (with-open [in input-stream]
(slurp in :encoding "UTF-8"))
:cljs input-stream)
{:keys [headers payload]}
(decode-sig jws)
{:keys [alg _typ]}
headers]
(cond
(not (#{"RS256" "RS384" "RS512"}
alg))
(throw (ex-info "JWS Algorithm MUST be RS256, RS384, or RS512"
{:type ::invalid-signature-alg
:jws jws
:statement statement}))
(not (ss/statements-immut-equal?
(dissoc statement "attachments")
payload))
(throw (ex-info "Statement signature does not match statement"
{:type ::invalid-signature-mismatch
:jws jws
:statement statement}))
:else ; If everything's good, return the multipart w/ a new input stream
(assoc multipart
:input-stream
#?(:clj (ByteArrayInputStream. (.getBytes jws "UTF-8"))
:cljs jws)))))

(defn multipart-map
"Given a list of multiparts, make a map of vectors of them by xapi hash"
[multiparts]
(reduce (fn [m {:keys [headers] :as multipart}]
(update m
(get headers "X-Experience-API-Hash")
(fnil conj [])
multipart))
{}
multiparts))
[statement
attachment-object
{:keys [#?(:clj ^InputStream input-stream
:cljs ^String input-stream)] :as multipart}]
(if-not (= (:content-type multipart)
(get attachment-object "contentType")
"application/octet-stream")
(throw
(ex-info
"Statement signature attachment contentType must be application/octet-stream"
{:type ::invalid-signature-attachment-content-type
:attachment-object attachment-object
:attachment-multipart multipart}))
(let [^String jws
#?(:clj (with-open [in input-stream]
(slurp in :encoding "UTF-8"))
:cljs input-stream)
{:keys [headers payload]}
(decode-sig jws)
{:keys [alg _typ]}
headers]
(cond
(not (#{"RS256" "RS384" "RS512"}
alg))
(throw (ex-info "JWS Algorithm MUST be RS256, RS384, or RS512"
{:type ::invalid-signature-alg
:jws jws
:statement statement}))
(not (ss/statements-immut-equal?
;; TODO: make sure this doesn't trigger on substatement paths
(dissoc statement "attachments")
payload))
(throw (ex-info "Statement signature does not match statement"
{:type ::invalid-signature-mismatch
:jws jws
:statement statement}))
:else ; If everything's good, return the multipart w/ a new input stream
(assoc multipart
:input-stream
#?(:clj (ByteArrayInputStream. (.getBytes jws "UTF-8"))
:cljs jws))))))

(defn sig?
"Predicate, returns true if the given attachment object is a signature"
[attachment-object]
(= "http://adlnet.gov/expapi/attachments/signature"
(get attachment-object "usageType")))

(defn make-sha-multipart-pair
"Given the attachment object `att-obj`, return a pair of its SHA and the
multipart, or `nil` if `att-obj` is a fileURL."
[sig multi-parts att-obj]
;; If we match...
(if-let [[sha mps] (find multi-parts (get att-obj "sha2"))]
(let [mp (first mps)]
;; If it's a signature...
(if (sig? att-obj)
(if (= (:content-type mp)
(get att-obj "contentType")
"application/octet-stream")
;; If the ctype is valid, validate the sig
[sha (validate-sig sig mp)]
;; If that ctype was wrong, throw
(throw
(ex-info
"Statement signature attachment contentType must be application/octet-stream"
{:type ::invalid-signature-attachment-content-type
:attachment-object att-obj
:attachment-multipart mp})))
;; If not, return the sha and multipart
[sha mp]))
;; If we don't, this better be a fileURL
(when-not (get att-obj "fileUrl")
(throw (ex-info "Invalid multipart format"
{:type ::invalid-multipart-format})))))

(defn validate-statements-multiparts
"Validate and return statements and their multipart attachments"
[statements multiparts]
(let [;; collect the attachments per statement and reduce over them
{valid-statements :s-acc
valid-multiparts :a-acc
leftover-multiparts :mps}
(reduce
(fn [{:keys [mps] :as m}
[s & att-objs]]
;; match the attachment objects to a multipart
;; or assert that they are a fileURL
(let [;; Reduce over multiparts and find relevant attachments
[next-mps
valid-matched]
(reduce
(fn [[mps' acc :as state] ao]
(if-let [[sha mp :as match]
(make-sha-multipart-pair
s mps' ao)]
[(if (< 1 (count (get mps' sha)))
(update mps' sha #(into [] (rest %)))
(dissoc mps' sha))
(conj acc match)]
state))
[mps []]
att-objs)
(defn multipart-map-dedupe
"Given a list of multiparts, make a map of vectors of them by xapi hash.
Removes duplicate multiparts."
[multiparts]
(reduce (fn [m {:keys [headers] :as multipart}]
(let [sha (get headers "X-Experience-API-Hash")]
(if-let [extant (get m sha)]
m
(assoc m sha multipart))))
{}
multiparts))

valid-shas (map first valid-matched)]
(-> m
(update :s-acc conj s)
(update :a-acc into (map second valid-matched))
(assoc :mps
next-mps))))
{;; accumulators for the statements + attachments
:s-acc []
:a-acc []
;; the multiparts to join up, by Hash
:mps (multipart-map multiparts)}
(statements-attachments statements))]
(if (seq leftover-multiparts)
;; If we have leftovers, it's bad
(defn- attachment-reduce-fn
"Reduce over statement attachment references and attempt to match them to the
provided map of multiparts. Validates signatures when appropriate.
Throws on missing attachment or signature errors."
[{:keys [sha2s mpart-map]
:as state}
{:keys [statement
attachment-path]
{:strs [sha2 fileUrl]
:as att-obj} :attachment}]
(if-let [match-mp (get mpart-map sha2)]
(cond-> (update state :sha2s conj sha2)
;; Validate + recreate sig mp
(and (= "attachments"
(first attachment-path))
(sig? att-obj))
(update-in
[:mpart-map sha2]
(partial validate-sig
statement
att-obj)))
;; Allow attachments w/o a fileUrl to silently drop
(if fileUrl
state
(throw
(ex-info
"Statement references missing attachment and no fileUrl is present"
{:type ::statement-attachment-missing})))))

(defn validate-multiparts
"Given a list of statements and a list of multiparts, return valid multiparts,
deduplicated."
[statements
multiparts]
(let [{:keys [sha2s
mpart-map]} (reduce
attachment-reduce-fn
{:sha2s []
:mpart-map (multipart-map-dedupe
multiparts)}
(ss/all-attachment-objects statements))
sha2s-out (distinct sha2s)]
(if-let [leftover-multiparts (-> (apply dissoc
mpart-map
sha2s-out)
not-empty
vals)]
(throw (ex-info "Attachment sha2s differ from statement sha2s"
{:type ::statement-attachment-mismatch
:leftover-multiparts (into [] leftover-multiparts)}))
;; If not, let's return the statements and multiparts
[valid-statements valid-multiparts])))

;; The following comment block is left for dev purposes
(comment

(def s-json "{\"actor\":{\"objectType\":\"Agent\",\"name\":\"xAPI mbox\",\"mbox\":\"mailto:xapi@adlnet.gov\"},\"verb\":{\"id\":\"http://adlnet.gov/expapi/verbs/attended\",\"display\":{\"en-GB\":\"attended\",\"en-US\":\"attended\"}},\"object\":{\"objectType\":\"Activity\",\"id\":\"http://www.example.com/meetings/occurances/34534\"},\"id\":\"2e2f1ad7-8d10-4c73-ae6e-2842729e25ce\",\"attachments\":[{\"usageType\":\"http://adlnet.gov/expapi/attachments/signature\",\"display\":{\"en-US\":\"Signed by the Test Suite\"},\"description\":{\"en-US\":\"Signed by the Test Suite\"},\"contentType\":\"application/octet-stream\",\"length\":796,\"sha2\":\"f7db3634a22ea2fe4de1fc519751046a3bdf1e5605a316a19343109bd6daa388\"}]}")

(def sig "eyJhbGciOiJSUzI1NiJ9.eyJhY3RvciI6eyJvYmplY3RUeXBlIjoiQWdlbnQiLCJuYW1lIjoieEFQSSBtYm94IiwibWJveCI6Im1haWx0bzp4YXBpQGFkbG5ldC5nb3YifSwidmVyYiI6eyJpZCI6Imh0dHA6Ly9hZGxuZXQuZ292L2V4cGFwaS92ZXJicy9hdHRlbmRlZCIsImRpc3BsYXkiOnsiZW4tR0IiOiJhdHRlbmRlZCIsImVuLVVTIjoiYXR0ZW5kZWQifX0sIm9iamVjdCI6eyJvYmplY3RUeXBlIjoiQWN0aXZpdHkiLCJpZCI6Imh0dHA6Ly93d3cuZXhhbXBsZS5jb20vbWVldGluZ3Mvb2NjdXJhbmNlcy8zNDUzNCJ9LCJpZCI6IjJlMmYxYWQ3LThkMTAtNGM3My1hZTZlLTI4NDI3MjllMjVjZSJ9.roBpi7viDC4DyNikcWtjuvfXEfrVqNtukVfOjoj-VEGbskcxc9H21GKQBsw3LxnpblIpiDPithCs2AOZK7RFy4vB9wsL5HmX8jpxGvGnYCWNEbVRGoYyntFWjF3wFtTaJMHvZLnirL6k1qhxdfJPcV2C-uc-FXC9AR4__xYbJioJDb37wvPtetD8x8YTdkMkM7nlv20GjV3YF-wa_cxt9hWVS-8LDikCswY6PpMLFR6eYeqIqrZxJQtqDhsZK3k28eHDxAnNB-dGoYeiSeFSbcToyVh4iz2lZGNUmfkltiVs7mLTVJNilU0Z41JIFrdYEGXEfYQwFmiIf5denL5_lg")

(def s-parsed (parse-string s-json))

(= (dissoc s-parsed "attachments") (:payload (decode-sig sig)))

(keys s-parsed)

(validate-sig s-parsed
{:input-stream sig})

#?(:clj
(= (json/parse-string-strict s-json)
(json/parse-string s-json))))
(into []
(for [sha2 sha2s-out]
(get mpart-map sha2))))))
Loading

0 comments on commit 6a61ad9

Please sign in to comment.