Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add JSONLines Support to the SparkML Container #16

Merged
merged 4 commits into from
Aug 12, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ WORKDIR /sagemaker-sparkml-model-server

RUN mvn clean package

RUN cp ./target/sparkml-serving-2.3.jar /usr/local/lib/sparkml-serving-2.3.jar
RUN cp ./target/sparkml-serving-2.4.jar /usr/local/lib/sparkml-serving-2.4.jar
RUN cp ./serve.sh /usr/local/bin/serve.sh

RUN chmod a+x /usr/local/bin/serve.sh
Expand Down
36 changes: 18 additions & 18 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -223,20 +223,20 @@ Calling `CreateModel` is required for creating a `Model` in SageMaker with this
SageMaker works with Docker images stored in [Amazon ECR](https://aws.amazon.com/ecr/). SageMaker team has prepared and uploaded the Docker images for SageMaker SparkML Serving Container in all regions where SageMaker operates.
Region to ECR container URL mapping can be found below. For a mapping from Region to Region Name, please see [here](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Concepts.RegionsAndAvailabilityZones.html).

* us-west-1 = 746614075791.dkr.ecr.us-west-1.amazonaws.com/sagemaker-sparkml-serving:2.2
* us-west-2 = 246618743249.dkr.ecr.us-west-2.amazonaws.com/sagemaker-sparkml-serving:2.2
* us-east-1 = 683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-sparkml-serving:2.2
* us-east-2 = 257758044811.dkr.ecr.us-east-2.amazonaws.com/sagemaker-sparkml-serving:2.2
* ap-northeast-1 = 354813040037.dkr.ecr.ap-northeast-1.amazonaws.com/sagemaker-sparkml-serving:2.2
* ap-northeast-2 = 366743142698.dkr.ecr.ap-northeast-2.amazonaws.com/sagemaker-sparkml-serving:2.2
* ap-southeast-1 = 121021644041.dkr.ecr.ap-southeast-1.amazonaws.com/sagemaker-sparkml-serving:2.2
* ap-southeast-2 = 783357654285.dkr.ecr.ap-southeast-2.amazonaws.com/sagemaker-sparkml-serving:2.2
* ap-south-1 = 720646828776.dkr.ecr.ap-south-1.amazonaws.com/sagemaker-sparkml-serving:2.2
* eu-west-1 = 141502667606.dkr.ecr.eu-west-1.amazonaws.com/sagemaker-sparkml-serving:2.2
* eu-west-2 = 764974769150.dkr.ecr.eu-west-2.amazonaws.com/sagemaker-sparkml-serving:2.2
* eu-central-1 = 492215442770.dkr.ecr.eu-central-1.amazonaws.com/sagemaker-sparkml-serving:2.2
* ca-central-1 = 341280168497.dkr.ecr.ca-central-1.amazonaws.com/sagemaker-sparkml-serving:2.2
* us-gov-west-1 = 414596584902.dkr.ecr.us-gov-west-1.amazonaws.com/sagemaker-sparkml-serving:2.2
* us-west-1 = 746614075791.dkr.ecr.us-west-1.amazonaws.com/sagemaker-sparkml-serving:2.4
* us-west-2 = 246618743249.dkr.ecr.us-west-2.amazonaws.com/sagemaker-sparkml-serving:2.4
* us-east-1 = 683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-sparkml-serving:2.4
* us-east-2 = 257758044811.dkr.ecr.us-east-2.amazonaws.com/sagemaker-sparkml-serving:2.4
* ap-northeast-1 = 354813040037.dkr.ecr.ap-northeast-1.amazonaws.com/sagemaker-sparkml-serving:2.4
* ap-northeast-2 = 366743142698.dkr.ecr.ap-northeast-2.amazonaws.com/sagemaker-sparkml-serving:2.4
* ap-southeast-1 = 121021644041.dkr.ecr.ap-southeast-1.amazonaws.com/sagemaker-sparkml-serving:2.4
* ap-southeast-2 = 783357654285.dkr.ecr.ap-southeast-2.amazonaws.com/sagemaker-sparkml-serving:2.4
* ap-south-1 = 720646828776.dkr.ecr.ap-south-1.amazonaws.com/sagemaker-sparkml-serving:2.4
* eu-west-1 = 141502667606.dkr.ecr.eu-west-1.amazonaws.com/sagemaker-sparkml-serving:2.4
* eu-west-2 = 764974769150.dkr.ecr.eu-west-2.amazonaws.com/sagemaker-sparkml-serving:2.4
* eu-central-1 = 492215442770.dkr.ecr.eu-central-1.amazonaws.com/sagemaker-sparkml-serving:2.4
* ca-central-1 = 341280168497.dkr.ecr.ca-central-1.amazonaws.com/sagemaker-sparkml-serving:2.4
* us-gov-west-1 = 414596584902.dkr.ecr.us-gov-west-1.amazonaws.com/sagemaker-sparkml-serving:2.4

With [SageMaker Python SDK](https://github.com/aws/sagemaker-python-sdk)
------------------------------------------------------------------------
Expand All @@ -263,7 +263,7 @@ First you need to ensure that have installed [Docker](https://www.docker.com/) o
In order to build the Docker image, you need to run a single Docker command:

```
docker build -t sagemaker-sparkml-serving:2.2 .
docker build -t sagemaker-sparkml-serving:2.4 .
```

#### Running the image locally
Expand All @@ -272,7 +272,7 @@ In order to run the Docker image, you need to run the following command. Please
The command will start the server on port 8080 and will also pass the schema as an environment variable to the Docker container. Alternatively, you can edit the `Dockerfile` to add `ENV SAGEMAKER_SPARKML_SCHEMA=schema` as well before building the Docker image.

```
docker run -p 8080:8080 -e SAGEMAKER_SPARKML_SCHEMA=schema -v /tmp/model:/opt/ml/model sagemaker-sparkml-serving:2.2 serve
docker run -p 8080:8080 -e SAGEMAKER_SPARKML_SCHEMA=schema -v /tmp/model:/opt/ml/model sagemaker-sparkml-serving:2.4 serve
```

#### Invoking with a payload
Expand All @@ -287,7 +287,7 @@ or
curl -i -H "content-type:application/json" -d "{\"data\":[feature_1,\"feature_2\",feature_3]}" http://localhost:8080/invocations
```

The `Dockerfile` can be found at the root directory of the package. SageMaker SparkML Serving Container tags the Docker images using the Spark major version it is compatible with. Right now, it only supports Spark 2.2 and as a result, the Docker image is tagged with 2.2.
The `Dockerfile` can be found at the root directory of the package. SageMaker SparkML Serving Container tags the Docker images using the Spark major version it is compatible with. Right now, it only supports Spark 2.4 and as a result, the Docker image is tagged with 2.4.

In order to save the effort of building the Docker image everytime you are making a code change, you can also install [Maven](http://maven.apache.org/) and run `mvn clean package` at your project root to verify if the code is compiling fine and unit tests are running without any issue.

Expand All @@ -310,7 +310,7 @@ aws ecr get-login --region us-west-2 --registry-ids 246618743249 --no-include-em
* Download the Docker image with the following command:

```
docker pull 246618743249.dkr.ecr.us-west-2.amazonaws.com/sagemaker-sparkml-serving:2.2
docker pull 246618743249.dkr.ecr.us-west-2.amazonaws.com/sagemaker-sparkml-serving:2.4
```

For running the Docker image, please see the Running the image locally section from above.
Expand Down
6 changes: 3 additions & 3 deletions ci/buildspec.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,10 +9,10 @@ phases:
commands:
- echo Build started on `date`
- echo Building the Docker image...
- docker build -t sagemaker-sparkml-serving:2.3 .
- docker tag sagemaker-sparkml-serving:2.3 515193369038.dkr.ecr.us-west-2.amazonaws.com/sagemaker-sparkml-serving:2.3
- docker build -t sagemaker-sparkml-serving:2.4 .
- docker tag sagemaker-sparkml-serving:2.4 515193369038.dkr.ecr.us-west-2.amazonaws.com/sagemaker-sparkml-serving:2.4
post_build:
commands:
- echo Build completed on `date`
- echo Pushing the Docker image...
- docker push 515193369038.dkr.ecr.us-west-2.amazonaws.com/sagemaker-sparkml-serving:2.3
- docker push 515193369038.dkr.ecr.us-west-2.amazonaws.com/sagemaker-sparkml-serving:2.4
4 changes: 2 additions & 2 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@
<modelVersion>4.0.0</modelVersion>
<groupId>org.amazonaws.sagemaker</groupId>
<artifactId>sparkml-serving</artifactId>
<version>2.3</version>
<version>2.4</version>
<build>
<plugins>
<plugin>
Expand Down Expand Up @@ -154,7 +154,7 @@
<dependency>
<groupId>ml.combust.mleap</groupId>
<artifactId>mleap-runtime_2.11</artifactId>
<version>0.13.0</version>
<version>0.14.0</version>
</dependency>
<dependency>
<groupId>org.apache.commons</groupId>
Expand Down
2 changes: 1 addition & 1 deletion serve.sh
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
#!/bin/bash
# This is needed to make sure Java correctly detects CPU/Memory set by the container limits
java -XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap -jar /usr/local/lib/sparkml-serving-2.3.jar
java -XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap -jar /usr/local/lib/sparkml-serving-2.4.jar
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@

import com.amazonaws.sagemaker.dto.BatchExecutionParameter;
import com.amazonaws.sagemaker.dto.DataSchema;
import com.amazonaws.sagemaker.dto.SageMakerRequestListObject;
import com.amazonaws.sagemaker.dto.SageMakerRequestObject;
import com.amazonaws.sagemaker.helper.DataConversionHelper;
import com.amazonaws.sagemaker.helper.ResponseHelper;
Expand All @@ -29,11 +30,13 @@
import com.amazonaws.sagemaker.utils.ScalaUtils;
import com.amazonaws.sagemaker.utils.SystemUtils;
import com.fasterxml.jackson.core.JsonProcessingException;
import com.fasterxml.jackson.databind.JsonMappingException;
import com.fasterxml.jackson.databind.ObjectMapper;
import com.google.common.annotations.VisibleForTesting;
import com.google.common.base.Preconditions;
import com.google.common.collect.Lists;
import java.io.IOException;
import java.util.Arrays;
import java.util.List;
import ml.combust.mleap.runtime.frame.ArrayRow;
import ml.combust.mleap.runtime.frame.DefaultLeapFrame;
Expand Down Expand Up @@ -102,7 +105,7 @@ public ResponseEntity returnBatchExecutionParameter() throws JsonProcessingExcep
* Implements the invocations POST API for application/json input
*
* @param sro, the request object
* @param accept, accept parameter from request
* @param accept, indicates the content types that the http method is able to understand
* @return ResponseEntity with body as the expected payload JSON & proper statuscode based on the input
*/
@RequestMapping(path = "/invocations", method = POST, consumes = MediaType.APPLICATION_JSON_VALUE)
Expand All @@ -123,10 +126,10 @@ public ResponseEntity<String> transformRequestJson(@RequestBody final SageMakerR
}

/**
* Implements the invocations POST API for application/json input
* Implements the invocations POST API for text/csv input
*
* @param csvRow, data in row format in CSV
* @param accept, accept parameter from request
* @param accept, indicates the content types that the http method is able to understand
* @return ResponseEntity with body as the expected payload JSON & proper statuscode based on the input
*/
@RequestMapping(path = "/invocations", method = POST, consumes = AdditionalMediaType.TEXT_CSV_VALUE)
Expand All @@ -148,6 +151,40 @@ public ResponseEntity<String> transformRequestCsv(@RequestBody final byte[] csvR
}
}

/**
* Implements the invocations POST API for application/jsonlines input
*
* @param jsonLines, lines of json values
* @param accept, indicates the content types that the http method is able to understand
* @return ResponseEntity with body as the expected payload JSON & proper statuscode based on the input
*/
@RequestMapping(path = "/invocations", method = POST, consumes = AdditionalMediaType.APPLICATION_JSONLINES_VALUE)
public ResponseEntity<String> transformRequestJsonLines(
@RequestBody final byte[] jsonLines,
@RequestHeader(value = HttpHeaders.ACCEPT, required = false)
final String accept) {

if (jsonLines == null) {
LOG.error("Input passed to the request is null");
return ResponseEntity.badRequest().build();

} else if (jsonLines.length == 0) {

LOG.error("Input passed to the request is empty");
return ResponseEntity.noContent().build();
}

try {
final String acceptVal = this.retrieveAndVerifyAccept(accept);
return this.processInputDataForJsonLines(new String(jsonLines), acceptVal);

} catch (final Exception ex) {

LOG.error("Error in processing current request", ex);
return ResponseEntity.badRequest().body(ex.getMessage());
}
}

@VisibleForTesting
protected String retrieveAndVerifyAccept(final String acceptFromRequest) {
final String acceptVal = checkEmptyAccept(acceptFromRequest) ? SystemUtils
Expand Down Expand Up @@ -181,6 +218,72 @@ private ResponseEntity<String> processInputData(final List<Object> inputData, fi

}

/**
* Helper method to interpret the JSONLines input and return the response in the expected output format.
*
* @param jsonLinesAsString
* The JSON lines input.
*
* @param acceptVal
* The output format in which the response is to be returned.
*
* @return
* The transformed output for the JSONlines input.
*
* @throws IOException
* If there is an exception during object mapping and validation.
*
*/
ResponseEntity<String> processInputDataForJsonLines(
final String jsonLinesAsString, final String acceptVal) throws IOException {

final String lines[] = jsonLinesAsString.split("\\r?\\n");
final ObjectMapper mapper = new ObjectMapper();

// first line is special since it could contain the schema as well. Extract the schema.
final SageMakerRequestObject firstLine = mapper.readValue(lines[0], SageMakerRequestObject.class);
final DataSchema schema = this.retrieveAndVerifySchema(firstLine.getSchema(), mapper);

List<List<Object>> inputDatas = Lists.newArrayList();

for(String jsonStringLine : lines) {
try {

final SageMakerRequestListObject sro = mapper.readValue(jsonStringLine, SageMakerRequestListObject.class);

for(int idx = 0; idx < sro.getData().size(); ++idx) {
inputDatas.add(sro.getData().get(idx));
}

} catch (final JsonMappingException ex) {

final SageMakerRequestObject sro = mapper.readValue(jsonStringLine, SageMakerRequestObject.class);
inputDatas.add(sro.getData());
}
}

List<ResponseEntity<String>> responseList = Lists.newArrayList();

// Process each input separately and add response to a list
for (int idx = 0; idx < inputDatas.size(); ++idx) {
responseList.add(this.processInputData(inputDatas.get(idx), schema, acceptVal));
}

// Merge response body to a new output response
List<List<String>> bodyList = Lists.newArrayList();

// All response should be valid if no exception got catch
// which all headers should be the same and extract the first one to construct responseEntity
HttpHeaders headers = responseList.get(0).getHeaders();

//combine body in responseList
for (ResponseEntity<String> response: responseList) {
bodyList.add(Lists.newArrayList(response.getBody()));
}

return ResponseEntity.ok().headers(headers).body(bodyList.toString());
}

private boolean checkEmptyAccept(final String acceptFromRequest) {
//Spring may send the Accept as "*\/*" (star/star) in case accept is not passed via request
return (StringUtils.isBlank(acceptFromRequest) || StringUtils.equals(acceptFromRequest, MediaType.ALL_VALUE));
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
/*
* Copyright 2010-2018 Amazon.com, Inc. or its affiliates. All Rights Reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License").
* You may not use this file except in compliance with the License.
* A copy of the License is located at
*
* http://aws.amazon.com/apache2.0
*
* or in the "license" file accompanying this file. This file is distributed
* on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either
* express or implied. See the License for the specific language governing
* permissions and limitations under the License.
*
*/

package com.amazonaws.sagemaker.dto;

import com.fasterxml.jackson.annotation.JsonCreator;
import com.fasterxml.jackson.annotation.JsonProperty;
import com.google.common.base.Preconditions;

import java.util.List;

/**
* Request object POJO to which data field of input request in JSONLINES format will be mapped to by Spring (using Jackson).
* For sample input, please see test/resources/com/amazonaws/sagemaker/dto
*/
public class SageMakerRequestListObject {

private DataSchema schema;
private List<List<Object>> data;

@JsonCreator
public SageMakerRequestListObject(@JsonProperty("schema") final DataSchema schema,
@JsonProperty("data") final List<List<Object>> data) {
// schema can be retrieved from environment variable as well, hence it is not enforced to be null
this.schema = schema;
this.data = Preconditions.checkNotNull(data);
}

public DataSchema getSchema() {
return schema;
}

public List<List<Object>> getData() {
return data;
}
}
Loading