A Java client for calling KServe inference services which implement one of the predict v1 or v2 protocols.
It let's you easily configure the endpoint of the inference service which should be called. The data shape of both the request and response can be modeled using Java classes. The library includes a retry mechanism to automatically retry requests to the inference service in case it's scaled to zero upon the first request.
You can find a blog post on medium where the kserve-client is used in the demo application.
You can add kserve-client via Maven Central.
compile group: 'com.bakdata.kserve', name: 'kserve-client', version: '1.0.1'
<dependency>
<groupId>com.bakdata.kserve</groupId>
<artifactId>kserve-client</artifactId>
<version>1.0.1</version>
</dependency>
For other build tools or versions, refer to the latest version in MvnRepository.
This usage example is extracted from a blog post on medium where the kserve-client is used. In the inference service, we use an Argos Translate model to obtain a translation for an input text.
A KServe inference service supporting the protocol version v2 is expected to run on localhost:8080
with model argos-translator-en-es
so that the endpoint localhost:8080/v2/models/argos-translator-en-es/infer
can be used for requests.
This inference service knows how to deal with the fields defined in TextToTranslate.java
and Translation.java
for the request input and output data, respectively.
A usage example compatible with protocol version v1 can be constructed analogously using the KServeClientFactoryV1
class.
TextToTranslate.java
@Data
@AllArgsConstructor
@NoArgsConstructor
@Builder
public class TextToTranslate {
private String textToTranslate;
}
Translation.java
@Data
@AllArgsConstructor
@NoArgsConstructor
@Builder
public class Translation {
private String originalText;
private String translatedText;
}
TranslatorResponse.java
public class TranslatorResponse extends InferenceResponse<Translation> {
}
KServeRequester.java
public class KServeRequester<I, O> {
private final KServeClient<I> kServeClient;
public KServeRequester() {
this.kServeClient = (KServeClient<I>) new KServeClientFactoryV2().getKServeClient(
"localhost:8080",
"argos-translator-en-es",
Duration.ofSeconds(2),
false
);
}
protected Optional<O> requestInferenceService(final I jsonObject) {
try {
return (Optional<O>) this.kServeClient.makeInferenceRequest(
jsonObject,
TranslatorResponse.class,
"");
} catch (final IOException e) {
throw new IllegalArgumentException(
"Error occurred when sending the inference request or receiving the response", e);
}
}
}
App.java
public final class App {
private static Translation getTranslation(final TextToTranslate input) {
return new KServeRequester<InferenceRequest<TextToTranslate>, TranslatorResponse>()
.requestInferenceService(InferenceRequest.<TextToTranslate>builder()
.inputs(List.of(
RequestInput.<TextToTranslate>builder()
.name("Translation")
.datatype("BYTES")
.shape(List.of(1))
.datatype("BYTES")
.parameters(Parameters.builder()
.contentType("str")
.build())
.data(input)
.build()
))
.build())
.map(InferenceResponse::getOutputs)
.stream()
.flatMap(Collection::stream)
.map(ResponseOutput::getData)
.findFirst()
.orElseThrow();
}
public static void main(final String[] args) {
final Translation translation = getTranslation(TextToTranslate.builder().textToTranslate("Hello World").build());
System.out.println(translation.getTranslatedText());
// Hola Mundo
}
}
If you want to contribute to this project, you can simply clone the repository and build it via Gradle. All dependencies should be included in the Gradle files, there are no external prerequisites.
> git clone git@github.com:bakdata/kserve-client.git
> cd kserve-client && ./gradlew build
Please note, that we have code styles for Java. They are basically the Google style guide, with some small modifications.
We are happy if you want to contribute to this project. If you find any bugs or have suggestions for improvements, please open an issue. We are also happy to accept your PRs. Just open an issue beforehand and let us know what you want to do and why.
This project is licensed under the MIT license. Have a look at the LICENSE for more details.