Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'X-Nuclio-Function-Namespace' header is set incorrectly when invoking a Nuclio function for auto annotation. #5626

Closed
2 tasks done
AKShaw opened this issue Jan 25, 2023 · 1 comment · Fixed by #5917
Closed
2 tasks done
Assignees

Comments

@AKShaw
Copy link

AKShaw commented Jan 25, 2023

My actions before raising this issue

Summary

When attempting to use CVAT's auto annotation feature, the UI shows the following error:

Error: Inference status for the task 17 is failed. requests.exceptions.HTTPError: 404 Client Error: Not Found for url: http://cvat-nuclio-dashboard:8070/api/function_invocations

This is likely arising due to the X-Nuclio-Function-Namespace being set incorrectly for invocation calls, as shown by these logs from the Nuclio dashboard:

23.01.25 13:02:57.127 [37m         dashboard.server[0m [32m(D)[0m Handled request {"requestID": "cvat-nuclio-dashboard-cc87f88b4-6sqvh/D0ZRRIZvjJ-000373", "requestMethod": "GET", "requestPath": "/api/functions/blackgrass-retinanet", "requestHeaders": {"Accept":["*/*"],"Accept-Encoding":["gzip, deflate"],"Connection":["close"],"Cookie":["[redacted]"],"User-Agent":["python-requests/2.26.0"],"X-Nuclio-Function-Namespace":["cvat"],"X-Nuclio-Invoke-Via":["domain-name"],"X-Nuclio-Project-Name":["cvat"],"X-V3io-Session-Key":["[redacted]"]}, "requestBody": "", "responseStatus": 200, "responseTime": "14.161512ms"}
23.01.25 13:02:57.367 [37m.api/function_invocations[0m [33m(W)[0m Failed to invoke function {"requestID": "cvat-nuclio-dashboard-cc87f88b4-6sqvh/D0ZRRIZvjJ-000374", "err": "Function not found: blackgrass-retinanet @ nuclio"}
23.01.25 13:02:57.368 [37m         dashboard.server[0m [32m(D)[0m Handled request {"requestID": "cvat-nuclio-dashboard-cc87f88b4-6sqvh/D0ZRRIZvjJ-000374", "requestMethod": "POST", "requestPath": "/api/function_invocations", "requestHeaders": {"Accept":["*/*"],"Accept-Encoding":["gzip, deflate"],"Connection":["close"],"Content-Length":["3535881"],"Content-Type":["application/json"],"Cookie":["[redacted]"],"User-Agent":["python-requests/2.26.0"],"X-Nuclio-Function-Name":["blackgrass-retinanet"],"X-Nuclio-Function-Namespace":["nuclio"],"X-Nuclio-Invoke-Via":["domain-name"],"X-Nuclio-Path":["/"],"X-Nuclio-Project-Name":["cvat"],"X-V3io-Session-Key":["[redacted]"]}, "requestBody": "{\"image\": \"Image B64\"}", "responseStatus": 404, "responseTime": "47.503302ms", "responseBody": "{\"error\": \"Failed to invoke function: Function not found: blackgrass-retinanet @ nuclio\"}"}

In this, we can see that the request to get the function that succeeded (200) has the X-Nuclio-Function-Namespace header set to cvat which is the correct namespace. In the unsuccessful (404) request to invoke the function, the X-Nuclio-Function-Namespace header is set to nuclio, which doesnt exist in my Kubernetes cluster.

If the function is deployed to the cvat namespace (for discovery by CVAT), and the nuclio namespace (for invocation), the auto annotation succeeds. If the function is only deployed to the nuclio namespace, no models are visible in CVAT.

Steps to Reproduce (for bugs)

I'm running CVAT and Nuclio on Kubernetes via Helm. These are the Helm charts used.

This is the command used to install CVAT and Nuclio.

helm upgrade -n cvat cvat -i ./helm-chart --atomic --timeout 120s -f ./helm-chart/values.yaml \
-f ./helm-chart/values.override.yaml \
--set nuclio.registry.credentials.password="" \
--set nuclio.registry.loginUrl="" \
--set postgresql.external.host="" \
--set postgresql.external.user="" \
--set postgresql.external.dbname="" \
--set postgresql.external.password=""

Here is my values.override.yaml:

#Value overrides for helm. See https://opencv.github.io/cvat/docs/administration/advanced/k8s_deployment_with_helm/#configuration

nuclio:
  enabled: true
  registry:
    credentials:
      username: AWS

postgresql:
  enabled: false
  external:
    port: 5432

redis:
  enabled: false
  external:
    host: redis
  secret:
    password: ""

ingress:
  enabled: false

cvat:
  backend:
    permissionFix:
      enabled: false
    defaultStorage:
      enabled: false
    server:
      additionalVolumes:
        - name: cvat-backend-data
          persistentVolumeClaim:
            claimName: cvat-data
    worker:
      default:
        additionalVolumes:
          - name: cvat-backend-data
            persistentVolumeClaim:
              claimName: cvat-data
      low:
        additionalVolumes:
          - name: cvat-backend-data
            persistentVolumeClaim:
              claimName: cvat-data
    utils:
      additionalVolumes:
        - name: cvat-backend-data
          persistentVolumeClaim:
            claimName: cvat-data

traefik:
  logs:
    general:
      level: DEBUG

To deploy my Nuclio function, I am using the following commands:

NAMESPACE=cvat
nuctl --namespace $NAMESPACE --platform kube create project cvat
nuctl deploy --project-name cvat --namespace $NAMESPACE --path ./src_detection/nuclio --platform kube \
--registry "" \
--env MLFLOW_USER="" \
--env MLFLOW_PASSWORD="" \
--env MLFLOW_TRACKING_URI="" \
--env AWS_ACCESS_KEY_ID="" \
--env AWS_SECRET_ACCESS_KEY=""

I am unable to provide source code for the function as this is proprietary but it is just a simple function that decodes an image from Base64 before passing it through a model loaded via MLFlow and returning the predictions.

Due to upgrading from CVAT 1.6, I have made some config changes to fit the deployment into our existing infra (as detailed here):

  • Using an external Postgres instance
  • Using an external Redis instance
  • Using a pre-existing PVC
  • Disabling the default ingress and using a pre-existing one, updated with the routes from here. This terminates our TLS.

Expected Behaviour

When using auto annotation, calls to invoke the Nuclio function use the correct X-Nuclio-Function-Namespace header and don't 404.

Current Behaviour

When using auto annotation, calls to invoke the Nuclio function don't use the correct X-Nuclio-Function-Namespace header, leading to a 404.

Possible Solution

Set the X-Nuclio-Function-Namespace header to the same value used for discovering functions (see logs from the Nuclio dashboard in the summary).

I have also got this to work be deploying the function twice - once in the cvat namespace so it can be discovered, and once in the nuclio namespace for the actual invocation, but this is more of a temporary fix.

Context

Unable to run auto annotation on tasks without deploying the same function twice in separate namespaces.

Your Environment

  • Git hash commit (git log -1): fb4af9c
  • Docker version docker version (e.g. Docker 17.0.05): N/A
  • Are you using Docker Swarm or Kubernetes? Kubernetes via Helm
  • Operating System and version (e.g. Linux, Windows, MacOS): Ubuntu 22.04.1 LTS
  • Other diagnostic information / logs: See summary above.

Many thanks!

@SpecLad
Copy link
Contributor

SpecLad commented Mar 23, 2023

Thank you for the detailed report; it was very helpful for identifying the issue. I created a PR to fix this.

nmanovic pushed a commit that referenced this issue Mar 24, 2023
The `CVAT_NUCLIO_FUNCTION_NAMESPACE` needs to be defined consistently in
order for Nuclio integration to work. Currently, it's set to `cvat` for
the main CVAT server process, but not for any other CVAT process (which
means it defaults to `nuclio` in those processes). Since it's the
annotation worker process that actually invokes the Nuclio functions,
the invocation fails.

Fix it by synchronizing the Nuclio environment variables across all
backend deployments. Technically, I think only the server and annotation
worker deployments need these variables, but since they're accessed by
`cvat/settings/base.py` in every process that loads Django, define them
everywhere to be sure.

Fixes #5626.
mikhail-treskin pushed a commit to retailnext/cvat that referenced this issue Jul 1, 2023
…t-ai#5917)

The `CVAT_NUCLIO_FUNCTION_NAMESPACE` needs to be defined consistently in
order for Nuclio integration to work. Currently, it's set to `cvat` for
the main CVAT server process, but not for any other CVAT process (which
means it defaults to `nuclio` in those processes). Since it's the
annotation worker process that actually invokes the Nuclio functions,
the invocation fails.

Fix it by synchronizing the Nuclio environment variables across all
backend deployments. Technically, I think only the server and annotation
worker deployments need these variables, but since they're accessed by
`cvat/settings/base.py` in every process that loads Django, define them
everywhere to be sure.

Fixes cvat-ai#5626.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants