Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Client method to dump cluster state #5470

Merged
merged 13 commits into from
Nov 10, 2021
Merged

Conversation

fjetter
Copy link
Member

@fjetter fjetter commented Oct 27, 2021

This adds a client method to dump the entire cluster state in a file for debugging purposes. This has been incredibly handy for the deadlock scenarios I've been debugging recently.

This method is called automatically in case a test is running into a timeout and persists the content as part of a GH artefact. This should help us debug spurious, flaky test failures.

I implemented the test dump as a yaml for readability but for real world examples, yaml is not well suited. In my experience these dumps can grow several MB GB and a feasible approach, so far, was to use msgpack + gzip. That's all not set in stone.

The implementation is not very elegant but I added a to_dict method, similar but more verbose to identity to most relevant classes. If somebody has an idea about a more elegant approach, I'm all ears

Example output
scheduler_info:
  address: tcp://127.0.0.1:38993
  events:
    Client-cb05ef34-372a-11ec-881b-e9dd5dbb64c5: !!python/tuple
    - '(1635341747.6048865, {''action'': ''add-client'', ''client'': ''Client-cb05ef34-372a-11ec-881b-e9dd5dbb64c5''})'
    all: !!python/tuple
    - !!python/tuple
      - '1635341747.5759628'
      - action: add-worker
        worker: tcp://127.0.0.1:41249
    - !!python/tuple
      - '1635341747.5792954'
      - action: add-worker
        worker: tcp://127.0.0.1:42413
    - !!python/tuple
      - '1635341747.6048865'
      - action: add-client
        client: Client-cb05ef34-372a-11ec-881b-e9dd5dbb64c5
    stealing: !!python/tuple []
    tcp://127.0.0.1:41249: !!python/tuple
    - !!python/tuple
      - '1635341747.5758505'
      - action: heartbeat
        bandwidth:
          total: '100000000'
          types: {}
          workers: {}
        cpu: '0.0'
        executing: '0'
        in_flight: '0'
        in_memory: '0'
        memory: '134877184'
        num_fds: '24'
        read_bytes: '0.0'
        read_bytes_disk: '0.0'
        ready: '0'
        spilled_nbytes: '0'
        time: '1635341747.557952'
        write_bytes: '0.0'
        write_bytes_disk: '0.0'
    - !!python/tuple
      - '1635341747.5759585'
      - action: add-worker
    - !!python/tuple
      - '1635341747.5858035'
      - action: worker-status-change
        prev-status: undefined
        status: running
    - !!python/tuple
      - '1635341748.5936973'
      - action: heartbeat
        bandwidth:
          total: '100000000'
          types: {}
          workers: {}
        cpu: '4.0'
        executing: '0'
        in_flight: '0'
        in_memory: '0'
        memory: '135020544'
        num_fds: '39'
        read_bytes: '0.0'
        read_bytes_disk: '0.0'
        ready: '0'
        spilled_nbytes: '0'
        time: '1635341748.5843215'
        write_bytes: '0.0'
        write_bytes_disk: '0.0'
    - !!python/tuple
      - '1635341749.5884714'
      - action: heartbeat
        bandwidth:
          total: '100000000'
          types: {}
          workers: {}
        cpu: '4.0'
        executing: '0'
        in_flight: '0'
        in_memory: '0'
        memory: '135135232'
        num_fds: '43'
        read_bytes: '0.0'
        read_bytes_disk: '0.0'
        ready: '0'
        spilled_nbytes: '0'
        time: '1635341749.5846045'
        write_bytes: '0.0'
        write_bytes_disk: '0.0'
    tcp://127.0.0.1:42413: !!python/tuple
    - !!python/tuple
      - '1635341747.579201'
      - action: heartbeat
        bandwidth:
          total: '100000000'
          types: {}
          workers: {}
        cpu: '0.0'
        executing: '0'
        in_flight: '0'
        in_memory: '0'
        memory: '134877184'
        num_fds: '25'
        read_bytes: '0.0'
        read_bytes_disk: '0.0'
        ready: '0'
        spilled_nbytes: '0'
        time: '1635341747.5610273'
        write_bytes: '0.0'
        write_bytes_disk: '0.0'
    - !!python/tuple
      - '1635341747.579291'
      - action: add-worker
    - !!python/tuple
      - '1635341747.5859537'
      - action: worker-status-change
        prev-status: undefined
        status: running
    - !!python/tuple
      - '1635341748.5940423'
      - action: heartbeat
        bandwidth:
          total: '100000000'
          types: {}
          workers: {}
        cpu: '11.4'
        executing: '0'
        in_flight: '0'
        in_memory: '0'
        memory: '135020544'
        num_fds: '39'
        read_bytes: '16806.444267185172'
        read_bytes_disk: '0.0'
        ready: '0'
        spilled_nbytes: '0'
        time: '1635341748.085588'
        write_bytes: '26748.096587213608'
        write_bytes_disk: '0.0'
    - !!python/tuple
      - '1635341749.5887933'
      - action: heartbeat
        bandwidth:
          total: '100000000'
          types: {}
          workers: {}
        cpu: '4.0'
        executing: '0'
        in_flight: '0'
        in_memory: '0'
        memory: '135135232'
        num_fds: '43'
        read_bytes: '0.0'
        read_bytes_disk: '0.0'
        ready: '0'
        spilled_nbytes: '0'
        time: '1635341749.5856717'
        write_bytes: '0.0'
        write_bytes_disk: '0.0'
  id: Scheduler-87138d38-3606-49a3-b896-5d2d5d31bff7
  log: !!python/tuple []
  services:
    dashboard: '33755'
  started: '1635341747.549502'
  status: running
  tasks: {}
  thread_id: '140209022129984'
  transition_log: !!python/tuple []
  type: Scheduler
  workers:
    tcp://127.0.0.1:41249:
      host: 127.0.0.1
      id: '0'
      last_seen: '1635341749.5884442'
      local_directory: /home/runner/work/distributed/distributed/dask-worker-space/worker-eru8w8qn
      memory_limit: '7291699200'
      metrics:
        bandwidth:
          total: '100000000'
          types: {}
          workers: {}
        cpu: '4.0'
        executing: '0'
        in_flight: '0'
        in_memory: '0'
        memory: '135135232'
        num_fds: '43'
        read_bytes: '0.0'
        read_bytes_disk: '0.0'
        ready: '0'
        spilled_nbytes: '0'
        time: '1635341749.5846045'
        write_bytes: '0.0'
        write_bytes_disk: '0.0'
      name: '0'
      nanny: null
      nthreads: '1'
      resources: {}
      services:
        dashboard: '34057'
      type: Worker
    tcp://127.0.0.1:42413:
      host: 127.0.0.1
      id: '1'
      last_seen: '1635341749.588775'
      local_directory: /home/runner/work/distributed/distributed/dask-worker-space/worker-uj4ir590
      memory_limit: '7291699200'
      metrics:
        bandwidth:
          total: '100000000'
          types: {}
          workers: {}
        cpu: '4.0'
        executing: '0'
        in_flight: '0'
        in_memory: '0'
        memory: '135135232'
        num_fds: '43'
        read_bytes: '0.0'
        read_bytes_disk: '0.0'
        ready: '0'
        spilled_nbytes: '0'
        time: '1635341749.5856717'
        write_bytes: '0.0'
        write_bytes_disk: '0.0'
      name: '1'
      nanny: null
      nthreads: '2'
      resources: {}
      services:
        dashboard: '41997'
      type: Worker
worker_info:
  tcp://127.0.0.1:41249:
    address: tcp://127.0.0.1:41249
    config:
      array:
        chunk-size: 128MiB
        rechunk-threshold: '4'
        slicing:
          split-large-chunks: null
        svg:
          size: '120'
      dataframe:
        parquet:
          metadata-task-size-local: '512'
          metadata-task-size-remote: '16'
        shuffle-compression: null
      distributed:
        adaptive:
          interval: 1s
          maximum: inf
          minimum: '0'
          target-duration: 5s
          wait-count: '3'
        admin:
          event-loop: tornado
          log-format: '%(name)s - %(levelname)s - %(message)s'
          log-length: '10000'
          max-error-length: '10000'
          pdb-on-err: 'False'
          system-monitor:
            interval: 500ms
          tick:
            interval: 20ms
            limit: 3s
        client:
          heartbeat: 5s
          scheduler-info-interval: 2s
        comm:
          compression: auto
          default-scheme: tcp
          offload: 10MiB
          recent-messages-log-length: '0'
          require-encryption: null
          retry:
            count: '0'
            delay:
              max: 20s
              min: 1s
          shard: 64MiB
          socket-backlog: '2048'
          timeouts:
            connect: 5s
            tcp: 30s
          tls:
            ca-file: null
            ciphers: null
            client:
              cert: null
              key: null
            scheduler:
              cert: null
              key: null
            worker:
              cert: null
              key: null
          ucx:
            cuda_copy: 'False'
            infiniband: 'False'
            net-devices: null
            nvlink: 'False'
            rdmacm: 'False'
            reuse-endpoints: null
            tcp: 'False'
          websockets:
            shard: 8MiB
          zstd:
            level: '3'
            threads: '0'
        dashboard:
          export-tool: 'False'
          graph-max-items: '5000'
          link: '{scheme}://{host}:{port}/status'
          prometheus:
            namespace: dask
        deploy:
          cluster-repr-interval: 500ms
          lost-worker-timeout: 15s
        diagnostics:
          computations:
            ignore-modules: !!python/tuple
            - distributed
            - dask
            - xarray
            - cudf
            - cuml
            - prefect
            - xgboost
            max-history: '100'
          nvml: 'True'
        nanny:
          environ:
            MALLOC_TRIM_THRESHOLD_: '65536'
            MKL_NUM_THREADS: '1'
            OMP_NUM_THREADS: '1'
          preload: !!python/tuple []
          preload-argv: !!python/tuple []
        rmm:
          pool-size: null
        scheduler:
          active-memory-manager:
            interval: 2s
            policies: !!python/tuple
            - class: distributed.active_memory_manager.ReduceReplicas
            start: 'False'
          allowed-failures: '3'
          allowed-imports: !!python/tuple
          - dask
          - distributed
          bandwidth: '100000000'
          blocked-handlers: !!python/tuple []
          dashboard:
            bokeh-application:
              allow_websocket_origin: !!python/tuple
              - '*'
              check_unused_sessions_milliseconds: '500'
              keep_alive_milliseconds: '500'
            status:
              task-stream-length: '1000'
            tasks:
              task-stream-length: '100000'
            tls:
              ca-file: null
              cert: null
              key: null
          default-data-size: 1kiB
          default-task-durations:
            rechunk-split: 1us
            split-shuffle: 1us
          events-cleanup-delay: 1h
          events-log-length: '100000'
          http:
            routes: !!python/tuple
            - distributed.http.scheduler.prometheus
            - distributed.http.scheduler.info
            - distributed.http.scheduler.json
            - distributed.http.health
            - distributed.http.proxy
            - distributed.http.statics
          idle-timeout: null
          locks:
            lease-timeout: 30s
            lease-validation-interval: 10s
          pickle: 'True'
          preload: !!python/tuple []
          preload-argv: !!python/tuple []
          transition-log-length: '100000'
          unknown-task-duration: 500ms
          validate: 'False'
          work-stealing: 'True'
          work-stealing-interval: 100ms
          worker-ttl: null
        version: '2'
        worker:
          blocked-handlers: !!python/tuple []
          connections:
            incoming: '10'
            outgoing: '50'
          daemon: 'True'
          http:
            routes: !!python/tuple
            - distributed.http.worker.prometheus
            - distributed.http.health
            - distributed.http.statics
          lifetime:
            duration: null
            restart: 'False'
            stagger: 0 seconds
          memory:
            pause: '0.8'
            rebalance:
              measure: optimistic
              recipient-max: '0.6'
              sender-min: '0.3'
              sender-recipient-gap: '0.1'
            recent-to-old-time: 30s
            spill: '0.7'
            target: '0.6'
            terminate: '0.95'
          multiprocessing-method: spawn
          preload: !!python/tuple []
          preload-argv: !!python/tuple []
          profile:
            cycle: 1000ms
            interval: 10ms
            low-level: 'False'
          resources: {}
          use-file-locking: 'True'
          validate: 'False'
      optimization:
        fuse:
          active: null
          ave-width: '1'
          max-depth-new-edges: null
          max-height: inf
          max-width: null
          rename-keys: 'True'
          subgraphs: null
      scheduler: dask.distributed
      shuffle: tasks
      temporary-directory: null
      tokenize:
        ensure-deterministic: 'False'
    constrained: !!python/tuple []
    executing_count: '0'
    id: Worker-5bd7c8a8-6ca3-4457-ac0f-0c49fa1e56f5
    in_flight_tasks: '0'
    in_flight_workers: {}
    incoming_transfer_log: !!python/tuple []
    log: !!python/tuple []
    logs: !!python/tuple
    - !!python/tuple
      - INFO
      - 'distributed.worker - INFO -       Start worker at:      tcp://127.0.0.1:41249'
    - !!python/tuple
      - INFO
      - 'distributed.worker - INFO -          Listening to:      tcp://127.0.0.1:41249'
    - !!python/tuple
      - INFO
      - 'distributed.worker - INFO -          dashboard at:            127.0.0.1:34057'
    - !!python/tuple
      - INFO
      - 'distributed.worker - INFO - Waiting to connect to:      tcp://127.0.0.1:38993'
    - !!python/tuple
      - INFO
      - distributed.worker - INFO - -------------------------------------------------
    - !!python/tuple
      - INFO
      - 'distributed.worker - INFO -               Threads:                          1'
    - !!python/tuple
      - INFO
      - 'distributed.worker - INFO -                Memory:                   6.79
        GiB'
    - !!python/tuple
      - INFO
      - 'distributed.worker - INFO -       Local Directory: /home/runner/work/distributed/distributed/dask-worker-space/worker-eru8w8qn'
    - !!python/tuple
      - INFO
      - distributed.worker - INFO - -------------------------------------------------
    - !!python/tuple
      - INFO
      - 'distributed.worker - INFO -       Start worker at:      tcp://127.0.0.1:42413'
    - !!python/tuple
      - INFO
      - 'distributed.worker - INFO -          Listening to:      tcp://127.0.0.1:42413'
    - !!python/tuple
      - INFO
      - 'distributed.worker - INFO -          dashboard at:            127.0.0.1:41997'
    - !!python/tuple
      - INFO
      - 'distributed.worker - INFO - Waiting to connect to:      tcp://127.0.0.1:38993'
    - !!python/tuple
      - INFO
      - distributed.worker - INFO - -------------------------------------------------
    - !!python/tuple
      - INFO
      - 'distributed.worker - INFO -               Threads:                          2'
    - !!python/tuple
      - INFO
      - 'distributed.worker - INFO -                Memory:                   6.79
        GiB'
    - !!python/tuple
      - INFO
      - 'distributed.worker - INFO -       Local Directory: /home/runner/work/distributed/distributed/dask-worker-space/worker-uj4ir590'
    - !!python/tuple
      - INFO
      - distributed.worker - INFO - -------------------------------------------------
    - !!python/tuple
      - INFO
      - 'distributed.worker - INFO -         Registered to:      tcp://127.0.0.1:38993'
    - !!python/tuple
      - INFO
      - distributed.worker - INFO - -------------------------------------------------
    - !!python/tuple
      - INFO
      - 'distributed.worker - INFO -         Registered to:      tcp://127.0.0.1:38993'
    - !!python/tuple
      - INFO
      - distributed.worker - INFO - -------------------------------------------------
    long_running: !!python/tuple []
    memory_limit: '7291699200'
    memory_pause_fraction: '0.8'
    memory_spill_fraction: '0.7'
    memory_target_fraction: '0.6'
    ncores: '1'
    nthreads: '1'
    outgoing_transfer_log: !!python/tuple []
    ready: !!python/tuple []
    scheduler: tcp://127.0.0.1:38993
    status: '<Status.running: ''running''>'
    tasks: {}
    thread_id: '140209022129984'
    type: Worker
  tcp://127.0.0.1:42413:
    address: tcp://127.0.0.1:42413
    config:
      array:
        chunk-size: 128MiB
        rechunk-threshold: '4'
        slicing:
          split-large-chunks: null
        svg:
          size: '120'
      dataframe:
        parquet:
          metadata-task-size-local: '512'
          metadata-task-size-remote: '16'
        shuffle-compression: null
      distributed:
        adaptive:
          interval: 1s
          maximum: inf
          minimum: '0'
          target-duration: 5s
          wait-count: '3'
        admin:
          event-loop: tornado
          log-format: '%(name)s - %(levelname)s - %(message)s'
          log-length: '10000'
          max-error-length: '10000'
          pdb-on-err: 'False'
          system-monitor:
            interval: 500ms
          tick:
            interval: 20ms
            limit: 3s
        client:
          heartbeat: 5s
          scheduler-info-interval: 2s
        comm:
          compression: auto
          default-scheme: tcp
          offload: 10MiB
          recent-messages-log-length: '0'
          require-encryption: null
          retry:
            count: '0'
            delay:
              max: 20s
              min: 1s
          shard: 64MiB
          socket-backlog: '2048'
          timeouts:
            connect: 5s
            tcp: 30s
          tls:
            ca-file: null
            ciphers: null
            client:
              cert: null
              key: null
            scheduler:
              cert: null
              key: null
            worker:
              cert: null
              key: null
          ucx:
            cuda_copy: 'False'
            infiniband: 'False'
            net-devices: null
            nvlink: 'False'
            rdmacm: 'False'
            reuse-endpoints: null
            tcp: 'False'
          websockets:
            shard: 8MiB
          zstd:
            level: '3'
            threads: '0'
        dashboard:
          export-tool: 'False'
          graph-max-items: '5000'
          link: '{scheme}://{host}:{port}/status'
          prometheus:
            namespace: dask
        deploy:
          cluster-repr-interval: 500ms
          lost-worker-timeout: 15s
        diagnostics:
          computations:
            ignore-modules: !!python/tuple
            - distributed
            - dask
            - xarray
            - cudf
            - cuml
            - prefect
            - xgboost
            max-history: '100'
          nvml: 'True'
        nanny:
          environ:
            MALLOC_TRIM_THRESHOLD_: '65536'
            MKL_NUM_THREADS: '1'
            OMP_NUM_THREADS: '1'
          preload: !!python/tuple []
          preload-argv: !!python/tuple []
        rmm:
          pool-size: null
        scheduler:
          active-memory-manager:
            interval: 2s
            policies: !!python/tuple
            - class: distributed.active_memory_manager.ReduceReplicas
            start: 'False'
          allowed-failures: '3'
          allowed-imports: !!python/tuple
          - dask
          - distributed
          bandwidth: '100000000'
          blocked-handlers: !!python/tuple []
          dashboard:
            bokeh-application:
              allow_websocket_origin: !!python/tuple
              - '*'
              check_unused_sessions_milliseconds: '500'
              keep_alive_milliseconds: '500'
            status:
              task-stream-length: '1000'
            tasks:
              task-stream-length: '100000'
            tls:
              ca-file: null
              cert: null
              key: null
          default-data-size: 1kiB
          default-task-durations:
            rechunk-split: 1us
            split-shuffle: 1us
          events-cleanup-delay: 1h
          events-log-length: '100000'
          http:
            routes: !!python/tuple
            - distributed.http.scheduler.prometheus
            - distributed.http.scheduler.info
            - distributed.http.scheduler.json
            - distributed.http.health
            - distributed.http.proxy
            - distributed.http.statics
          idle-timeout: null
          locks:
            lease-timeout: 30s
            lease-validation-interval: 10s
          pickle: 'True'
          preload: !!python/tuple []
          preload-argv: !!python/tuple []
          transition-log-length: '100000'
          unknown-task-duration: 500ms
          validate: 'False'
          work-stealing: 'True'
          work-stealing-interval: 100ms
          worker-ttl: null
        version: '2'
        worker:
          blocked-handlers: !!python/tuple []
          connections:
            incoming: '10'
            outgoing: '50'
          daemon: 'True'
          http:
            routes: !!python/tuple
            - distributed.http.worker.prometheus
            - distributed.http.health
            - distributed.http.statics
          lifetime:
            duration: null
            restart: 'False'
            stagger: 0 seconds
          memory:
            pause: '0.8'
            rebalance:
              measure: optimistic
              recipient-max: '0.6'
              sender-min: '0.3'
              sender-recipient-gap: '0.1'
            recent-to-old-time: 30s
            spill: '0.7'
            target: '0.6'
            terminate: '0.95'
          multiprocessing-method: spawn
          preload: !!python/tuple []
          preload-argv: !!python/tuple []
          profile:
            cycle: 1000ms
            interval: 10ms
            low-level: 'False'
          resources: {}
          use-file-locking: 'True'
          validate: 'False'
      optimization:
        fuse:
          active: null
          ave-width: '1'
          max-depth-new-edges: null
          max-height: inf
          max-width: null
          rename-keys: 'True'
          subgraphs: null
      scheduler: dask.distributed
      shuffle: tasks
      temporary-directory: null
      tokenize:
        ensure-deterministic: 'False'
    constrained: !!python/tuple []
    executing_count: '0'
    id: Worker-bc9ba117-6ebd-4c25-8ff7-c181040cf688
    in_flight_tasks: '0'
    in_flight_workers: {}
    incoming_transfer_log: !!python/tuple []
    log: !!python/tuple []
    logs: !!python/tuple
    - !!python/tuple
      - INFO
      - 'distributed.worker - INFO -       Start worker at:      tcp://127.0.0.1:41249'
    - !!python/tuple
      - INFO
      - 'distributed.worker - INFO -          Listening to:      tcp://127.0.0.1:41249'
    - !!python/tuple
      - INFO
      - 'distributed.worker - INFO -          dashboard at:            127.0.0.1:34057'
    - !!python/tuple
      - INFO
      - 'distributed.worker - INFO - Waiting to connect to:      tcp://127.0.0.1:38993'
    - !!python/tuple
      - INFO
      - distributed.worker - INFO - -------------------------------------------------
    - !!python/tuple
      - INFO
      - 'distributed.worker - INFO -               Threads:                          1'
    - !!python/tuple
      - INFO
      - 'distributed.worker - INFO -                Memory:                   6.79
        GiB'
    - !!python/tuple
      - INFO
      - 'distributed.worker - INFO -       Local Directory: /home/runner/work/distributed/distributed/dask-worker-space/worker-eru8w8qn'
    - !!python/tuple
      - INFO
      - distributed.worker - INFO - -------------------------------------------------
    - !!python/tuple
      - INFO
      - 'distributed.worker - INFO -       Start worker at:      tcp://127.0.0.1:42413'
    - !!python/tuple
      - INFO
      - 'distributed.worker - INFO -          Listening to:      tcp://127.0.0.1:42413'
    - !!python/tuple
      - INFO
      - 'distributed.worker - INFO -          dashboard at:            127.0.0.1:41997'
    - !!python/tuple
      - INFO
      - 'distributed.worker - INFO - Waiting to connect to:      tcp://127.0.0.1:38993'
    - !!python/tuple
      - INFO
      - distributed.worker - INFO - -------------------------------------------------
    - !!python/tuple
      - INFO
      - 'distributed.worker - INFO -               Threads:                          2'
    - !!python/tuple
      - INFO
      - 'distributed.worker - INFO -                Memory:                   6.79
        GiB'
    - !!python/tuple
      - INFO
      - 'distributed.worker - INFO -       Local Directory: /home/runner/work/distributed/distributed/dask-worker-space/worker-uj4ir590'
    - !!python/tuple
      - INFO
      - distributed.worker - INFO - -------------------------------------------------
    - !!python/tuple
      - INFO
      - 'distributed.worker - INFO -         Registered to:      tcp://127.0.0.1:38993'
    - !!python/tuple
      - INFO
      - distributed.worker - INFO - -------------------------------------------------
    - !!python/tuple
      - INFO
      - 'distributed.worker - INFO -         Registered to:      tcp://127.0.0.1:38993'
    - !!python/tuple
      - INFO
      - distributed.worker - INFO - -------------------------------------------------
    long_running: !!python/tuple []
    memory_limit: '7291699200'
    memory_pause_fraction: '0.8'
    memory_spill_fraction: '0.7'
    memory_target_fraction: '0.6'
    ncores: '2'
    nthreads: '2'
    outgoing_transfer_log: !!python/tuple []
    ready: !!python/tuple []
    scheduler: tcp://127.0.0.1:38993
    status: '<Status.running: ''running''>'
    tasks: {}
    thread_id: '140209022129984'
    type: Worker

Copy link
Member

@jrbourbeau jrbourbeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for putting this together @fjetter. Overall this looks like a nice addition

distributed/worker.py Outdated Show resolved Hide resolved
distributed/client.py Outdated Show resolved Hide resolved
Comment on lines 386 to 388
def to_dict(
self, comm: Comm = None, *, exclude: Container[str] = None
) -> dict[str, str]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want users directly interacting with this (and other) to_dict methods? If not, I'd prefer to prepend a leading underscore to the method name. My sense is Client.dump_cluster_state is the main user-facing entrypoint for this feature

distributed/scheduler.py Outdated Show resolved Hide resolved
distributed/tests/test_client.py Outdated Show resolved Hide resolved
.github/workflows/tests.yaml Outdated Show resolved Hide resolved
distributed/client.py Outdated Show resolved Hide resolved
distributed/client.py Outdated Show resolved Hide resolved
distributed/utils.py Outdated Show resolved Hide resolved
distributed/utils_test.py Outdated Show resolved Hide resolved
@jrbourbeau
Copy link
Member

Would including the output of client.get_versions() be useful here? This is often one of the first questions I ask users reporting issues

@fjetter fjetter self-assigned this Nov 5, 2021
@fjetter
Copy link
Member Author

fjetter commented Nov 5, 2021

I still get broken docs builds due to typing_extensions but this has been imported before already 🤔
Note: I had the TYPE_CHECK guard in. Will wait for another build since I pushed more changes

Now it works

@fjetter fjetter mentioned this pull request Nov 9, 2021
@fjetter
Copy link
Member Author

fjetter commented Nov 9, 2021

Added the version, good point.

I think the only topic left to address is whether or not we prefix to_dict with underscores. I have a slight preference not to but if that's a blocking reason, I will change it.

Friendly ping @jrbourbeau if that's ok for you

@jrbourbeau
Copy link
Member

Thanks for all the updates @fjetter. For the sake of trying to be more intentional about our public API, I'd prefer to use leading underscores. I pushed a small commit to make the to_dict -> _to_dict changes. It will be relatively straightforward to move these methods into the public API in the future (I'm happy to handle that too)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants