Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adapt to v2.0.0 core #19

Draft
wants to merge 9 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -141,3 +141,7 @@ _logs/
_rendered/
*instances.yaml
nohup.out

# cache
enwiki-latest-abstract.xml
wiki_dump.gz
3 changes: 2 additions & 1 deletion distributed/data.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,8 @@
from typing import Dict, Callable

from jina.parsers import set_client_cli_parser
from jina.clients import Client, WebSocketClient
from jina.clients import Client
from jina.clients.websocket import WebSocketClient
from pydantic import validate_arguments

from logger import logger
Expand Down
3 changes: 2 additions & 1 deletion distributed/helper.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,8 @@
import yaml
import chevron
import numpy as np
from jina import Document, Request
from jina import Document
from jina.types.request import Request
from jinacld_tools.aws.services.s3 import S3Bucket
from pydantic import FilePath, validate_arguments

Expand Down
Empty file added distributed/wiki/__init__.py
Empty file.
38 changes: 0 additions & 38 deletions distributed/wiki/annoy_indexer.yml

This file was deleted.

23 changes: 23 additions & 0 deletions distributed/wiki/app.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
from jina import Flow, Document, DocumentArray

f = Flow.load_config(
'./local/index.yml'
)

d1 = Document(id=1, text='foo1 is foo fool full fu')
d2 = Document(id=2, text='foo2 is foo fool full fu')
d3 = Document(id=3, text='foo3 is foo fool full fu')


def print_matches(req): # the callback function invoked when task is done
for idx, d in enumerate(req.docs[0].matches[:3]): # print top-3 matches
print(f'[{idx}]{d.score.value:2f}: "{d.text}"')


with f:
f.index(inputs=DocumentArray([d1, d2, d3]))

with Flow.load_config(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this file in the PR? Looks like some local test to me which will not be used in the actual stress test? If you think its handy to have as some local test entry point, maybe generalize this and otherwise remove?

Copy link
Member

@bwanglzu bwanglzu Jun 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jacobowitz this will be removed after flow runs successfully, only for test purpose, shouldn't commit to the PR, good catch!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thx for your review.
This PR is still in local progress.

'./local/query.yml'
) as f:
f.search(inputs=d2, on_done=print_matches)
36 changes: 0 additions & 36 deletions distributed/wiki/chunk_indexer.yml

This file was deleted.

10 changes: 0 additions & 10 deletions distributed/wiki/chunk_merger.yml

This file was deleted.

32 changes: 0 additions & 32 deletions distributed/wiki/doc.yml

This file was deleted.

15 changes: 0 additions & 15 deletions distributed/wiki/encoder.yml

This file was deleted.

43 changes: 13 additions & 30 deletions distributed/wiki/local/index.yml
Original file line number Diff line number Diff line change
@@ -1,39 +1,22 @@
jtype: Flow
version: '1'
with:
rest_api: {{ JINA_GATEWAY_REST }}
port_expose: {{ JINA_GATEWAY_PORT_EXPOSE }}
workspace: $JINA_WORKDIR
py_modules:
- wiki_executors.py
pods:
- name: segmenter
polling: any
shards: {{ JINA_SEGMENTER_SHARDS }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we removing the sharding? Testing this as part of the stress test is one of the added values here I think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I should convert this PR to a draft one.
Let me fix the status of this PR.

uses: segment.yml
scheduling: {{ JINA_SCHEDULING }}
read_only: true
timeout_ready: 100000
uses:
jtype: Segmenter
- name: encoder
polling: any
scheduling: {{ JINA_SCHEDULING }}
uses: encoder.yml
shards: {{ JINA_ENCODER_SHARDS }}
timeout_ready: 100000
read_only: true
uses:
jtype: TextEncoder
- name: vec_idx
polling: any
scheduling: {{ JINA_SCHEDULING }}
uses: annoy_indexer.yml
shards: {{ JINA_VEC_INDEXER_SHARDS }}
timeout_ready: 100000
uses:
jtype: AnnoyIndexer
- name: doc_idx
polling: any
scheduling: {{ JINA_SCHEDULING }}
uses: doc.yml
shards: {{ JINA_KV_INDEXER_SHARDS }}
needs: gateway
timeout_ready: 100000
uses:
jtype: KeyValueIndexer
needs: segmenter
- name: join_all
method: needs
uses: _merge
needs: [ doc_idx, vec_idx ]
read_only: true
timeout_ready: 100000
needs: [vec_idx, doc_idx]
43 changes: 15 additions & 28 deletions distributed/wiki/local/query.yml
Original file line number Diff line number Diff line change
@@ -1,36 +1,23 @@
jtype: Flow
version: '1'
with:
read_only: true
rest_api: {{ JINA_GATEWAY_REST }}
port_expose: {{ JINA_GATEWAY_PORT_EXPOSE }}
workspace: $JINA_WORKDIR
py_modules:
- wiki_executors.py
pods:
- name: segmenter
polling: all
shards: {{ JINA_SEGMENTER_SHARDS }}
uses: segment.yml
read_only: true
uses:
jtype: Segmenter
- name: encoder
polling: all
scheduling: {{ JINA_SCHEDULING }}
uses: encoder.yml
shards: {{ JINA_ENCODER_SHARDS }}
uses_after: chunk_merger.yml
timeout_ready: -1
read_only: true
uses:
jtype: TextEncoder
- name: vec_idx
scheduling: {{ JINA_SCHEDULING }}
uses: annoy_indexer.yml
shards: {{ JINA_VEC_INDEXER_SHARDS }}
polling: all
uses_after: chunk_merger.yml
timeout_ready: -1
- name: ranker
polling: all
shards: {{ JINA_RANKER_SHARDS }}
uses: ranker.yml
uses:
jtype: AnnoyIndexer
- name: doc_idx
uses: doc.yml
shards: {{ JINA_KV_INDEXER_SHARDS }}
polling: all
timeout_ready: 100000
uses:
jtype: KeyValueIndexer
- name: ranker
uses:
jtype: AggregateRanker

10 changes: 0 additions & 10 deletions distributed/wiki/ranker.yml

This file was deleted.

5 changes: 0 additions & 5 deletions distributed/wiki/segment.yml

This file was deleted.

38 changes: 0 additions & 38 deletions distributed/wiki/segmenters.py

This file was deleted.

Loading