Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new hydrophones to ML pipeline & document process #128

Open
2 tasks
scottveirs opened this issue Aug 2, 2023 · 24 comments
Open
2 tasks

Add new hydrophones to ML pipeline & document process #128

scottveirs opened this issue Aug 2, 2023 · 24 comments
Assignees
Labels
2023-hackathon Goals or topics for the 2023 annual Microsoft hackathon 2024-hackathon Goals or issues for the 2024 annual Microsoft hackathon documentation Improvements or additions to documentation inference system Code to perform inference with the trained model(s)

Comments

@scottveirs
Copy link
Member

scottveirs commented Aug 2, 2023

In 2024, we are excited to add the North San Juan Channel hydrophone which was just repaired and restarted streaming last week!

In 2023, the number of active nodes in the network has increased from 3 to these 7 locations ready for production:

Screenshot 2023-08-01 at 5 35 04 PM

The current nodes and some metadata should be accessible by the time of the 2023 Microsoft hackathon programmatically via a new Orcasound API.

@scottveirs scottveirs self-assigned this Aug 2, 2023
@scottveirs scottveirs added documentation Improvements or additions to documentation inference system Code to perform inference with the trained model(s) 2023-hackathon Goals or topics for the 2023 annual Microsoft hackathon labels Aug 2, 2023
@micya
Copy link
Member

micya commented Aug 2, 2023

Steps involved per location:

  1. Add configuration file: see Port Townsend config for reference. Place new file in same directory.
  2. Modify last line of Dockerfile to point to new config (NOTE: we should move away from having to bake the config file into the docker image so that we can build one image and specify the relevant configs externally).
  3. Build docker container: https://github.com/orcasound/aifororcas-livesystem/tree/main/InferenceSystem#building-the-docker-container-for-production
  4. Push docker image to Azure Container Registry: https://github.com/orcasound/aifororcas-livesystem/tree/main/InferenceSystem#pushing-your-image-to-azure-container-registry
  5. Deploy to Azure Kubernetes Service: https://github.com/orcasound/aifororcas-livesystem/tree/main/InferenceSystem#deploying-an-updated-docker-build-to-azure-kubernetes-service (create namespace, secret, deployment)

@micya
Copy link
Member

micya commented Aug 2, 2023

Need to check with @micowan on whether anything needs to be done for moderator portal.

@micowan
Copy link
Collaborator

micowan commented Aug 3, 2023 via email

@micya
Copy link
Member

micya commented Aug 3, 2023

From the description, I don't believe the inference system has been brought up yet. So no records in Cosmos DB yet.

No additional handling needs to be done for inference system -> Cosmos DB, since Cosmos DB is really storing a blob of json which accepts any arbitrary string.

@catskids3
Copy link
Contributor

Checked the code. We did in fact turn the locations into a config setting last go round. So, adding from the UI perspective should be as simple as updating that config with the new locations. I know Scott showed a spreadsheet or api or something during the discussion last week that listed the locations. If they are updating that one themselves, and we can pull from that, we could make the list "live" vs a config setting. But that is just a thought.

@scottveirs
Copy link
Member Author

Hey @micowan et al! I see two possible routes to updating the config file, or more dynamically managing the ML pipeline:

  1. The orcasite wiki lists a recent dump of the feeds table and I could update it this weekend for the hackathon
  2. Recent Orcasound backend improvements make it possible to access the feeds table itself programatically, e.g. here -- https://beta.orcasound.net/graphiql via queries like:
{feeds 
	{nodeName}
}

@scottveirs
Copy link
Member Author

Also @micowan, I mentioned to @skanderm that your existing config file held JSON, so he said he could work on new API endpoint that could provide JSON to you...

@skanderm
Copy link

skanderm commented Sep 9, 2023

You should be able to get an updated list here: https://beta.orcasound.net/api/json/feeds

You may need to set these headers as well:
curl -s -H "Content-Type: application/vnd.api+json" -H "Accept: application/vnd.api+json" https://beta.orcasound.net/api/json/feeds

@catskids3
Copy link
Contributor

@scottveirs and @skanderm, the url: https://beta.orcasound.net/api/json/feeds was absolutely perfect!

I have already added this to a new hydrophones endpoint in the API so that we can access it from the UI. I also brought in the url and html in case it makes sense to add them to the UI somewhere.

Thanks!!!

@skanderm
Copy link

skanderm commented Sep 9, 2023

Glad you found it useful! Will the config be modifiable? We’re planning to deploy the changes to https://live.orcasound.net at some point.

@catskids3
Copy link
Contributor

If I am understanding the question correctly, yes. We will be able to change the URL we are pointing to on the fly by updating the configuration setting in Azure.

@catskids3
Copy link
Contributor

@scottveirs and @skanderm a quick question, there is a hydrophone location you call Orcasound Lab, can you confirm that this the Haro Strait hydrophone that we reference in the Cosmos DB. And if so, which is the correct name/label? We may need coding/configuration changes on our end if it is "Orcasound Lab".

@scottveirs
Copy link
Member Author

scottveirs commented Sep 13, 2023 via email

@micowan
Copy link
Collaborator

micowan commented Sep 13, 2023

OK. Great. Since we are changing the partition strategy, which requires as rebuild of the data set, I can take care of that one off during the migration. We will need to speak with @micya or @pastorep about how it is marked coming out of the ML pipeline. Thanks for the feedback and quick turnaround.

@micya
Copy link
Member

micya commented Sep 13, 2023

OK. Great. Since we are changing the partition strategy, which requires as rebuild of the data set, I can take care of that one off during the migration. We will need to speak with @micya or @pastorep about how it is marked coming out of the ML pipeline. Thanks for the feedback and quick turnaround.

I found that location information is hardcoded in the inference system script:

ORCASOUND_LAB_LOCATION = {"id": "rpi_orcasound_lab", "name": "Haro Strait", "longitude": -123.17357, "latitude": 48.55833}
PORT_TOWNSEND_LOCATION = {"id": "rpi_port_townsend", "name": "Port Townsend", "longitude": -122.76045, "latitude": 48.13569}
BUSH_POINT_LOCATION = {"id": "rpi_bush_point", "name": "Bush Point", "longitude": -122.6039, "latitude": 48.03371}
source_guid_to_location = {"rpi_orcasound_lab" : ORCASOUND_LAB_LOCATION, "rpi_port_townsend" : PORT_TOWNSEND_LOCATION, "rpi_bush_point": BUSH_POINT_LOCATION}
.

We should probably pull that out and configure it via an environment variable.

@micowan
Copy link
Collaborator

micowan commented Sep 13, 2023

Michelle. Thanks for finding that. Also, if you are going to be changing the data port, we will want to incorporate the changes I requested earlier. i.e. remove the reviewed and SRKWFound properties (may have these spelled wrong) and replace with a new property called "state" which will be populated with the term "Unreviewed". "state" is also the new partition key. Also need a new property called "locationName" at the top level of the JSON that duplicates the name in the Location portion of the JSON.
Thanks.

@scottveirs
Copy link
Member Author

@micowan and @micya --

@salsal97 is teaching David and I here in Redmond how to add the new Sunset Bay location to the ML pipeline.

If the new model deployment creates a candidate, will it show up in the Moderator portal auto-magically now? Or is there some hardcoding of the new location within the UI portal code? (i.e. "Sunset Bay metadata that's now available via the API provided by Skander).

It looks like your recent pull request, Mike, might be the answer my question?

Maybe Tara or someone else who knows C# could review the PR?

@catskids3
Copy link
Contributor

catskids3 commented Sep 14, 2023 via email

@salsal97
Copy link
Contributor

This PR should be a step toward getting this issue squared out #136

@skanderm
Copy link

skanderm commented Nov 1, 2023

Hi everyone! We've updated the live site. As referenced here: #128 (comment), please update the endpoint to https://live.orcasound.net/api/json/feeds. Thank you!

@micowan
Copy link
Collaborator

micowan commented Nov 3, 2023

@skanderm, Thanks for this. I have replaced the beta url with this new one in the codebase I am working.

@scottveirs scottveirs added the 2024-hackathon Goals or issues for the 2024 annual Microsoft hackathon label Jul 20, 2024
@tanviraja24
Copy link
Contributor

Based off https://live.orcasound.net/listen, are there any new hydrophones available to add?

@micowan
Copy link
Collaborator

micowan commented Sep 16, 2024

Scott gave me a URL last year: https://live.orcasound.net/api/json/feeds which has 7 hydrophones listed (including Haro Strait as Orcasound Lab). I have changed the API to pull this list for all Moderator features (picklists, etc.)

@scottveirs
Copy link
Member Author

scottveirs commented Sep 18, 2024

Before taking the steps that Michelle outlined, we need to fix a change that was recently made to the Amazon S3 buckets where the live audio data are stored. In the process of moving the data streams and archive to Amazon-sponsored buckets (and dramatically reducing our storage and egress costs), we had to rename the streaming data bucket.

  • The old name of the audio data bucket was streaming-orcasound-net
  • The new name of the bucket from which OrcaHello should acquire data is audio-orcasound-net

My understanding is that the S3 bucket URI is hard-coded into the Docker images for each location. Ideally, we'd move the audio data source URI/URL outside of the image and into a configuration file.

The other place I see the S3 bucket name is hard-coded is here in the Orchestrator.py code --

hydrophone_stream_url = 'https://s3-us-west-2.amazonaws.com/streaming-orcasound-net/' + hls_hydrophone_id

Steps involved per location:

  1. Add configuration file: see Port Townsend config for reference. Place new file in same directory.
  2. Modify last line of Dockerfile to point to new config (NOTE: we should move away from having to bake the config file into the docker image so that we can build one image and specify the relevant configs externally).
  3. Build docker container: https://github.com/orcasound/aifororcas-livesystem/tree/main/InferenceSystem#building-the-docker-container-for-production
  4. Push docker image to Azure Container Registry: https://github.com/orcasound/aifororcas-livesystem/tree/main/InferenceSystem#pushing-your-image-to-azure-container-registry
  5. Deploy to Azure Kubernetes Service: https://github.com/orcasound/aifororcas-livesystem/tree/main/InferenceSystem#deploying-an-updated-docker-build-to-azure-kubernetes-service (create namespace, secret, deployment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2023-hackathon Goals or topics for the 2023 annual Microsoft hackathon 2024-hackathon Goals or issues for the 2024 annual Microsoft hackathon documentation Improvements or additions to documentation inference system Code to perform inference with the trained model(s)
Projects
Status: In Progress
Development

No branches or pull requests

7 participants