Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cpt4.jar UMLS Endpoints #299

Open
wtroddy opened this issue Aug 11, 2022 · 12 comments
Open

cpt4.jar UMLS Endpoints #299

wtroddy opened this issue Aug 11, 2022 · 12 comments
Labels
cpt4 tool issues around the CPT4 tool packaged with every vocabulary download

Comments

@wtroddy
Copy link

wtroddy commented Aug 11, 2022

I'm trying to reconstitute the CPT4 vocabularies using the cpt4.sh file and am getting this error:

Exception in thread "main" org.odhsi.utils.cpt.Cpt4Exception: Cannot process CONCEPT.csv file. You can find more details in the logs/logfile.log file.
Reason: cannot request TGT
        at org.odhsi.utils.cpt.Application.main(Application.java:45)

I'm doing this in a VM that has a limited allowlist and suspect this is the problem. I've tested the same download/API key on another machine that isn't restricted and it works fine.

Right now we've allowlisted these base URL's but I'm guessing I'm missing one (or some intermediate jumps?):
https://uts-ws.nlm.nih.gov/
https://utslogin.nlm.nih.gov/
http://umlsks.nlm.nih.gov/

Is there any documentation on which UMLS API endpoints are required by cpt4.jar? If not, can you point me to source code so I can dig through the API calls?

Thanks in advance!

@alex-odysseus
Copy link
Contributor

So far this is the correct list of base URLs:

https://uts-ws.nlm.nih.gov
http://umlsks.nlm.nih.gov
https://utslogin.nlm.nih.gov

@wtroddy
Copy link
Author

wtroddy commented Aug 24, 2022

I've tested each of these individual URL's and can access them okay. I've also been able to generate a TGT using the same endpoints but in a python script but I'm still getting an error - here's the: logfile.log

I'm still suspecting the timeout is related to our restricted allowlist. Is the source code for the jar file public anywhere? I can't seem to find it but would like to try and add additional logging to see which http request this is getting stuck on. Or do you know of another way to get more verbose logs with the details of where I might be getting stuck?

@mik-ohdsi
Copy link

Hi @wtroddy - in this forum post the TGT error was due to an invalid certificate

@wtroddy
Copy link
Author

wtroddy commented Aug 25, 2022

Hi @mik-ohdsi, awesome - thanks! I'd searched the forums but somehow missed this thread. I commented to see if they know which certificate was the problem but will do some investigation on our end in the interim.

@mik-ohdsi
Copy link

Hi @wtroddy - were you able to fix it?
You could also try a new download from Athena, as the CPT4.jar has just recently experienced a small update.
If it works for you now, can you close the issue and if it doesn't tell us what is still not working?

@wtroddy
Copy link
Author

wtroddy commented Sep 2, 2022

Hi @mik-ohdsi - thanks for the update. I've been out of the office and haven't gotten back to this. I'll be back in the office after the long weekend in the US. When I'm back, I'll try the updated JAR file and update this issue accordingly. Thanks!

@wtroddy
Copy link
Author

wtroddy commented Sep 9, 2022

Just a quick update - I've tried the updated CPT4.jar and am still getting the same error. We're investigating the certificate possibility now but don't have any updates quite yet. Do you know if there's a way to get a more verbose message about why (or where in the process) the application is not able to request the TGT? I see there is a SocketTimeoutException but any additional information would be helpful since I'm able to connect to the endpoints successfully with other programs.

Thanks.

@mik-ohdsi
Copy link

Hi @wtroddy - I looked at the logfile again and I agree that the problem is not so much in the ticket requesting logic but rather in connecting a socket to execute that. For reference, this is the UMLS documentation about the API and here are the Java samples. I still suspect, the problem is rather not solvable on our side but rather on the level of the firewall that you have build around that VM. Do you have logging implemented that would check the network traffic from that VM to the outside and ports that may need to be opened? And maybe you can execute that shell script with somewhat elevated privileges and try logging the calls internally in the VM? Or the JRE that you have in the VM is somewhat restricted in its outside communication... Are the other programs that you used successfully Java based? What is the System OS by the way?

@wtroddy
Copy link
Author

wtroddy commented Nov 10, 2022

Hi @mik-ohdsi - please pardon the delay on this. We're working with a platform vendor and it's taken some time for them to investigate on their end.

For your last two questions - my other programs were not java based (we're starting to test this now) and we're using ubuntu 20.04.

The latest recommendation from our vendor was to pass our proxy host and port as arguments when running the java app. I've tried to run the jar file with something like this:

java -Dhttp.proxyHost=$PROXY_HOST -Dhttp.proxyPort=$PROXY_PORT -Dumls-apikey=$UMLS_API_KEY -jar cpt4.jar 5

I'm still getting similar results, though. I've asked the vendor to confirm that this fix works with other allowlisted sites in a java to try and isolate the problem.

In the meantime, I thought I'd check here - do you know if there's any reason the jar file wouldn't be making use of the additional java flags when trying to connect to the UMLS?

@irbraun
Copy link

irbraun commented Nov 30, 2022

As a small update on this- the platform vendor clarified that they do run other java programs on the VM where this problem is occurring, and in those cases do use the proxy host and port arguments to fix this allowlist issue.

@wtroddy
Copy link
Author

wtroddy commented Dec 6, 2022

Hi @mik-ohdsi and @alex-odysseus - it sounds the issue might be around the proxy host/port arguments. I was just curious if you know of any reason the cpt4.jar wouldn't be accepting/using those parameters or have any other ideas what might be the problem?

@mik-ohdsi
Copy link

mik-ohdsi commented Dec 12, 2022

Hi @wtroddy - I guess we simply haven't implemented providing host/port arguments in the tool. Let me check with @alex-odysseus if we can put this on the wish list. Meanwhile, as a workaround (and I know this is a bit inconvenient), can you run the process of vocabulary concept name resolving for CPT outside your VM and then move the processed csv files there afterwards?

@mik-ohdsi mik-ohdsi added the cpt4 tool issues around the CPT4 tool packaged with every vocabulary download label Jun 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cpt4 tool issues around the CPT4 tool packaged with every vocabulary download
Projects
None yet
Development

No branches or pull requests

4 participants