Skip to content

Latest commit

 

History

History
261 lines (186 loc) · 14.2 KB

datalink-heasarc.md

File metadata and controls

261 lines (186 loc) · 14.2 KB

1. Introduction

This notenbook contains a simple example of using datalinks to serve data from the cloud.

This example uses the HEASARC SIA service. The changes needed on the server side for this to work are:


  1. In the SIA (or any other service where datalinks can work), add a <PARAM> element inside the <GROUP> element in the adhoc:service <RESOURCE>, as defined in the datalinks standars document. The <PARAM> element have a name source, and contains sources from where the data can be accessed. The default is main-server, that indicates accessing data from on-prem servers.

The following shows an example where the data can be access from four sources:

- On prem servers (`value="main-server"`)
- AWS US east1 (value="aws:us-east1")
- AWS US east2 (value="aws:us-east2")
- Google Cloud (value="gc").
<RESOURCE utype="adhoc:service" type="meta">
    <PARAM datatype="char" arraysize="*" name="standardID" value="ivo://ivoa.net/std/DataLink#links-1.0"/>
    <PARAM datatype="char" arraysize="*" name="accessURL" value="http://localhost:8080/xamin/vo/datalink/chanmaster"/>
    <GROUP name="inputParams">
        <PARAM ref="DataLinkID" datatype="char" arraysize="*" name="id" value=""/>
        <PARAM datatype="char" arraysize="*" name="source" value="main-server">
            <VALUES>
                <OPTION name="On prem servers" value="main-server"/>
                <OPTION name="AWS region 1" value="aws:us-east1"/>
                <OPTION name="AWS some other region" value="aws:us-east2"/>
                <OPTION name="GC some region" value="gc"/>
            </VALUES>
        </PARAM>
    </GROUP>
</RESOURCE>

  1. The datalink service should be able to interpret the source parameter that the clients sends with the datalink request, and serve the appropriate access_url. So a request to the datalink url with &source=main-server should give something like:
<TABLE>
    <FIELD datatype="char" arraysize="*" ucd="meta.id;meta.main" name="ID"/>
    <FIELD datatype="char" arraysize="*" ucd="meta.ref.url" name="access_url"/>
    ...
    <DATA>
        <TABLEDATA>
            <TR>
                <TD>[SOME_ID]</TD>
                <TD>https://someurl/path/to/some/file.fits</TD>
                ...
            </TR>
        </TABLEDATA>
    </DATA>
</TABLE>

Passing &source=aws:us-east1 for example would give:

<TABLE>
    <FIELD datatype="char" arraysize="*" ucd="meta.id;meta.main" name="ID"/>
    <FIELD datatype="char" arraysize="*" ucd="meta.ref.url" name="access_url"/>
    ...
    <DATA>
        <TABLEDATA>
            <TR>
                <TD>[SOME_ID]</TD>
                <TD>s3://somebucket/path/to/some/file.fits</TD>
                ...
            </TR>
        </TABLEDATA>
    </DATA>
</TABLE>

2. Setup an SIA Query

import pyvo
from astropy.coordinates import SkyCoord

# set some sky position to use in the queries
pos = SkyCoord.from_name('NGC 4151')
# make a simple SIA query. If not the HEASARC, change sia_url.
#xaminUrl = 'http://localhost:8080/xamin'
xaminUrl = 'https://heasarc.gsfc.nasa.gov/xamin_aws'
sia_url = f'{xaminUrl}/vo/sia?table=chanmaster'

sia_result = pyvo.dal.sia.search(sia_url, pos=pos, resultmax=2)
# explore the returned SIA result
#sia_result.votable.to_xml('sai_result.xml')
sia_result.to_table()
Table length=2
obsidstatusnameradectimedetectorgratingexposuretypepipublic_datedatalinkt_mint_resolutiont_maxt_exptimeem_res_powers_regions_ras_decs_resolutionaccess_estsizes_fovo_ucdaccess_urlobs_publisher_didobs_idobs_collectiontarget_nameinstrument_namefacility_namepol_statescalib_levelaccess_formatdataproduct_typeem_minem_maxSIA_titleSIA_scaleSIA_naxisSIA_naxesSIA_formatSIA_referenceSIA_raSIA_decSIA_instrumentcloud_access
degdegmjdsmjddsdsdegdegdegarcseckbytedegmm
objectobjectobjectfloat64float64float64objectobjectfloat64objectobjectint32objectfloat64float64float64float64float64objectfloat64float64float32int32float64objectobjectobjectobjectobjectobjectobjectobjectobjectint32objectobjectfloat64float64objectobjectobjectint32objectobjectfloat64float64objectobject
15158archivedRBS1066181.2900039.3470056363.8531ACIS-INONE8080GOReiprich5672918244:chandra.obs.img56363.8531481481--64443.85314814818080.0--181.2939.347--32447--https://heasarc.gsfc.nasa.gov/FTP/chandra/data/byobsid/8/15158/primary/acisf15158N003_cntr_img2.fits.gzHEASARC15158CHANDRA ACIS-IRBS1066ACIS-IChandra3image/fitsImage1.24e-101.24e-08acisf15158N003_cntr_img2.fits[-0.0013666666666667 0.0013666666666667][1024 1024]2image/fitshttps://heasarc.gsfc.nasa.gov/FTP/chandra/data/byobsid/8/15158/primary/acisf15158N003_cntr_img2.fits.gz181.2939.347CHANDRA ACIS-I{"aws": { "bucket_name": "dh-fornaxdev", "region": "us-east-1", "access": "region", "key": "/FTP/chandra/data/byobsid/8/15158/primary/acisf15158N003_cntr_img2.fits.gz" }}
15158archivedRBS1066181.2900039.3470056363.8531ACIS-INONE8080GOReiprich5672918244:chandra.obs.img56363.8531481481--64443.85314814818080.0--181.2939.347--228059--https://heasarc.gsfc.nasa.gov/FTP/chandra/data/byobsid/8/15158/primary/acisf15158N003_cntr_img2.jpgHEASARC15158CHANDRA ACIS-IRBS1066ACIS-IChandra3image/jpegImage1.24e-101.24e-08acisf15158N003_cntr_img2.jpg[-0.0013666666666667 0.0013666666666667][1024 1024]2image/jpeghttps://heasarc.gsfc.nasa.gov/FTP/chandra/data/byobsid/8/15158/primary/acisf15158N003_cntr_img2.jpg181.2939.347CHANDRA ACIS-I{"aws": { "bucket_name": "dh-fornaxdev", "region": "us-east-1", "access": "region", "key": "/FTP/chandra/data/byobsid/8/15158/primary/acisf15158N003_cntr_img2.jpg" }}

3. A Standard Datalink Query from the SIA Result

# get the datalink for the first row
dlink = sia_result[0].getdatalink()

# explore the returned datalink result
#dlink.votable.to_xml('datalink_result.xml')
dlink.to_table()
Table length=4
IDaccess_urlservice_deferror_messagedescriptionsemanticscontent_typecontent_lengthcloud_access
byte
objectobjectobjectobjectobjectobjectobjectint64object
18244:chandra.obs.imghttps://heasarc.gsfc.nasa.gov/FTP/chandra/data/byobsid/8//15158/primary/acisf15158N003_cntr_img2.fits.gzCenter Imagehttps://localhost:8080/xamin/jsp/products.jsp#chandra.obs.img.cntr.fitsapplication/fits--{"aws": { "bucket_name": "dh-fornaxdev", "region": "us-east-1", "access": "region", "key": "/FTP/chandra/data/byobsid/8//15158/primary/acisf15158N003_cntr_img2.fits.gz" }}
18244:chandra.obs.imghttps://heasarc.gsfc.nasa.gov/FTP/chandra/data/byobsid/8//15158/primary/acisf15158N003_full_img2.fits.gzFull Imagehttps://localhost:8080/xamin/jsp/products.jsp#chandra.obs.img.full.fitsapplication/fits--{"aws": { "bucket_name": "dh-fornaxdev", "region": "us-east-1", "access": "region", "key": "/FTP/chandra/data/byobsid/8//15158/primary/acisf15158N003_full_img2.fits.gz" }}
18244:chandra.obs.imghttps://heasarc.gsfc.nasa.gov/FTP/chandra/data/byobsid/8//15158/primary/acisf15158N003_cntr_img2.jpgCenter Imagehttps://localhost:8080/xamin/jsp/products.jsp#chandra.obs.img.cntr.jpgimage/jpeg--{"aws": { "bucket_name": "dh-fornaxdev", "region": "us-east-1", "access": "region", "key": "/FTP/chandra/data/byobsid/8//15158/primary/acisf15158N003_cntr_img2.jpg" }}
18244:chandra.obs.imghttps://heasarc.gsfc.nasa.gov/FTP/chandra/data/byobsid/8//15158/primary/acisf15158N003_full_img2.jpgFull Imagehttps://localhost:8080/xamin/jsp/products.jsp#chandra.obs.img.full.jpgimage/jpeg--{"aws": { "bucket_name": "dh-fornaxdev", "region": "us-east-1", "access": "region", "key": "/FTP/chandra/data/byobsid/8//15158/primary/acisf15158N003_full_img2.jpg" }}

4. Process the New Cloud Information

Read the cloud information from the datalink resource in the SIA result.

This is done by exposing what getdatalink() does inside pyvo, and we add the part that processes the extra parameters

# expose what goes on inside pyvo when doing getdatalink()
dlink_resource = sia_result.get_adhocservice_by_ivoid(pyvo.dal.adhoc.DATALINK_IVOID)

# Look for the 'source' <PARAM> element inside the inputParams <GROUP> element.
# pyvo already handles part of this.
source_elem = [p for p in dlink_resource.groups[0].entries if p.name == 'source'][0]
print(type(source_elem))
print(source_elem)
<class 'astropy.io.votable.tree.Param'>
<PARAM ID="source" arraysize="*" datatype="char" name="source" value="main-server"/>
# list the available options in the `source` element:
access_options = source_elem.values.options

print(f'There are {len(access_options)} options:')
for opt in access_options:
    print(f'\t{opt[1]:13}: {opt[0]}')
There are 4 options:
	main-server  : On prem servers
	aws:us-east1 : AWS region 1
	aws:us-east2 : AWS some other region
	gc           : GC some region

Given these options, we can query for the datalink we want by including the parameter source in the query, where its value takes one of the options in access_options

a. Use the main-server option (default):

## main-server; this is the default
source_1 = access_options[0][1]
query_1  = pyvo.dal.adhoc.DatalinkQuery.from_resource(
                sia_result[0], dlink_resource, sia_result._session, source=source_1
            )
result_1 = query_1.execute()

print(f'access option: {source_1}')
print('access_url: ')
print(result_1[0].access_url)
access option: main-server
access_url: 
https://heasarc.gsfc.nasa.gov/FTP/chandra/data/byobsid/8//15158/primary/acisf15158N003_cntr_img2.fits.gz

b. Use the aws:us-east1 option:

Note that access_url is now an s3 uri.

## aws:us-east1
source_2 = access_options[1][1]
query_2  = pyvo.dal.adhoc.DatalinkQuery.from_resource(
                sia_result[0], dlink_resource, sia_result._session, source=source_2
            )
result_2 = query_2.execute()

print(f'access option: {source_2}')
print('access_url: ')
print(result_2[0].access_url)
access option: aws:us-east1
access_url: 
s3://dh-fornaxdev/FTP/chandra/data/byobsid/8//15158/primary/acisf15158N003_cntr_img2.fits.gz

c. Use gc option:

This is not supported, so we fall back to the default

## gc; GC is not implemented so the server defaults http from main server
source_3 = access_options[3][1]
query_3  = pyvo.dal.adhoc.DatalinkQuery.from_resource(
                sia_result[0], dlink_resource, sia_result._session, source=source_3
            )
result_3 = query_3.execute()

print(f'access option: {source_3}')
print('access_url: ')
print(result_3[0].access_url)
access option: gc
access_url: 
https://heasarc.gsfc.nasa.gov/FTP/chandra/data/byobsid/8//15158/primary/acisf15158N003_cntr_img2.fits.gz