.. _pyvo-registry: ************************** Registry (`pyvo.registry`) ************************** This is an interface to the Virtual Observatory Registry, a collection of metadata records of the VO's “resources” (“resource” is jargon for: a collection of datasets, usually with a service in front of it). For a wider background, see `2014A&C.....7..101D`_ for the general architecture and `2015A&C....10...88D`_ for the search interfaces. .. _2014A&C.....7..101D: https://ui.adsabs.harvard.edu/abs/2014A%26C.....7..101D/abstract .. _2015A&C....10...88D: https://ui.adsabs.harvard.edu/abs/2015A%26C....10...88D/abstract There are two fundamental modes of searching in the VO: (a) Data discovery: This is when you are looking for some sort of data collection based on its metadata; a classical example would be something like “I need redshifts of supernovae”. (b) Service discovery: This is what you need when you want to query all services of a certain kind (e.g., „all spectral services claiming to have infrared data“), which in turn is the basis of all-VO *dataset* discovery (“give me all infrared spectra of 3C273”) Both modes are supported by this module. .. _registry-basic-interface: Basic interface =============== The main interface for the module is :py:meth:`pyvo.registry.search`; the examples below assume:: >>> from pyvo import registry This function accepts one or more search constraints, which can be either specified using constraint objects as positional arguments or as keyword arguments. The following constraints are available: * :py:class:`~pyvo.registry.Freetext` (``keywords``): one or more freetext words, mached in the title, description or subject of the resource. * :py:class:`~pyvo.registry.Servicetype` (``servicetype``): constrain to one of tap, ssa, sia, conesearch (or full ivoids for other service types). This is the constraint you want to use for service discovery. * :py:class:`~pyvo.registry.UCD` (``ucd``): constrain by one or more UCD patterns; resources match when they serve columns having a matching UCD (e.g., ``phot.mag;em.ir.%`` for “any infrared magnitude”). * :py:class:`~pyvo.registry.Waveband` (``waveband``): one or more terms from the vocabulary at http://www.ivoa.net/rdf/messenger giving the rough spectral location of the resource. * :py:class:`~pyvo.registry.Author` (``author``): an author (“creator”). This is a single SQL pattern, and given the sloppy practices in the VO for how to write author names, you should probably generously use wildcards. * :py:class:`~pyvo.registry.Datamodel` (``datamodel``): one of obscore, epntap, or regtap: only return TAP services having tables of this kind. * :py:class:`~pyvo.registry.Ivoid` (``ivoid``): exactly match a single IVOA identifier (that is, in effect, the primary key in the VO). * :py:class:`~pyvo.registry.Spatial` (``spatial``): match resources covering, enclosed or overlapping a certain geometry (point, circle, polygon, or MOC). *RegTAP 1.2 Extension* * :py:class:`~pyvo.registry.Spectral` (``spectral``): match resources covering a certain part of the spectrum (usually, but not limited to, the electromagnetic spectrum). *RegTAP 1.2 Extension* * :py:class:`~pyvo.registry.Temporal` (``temporal``): match resources covering a some point or interval in time. *RegTAP 1.2 Extension* Multiple constraints are combined conjunctively (”AND”). Constraints marked with *RegTAP 1.2 Extension* are not available on all IVOA RegTAP services (they are on pyVO's default RegTAP endpoint, though). Also refer to the class documentation for further caveats on these. Hence, to look for for resources with UV data mentioning white dwarfs you could either run: .. doctest-remote-data:: >>> resources = registry.search(keywords="white dwarf", waveband="UV") or: .. doctest-remote-data:: >>> resources = registry.search(registry.Freetext("white dwarf"), ... registry.Waveband("UV")) or a mixture between the two. Constructing using explicit constraints is generally preferable with more complex queries. Where the constraints accept multiple arguments, you can pass in sequences to the keyword arguments; for instance: .. doctest-remote-data:: >>> resources = registry.search(registry.Waveband("Radio", "Millimeter")) is equivalent to: .. doctest-remote-data:: >>> resources = registry.search(waveband=["Radio", "Millimeter"]) There is also :py:meth:`~pyvo.registry.get_RegTAP_query`, accepting the same arguments as :py:meth:`pyvo.registry.search`. This function simply returns the ADQL query that search would execute. This is may be useful to construct custom RegTAP queries, which could then be executed on TAP services implementing the ``regtap`` data model. Data Discovery ============== In data discovery, you look for resources matching your constraints and then figure out in a second step how to query them. For instance, to look for resources giving redshifts in connection with supernovae, you would say: .. doctest-remote-data:: >>> resources = registry.search(registry.UCD("src.redshift"), ... registry.Freetext("supernova")) After that, ``resources`` is an instance of :py:class:`~pyvo.registry.regtap.RegistryResults`, which you can iterate over. In interactive data discovery, however, it is usually preferable to use the ``to_table`` method for an overview of the resources available: .. doctest-remote-data:: >>> resources.to_table() # doctest: +IGNORE_OUTPUT title ... interfaces str67 ... str24 --------------------------------------------------------------- ... ------------------------ Asiago Supernova Catalogue (Barbon et al., 1999-) ... conesearch, tap#aux, web Asiago Supernova Catalogue (Version 2008-Mar) ... conesearch, tap#aux, web Sloan Digital Sky Survey-II Supernova Survey (Sako+, 2018) ... conesearch, tap#aux, web ... And to look for tap resources *in* a specific cone, you would do .. doctest-remote-data:: >>> from astropy.coordinates import SkyCoord >>> registry.search(registry.Servicetype("tap"), ... registry.Spatial((SkyCoord("23d +3d"), 3), intersect="enclosed"), ... includeaux=True) # doctest: +IGNORE_OUTPUT ivoid res_type short_name res_title ... intf_types intf_roles alt_identifier ... object object object object ... object object object ------------------------------ ----------------- ------------- ------------------------------------------- ... ------------ ---------- -------------------------------- ivo://cds.vizier/j/apj/835/123 vs:catalogservice J/ApJ/835/123 Globular clusters in NGC 474 from CFHT obs. ... vs:paramhttp std doi:10.26093/cds/vizier.18350123 Where ``intersect`` can take the following values: * 'covers' is the default and returns resources that cover the geometry provided, * 'enclosed' is for services in the given region, * 'overlaps' returns services intersecting with the region. The idea is that in notebook-like interfaces you can pick resources by title, description, and perhaps the access mode (“interface”) offered. In the list of interfaces, you will sometimes spot an ``#aux`` after a standard id; this is a minor VO technicality that you can in practice ignore. For instance, you can simply construct :py:class:`~pyvo.dal.TAPService`-s from ``tap#aux`` interfaces. Once you have found a resource you would like to query, you can pick it by index; however, this will not be stable across multiple executions. Hence, RegistryResults also supports referencing results by short name, which is the style we recommend. Using full ivoids is possible, too, and safer because these are guaranteed to be unique (which short names are not), but it is rather clunky, and in the real VO short name collisions should be very rare. Use the ``get_service`` method of :py:class:`~pyvo.registry.regtap.RegistryResource` to obtain a DAL service object for a particular sort of interface. To query the fourth match using simple cone search, you would thus say: .. doctest-remote-data:: >>> resources["II/283"].get_service("conesearch").search(pos=(120, 73), sr=1) _RAJ2000 _DEJ2000 _r recno ... NED RAJ2000 DEJ2000 deg deg ... float64 float64 float64 int32 ... str3 str12 str12 ------------ ------------ -------- ----- ... ---- ------------ ------------ 117.98645833 73.00961111 0.588592 986 ... NED 07 51 56.750 +73 00 34.60 To operate TAP services, you need to know what tables make up a resource; you could construct a TAP service and access its ``tables`` attribute, but you can take a shortcut and call a RegistryResource's ``get_tables`` method for a rather similar result: .. doctest-remote-data:: >>> tables = resources["II/283"].get_tables() # doctest: +IGNORE_WARNINGS >>> list(tables.keys()) ['II/283/sncat'] >>> sorted(c.name for c in tables['II/283/sncat'].columns) ['band', 'bmag', 'deg', 'dej2000', 'disc', 'epmax', 'galaxy', 'hrv', 'i', 'logd25', 'maxmag', 'mtype', 'n_bmag', 'n_sn', 'n_x', 'n_y', 'ned', 'pa', 'rag', 'raj2000', 'recno', 'simbad', 'sn', 't', 'type', 'u_epmax', 'u_maxmag', 'u_sn', 'u_y', 'u_z', 'x', 'y', 'z'] In this case, this is a table with one of VizieR's somewhat funky names. To run a TAP query based on this metadata, do something like: .. doctest-remote-data:: >>> resources["II/283"].get_service("tap#aux").run_sync( ... 'SELECT sn, z FROM "J/A+A/437/789/table2" WHERE z>0.04') SN z object float64 ------ ------- 1992bh 0.045 1992bp 0.079 1993ag 0.049 1993O 0.051 A special sort of access mode is ``web``, which represents some facility related to the resource that works in a web browser. You can ask for a “service” for it, too; you will then receive an object that has a ``search`` method, and when you call it, a browser window should open with the query facility (this uses python's ``webbrowser`` module): .. doctest-skip:: >>> resources["II/283"].get_service("web").search() # doctest: +IGNORE_OUTPUT Note that for interactive data discovery in the VO Registry, you may also want to have a look at Aladin's discovery tree, TOPCAT's VO menu, or at services like DataScope_ or WIRR_ in your web browser. .. _DataScope: https://heasarc.gsfc.nasa.gov/cgi-bin/vo/datascope/init.pl .. _WIRR: https://dc.g-vo.org/WIRR Service Discovery ================= Service discovery is what you want typically in connection with a search for datasets, as in “Give me all infrared spectra of Bellatrix“. To do that, you want to run the same DAL query against all the services of a given sort. This means that you will have to include a ``servicetype`` constraint such that all resources in your registry results can be queried in the same way. When that is the case, you can use each RegistryResource's ``service`` attribute, which contains a DAL service instance. The opening example could be written like this: .. This one is too expensive to run as part of CI/testing .. doctest-skip:: >>> from astropy.coordinates import SkyCoord >>> my_obj = SkyCoord.from_name("Bellatrix") >>> for res in registry.search(waveband="infrared", servicetype="spectrum"): ... print(res.service.search(pos=my_obj, size=0.001)) ... In reality, you will have to add some error handling to this kind of all-VO queries: in a wide and distributed network, some service is always down. See `Appendix: Robust All-VO Queries`_ The central point is: With a ``servicetype`` constraint, each result has a well-defined ``service`` attribute that contains some subclass of dal.Service and that can be queried in a uniform fashion. TAP services may provide tables in well-defined data models, like EPN-TAP or obscore. These can be queried in similar loops, although in some cases you will have to adapt the queries to the resources found. In the obscore case, an all-VO query would look like this: .. Again, that's too expensive for CI/testing .. doctest-skip:: >>> for svc_rec in registry.search(datamodel="obscore"): ... print(svc_rec.service.run_sync( ... "SELECT DISTINCT dataproduct_type FROM ivoa.obscore")) Again, in production this needs explicit handling of failing services. For an example of how this might look like, see `GAVO's plate tutorial`_ .. _GAVO's plate tutorial: http://docs.g-vo.org/gavo_plates.pdf More examples ------------- Discover archives ^^^^^^^^^^^^^^^^^ You can use the registry ``search`` method (or the ``regsearch`` function) to discover archives that may have x-ray images and then query those archives to find what x-ray images that have of CasA. For the arguments you will enter ``'image'`` for the service type and ``'x-ray'`` for the waveband. The position is provided by the Astropy library. The query returns a :py:class:`~pyvo.registry.regtap.RegistryResults` object which is a container holding a table of matching services. In this example it returns 33 matching services. .. doctest-remote-data:: >>> import pyvo as vo >>> from astropy.coordinates import SkyCoord >>> >>> import warnings >>> warnings.filterwarnings('ignore', module="astropy.io.votable.*") >>> >>> archives = vo.regsearch(servicetype='image', waveband='x-ray') >>> pos = SkyCoord.from_name('Cas A') >>> len(archives) # doctest: +IGNORE_OUTPUT 33 There are also other type of services that you can choose via the ``servicetype`` parameter, for more details see :py:class:`~pyvo.registry.Servicetype`. You can learn more about the archives by printing their titles and access URL: .. doctest-remote-data:: >>> for service in archives: ... print(service.res_title, service.access_url) Chandra X-ray Observatory Data Archive https://cda.harvard.edu/cxcsiap/queryImages? Chandra Source Catalog http://cda.cfa.harvard.edu/cscsiap/queryImages? Chandra Source Catalog Release 1 http://cda.cfa.harvard.edu/csc1siap/queryImages? ... It is not necessary to keep track of the URL because you can search images directly from the registry record, for example using the Chandra X-ray Observatory (CDA) service and the ``search`` method, inserting the position and size for the desired object. .. doctest-remote-data:: >>> images = archives["CDA"].search(pos=pos, size=0.25) >>> len(images) # doctest: +IGNORE_OUTPUT 822 Sometimes you are looking for a type of object. For this purpose, the ``keywords`` parameter is useful here. For example, you want to find all catalogs related to blazars observed with Fermi: .. doctest-remote-data:: >>> cats = vo.regsearch(keywords=['blazar', 'Fermi']) >>> len(cats) # doctest: +IGNORE_OUTPUT 551 Or you already know the particular catalog but not the base URL for that service. For example, you want to get cutout images from the NRAO VLA Sky Survey (NVSS): .. doctest-remote-data:: >>> colls = vo.regsearch(keywords=['NVSS'], servicetype='sia') >>> for coll in colls: ... print(coll.res_title, coll.access_url) NRA) VLA Sky Survey https://skyview.gsfc.nasa.gov/cgi-bin/vo/sia.pl?survey=nvss& Sydney University Molonglo Sky Survey https://skyview.gsfc.nasa.gov/cgi-bin/vo/sia.pl?survey=sumss& Search results ============== What is coming back from registry.search is :py:class:`pyvo.registry.regtap.RegistryResults` which is rather similar to :ref:`pyvo-resultsets`; just remember that for interactive use there is the ``to_tables`` method discussed above. The individual items are instances of :py:class:`~pyvo.registry.regtap.RegistryResource`, which expose many pieces of metadata (e.g., title, description, creators, etc) in attributes named like their RegTAP counterparts (see the class documentation). Some attributes deserve a second look. .. doctest-remote-data:: >>> import pyvo as vo >>> colls = vo.regsearch(keywords=["NVSS"], servicetype='sia') >>> nvss = colls["NVSS"] >>> nvss.res_title 'NRA) VLA Sky Survey' If you are looking for a particular data collection or catalog, as we did above when we looked for the NVSS archive, often simply reviewing the titles is sufficient. Other times, particularly when you are not sure what you are looking for, it helps to look deeper. A selection of the resource metadata, including the title, shortname and description, can be printed out in a summary form with the ``describe`` function. .. doctest-remote-data:: >>> nvss.describe(verbose=True) NRA) VLA Sky Survey Short Name: NVSS IVOA Identifier: ivo://nasa.heasarc/skyview/nvss Access modes: sia Base URL: https://skyview.gsfc.nasa.gov/cgi-bin/vo/sia.pl?survey=nvss& ... The verbose option in ``describe`` will output more information about the content of the resource, if available. Possible added entries are the authors of the resource, an associated DOI, an url where more information is provided, or a reference to a related paper. The method ``service`` will, for resources that only have a single capability, return a DAL service object ready for querying using the respective protocol. You should only use that attribute when the original registry query constrained the service type, because otherwise there is no telling what kind of service you will get back. .. doctest-remote-data:: >>> nvss = colls["NVSS"].service # converts record to service object >>> nvss.search(pos=(350.85, 58.815),size=0.25,format="image/fits") Survey Ra ... LogicalName object float64 ... object ------ ------- ... ----------- nvss 350.85 ... 1 With this service object, we can either call its ``search`` function directly or create query objects to get cutouts for a whole list of sources. .. doctest-remote-data:: >>> cutouts1 = nvss.search(pos=(148.8888, 69.065), size=0.2) >>> nvssq = nvss.create_query(size=0.2) # or create a query object >>> nvssq.pos = (350.85, 58.815) >>> cutouts2 = nvssq.execute() Our discussion of service metadata offers an opportunity to highlight another important property, the service's *IVOA Identifier* (sometimes referred to as its *ivoid*). This is a globally-unique identifier that takes the form of a `URI `_: .. doctest-remote-data:: >>> colls = vo.regsearch(keywords=["NVSS"], servicetype='sia') >>> for coll in colls: ... print(coll.ivoid) ivo://nasa.heasarc/skyview/nvss ivo://nasa.heasarc/skyview/sumss This identifier can be used to retrieve a specific service from the registry. .. doctest-remote-data:: >>> nvss = vo.registry.search(ivoid='ivo://nasa.heasarc/skyview/nvss')[0].get_service('sia') >>> nvss.search(pos=(350.85, 58.815),size=0.25,format="image/fits") Survey Ra ... LogicalName object float64 ... object ------ ------- ... ----------- nvss 350.85 ... 1 When the registry query did not constrain the service type, you can use the ``access_modes`` method to see what capabilities are available. For instance with this identifier: .. doctest-remote-data:: >>> res = registry.search(ivoid="ivo://org.gavo.dc/flashheros/q/ssa")[0] >>> res.access_modes() # doctest: +IGNORE_OUTPUT {'ssa', 'datalink#links-1.0', 'tap#aux', 'web', 'soda#sync-1.0'} – this service can be accessed through SSA, TAP, a web interface, and two special capabilities that pyVO cannot produce services for (mainly because standalone service objects do not make much sense for them). To obtain a service for one of the access modes pyVO does support, use ``get_service(mode)``. For ``web``, this returns an object that opens a web browser window when its ``query`` method is called. RegistryResources also have a ``get_contact`` method. Use this if the service is down or seems to have bugs; you should in general get at least an e-Mail address: .. doctest-remote-data:: >>> res.get_contact() 'GAVO Data Center Team (++49 6221 54 1837) ' Finally, the registry has an idea of what kind of tables are published through a resource, much like the VOSI tables endpoint (as a matter of fact, the Registry should contain exactly what is there, as VOSI tables in effect just gives a part of the registry record). Not all publishers properly provide table metadata to the Registry, though, but most do these days, and then you can run: .. doctest-remote-data:: >>> res.get_tables() # doctest: +IGNORE_OUTPUT {'flashheros.data': ... 29 columns ..., 'ivoa.obscore': ... 0 columns ...} Alternative Registries ====================== There are several RegTAP services in the VO. PyVO by default uses the one at the TAP access URL http://reg.g-vo.org/tap. You can use alternative ones, for instance, because they are nearer to you or because the default endpoint is down. You can pre-select the URL by setting the ``IVOA_REGISTRY`` environment variable to the TAP access URL of the service you would like to use. In a bash-like shell, you would say:: export IVOA_REGISTRY="http://vao.stsci.edu/RegTAP/TapService.aspx" before starting python (or the notebook processor). Within a Python session, you can use the `pyvo.registry.choose_RegTAP_service` function, which also takes the TAP access URL. As long as you have on working registry endpoint, you can find the other RegTAP services using: .. We probably shouldn't test the result of the next code block; this will change every time someone registers a new RegTAP service... .. doctest-remote-data:: >>> res = registry.search(datamodel="regtap") >>> print("\n".join(sorted(r.get_interface("tap").access_url ... for r in res))) http://dc.zah.uni-heidelberg.de/tap http://gavo.aip.de/tap http://voparis-rr.obspm.fr/tap https://vao.stsci.edu/RegTAP/TapService.aspx Reference/API ============= .. automodapi:: pyvo.registry .. automodapi:: pyvo.registry.regtap .. automodapi:: pyvo.registry.rtcons Appendix: Robust All-VO Queries =============================== The VO contains many services, and even if all of them had 99.9% uptime (which not all do), at any time you would always see failures, some of them involving long timeouts. Hence, if you run all-VO queries, you should catch errors and, at least in interactive sessions, provide some way to interrupt overly long queries. Here is an example for how to query all obscore services; remove the ``break`` at the end of the loop to actually do the global query (it's there so that you don't blindly run all-VO queries without reading at least this sentence): .. doctest-remote-data:: >>> from astropy.table import vstack >>> from pyvo import registry >>> >>> QUERY = "SELECT TOP 1 s_ra, s_dec from ivoa.obscore" >>> >>> results = [] >>> for i, svc_rec in enumerate(registry.search(datamodel="obscore", servicetype="tap")): ... # print("Querying {}".format(svc_rec.res_title)) ... try: ... svc = svc_rec.get_service(service_type="tap", lax=True) ... results.append( ... svc.run_sync(QUERY).to_table()) ... except KeyboardInterrupt: ... # someone lost their patience with a service. Query next. ... pass ... except Exception as msg: ... # some service is broken; you *should* complain, but ... #print(" Broken: {} ({}). Complain to {}.\n".format( ... pass # svc_rec.ivoid, msg, svc_rec.get_contact())) ... if i == 5: ... break >>> total_result = vstack(results) # doctest: +IGNORE_WARNINGS >>> total_result # doctest: +IGNORE_OUTPUT
s_ra s_dec deg deg float64 float64 ------------------ ------------------- 350.4619 -9.76139 208.360833592735 52.3611106494996 148.204840298431 29.1690999975089 243.044008 -51.778222 321.63278049999997 -54.579285999999996