Content syndication

With content syndication, data can be seamlessly moved between different Snow Owl Terminology Server deployments.

This functionality is useful when content created in a central deployment (upstream) needs to be distributed to one or more read-only downstream instances. The resource distribution is designed to be uni-directional and semi-automated where an actor has to configure any new downstream instances to be able to receive data from the central unit.

Configure upstream

To be able to access the upstream server and its content the following items are required:

  • the HTTP port of Elasticsearch has to be accessible for the downstream Snow Owl and Elasticsearch instances (configured via the http.port property, the default is 9200)

  • the REST API of Snow Owl has to be accessible for the downstream Snow Owl servers

  • an Elasticsearch API key with sufficient privileges for authentication and authorization

  • a Snow Owl API key with sufficient privileges for authentication and authorization

  • configure selected terminology resources as distributable

Access Elasticsearch

In case Snow Owl uses a self-hosted Elasticsearch instance the HTTP port can be opened by modifying the container settings in the docker-compose.yml file. Make sure to remove the localhost IP prefix from the port declaration:

docker-compose.yml
...
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:${ELASTICSEARCH_VERSION}
    container_name: elasticsearch
...
    ports:
-      - "127.0.0.1:9200:9200"
+      - "9200:9200"

When opening up a self-hosted Elasticsearch make sure to use strengthened security with secure HTTP and username/password access.

A detailed guide on Elasticsearch security can be found here.

In the case of a hosted Elasticsearch instance there is nothing to do, it will already be accessible from outside.

Access Snow Owl

The default reverse proxy configuration (shipped in the released package) exposes the Snow Owl REST API via the URL: http(s)://upstream-snow-owl-url/snowowl

Other than that no additional configuration is needed.

Obtain an Elasticsearch API key

Creating a new API key for Elasticsearch is either possible through its Api Key API or - in the case of a hosted instance - from within Kibana.

The content syndication operation requires the following permissions:

  • cluster privilege: monitor

  • index privilege: read

Here is an example request body for the Api Key API:

POST /_security/api_key
{
  "name": "syndication-api-key",
  "expiration": "30d",
  "role_descriptors": { 
    "syndicate-role": {
      "cluster": [
        "monitor"
      ],
      "indices": [
        {
          "names": [
            "*"
          ],
          "privileges": [
            "read"
          ]
        }
      ]
    }
  }
}

This request will return with the following response:

{
  "id" : "<token_id>",
  "name" : "syndication-api-key",
  "expiration" : 0,
  "api_key" : "<api_key>",
  "encoded" : "<encoded_api_key>"
}

Take note of the encoded API Key, which is the one that will be used later on.

To obtain an API key using Kibana, follow this guide with the same settings from above.

Obtain a Snow Owl API Key

To request an API key from the upstream Snow Owl Terminology Server the following REST API endpoint must be used:

To request an API key

POST https://upstream-snow-owl-url/snowowl/token

Request Body

NameTypeDescription

username*

String

The username to authenticate with

password*

String

The password belonging to the username

token

String

Previous token to re-new

expiration

String

Expiration interval, e.g. 1d or 2h

permissions

List<String>

List of permissions

{
    token: "<snow-owl-api-key>"
}

Select distributable resources

All three major terminology resource types can be configured as distributable. Resources have a settings map that can be updated via their specific REST API endpoints:

  • PUT /codesystems/{codeSystemId}

  • PUT /valuesets/{valueSetId}

  • PUT /conceptmaps/{conceptMapId}

A setting called distributable has to be set with a value of either true or false. Here is an example update request to make the 'Example Code System' distributable:

PUT /codesystems/example_codesystem_id
{
  "settings": {
    "distributable": true
  }
}

Configure downstream

Elasticsearch

There is one configuration property that must be set before provisioning a new downstream Snow Owl Terminology Server.

Any potential upstream Elasticsearch instance must be listed as an allowed source of information for the downstream Elasticsearch instances via a configuration parameter in the elasticsearch.yml file.

The property is called reindex.remote.whitelist :

elasticsearch.yml
...
http.port: 9200
...
reindex.remote.whitelist: ["upstream-elasticsearch-url.com:9200", "other-upstream-elasticsearch-url.com:9200"]

The whitelisted URL must contain the upstream HTTP port and must not contain the scheme.

Provision a new downstream server

Provisioning a new downstream server has the following prerequisites:

  • start with an empty dataset

  • collect all terminology resource identifiers that need to be syndicated

  • get all the necessary credentials to communicate with upstream

  • initiate the resource syndication and verify the result

Collect terminology resources for syndication

To populate a downstream server with terminology resources via an upstream source, one must collect the required resource identifiers or resource version identifiers beforehand.

Resource identifiers must be in their simple form, e.g.:

  • SNOMED-CT

  • ICD-10

  • LOINC

Resource version identifiers must be in the following form: <resource_id>/<version_id>, e.g.:

  • SNOMED-CT/2020-01-31

  • ICD-10/v2019

  • LOINC/v2.72

To determine which resources are available for syndication, the following upstream REST API endpoint can be used. It returns an atom feed that consists of resource versions from where one can collect the required identifiers.

Retrieve syndication resource feed

GET https://upstream-snow-owl-url/snowowl/syndication/feed.xml

Retrieves the feed of all distributable resources

Query Parameters

NameTypeDescription

resource

List<String>

The resource identifier(s) to include in the feed

resourceType

List<String>

The types of resources to include in the feed (e.g. conceptmaps, valuesets, codesystems)

resourceUrl

List<String>

The URLs of the resources to include in the feed

packageTypes

List<String>

The types of packages to include in the feed. Only BINARY is supported at the moment

effectiveTime

String

The effective time value to match (yyyyMMdd) or an effective time range value to match (yyyyMMdd...yyyyMMdd), inclusive range

createdAt

Long

Exact match filter for the resource version created at field

createdAtFrom

Long

Greater than equal to filter for the resource version created at field

createdAtTo

String

Less than equal to filter for the resource version created at field

limit*

int

The maximum number of items to return

<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <id>urn:uuid:ddce3cd6-2efe-3142-9cce-62e73d3031ca</id>
  <title>Snow Owl® Terminology Server Syndication Feed</title>
  ...
  <entry>
    <id>valuesets/1234/V1.0</id>
    ...
    <title>Valueset example</title>
    <category term="BINARY" scheme="https://b2ihealthcare.com/snow-owl/syndication/binary/1.0.0" label="Binary index"/>
    ...
  </entry>
</feed>

It is not required to list all resource version identifiers for an already selected resource. E.g.:

  • If SNOMED-CT is selected as a resource, it is not required to select all its versions among the version resource identifiers.

  • If a specific version is selected (SNOMED-CT/2020-01-31) and the resource is not listed among the selected resources, then only versions created until 2020-01-31 will be syndicated

Syndicate resources

To kick off a syndication process the following parameters are required:

  • the list of resource identifiers

  • the list of resource version identifiers

  • the upstream Snow Owl URL without its REST API root context:

    • e.g. https://upstream-snow-owl-url.com

  • the API key to authenticate with the upstream Snow Owl server

  • the upstream Elasticsearch URL, including the scheme and port:

    • e.g. https://upstream-elasticsearch-url.com:9200

  • the API key to authenticate with the upstream Elasticsearch

When there are no existing resources on the downstream server yet, at least one resource identifier or one resource version identifier must be selected.

Snow Owl will resolve all resource dependencies and will handle syndication requests rigorously. If e.g. a Value Set depends on a specific SNOMED CT version and that version is not among the selected resources - or does not exist on the downstream server yet - the syndication run will fail to note that there is a missing dependency. It is always required to list all dependencies that the selected resources have for a given syndication run.

The above parameters should be fed to the following downstream Snow Owl REST API endpoint:

Syndicate resource(s)

POST https://downstream-snow-owl-url/snowowl/syndication/syndicate

Syndicate resources from a remote Snow Owl instance. In case no resource identifiers are provided, all existing resources will be syndicated to their latest version.

Request Body

NameTypeDescription

resource

List<String>

List of resource identifiers

version

List<String>

List of version resource identifiers

upstreamUrl*

String

The URL of the upstream Snow Owl

upstreamToken*

String

API key for the upstream Snow Owl

upstreamDataUrl*

String

The URL of the upstream Elasticsearch

upstreamDataToken*

String

API key for the upstream Elasticsearch

The syndication process starts in the background as an asynchronous job. It can be tracked by calling the following endpoint using the job identifier returned in the Location:

Retrieve syndication job

GET https://downstream-snow-owl-url/snowowl/syndication/{id}

Returns the specified syndication run's configuration and status.

Path Parameters

NameTypeDescription

id*

String

The identifier of a syndication run

{
    // Response
}

The returned result object will contain all information related to the given syndication run:

  • status of the run (RUNNING, FINISHED, FAILED)

  • list of successfully syndicated resource versions

  • additional details about created or updated Elasticsearch indices

Examples of resource selection

Code Systems

There is a need to syndicate the SNOMED-CT US extension. It depends on the SNOMED CT International version 2021-01-31. Provide the following resource identifier and resource version identifier configuration:

{
  "resource": "SNOMED-CT-US",
  "version": "SNOMED-CT/2021-01-31"
}

This will syndicate all versions of SNOMED-CT-US and all international versions until 2021-01-31.

If the configuration is changed to:

{
  "resource": "SNOMED-CT-US, SNOMED-CT"
  "version": ""
}

This will syndicate all versions of SNOMED-CT-US and SNOMED-CT international, including all international versions even after 2021-01-31.

Value Sets

There is a Value Set with an identifier of VS and members from SNOMED-CT/2020-07-31:

{
  "resource": "VS"
  "version": "SNOMED-CT/2020-07-31"
}

Concept Maps

There is a Concept Map with an identifier of CM mapping concepts between LOINC/v2.72 and ICD-10/v2019:

{
  "resource": "CM"
  "version": "LOINC/v2.72, ICD-10/v2019"
}

Keeping a downstream server up-to-date

If a given downstream server already contains the desired resources and the goal is to keep the content up-to-date, it is not required to fill in the resource and resource version identifiers for the syndication request.

One can call the POST /syndication/syndicate endpoint with all the credentials and URLs but without specifying any resource or version identifier. The server will automatically determine - based on the set of existing downstream resources - if there are any new resource versions available for syndication.

To check whether there are any updates available, there is an endpoint that can be called:

Retrieve a list of resource versions which are available for syndication

GET https://downstream-snow-owl-url/snowowl/syndication/list

Returns the full list of resource versions to be syndicated based on the search criteria. If no filters are provided updates are calculated for all existing resources.

Query Parameters

NameTypeDescription

resource

List<String>

The resource identifier(s) to syndicate, e.g. SNOMEDCT (== latest version)

version

List<String>

The version identifier(s) to syndicate, e.g. SNOMEDCT/2022-07-31

upstreamUrl*

String

The URL of the upstream Snow Owl server

upstreamToken*

String

The token to authenticate with the upstream Snow Owl server

limit*

int

The number of resource versions to return if there are any

{
    "items": [
        {
            "id": "SNOMED-CT/2022-01-31",
            "version": "2022-01-31",
            "description": "2022-01-31",
            "effectiveTime": "2022-01-31",
            "resource": "codesystems/SNOMED-CT"
        },
        {
            "id": "SNOMED-CT/2022-07-31",
            "version": "2022-07-31",
            "description": "2022-07-31",
            "effectiveTime": "2022-07-31",
            "resource": "codesystems/SNOMED-CT"
        }
    ]
    "limit": 50,
    "total": 2
}

If there are any updates this endpoint will return a list of versions, if there are none it will return an empty result.

Last updated