Handling Permissions

As per Beacon specification there are three types of permissions:

  • PUBLIC - data available for anyone;
  • REGISTERED - data available for users registered on a service for special credentials e.g. ELIXIR bona_fide or researcher status. Requires a JWT Token;
  • CONTROLLED - data available for users that have been granted access to a protected resource by a Data Access Committee (DAC).

Note

In this page we are illustrating permissions according to: GA4GH Authentication and Authorization Infrastructure (AAI) OpenID Connect Profile.

Registered Data

For retrieving REGISTERED permissions the function below forwards the TOKEN to another server (e.g ELIXIR userinfo endpoint) that validates the information in the token is for a registered user/token and retrieves a JSON message that contains data regarding the Bona Fide status. Custom servers can be set up to mimic this functionality.

    {
        "ga4gh_visa_v1": {
            "type": "ResearcherStatus",
            "value": "https://doi.org/10.1038/s41431-018-0219-y",
            "source": "https://ga4gh.org/duri/no_org",
            "by": "peer",
            "asserted": 1539017776,
            "expires": 1593165413
        }
    }

The function below then checks for the existence of the ga4gh.AcceptedTermsAndPolicies and ga4gh.ResearcherStatus keys, which will indicate, that the user has agreed to follow ethical researcher practices, and has been recognised by another esteemed researcher.

async def get_ga4gh_bona_fide(passports):
    """Retrieve Bona Fide status from GA4GH JWT claim."""
    LOG.info("Parsing GA4GH bona fide claims.")

    # User must have agreed to terms, and been recognized by a peer to be granted Bona Fide status
    terms = False
    status = False

    for passport in passports:
        # Check for the `type` of visa to determine if to look for `terms` or `status`
        #
        # CHECK FOR TERMS
        passport_type = passport[2].get('ga4gh_visa_v1', {}).get('type')
        passport_value = passport[2].get('ga4gh_visa_v1', {}).get('value')
        if passport_type in 'AcceptedTermsAndPolicies' and passport_value == OAUTH2_CONFIG.bona_fide_value:
            # This passport has the correct type and value, next step is to validate it
            #
            # Decode passport and validate its contents
            # If the validation passes, terms will be set to True
            # If the validation fails, an exception will be raised
            # (and ignored since it's not fatal), and terms will remain False
            await validate_passport(passport)
            # The token is validated, therefore the terms are accepted
            terms = True
        #
        # CHECK FOR STATUS
        if passport_value == OAUTH2_CONFIG.bona_fide_value and passport_type == 'ResearcherStatus':
            # Check if the visa contains a bona fide value
            # This passport has the correct type and value, next step is to validate it
            #
            # Decode passport and validate its contents
            # If the validation passes, status will be set to True
            # If the validation fails, an exception will be raised
            # (and ignored since it's not fatal), and status will remain False
            await validate_passport(passport)
            # The token is validated, therefore the status is accepted
            status = True

        # User has agreed to terms and has been recognized by a peer, return True for Bona Fide status

Note

The ga4gh.AcceptedTermsAndPolicies and ga4gh.ResearcherStatus keys’ values must be equal to those mandated by GA4GH.

Controlled Data

Note

See https://tools.ietf.org/html/rfc7519 for more information on claims and JWT. A short intro on the JSON Web Tokens available at: https://jwt.io/introduction/

In order to retrieve permissions for the CONTROLLED datasets via a JWT token, we added a permissions module beacon_api.permissions() that aims to act as a platform where add-ons are placed for processing different styles of permissions claims.

The main reason for choosing such a method of handling dataset permissions, is that there is no standard way for delivering access to datasets via JWT Tokens and each AAI authority provides different claims with different structures.

By default we include beacon_api.permissions.ga4gh() add-on that offers the means to retrieve permissions following the GA4GH format via a token provided by ELIXIR AAI.

If a token contains ga4gh_userinfo_claims JWT claim with ga4gh.ControlledAccessGrants, these are parsed and retrieved as illustrated in:

async def get_ga4gh_controlled(passports):
    """Retrieve dataset permissions from GA4GH passport visas."""
    # We only want to get datasets once, thus the set which prevents duplicates
    LOG.info("Parsing GA4GH dataset permissions.")
    datasets = set()

    for passport in passports:
        # Decode passport and validate its contents
        validated_passport = await validate_passport(passport)
        # Extract dataset id from validated passport
        # The dataset value will be of form `https://institution.org/urn:dataset:1000`
        # the extracted dataset will always be the last list element when split with `/`
        dataset = validated_passport.get('ga4gh_visa_v1', {}).get('value').split('/')[-1]
        # Add dataset to set
        datasets.add(dataset)

    return datasets

The permissions are then passed in beacon_api.utils.validate() as illustrated below:

# for now the permissions just reflects that the data can be decoded from token
# the bona fide status is checked against ELIXIR AAI by default or the URL from config
# the bona_fide_status is specific to ELIXIR Tokens
# Retrieve GA4GH Passports from /userinfo and process them into dataset permissions and bona fide status
dataset_permissions, bona_fide_status = set(), False
dataset_permissions, bona_fide_status = await check_ga4gh_token(decoded_data, token, bona_fide_status, dataset_permissions)
# currently we offer module for parsing GA4GH permissions, but multiple claims and providers can be utilised
# by updating the set, meaning replicating the line below with the permissions function and its associated claim
# For GA4GH DURI permissions (ELIXIR Permissions API 2.0)
controlled_datasets = set()
controlled_datasets.update(dataset_permissions)
all_controlled = list(controlled_datasets) if bool(controlled_datasets) else None
request["token"] = {"bona_fide_status": bona_fide_status,
                    # permissions key will hold the actual permissions found in the token/userinfo e.g. GA4GH permissions
                    "permissions": all_controlled,
                    # additional checks can be performed against this authenticated key
                    # currently if a token is valid that means request is authenticated
                    "authenticated": True}

If there is no claim for GA4GH permissions as illustrated above, they will not be added to controlled_datasets.

More datasets can be added to the controlled_datasets set() by updating:

controlled_datasets.update(custom_add_on())

where custom_add_on() is a function one could add in beacon_api.permissions().

An example of such a function is beacon_api.permissions.ga4gh() and the specific JWT claim it should parse.

Attention

JWT is validated against an AAI OAuth2 signing authority with the public key. This public key can be provided either a JWK server or the environment variable PUBLIC_KEY. See also: OAuth2 Configuration.

Access Resolution

In the tables below we illustrate how the beacon server handles access to datasets. We have integrated tests for these use cases that can be found at: beacon-python Github deploy tests.

Table Legend

  • colours:

    • green is for PUBLIC datasets;
    • orange is for REGISTERED datasets;
    • red is for CONTROLLED datasets;
    • blue is for errors in retrieving datasets, currently done via HTTP error statuses;
  • [] - all available datasets are requested;

  • if a cell is empty it means no datasets are requested;

  • ✓ - is used to represent that:

    • a JWT TOKEN is present in the request - used for retrieving CONTROLLED datasets from JWT claim;
    • a user’s BONA FIDE status can be retrieved - used for REGISTERED datasets
    • if the ✓ is not present that means (depending on the column) there is no TOKEN or BONA FIDE is not provided;
  • PERMISSIONS column reflects the dataset permissions found in the JWT TOKEN claim, if column is empty no datasets are in that specific claim.

Default cases (no dataset IDs specified)

Most queries to the beacon do not specify datasets IDs meaning a request does not contain the datasetIds parameter. For such cases we handle permissions as illustrated below.

Requested datasets DB: 1, 2, 3, 4, 5, 6
PUBLIC REGISTERED CONTROLLED TOKEN PERMISSIONS BONA FIDE RESPONSE
[] [] []       1, 2
[] [] []     1, 2
[] [] []   1, 2, 3, 4
[] [] [] 5, 6   1, 2, 5, 6
[] [] [] 5, 6 1, 2, 3, 4, 5, 6

Specific cases (dataset IDs specified)

For cases in which the dataset IDs are specified we handle permissions as in the table below.

Requested datasets DB: 1, 2, 3, 4, 5, 6
PUBLIC REGISTERED CONTROLLED TOKEN PERMISSIONS BONA FIDE RESPONSE
    5, 6 5   5
1   5       1
  4 7   4
  3         401 Unauthorized
    5       401 Unauthorized
  4       403 Forbidden
    6 7   403 Forbidden
2   6 7   2