Checks Lifecycle: Deep Dive

In this section we will explain how a background check unfolds over time, detailing which circumstances affect the check's execution.

This is a very complex section, where we use terms that we have defined before. We recommend reading the Understanding Background Checks and the Datasets, Scores and Types sections before this one.

Check Statuses: Overview

All checks have a status that represents in which part of the lifecycle they are. We will explain each part one by one (with the corresponding status) and then do a recap at the end of this section.

Queues and Priorities

First of all, when a check is created it can be enqueued and remain enqueued for a time. While the check is enqueued, its status will be not_started. This queue system allows us to monitor in real time how our data sources are fairing with volumes, and prevent us from exceeding capacity in them.

The way we determine how a check should be handled by the queue system, is by the check’s priority. More specifically, the priority decides whether the check is enqueued or not and how much time will it remain enqueued.

There are 3 priorities:

High: Is the fastest way that a check can be executed. It will skip any queues and it is only limited by how many other high-priority checks are currently running.
Medium: Is the default priority if not specified. This causes checks to be enqueued. Also, there is not a limit on how many checks can be enqueued at any given time.
Low: These checks are similar to medium, but they are enqueued in a lower priority queue. This means they will remain more time enqueued than those created with the medium priority.

Here is a diagram detailing the information explained above:

Check lifecycle diagram - queues

Check execution and data sources

When a check starts, we begin the data collection of the data sources specified by the datasets present on the check type. At this point, the check changes its status to in_progress.

The check will remain in_progress as long as there is at least one database collecting results. When the last database yields results, the check will change to status completed.

Note that data sources also have statuses, so if you only need information from specific data sources, you should watch for their status changing to completed.

This diagram depicts this process:

Check lifecycle diagram - check start (simple)

Things can go wrong

Unfortunately, a check’s processing isn’t always as smooth as shown above. This is due to the volatile nature of the public data sources we query data from. For example, some data sources could be down or slow at the moment when the check is started. It is worth noting that Truora is not in control of the data sources we query, so any setback with any data source is out of our reach.

When a data source is down, we will retry to collect the involved data source up to a limit. If this retry limit is reached, we will mark the status of that data source as error. Besides this, when a check finishes and more than 30% of the data sources end up in status error, the general check status will change to error. Checks that end in error status will not be charged.

Similarly, when a database is slow, its status will change to delayed. Depending on the priority of the check, there is a time frame where if all the data sources have not finished yet, the general check status will change to delayed.

This time frame duration is as follows, and starts at the moment the check begins collecting data:

Priority	Duration before `delayed` status
`high`	2 minutes
`medium`	8 minutes
`low`	20 minutes

The delayed check status means that the data sources that are slow probably won’t be yielding any results. In these cases, instead of waiting for the check to complete, it might be appropriate to make a decision with the information that the check currently has, or creating a check later if the information from that particular data source is crucial to you.

When a check enters the delayed status, depending on the country, it can take up to 3 days to finish. When this timeout occurs, the check will be forced to finish and we will count delayed databases as errors and set the final score accordingly.

The worst case scenario, where one data source never yields any information, is described by the following diagram:

Check lifecycle diagram - check start (worst case scenario)

The complete lifecycle

At last, here is a diagram detailing the whole check lifecycle:

Check lifecycle diagram - complete

In addition, the next diagram represents all the possible statuses on a check and how they transition to one another:

Check statuses state machine

To sum up the whole lifecycle, these are some brief descriptions of the check and data source statuses:

Check Statuses

Status	Description
`not_started`	The check is enqueued and the data collection has not started yet.
`in_progress`	Data is being collected but some data sources may have finished already.
`delayed`	One or more data sources is taking a long time to query the data. Most data sources will have already finished collecting data.
`completed`	The check finished and 70% or more of the data sources did not end in status error.
`error`	The check finished and more than 30% of the data sources ended in status error.

Data source statuses

Status	Description
`not_started`	The data source data collection was triggered.
`skipped`	The data source does not fetch any data as it does not have the required inputs to do so.
`delayed`	The data source is taking a long time to query the data.
`completed`	The data source data was fetched successfully and it’s present on the check details.
`error`	We could not fetch data from that data source.