Datasets, Scores and Types

In this section we will describe these key concepts about checks.

In the Understanding Background Checks section we explained some of the vocabulary we are going to use. We recommend reading that section before jumping into this one.

Datasets

Datasets are categories of information used to group related data sources and the information they return.

For example, the Personal Identity dataset will only contain details about the subject’s identity information (names, last names, date of birth, …), collected from different data sources. Similarly, the International Background dataset will only contain details about the subject appearing in international sanctions lists like those for terrorism or financial crimes.

Our datasets are:

Affiliations and Insurances
Alert in Media
Business Background
Criminal Record
Driving Licenses
International Background
Legal Background
Personal Identity
Professional Background
Taxes and Finances
Traffic Fines
Vehicle Information
Vehicle Permits

Check Types

When a check is created, how does it know what datasets will be searched? That’s where check types come in. Check types define what datasets will be queried on the check and how relevant will each dataset be when calculating the global score.

The way check types affect the scores will described when introducing Scores

Truora has 3 default check types:

Type	Included datasets	Description
`person`	- Personal Identity - Criminal Record - Legal Background - International Background - Alert In Media Only available in some countries: - Professional Background - Affiliations and Insurances - Taxes and Finances	Used to perform a background check on a person.
`vehicle`	- Vehicle Information - Personal Identity - Traffic Fines - Driving Licenses - Criminal Record - International Background Only available in some countries: - Vehicle Permits	Used to perform a background check on a driver and their vehicle.
`company`	- Business Background - International Background - Alert in Media Only available in some countries: - Criminal Record - Legal Background - Taxes and Finances	Used to perform a background check on a company.

You can define custom types to query specific datasets. This is further explained in the Creating a custom type section.

Scores

Checks scores are a quantitative measure for how confident a background check’s results are: the higher the score, the cleaner the check results. This attribute makes scores useful to automate decisions easily. Scores range from 0 to 1, or from 0 to 10 when displayed in the dashboard.

Dataset scores

Apart from grouping the check results, each dataset is assigned a score when a background check is performed. This way, each dataset has a score representing how confident their results are.

Global score

All checks have an unique global score which is calculated as a weighted sum of all the dataset scores. Each dataset's weight is determined by the check type. As a consequence, some datasets will contribute more than others to the global score.

For instance, let’s suppose we have a custom type with 3 datasets, with the following weight configuration:

Dataset	Weight
Personal Identity	0.2
Criminal Record	0.4
International Background	0.4

And the check yields these scores for those datasets:

Dataset	Score
Personal Identity	1
Criminal Record	0.4
International Background	0.6

The global score will be the sum of each dataset score multiplied by its weight:

Dataset	Score	Weight	Contribution (score * weight)
Personal Identity	1	0.2	0.2
Criminal Record	0.4	0.4	0.16
International Background	0.6	0.4	0.24
Global score	---	---	0.6

Scores and Homonyms

When querying information for a check, one part of the data sources requires identification numbers as an input, while the other requires names and last names. This first set of data sources, those that require an identification number as an input, we say they perform a search by ID. On the other hand, the data sources that require names and last names are said to perform a search by name.

This distinction is important, as precision of the results returned by the data sources depends on if they perform a search by ID or by name. More specifically, the data sources that perform a search by ID will return records that belong exactly to the person represented by that identification number. In contrast, the records yielded by the data sources that perform a search by name, could belong to any* person that has the same name as the person subject to the background check (an homonym).

*: Results found via search by name are validated with the information found on other parts of the background check. By doing so, this tends to diminish the number of homonyms in the results.

Because of the checks including data sources that perform both types of searches, sometimes the check result can be mixed up with that of a homonym. Due to this situation, both scores detailed before (dataset and global) will also include two extra scores each: a score by name and a score by ID. These two represent the score if we only took into account details via search by name or search by ID, respectively.

Depending on the country and the datasets you are using for your check types, you may also take into account these scores when making an automated decision.