StAR Releases a Technical Guide on Automated Risk Analysis of Anti-Corruption Declarations

The publication stresses that over 160 countries have introduced such an anti-corruption instrument as a declaration system. In spite of the fact that in most cases declarations are still submitted on paper and processed manually, some countries are already using electronic disclosure systems.

An electronic disclosure system can include different elements: registration of the declarant in the system, submission of the electronic asset and/or interest declaration form, validation of the electronic form, control of submission, automated risk analysis, recording actions taken to process declarations, management of verification cases, publication of declaration data online for public use, and data exchange with external databases and entities. However, most countries that have introduced electronic disclosure systems by now are prevalently using only its basic element, i.e. electronic filing (this mechanism is also employed in the Russian Federation). At the same time, electronic processing of declarations is much more important to effectively fight corruption.

The StAR Guide is developed to provide the countries that intend to fully use the potential of electronic disclosure systems with general information on how to use one of the elements of this system: automated risk analysis of declarations.

Automated risk analysis

Automated risk analysis means assessment of data submitted in a declaration form conducted by software using a predetermined set of risk indicators. Its goal is to raise “red flags” that can indicate the presence of different violations: illicit enrichment, conflict of interest, incompatibility of public office with other activities, prohibited gifts or sponsorships, prohibited financial interests, and violation of postemployment restrictions etc.

This analysis can include:

analysing data within one declaration; the risk indicators in this case can include, for instance, real-estate property/vehicles acquired in the reporting period without indicating the property’s value; land plots/apartments with the total area exceeding X square meters; a residential building/farmhouse/other buildings belonging to the declarant or family members without mentioning any plots of land;
comparing with data in the previous declarations of the same declarant; the risk indicators in this case can include, for example, more new real estate items/vehicles compared with the previous declaration of the declarant; the verified declaration does not mention an asset that was mentioned in a previous declaration and does not mention any income from the disposal of such asset;
comparing with declarations of other declarants to detect unusual disparities; the risk indicators can include, in particular, a situation where the total income mentioned in the declaration exceeds the average income mentioned in the declarations of officials of the same category or all declarants in the same district/region by a certain percentage;
comparing with declarations of declarants’ family members or other associates; the risk indicators in this case will include the absence of information about the assets disclosed by those persons in the official’s declaration;
comparing data with external databases (public registers, open and commercial databases); the risk indicators can include the following: data mentioned in the declaration does not match data in the register of vehicles, company and national ID registers etc.; data in the declaration matches data in the company register and/or tax database and alerts to the possible conflict of interest (for example, the cross-checks show that the declarant or family member acquired shares or control in the entity conducting commercial activity in the declarant’s area of work) or the possible violation of postemployment restrictions (for example, the declarant who used to work in tax administration following dismissal from office received income as a tax consultant; or the declarant who worked in a regulator has acquired shares in a regulated entity shortly after dismissal).

Each indicator has a weight that reflects the probability of a violation occurring if this indicator is detected. To introduce a degree of objectivity in assigning weights the developers can use a set of criteria and provide an example of such a matrix that consists of six criteria:

indicator is established through an automated calculation,
discrepancy is significant (above established threshold of X),
other violations of law are possible,
case could lead to a criminal prosecution,
the indicator is triggered in a number of all declarations,
high probability of successful follow-up verification based on previous analysis.

Each affirmative answer weights one point, which means that each indicator weights from zero to six points.

After applying the risk analysis rules to the declaration, the electronic system calculates the risk value as a number. The declaration’s risk value represents a probability score for irregularity that can be detected throughout further verification procedures.

After that, depending on the selected domestic approach, a mandatory manual verification can be conducted only with respect to the declarations with the risk value above the pre-established threshold; or the declarations whose overall risk value is below the threshold can still undergo verifications but only for the risks identified; or an additional verification procedure can be carried out with respect to the declarations whose analysis triggered certain “red flags” (for example, a loan above a certain value received from a family member).

Benefits of automated risk analysis and challenges for its introduction

The application of the automated risk analysis to declarations has a number of benefits: firstly, such analysis allows for the filtering of declarations and prioritising the verification process (which is relevant for countries where significant numbers of officials are obliged to submit declarations); secondly, it minimises manual processes thereby making the whole process more impartial and credible; finally, the risk analysis not only allows obtaining information about possible violations but also provides valuable insights that can then inform policy makers on how to improve disclosure regulations.

At the same time it should be understood that the detection of “red flags” does not confirm that a violation took place: it simply demonstrates that further verifications or another corrective action are needed. Conversely, if “red flags” are not detected in the course of automated analysis, it does not mean that there are no violations. Therefore, the automated risk analysis of declarations can be used only as an additional instrument for detecting indicators of corruption offences.

Consideration should also be given to the fact that the Guide provides general recommendations on how to implement the automated risk analysis of declarations: when making a decision on the development of relevant software each country should take into account its own peculiarities, including the possibility to obtain certain information from different databases, processing and protection of personal data and other restricted information etc.

Additionally, in order to introduce the automated risk analysis it is necessary to carry out considerable preliminary work: if a country has not introduced electronic filing of declarations yet or if it has been introduced relatively recently, it will be necessary to digitise a large volume of information and/or transform it into such a format that will allow for its further analysis. The quality of this work will directly determine the effectiveness of subsequent automated analysis; therefore it is crucial that the digitalised information is complete and reliable.

Moreover, the developers of automated risk analysis software may encounter such problems as:

lack of financial and technical resources to design, implement, and maintain the risk analysis system,
insufficient expertise in verifying declarations or obstacles in transforming such expertise into the risk analysis framework,
lack of a clear legal framework that allows using risk analysis in the verification process,
lack of the automated access to external data sources,
poor quality of external data used for the risk analysis (for example, gaps and mistakes in the data of other government registers).

Stages of development of the risk analysis framework

A wide range of persons should be engaged in the development of software which allows conducting automated analysis of declarations with respect to corruption risks: IT specialists, experts in asset and interest declarations and in other areas such as risk analysis experts who deal with fraud and money laundering prevention, civil society and the private sector etc.

According to the Guide, the entire process of development and implementation of this software includes five major stages.

1. Preparation stage

In the first stage, it is necessary to review the data that the verification agency or the system developers have access to, the data that was submitted electronically as well as the data that can be obtained from their digitization, assess the format, quality, coverage and amount of these data. Additionally, developers will have to decide whether to use paperbased declarations, what areas of declarations to use for further analysis, and what kind of aggregated values to calculate (for example, in order to assess information on land plots they can use the following data: total area of land owned by the declarant; total area of land owned by his/her family members; total number of land plots owned by the declarant and his/her family members; total number of land plots in a certain region (for instance, in a municipality where the declarant and/or his/her family members own land plots); total number of land plots bought by the declarant (his/her family members) since the beginning of public service; total number of land plots bought in the past two to five years).

2. Data extraction stage

Once the developers define the set of data, they should update the software to get the required aggregated values out of each declaration. The deliverable of this stage could be a table in which each declaration is represented by a row that comprises meta information (declarant, year of submission, position, office) and aggregated information, defined during the previous stage.

In the data extraction stage, the information that allows identifying the declarant should be anonymised.

3. Data exploration stage

In this stage, developers should load the tables obtained into software for the business analysis with a view to analyse and visualise data without the need to change the electronic declaration system software repeatedly just to explore different hypotheses.

The publication stresses that there are many such instruments, including those that are open source (for example, Kibana, Metabase, Superset, Redash) and available for free (QlikView Personal Edition and PowerBI Desktop). Because the information that will be fed in such analytical software is anonymised, the authors believe that the data security risks are low.

4. Hypotheses formulation and verification stage

The expertise received during manual verification of declarations (if already conducted) and the insights gained during the previous stage should be converted into the list of “hypotheses”: risk indicators, their weights and thresholds.

For example, if “manual” verification often shows that having a land plot or real estate property abroad is a sign of potential unjustified wealth, owning a plot of land abroad could become a risk indicator.

To establish the appropriate level of threshold of yet another risk indicator - the increase in income over a certain percentage - the average income fluctuations over a number of years could be analysed.

By the end of this stage, the developers of the risk analysis framework can have a list of tested and validated risk indicators with thresholds and variables adjusted.

5. Final stage and monitoring of effectiveness

In the final stage, the full range of risk indicators is applied to all declarations to produce the total risk weight for each declaration.

To adjust the risk analysis framework, the verification agency should keep relevant statistics and regularly review results of the conducted risk analysis, making necessary changes as shortcomings are detected, new insights come up, more data become available, new typologies to hide assets are identified, the legislation is amended etc.

It should be noted that the StAR publication has a general character and does not unveil a number of aspects that are relevant in terms of implementation of an automated risk analysis framework. In particular, the Guide covers only certain, most obvious examples of risk indicators; the publication provides few recommendations on how to determine the weight of indicators; such important matters as the provision of access to different external registers and databases, the use and protection of personal data are not addressed in depth.

Moreover, the possibility to use software to automatically analyse risks is examined only in the narrow framework of electronic declaration systems. In our opinion, this instrument could be more effectively used as a part of an IT-system with broader capacities which would allow not only gathering and processing of declarations but also managing the workflow of anti-corruption divisions in electronic form and detecting corruption indicators in the fulfillment of different public functions, for example, in public procurement. This approach would ensure that there are more sources of information revealing different corruption offences available for analysis: for instance, information obtained from declarations could be compared with different notifications filed by officials, information about participants and winning procurement bidders etc.