Privacy Protection in COVID Data Tracking: Textual Analysis of the Literature ()
1. Introduction
December 2019 marked the beginning of a global challenge with the arrival of the SARS-Coronavirus, leading to sudden changes in the strategic, health and governance plans of several countries. Governments responded with lockdown policies and movement restrictions, which were effective in reducing the spread of the epidemic but proved socially and economically costly in the long run. The lack of established data and protocols prompted a search for new interpretive and predictive models to understand and contain the epidemic.
However, predictive models have faced limitations due to the scarcity of data, which are not always reliable, from different sources. The pandemic has highlighted the importance of data quality and availability, leading to the creation of tracking apps in various countries. The most effective contact tracing was found to be where there was greater population adherence to the use of such apps.
The paper examines different approaches used in contact tracing apps, highlighting critical issues and limitations, particularly related to privacy regulatory issues. It also illustrates blockchain technology whose mobile application may, in some ways, be perceived as more secure by end users for the protection and management of personal data, compared to traditional methods.
2. Approaches and Methodologies
2.1. Materials
In the present study, scientific output related to COVID-19 published between 2020 and 2022 was analyzed, with a focus on articles dealing with contact tracing methods and health data privacy. The search was initially conducted using the keywords “COVID-19” on Google Scholar, Pub Med, Scopus, and SIS, identifying articles that contained this word in their titles, abstracts, or keywords. The keywords “Sars” and “Cov2” were then added, resulting in a total of 333,175 articles. By applying filters to focus on articles related to contact tracing and that fell within the statistical scope, the number dropped to 1172. Finally, by introducing the keyword “privacy” as a filter, the number was further reduced to only 75 articles, from [1] - [75] .
A key point that emerged from the literature review is the importance of informed consent for the dissemination of health data, which is considered crucial when analyzing medical phenomena. Privacy protections can hinder complete and accurate data collection, making it difficult to build a comprehensive database on the phenomenon under study. The COVID-19 pandemic has highlighted the challenge of health data availability.
2.2. Textual Analysis
Textual analysis of the reviewed bibliography was conducted using TLab software. The first methodology considered was co-occurrence on the 75 selected articles through Sequence and Network analysis, Word Association, and Comparison of Word Pairs. Sequence and Network Analysis considers the positions of keywords within the sentences of each article. This allowed each text to be represented as a network of word relationships. The relationships between nodes in the network were connected one by one, also identifying predecessor and successor pairs through transition probabilities and Markov chains.
Through Word Association, associations between selected keywords and other words in the texts were checked. An association index was used to measure the degree of linkage between words. The results were represented through radial diagrams, in which the keywords are placed in the center and the other associated words are arranged around them at a distance proportional to the degree of association.
Using the Pairwise Comparison method, pairs of selected words were related to others in the texts. In this way it was possible to evaluate the relationships and associations between different word pairs.
Overall, these analyses provided an in-depth understanding of the relationships between keywords and others in the texts, enabling the identification of significant patterns, trends and correlations within the data.
Significant keywords for the study included “security”, “app”, “privacy”, “trust”, “interoperability”, “centralized”, and “decentralized”.
Multiple Correspondence Analysis (MCA) was used to analyze the patterns of categorical variables, such as keywords or elementary contexts, found in the analyzed texts. This multidimensional statistical technique aims to identify “interdependence” relationships among qualitative variables.
Through MCA, new variables can be extracted that neatly summarize the relevant information contained in the data. This analysis allows visualization of the relationships between the categories analyzed through a factorial plane.
3. Contact Tracing App
Manual tracing, previously effective in the SARS and MERS pandemics, revealed immediate limitations in the COVID-19 pandemic because of the rapid spread of infection. In this case, the contact tracing procedure involved identification of people exposed to the virus and follow-up for at least 14 days after the last exposure, but proved to be ineffective, especially for occasional contacts not easily identified in public places. In addition, the incubation period before symptoms onset made it difficult to map the network of possible exposed individuals and to identify the rate of virus transmission.
To overcome these challenges, many governments developed mobile apps for digital contact tracking during the COVID-19 pandemic. These apps aimed to immediately inform users of exposure to an infected person, expediting isolation, testing and quarantine measures. However, the effectiveness of the apps depended on their acceptance and dissemination, which had to reach at least 60 percent of the population. Unfortunately, most apps had significantly lower rates of use, leading to avoidance of broader restrictive measures.
The literature review revealed that the low uptake of apps was attributable to citizens’ concerns about information security and privacy protection, lack of trust in governments, and lack of understanding of the benefits of apps.
3.1. Analysis of Co-Occurrence
A co-occurrence analysis, using the pairwise comparison method “Figure 1”, showed that trust is closely related to the government and policies adopted, while security is associated with the digital capability of software in terms of
Figure 1. Co-occurrence analysis by comparison of word pairs with respect to the terms “Trust” and “Security”.
cyber and privacy protection of the information obtained. The difficulty in the dissemination of apps was most apparent in democratic countries, where privacy concerns were significant obstacles compared to countries outside the EU, and this was due to the presence of stringent regulations that did not allow the free flow of information. Thus, it was found that in most democratic countries, individual freedom of information circulation outweighed collective health protection, hindering both the monitoring of pandemic trends and the policy strategies to be applied.
The literature review identified several factors that may incentivize the spread of tracking apps during the COVID-19 pandemic. These include:
1) Age and Gender: demographic characteristics such as age and gender can influence the adoption of tracking apps.
2) Presence of comorbid illnesses or higher risk of infection: people with special health conditions or higher risk may be more likely to use the apps.
3) Familiarity with smartphone use: familiarity with smartphone technology may influence app adoption.
4) Trust in government: confidence in the government’s ability to protect the population may play a significant role in app acceptance.
5) Opt-in installation: the ability for users to voluntarily choose to install apps, rather than being forced to, may influence their acceptance.
6) Education level: a higher level of education may be associated with greater understanding and acceptance of tracking apps.
7) Psychological factors: elements such as fear of contagion to self, family, friends, and community of choice may influence app adoption.
8) Perception of benefits: perception of the benefits of using apps can positively influence their adoption.
9) Interoperability and usability: the ease of use and ability of apps to interact with other systems may contribute to their acceptance.
Focusing on users trust in apps, co-occurrence analysis through word associations “Figure 2” revealed that trust is mainly associated with the words “apps,” “governments,” and “participants.” This underscores the fact that users want secure apps that are guaranteed by governments committed to the protection of the public, and trustworthy participants who responsibly use the information they collect. Lack of trust was identified as a major barrier to the adoption of contact-tracking apps, while greater trust in government and technology was correlated with a greater propensity to install such apps.
Interoperability, that is, the ability of systems to exchange information and use it for research purposes, is crucial for the deployment of tracking apps. In this context, managing security and privacy issues is essential, and this can be addressed by improving information about data storage, sharing and use. This approach enables governance to maintain the trust of individuals [1] [2] .
As shown in “Figure 3”, what is most required in terms of interoperability is an information system based on system and information security. System security and information protection are key aspects of ensuring user trust and effective deployment of tracking apps.
In addition, app deployment has been limited by the presence of privacy
Figure 2. Co-occurrence analysis by word association with first-order association index cosine of the word “Trust”.
Figure 3. Co-occurrence analysis by word association with first-order cosine association index of the word “Interoperability”.
protection regulations and the General Data Protection Regulation (GDPR) in force in twenty-eight European Union member states, as well as Iceland, Liechtenstein, and Norway. The GDPR aims to protect personal data while preserving the confidentiality of the information released. Although the GDPR does not prohibit the use of personal data for analysis or other secondary purposes, it does require that technical and organizational measures be taken to protect the data. In addition, data should undergo a process of anonymization to ensure protection throughout the entire cycle of use.
To overcome these limitations, the shared platform called PanEuropean Privacy-Preserving Proximity Tracing (PEPP-PT) was developed. This platform, joined by more than 130 members from eight European countries, aims to identify technology solutions based on general standards for contact tracing, including proximity measurement using widespread devices such as smartphones, data encryption and anonymization, international interoperability, use of scalable architectures and technologies, and use of certified open-source code [3] .
All contact tracing apps (CTAs) developed required local installation on the devices, with the goal of tracking other smartphones and individuals in close physical proximity. In the event that the owner of one of these devices tested positive for COVID-19, the app sent an early warning to everyone in the vicinity, allowing them to isolate themselves and check whether they had contracted the virus [4] [5] [6] .
App-based contact tracking techniques were based on two main methodological approaches: location tracking via GPS and proximity tracking via Bluetooth. In the first mode, apps could track contacts based on location. In this case, a central server obtained the locations of each phone over time and used this information to identify contacts between patients and other users. Location information could be obtained from the cellular service provider or reported by apps that used GPS on smartphones. In the second mode, tracking was via Bluetooth Low-Energy beaconing (BLE) technology developed by Google and Apple. These companies introduced a protocol for tracking contacts through an exposure notification system that recorded contact between users, at what distance and for how long, via cell phone [7] [8] .
In addition, depending on how the sensed information was recorded on the server, two architectures emerged: centralized and decentralized. Both approaches ensured the protection of the personal information released.
The centralized approach involved sharing IDs with a central server operated by public health authorities, where they were matched with positive cases and triggered notifications. This allowed the authorities to have centralized control, as alerting was done by the central server in case of a match.
The textual analysis conducted on 75 items through Sequence and Network analysis showed that the centralized approach is opposed to the decentralized approach and has a strong connection with data protection servers and apps. “Figure 4” shows the relationships between the word “centralized” and associated words, highlighting connections with predecessors (in blue) and successors (in orange), calculated through transition probabilities (Markov chains).
With the decentralized approach, matching is done directly on the user’s smartphone, which compares IDs with a local list of IDs associated with positive
Figure 4. Co-occurrence analysis using the sequences and network analysis of the centralized approach.
cases. This method preserves the confidentiality of personal data, as no registration data is exchanged with the server unless the user has tested positive for COVID-19 and voluntarily decided to share this information.
The analysis shows that both approaches, centralized and decentralized, preserve the confidentiality of personal data, but the latter was considered more privacy friendly. In the decentralized case, information management takes place locally on the user’s device, and sensitive data is shared only in case of infection and with the user’s explicit consent.
In “Figure 5”, we show that the word “decentralized” is strongly associated with the words “app”, “tracking”, “privacy”, “architecture”, and “security systems”. This association once again emphasizes the importance placed on privacy, indicating that the focus is on preserving the confidentiality of information during the implementation of decentralized tracking apps.
To understand privacy connections, a co-occurrence analysis was conducted by word association using the first-order cosine association test. The analysis explored the main fears of potential users to assess the population’s concerns and resistance to installing the apps. “Figure 6” shows that the population desires protection and security for personal information released, requiring that the release of data has a positive impact on the entire community. Therefore, it seems essential that the population perceives an appropriate balance between the sacrifice in releasing personal information and the benefit in protecting the community.
Performing the Co-occurrence analysis by Word Association, but with the second-order index, again on the word privacy, “Figure 7” shows that in the 75 items analyzed, the words with a high similarity are “safety” and “security”.
Figure 5. Co-occurrence analysis using the sequences and network analysis of the decentralized approach.
Figure 6. Co-occurrence analysis with first-order cosine word association index “Privacy”.
Figure 7. Co-occurrence analysis with second-order word association index “Privacy”.
3.2. Multiple Correspondence Analysis
Finally, the patterns of categorical variables, keywords, or elementary contexts, were analyzed using Multiple Correspondence Analysis, a method of multidimensional statistical analysis of data with the aim of identifying “interdependence” relationships among the qualitative variables analyzed. Correspondence analysis makes it possible to extract new variables that have the property of summarizing relevant information in an orderly manner. A plot, factorial plane, is obtained, which allows us to appreciate all the intersections of the analyzed categories by identifying the relationships of similarity or difference between the linguistic entities considered.
As shown in “Figure 8”, the word “hospitalization” has a higher incidence than the other words, whose reciprocal position is calculated based on a similarity matrix and the frequencies present in each item examined. No words regarding privacy and contact tracing are present in the Cartesian plane, confirming that in the pandemic period, no special attention was paid to this important aspect. In fact, individual freedom allowed people not to share their health data while limiting the effectiveness of applications designed for tracing.
3.3. Comparison of Different Countries
In this section, different approaches in app implementation in various countries are illustrated. The NHS COVID-19 (National Health Service) app developed in the UK is based on the proximity and decentralized architecture developed by Apple and Google, which is considered by users to be more adherent to privacy regulations. It has been widely adopted, downloaded to 21 million devices out of
Figure 8. Multiple correspondence analysis of the word “hospitalization”.
a population of 34.3 million users; even more significant is the fact that at least 16.5 million people have used it regularly. This massive spread is attributable to the country’s high mortality rate in 2020, one of the highest in the world, which has made the local population more sensitive to the severity of the problem and more inclined to take measures to counter the spread of the virus [9] .
The app introduced in Norway took an approach that combined both Bluetooth Low Energy (BLE) technology and GPS for location tracking. Despite the significant number of downloads, amounting to 1.6 million, the National Institute of Public Health was forced to abandon the app after data protection authorities concluded that there was insufficient evidence of the app’s effectiveness in justifying the collection of people’s location data, with privacy and data protection concerns prevailing.
The COVID Tracker app developed in Ireland was used by a significant number of people roughly 1.3 million or 34 percent of the population over the age of 16 [10] .
Germany built the Corona-Warn app, using the platform developed by Google and Apple based on the decentralized DP-3T protocol, which operated on a voluntary basis and used Bluetooth Low Energy (BLE) to detect physical proximity between users, with 26.5 million downloads through March 2021 representing 44 percent of potential smartphone users. The implementation of additional features, such as the association of auxiliary apps, has encouraged their greater use by highlighting the importance of value-added services in increasing app usage [11] .
France with its TousAnti COVID app was the only country in Western Europe to implement a centralized approach to contact tracking built on the ROBUST protocol and ROBERT proximity tracking system to preserve user privacy.
In Italy, the Immuni app has been developed but has had very limited uptake. An online study conducted on a sample of 448 individuals during May-June 2020 investigated opinions regarding the willingness to vaccinate against COVID-19 and to use the app. The results indicated trust in politics and science and motivation, altruistic i.e., a willingness to protect not only oneself but especially others, as relevant factors.
A study conducted in Singapore showed a strong interest among the population in contributing to contact tracing efforts to combat the spread of COVID-19 through the use of the Trace Together app downloaded by more than 1.1 million users within a month of its launch, despite concerns about battery consumption and other technical limitations exhibited by the app.
Israel developed a GPS-based surveillance system that after one month of surveillance detected 36.8 percent of COVID-19 cases despite a false-positive detection rate of 5 percent. Interference with individual privacy rights in Israel, as in South Korea, was deemed necessary to address the health emergency caused by COVID-19.
Qatar implemented and enforced the app called “Ehteraz” (meaning “precaution” in Arabic). However, this app was found to have serious data security vulnerabilities.
Japan introduced the COCOA app in June 2020 based on Bluetooth technology with a decentralized architecture, promoted through various strategies including offering economic incentives such as issuing shopping tokens. An online survey found that the main factors related to the use of the app were community attachment, concern about health risks, and trust in government, factors whose weight varied according to the age of individuals. Older people showed greater sensitivity to health risks and were more likely to download the app to protect themselves and other community members. For middle-aged individuals, concerns about privacy and data security prevailed. For young people, an effective way to promote app adoption required developing a greater sense of community attachment, achievable using social media and other Web-based tools.
The textual examination conducted, using co-occurrence analysis and first-order cosine association index, with the keyword “app” showed strong links with the words “contact tracing”, “adoption”, and “participants” as shown in “Figure 9”, from this it follows that the success of a monitoring activity must rely on a large number of participants and ensure that the information provided through the apps is perceived as critical to safeguarding individual health.
4. Health Data Protection under the GDPR
Article 8 of the European Charter of Human Rights enshrines the right to respect for private life but recognizes that interference may be permitted if it is deemed necessary, proportionate and in accordance with the law. The General Data Protection Regulation (GDPR) protects principles such as purpose limitation, data minimization, and retention limitation, as well as guaranteeing the
Figure 9. Analysis of Co-occurrences word association index cosine word association “App”.
right of data subjects to exercise their rights. The GDPR states that the processing of data is lawful if it is necessary to protect the vital interest of the individual or for the performance of a task in the public interest.
As to health data, which are considered special under Article 9 of the GDPR, its processing is permitted only if necessary for reasons of substantial public interest, such as protecting against serious cross-border health threats or ensuring high standards of quality and safety of health care. Data anonymization is essential to respecting privacy, and the European Data Protection Board (EDPB) and the European Union Electronic Health Network have recommended a decentralized approach to contact record processing.
While the United States does not have a comprehensive federal data privacy law, it does have sector-specific regulations, such as the Health Insurance Portability and Accountability Act (HIPAA), which establishes rules for the protection of individual health information. Data anonymization under HIPAA can be done through Safe Harbor or Expert Determination. To facilitate the deployment of tracking apps, eight comprehensive principles have been agreed upon that should guide the drafting of health data privacy regulations, including voluntariness, transparency, and fairness. However, to ensure real respect for privacy, it would be necessary to irreversibly anonymize data, but specific operational guidance is lacking in the GDPR [76] [77] .
Finally, to overcome data anonymity challenges and foster clinical research, it may be useful to inform people about the limits of anonymity, create open, privacy, and mixed experimental zones, and improve the clarity of privacy notices to ensure accessible understanding for all.
5. Blockchain and Future Developments
Blockchain represents a key technology for ensuring security and data integrity in mobile applications for tracking COVID-19 contagions and managing health data in general [78] . Its main features and applications involve the ability to create a distributed and transparent ledger through which the Blockchain creates a blockchain containing transactions, shared among all nodes in the network. This ledger is transparent and visible to all participants, and once data is written into the Blockchain, it cannot be changed without the consent of the network. This technology allows the immutability of the data; in fact, each block is identified by a unique hash, ensuring the uniqueness of the information and its immodifiability (any attempt to modify it would be detected immediately). In addition, the consensus for entering transactions into the registry is distributed across all nodes in the network, eliminating the need for a central authority and increasing the security of the system.
The application of this methodology in the health sector would enable the management of data, collected through a secure mechanism for sharing and transparency of health data. Data can be encrypted and shared between network participants securely, protecting patient privacy. In terms of infection tracking from COVID-19, Blockchain-based applications can allow anonymous contact tracking, with users being notified if they have been near an infected person but without revealing that person’s identity, thus containing the spread of the virus quickly and effectively. Blockchain ensures the immunity of clinical data, enabling healthcare professionals to access information in real time and ensuring the availability of the entire patient history in a secure manner. In this regard, see the survey proposed in [77] regarding the evaluation of the perception of safety in giving consent to the use of data by the population, in the pandemic period of COVID-19. The online experiment, conducted on an international sample, aimed to highlight the importance of ensuring data security so that the population would be incentivized to use mobile applications for monitoring of infections in the pandemic period. To do this, the perceived level of personal data protection was analyzed, demonstrating what was theorized. The results of the experiment offer important insights for managers and policy makers to incentivize the use of mobile applications based on Blockchain technology to prevent and contain the spread of COVID-19 contagions.
In conclusion, the use of Blockchain in COVID-19 infection tracking applications and health data management provides a high level of security and transparency, thus increasing user confidence in using apps.
Blockchain mechanisms play a crucial role in data security, providing a tamper-proof environment for transactions and ensuring maximum protection of sensitive data, such as personal or medical information. The following are some key points regarding the use of Blockchain to ensure data security:
1) Transaction storage: the Blockchain stores all transactions processed in chronological order in a set of computers, ensuring the integrity and unmodifiability of the data.
2.) Combination of Blockchain: you can use different types of Blockchain, both permissionless and permissioned, to take advantage of both types and ensure greater security and flexibility.
3) Interplanetary File System (IPFS): Blockchain is often combined with IPFS to store content and smart contracts to govern, manage and provide traceability and visibility in data history. This combination ensures originality and authenticity of the data.
4) Decentralized consensus mechanisms: Blockchain systems use decentralized consensus mechanisms, such as Proof of Work (Pow), which requires the solution of a computationally difficult puzzle to prove the credibility of data. These mechanisms eliminate the need for a trusted third-party authority and increase confidence in the system.
5) Security, reliability, and data assurance: the combination of these mechanisms ensures a very high level of security, reliability and data assurance, which are essential especially in sensitive contexts such as the management of health information during the COVID-19 pandemic.
The COVID-19 pandemic has affected consumer priorities, subverting traditional scales of importance of needs. In a state of emergency like the present one, safety and health protection have assumed a predominant role, influencing the intentions of using technologies, such as applications based on Blockchain.
6. Conclusions
Tracing apps are a crucial tool for monitoring pandemic phenomena such as COVID-19, but their effectiveness depends on the rate of use by the population. The experience of the pandemic has shown an inherent reluctance in populations to offer personal information without security guarantees, especially in western countries. To be effective, tracing technologies must first be understood and accepted by most of the population.
Individuals might be willing to sacrifice their privacy only when there is a significant health emergency, but they would be reluctant to share sensitive data, especially in democratic and neoliberal societies. Privacy concerns are influenced by trust in government and the use of data for secondary purposes. Trust in apps is related to trust in government, while security is more related to information protection.
It is important for policy makers to consider the demographics of the population when assessing the propensity to give consent to data processing. Some countries have conducted analyses to understand app acceptance, while others have overlooked privacy concerns to address the health emergency.
Apps should be designed with a privacy-protecting design that actively engages users and reassures them about data use. Users appreciate the timely communication of apps, but they should separate clinical data from personal information and use decentralized architectures.
Implementing additional features, such as the vaccination certificate, could make the apps more attractive and promote their uptake. In addition, AI and machine learning could be useful for monitoring pandemics and identifying hotspots. Devices such as the Internet of Medical Things enable the use of technologies such as thermal imaging to monitor positive cases.
The use of blockchain technology, ensuring anonymization of data and thus privacy protection, could increase people’s perception of safety and make it easier for them to give consent for health data processing.
This paper is only a starting point; it is a timely and necessary review of the literature for subsequent developments on the very important topic of health data processing and privacy protection. It provides an opportunity to reflect and thus, in a broader perspective, to highlight future directions to be taken for the improvement of tracking applications, that is indispensable for the progress of society and the health care system itself.