<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article article-type="research-article" dtd-version="2.3" xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">
<front>
<?covid-19-tdm?>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Big Data</journal-id>
<journal-title>Frontiers in Big Data</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Big Data</abbrev-journal-title>
<issn pub-type="epub">2624-909X</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">701966</article-id>
<article-id pub-id-type="doi">10.3389/fdata.2021.701966</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Big Data</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>A Policy-Driven Approach to Secure Extraction of COVID-19 Data From Research Papers</article-title>
<alt-title alt-title-type="left-running-head">Elluri et&#x20;al.</alt-title>
<alt-title alt-title-type="right-running-head">Policy-Driven Approach to Secure COVID-19 Data</alt-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Elluri</surname>
<given-names>Lavanya</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<xref ref-type="corresp" rid="c001">&#x2a;</xref>
<uri xlink:href="https://loop.frontiersin.org/people/1065332/overview"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Piplai</surname>
<given-names>Aritran</given-names>
</name>
<xref ref-type="aff" rid="aff2">
<sup>2</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Kotal</surname>
<given-names>Anantaa</given-names>
</name>
<xref ref-type="aff" rid="aff2">
<sup>2</sup>
</xref>
<uri xlink:href="https://loop.frontiersin.org/people/1176693/overview"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Joshi</surname>
<given-names>Anupam</given-names>
</name>
<xref ref-type="aff" rid="aff2">
<sup>2</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Joshi</surname>
<given-names>Karuna Pande</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
</contrib>
</contrib-group>
<aff id="aff1">
<label>
<sup>1</sup>
</label>IS Department, University of Maryland Baltimore County, <addr-line>Baltimore</addr-line>, <addr-line>MD</addr-line>, <country>United&#x20;States</country>
</aff>
<aff id="aff2">
<label>
<sup>2</sup>
</label>CSEE Department, University of Maryland Baltimore County, <addr-line>Baltimore</addr-line>, <addr-line>MD</addr-line>, <country>United&#x20;States</country>
</aff>
<author-notes>
<fn fn-type="edited-by">
<p>
<bold>Edited by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/828760/overview">Chaminda Hewage</ext-link>, Cardiff Metropolitan University, United&#x20;Kingdom</p>
</fn>
<fn fn-type="edited-by">
<p>
<bold>Reviewed by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/560595/overview">Chittaranjan Hota</ext-link>, Birla Institute of Technology and Science, India</p>
<p>
<ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/1413835/overview">Elochukwu Ukwandu</ext-link>, Cardiff Metropolitan University, United&#x20;Kingdom</p>
</fn>
<corresp id="c001">&#x2a;Correspondence: Lavanya Elluri, <email>lelluri1@umbc.edu</email>
</corresp>
<fn fn-type="other">
<p>This article was submitted to Cybersecurity and Privacy, a section of the journal Frontiers in Big&#x20;Data</p>
</fn>
</author-notes>
<pub-date pub-type="epub">
<day>12</day>
<month>08</month>
<year>2021</year>
</pub-date>
<pub-date pub-type="collection">
<year>2021</year>
</pub-date>
<volume>4</volume>
<elocation-id>701966</elocation-id>
<history>
<date date-type="received">
<day>28</day>
<month>04</month>
<year>2021</year>
</date>
<date date-type="accepted">
<day>29</day>
<month>07</month>
<year>2021</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#xa9; 2021 Elluri, Piplai, Kotal, Joshi and Joshi.</copyright-statement>
<copyright-year>2021</copyright-year>
<copyright-holder>Elluri, Piplai, Kotal, Joshi and Joshi</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
<p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these&#x20;terms.</p>
</license>
</permissions>
<abstract>
<p>The entire scientific and academic community has been mobilized to gain a better understanding of the COVID-19 disease and its impact on humanity. Most research related to COVID-19 needs to analyze large amounts of data in very little time. This urgency has made Big Data Analysis, and related questions around the privacy and security of the data, an extremely important part of research in the COVID-19 era. The White House OSTP has, for example, released a large dataset of papers related to COVID research from which the research community can extract knowledge and information. We show an example system with a machine learning-based knowledge extractor which draws out key medical information from COVID-19 related academic research papers. We represent this knowledge in a Knowledge Graph that uses the Unified Medical Language System (UMLS). However, publicly available studies rely on dataset that might have sensitive data. Extracting information from academic papers can potentially leak sensitive data, and protecting the security and privacy of this data is equally important. In this paper, we address the key challenges around the privacy and security of such information extraction and analysis systems. Policy regulations like HIPAA have updated the guidelines to access data, specifically, data related to COVID-19, securely. In the US, healthcare providers must also comply with the Office of Civil Rights (OCR) rules to protect data integrity in matters like plasma donation, media access to health care data, telehealth communications, etc. Privacy policies are typically short and unstructured HTML or PDF documents. We have created a framework to extract relevant knowledge from the health centers&#x2019; policy documents and also represent these as a knowledge graph. Our framework helps to understand the extent to which individual provider policies comply with regulations and define access control policies that enforce the regulation rules on data in the knowledge graph extracted from COVID-related papers. Along with being compliant, privacy policies must also be transparent and easily understood by the clients. We analyze the relative readability of healthcare privacy policies and discuss the impact. In this paper, we develop a framework for access control decisions that uses policy compliance information to securely retrieve COVID data. We show how policy compliance information can be used to restrict access to COVID-19 data and information extracted from research papers.</p>
</abstract>
<kwd-group>
<kwd>COVID-19</kwd>
<kwd>knowledge graph</kwd>
<kwd>privacy</kwd>
<kwd>UMLS</kwd>
<kwd>HIPAA</kwd>
</kwd-group>
</article-meta>
</front>
<body>
<sec id="s1">
<title>Introduction</title>
<p>The COVID-19 pandemic is one of the most important global events in recent history. It has impacted human society at every level and created challenges to public health administration, medical research, and patient data management. It has led to significant cooperation among the scientific community to understand the disease and look for cures quickly. Medical researchers have released an unprecedented amount of data related to COVID-19 to understand the disease better. For example, the White House OSTP has released a large dataset of papers being published related to COVID research from which the research community is extracting knowledge and information. It is encouraging to see the global research community motivated to address the concerns of COVID-19. Data collection, data storage, data analytics, and data sharing are key to overcoming the pandemic. However, this wide sharing of potentially sensitive data raises critical questions about privacy and security. We address these concerns in this paper. We show a simple system that can extract useful information from papers, and express it in a knowledge graph (represented in OWL). We then show how policies controlling access can similarly be extracted from text descriptions and encoded in a knowledge graph to limit what can be done with the extracted knowledge.</p>
<p>Published articles on COVID-19 have a crucial role in our understanding of the pandemic. There are 23,000 &#x2b; unique published articles indexed on Web of Science and Scopus between 1 January and 30 June 2020 (<xref ref-type="bibr" rid="B4">da Silva et&#x20;al., 2020</xref>). Gathering relevant information from the large collection of published articles is a difficult task. It is time-intensive to read and investigate key published articles manually. As the global community is rushing to find a pandemic solution, we need a more efficient way to extract key information from COVID-19 related research papers. Approaches from Text Analysis and NLP are being developed to automatically read papers and extract key knowledge. In this paper, we show a prototype system with a machine learning-based knowledge extractor, which draws out key medical information from COVID-19 related academic research papers. We represent this knowledge in a Biomedical Knowledge Graph (BKG) that extends the information captured in the standard Unified Medical Language System (UMLS).</p>
<p>We used an established pipeline (<xref ref-type="bibr" rid="B28">Piplai et&#x20;al., 2020a</xref>; <xref ref-type="bibr" rid="B29">Piplai et&#x20;al., 2020b</xref>) for knowledge extraction, but retrained it for the medical domain, and populated the BKG that contains information from research papers on COVID-19. We used the UMLS (<xref ref-type="bibr" rid="B2">Bodenreider, 2004</xref>) to develop the knowledge graph schema. We also added necessary classes to define the sources for the data used in the medical experiments. This would help users search for the data sources that lead to the information present in the research paper and the BKG. They can also learn about the data collection methods&#x2019; privacy compliance from a related knowledge graph. This is described in greater detail in <italic>Extracting COVID-19 Knowledge From Published Research Paper</italic>.</p>
<p>While it is vital to share data related to COVID-19, including data about patients, treatments, and outcomes, it is necessary to ensure that this data is secured. Any analysis respect associated security and privacy policies. Ensuring security and privacy while processing done on shared data should the data and handling patient records has become a primary challenge, which we address in this paper. We propose a system to restrict access to controlled data. We use published paper and HIPAA regulation as an example to demonstrate the proposed framework. The Health Insurance Portability and Accountability Act (HIPAA) (<xref ref-type="bibr" rid="B9">for Disease Control, C., Prevention, 2003</xref>) regulates the security and privacy of the data retained by the healthcare providers in the US. It has provided specific guidance for COVID-19 data. All COVID-19 patient records in the United&#x20;States must comply with the new rules in HIPAA. The HIPAA COVID-19 privacy Rule (<xref ref-type="bibr" rid="B24">OCR, 2020a</xref>) provides guidelines to securely access personally identifiable information (PII) of patients who have been affected or exposed to COVID-19. The regulation also specifies the guidance for contacting former COVID-19 patients for plasma donation (<xref ref-type="bibr" rid="B26">OCR, 2020c</xref>). It further establishes the rules for disclosing personal health information to media (<xref ref-type="bibr" rid="B25">OCR, 2020b</xref>). The law also addresses remote telehealth communication-related questions for COVID-19 patients (<xref ref-type="bibr" rid="B16">Lee et&#x20;al., 2020</xref>). We would like to automatically ensure that any analysis done abides by these and other rules about sensitive medical&#x20;data.</p>
<p>We developed a knowledge graph (and an associated ontology) to define COVID-related privacy and security rules, such as those detailed in HIPAA. This ontology extends our previous work in creating a HIPAA ontology for automatically populating HIPAA rules to access patient records (<xref ref-type="bibr" rid="B13">Joshi et&#x20;al., 2016</xref>). It helps distinguish healthcare domain-specific privacy and security measures. Our previous HIPAA ontology identifies concepts specified in the regulation not related to COVID. By expanding this ontology combined with COVID rules and integrating the HIPAA and COVID compliance guidelines, data sources (e.g., healthcare providers) and data analysts can quickly check and enforce HIPAA and COVID privacy requirements. We describe the enhanced and updated ontology in <italic>Developing HIPAA Ontology</italic>. Health centers or organizations utilizing COVID-19 patient data can use this ontology to ensure their privacy policies have all the rules stated by HIPAA-COVID compliance. The semantically rich, machine-processable knowledge graph developed using our methodology captures all the rules stated in HIPAA and COVID guidance. It can also help identify missing rules in the&#x20;organization&#x2019;s privacy policy, which can then be added as needed.</p>
<p>Privacy Policies also need to be understandable to the average user. Clients/patients should not have to agree to rules and obligations that they do not fully understand. The privacy policy should be unambiguous and easy to read. In the previous work by our group <xref ref-type="bibr" rid="B15">Kotal et&#x20;al. (2020)</xref>, we studied trends in privacy policies of popular e-services and developed a metric to measure the vagueness in such policies. We used the same model to measure the textual quality of privacy policies for organizations that collect, store, and/or use patient data related to COVID. Along with the regulation compliance study, this can help organizations create privacy policies that are comprehensive and useful for the reader.</p>
<p>In <italic>Introduction</italic> we explained the motive for this work and in <italic>Related Work</italic> we talk about the background and related work in this area. In <italic>Framework to Securely Access COVID-19 Data</italic>, we describe our methodology of building the HIPAA COVID knowledge graph and detail the ontology we have developed using OWL. Also, In this section, we explain the NLP approaches took to obtain the rules and populate policy documents of various healthcare providers as instances of our ontology and present the results obtained from our validation. We end with the conclusions and future work in <italic>Conclusion and Future&#x20;Work</italic>.</p>
</sec>
<sec id="s2">
<title>Related Work</title>
<p>In this paper, we show proof of work of a pipeline that can extract key information from published papers and in doing that point out any privacy vulnerabilities in the data sharing process. In our pipeline, we parsed published articles on COVID-19 and privacy policy documents of healthcare centers. We extracted knowledge from documents in natural language and represented them in a machine-processable, semantic framework. The key techniques that help us in extracting and representing knowledge in published documents and policy articles are Named Entity Recognition (NER) and Knowledge Graph (KG). In this section, we discuss prior work related to these methods.</p>
<sec id="s2-1">
<title>Named Entity Recognition</title>
<p>In this section, we discuss how NER has been used previously for Information Extraction. Identifying information units like names, organization, location, time, date, etc. is critical to the task of information extraction. Named entity recognition can be broadly defined as the task of identifying references to these entities in the text (<xref ref-type="bibr" rid="B23">Nadeau and Sekine, 2007</xref>). NER has been used for the task of entity extraction in various domains including cybersecurity (<xref ref-type="bibr" rid="B5">Dasgupta et&#x20;al., 2020</xref>), law (<xref ref-type="bibr" rid="B6">Dozier et&#x20;al., 2010</xref>), biology (<xref ref-type="bibr" rid="B10">He and Kayaalp, 2008</xref>) etc. In our previous work in the cybersecurity domain (<xref ref-type="bibr" rid="B29">Piplai et&#x20;al., 2020b</xref>), we described a pipeline to represent and model CTI. We then used Stanford NER (<xref ref-type="bibr" rid="B18">Manning et&#x20;al., 2014</xref>)&#x20;and.</p>
<p>Regular Expressions to detect cyber-entities from the open-source text. Mittal et&#x20;al. in their paper (<xref ref-type="bibr" rid="B21">Mittal et&#x20;al., 2016</xref>), also used NER to automatically generate alerts from Twitter feeds relevant to cybersecurity. Stanford CoreNLP toolkit (<xref ref-type="bibr" rid="B18">Manning et&#x20;al., 2014</xref>) is an extensible pipeline that provides core NER analysis. This toolkit is widely used in research and commercial organizations for information and extraction. To use the Stanford CoreNLP toolkit in the medical domain we needed a structured medical vocabulary and a dataset of medical texts annotated within the vocabulary. The Unified Medical Language System (UMLS) (<xref ref-type="bibr" rid="B2">Bodenreider, 2004</xref>) is a repository of biomedical vocabularies developed by the US National Library of Medicine. The UMLS integrates over two million names for some 900,000 concepts from more than 60 families of biomedical vocabularies, as well as 12 million relations among these concepts. <xref ref-type="bibr" rid="B22">Mohan and Li (2019)</xref> developed a training dataset for biomedical entity extraction that uses UMLS as the target ontology.</p>
</sec>
<sec id="s2-2">
<title>Knowledge Graph</title>
<p>A knowledge graph is a set of semantic triples, which are pairs of &#x201c;entities&#x201d; with &#x201c;relationships&#x201d; between them. It is useful for feeding intelligent systems and agents with formalized knowledge of the world (<xref ref-type="bibr" rid="B27">Paulheim, 2017</xref>). Knowledge graphs can be refined to contain knowledge about a specific domain. In our prior work, we used Cybersecurity Knowledge Graphs (CKGs) to represent Cyber Threat Intelligence (CTI) (<xref ref-type="bibr" rid="B28">Piplai et&#x20;al., 2020a</xref>; <xref ref-type="bibr" rid="B29">Piplai et&#x20;al., 2020b</xref>; <xref ref-type="bibr" rid="B30">Piplai et&#x20;al., 2020c</xref>). To build a Knowledge Graph specific to a domain, we need to define the ontology schema and entity relations in the domain. In our prior work, we created an ontology schema to extract and represent knowledge in GDPR and PCI DSS (<xref ref-type="bibr" rid="B8">Elluri et&#x20;al., 2018</xref>), and cloud privacy policies (<xref ref-type="bibr" rid="B12">Joshi et&#x20;al., 2020</xref>). <xref ref-type="bibr" rid="B13">Joshi et&#x20;al. (2016)</xref>, <xref ref-type="bibr" rid="B14">Kim and Joshi (2021)</xref> also defined an ontology to extract knowledge from HIPAA regulation, before the COVID-19 updates. In our pipeline, we extended the pre-COVID HIPAA ontology to include regulations that were added to address COVID-19. We used this ontology schema to create a Knowledge Graph for regulation compliance of healthcare privacy policy. To extract knowledge from medical articles on COVID-19, we used the entity-relations described in&#x20;UMLS.</p>
</sec>
</sec>
<sec id="s3">
<title>Framework to Securely Access COVID-19 Data</title>
<p>We developed a framework that can extract key information from published papers, identify privacy vulnerabilities in the data sharing process and allow access to securely extracted COVID-19 data. There are three key steps to securely accessing COVID-19 information. They are as follows:<list list-type="simple">
<list-item>
<p>1. Extracting COVID-19 Knowledge from Published Research Paper.</p>
</list-item>
<list-item>
<p>2. Extracting Privacy Compliance Requirements from HIPAA COVID-19 and Organizational Privacy Policies.</p>
</list-item>
<list-item>
<p>3. Making Access Control decisions to securely retrieve COVID-19&#x20;data.</p>
</list-item>
</list>
</p>
<p>
<xref ref-type="fig" rid="F1">Figure&#x20;1</xref> shows the overall architecture of our framework. In the following section, we describe the details of our framework and demonstrate how it works. <italic>Extracting COVID-19 Knowledge From Published Research Paper</italic> describes our method to create a Biomedical Knowledge Graph (BKG) with COVID-19 data. In <italic>Extracting Privacy Compliance Requirements From HIPAA COVID-19 and Organizational Privacy Policies</italic>, we give a brief description of the stages to extract compliance information for organizational privacy policies. We describe each stage in details in <italic>Developing HIPAA Ontology</italic> and <italic>Generating Compliance Information</italic>. In <italic>Making Access Control Decisions to Securely Retrieve COVID-19 Data</italic>, we describe how we can use compliance information to make access control decisions and allow secure access to COVID-19&#x20;data.</p>
<fig id="F1" position="float">
<label>FIGURE 1</label>
<caption>
<p>Architecture flow to securely access COVID-19 data from published research&#x20;paper.</p>
</caption>
<graphic xlink:href="fdata-04-701966-g001.tif"/>
</fig>
<sec id="s3-1">
<title>Extracting COVID-19 Knowledge From Published Research Paper</title>
<p>We extracted knowledge from a published research paper on COVID-19 into a Biomedical Knowledge Graph (BKG). In this section, we discuss the different components of the knowledge extraction pipeline that lead to the generation of the BKG. We also show how this BKG can be queried to extract COVID-19 information. Representing unstructured data in the form of a knowledge graph helps to extract important information from the data and derive relationships between them. This also helps end-users to query the knowledge graph and retrieve information without having to go through the data manually. We mined information from unstructured research papers, written in natural language, about COVID-19. We presented the mined information in a knowledge graph (BKG) that has reasoning capabilities and also an interface to query the populated&#x20;BKG.</p>
<sec id="s3-1-1">
<title>BKG Schema</title>
<p>The schema of our BKG is based on the UMLS. This is a well-recognized ontology for the medical domain, as it defines classes for medical entities and the possible relationships that can exist between them that are accepted by the medical community. We extended the UMLS schema, by adding another class called &#x201c;Data Source&#x201d; that helps in representing the origin of various facts present in the research paper. We also added necessary relationships to support the addition of the aforementioned class. The information extraction pipeline is based on a knowledge extraction pipeline that members of our team developed for cybersecurity. It consists of a Named Entity Recognizer (NER) that classifies words or groups of words to a particular entity class present in our modified UMLS-based BKG. This results in a set of entities and their corresponding entity-class labels. The next stage is the relationship extractor, which takes pairs of entities that have credible relationships between them and produces a relationship label as an output. We then take our entity-relationship set and assert that into our BKG. <xref ref-type="fig" rid="F2">Figure&#x20;2</xref> describes the different components of our pipeline.</p>
<fig id="F2" position="float">
<label>FIGURE 2</label>
<caption>
<p>Architecture flow to extract knowledge from medical research articles.</p>
</caption>
<graphic xlink:href="fdata-04-701966-g002.tif"/>
</fig>
</sec>
<sec id="s3-1-2">
<title>NER and RelExt for BKG</title>
<p>In our prior works (<xref ref-type="bibr" rid="B5">Dasgupta et&#x20;al., 2020</xref>; <xref ref-type="bibr" rid="B29">Piplai et&#x20;al., 2020b</xref>), we have described different NER strategies in the domain of cybersecurity. We reuse the NER model that was described by <xref ref-type="bibr" rid="B29">Piplai et&#x20;al. (2020b)</xref>. This NER model is based on Stanford NER that used Conditional Random Fields and Gibbs&#x2019; sampling for the NER task. We used a public dataset (<xref ref-type="bibr" rid="B22">Mohan and Li, 2019</xref>) that consisted of 107819 words and was annotated with UMLS classes. We also annotated the training set with an additional class called the &#x201c;data source.&#x201d; A total of 124 UMLS classes (including the additional class &#x201c;Data source&#x201d;) were used in our BKG. We trained the model for 343 iterations and used the trained model on COVID-19 research papers to identify the key medical terms and expressions. At the end of this stage, we are left with a list of extracted entities and their corresponding entity types that includes the newly added&#x20;class.</p>
<p>The next stage of the pipeline is the Relationship Extractor. The relationship extractor takes pairs of entities and establishes a relationship between them. Since UMLS provides the entire schema for the ontology, we also have a list of possible relationships that can exist between pairs of entity classes. We used this to pre-process the candidate entity pairs that we provide as an input to the relationship extractor. We discarded pairs of entities that do not have a credible relationship between them according to our UMLS-based schema. We provide the rest as input to the Relationship Extractor. The Relationship extractor is a four layered neural network that takes the word2vec (<xref ref-type="bibr" rid="B19">Mikolov et&#x20;al., 2013</xref>) embeddings of the two candidate entities. We have a list of 46 relationships specified by our schema. We also have an additional class that signifies &#x201c;no relationship.&#x201d; The word2vec embedding has a dimension of 200. Two entities create a 400-dimensional input vector for the neural network. We then have three hidden layers of dimensions 200, 100, and 100 respectively. We have a final softmax layer that has 47 dimensions.</p>
</sec>
<sec id="s3-1-3">
<title>Querying BKG to Retrieve COVID-19 Information</title>
<p>At this stage, we have an entity-relationship set that not only captures the information present in COVID-19 research papers but also associates the source of knowledge for the facts present in those papers. We use RDFLib, a Python library, to dynamically create a knowledge graph instance and populate them. <xref ref-type="fig" rid="F3">Figure&#x20;3</xref> describes a subset of classes and relationships that can exist between them. The classes are represented by circles and the relationships are represented by directed lines signifying the &#x201c;domain&#x201d; and &#x201c;range&#x201d; of the relationship. The class &#x201c;Data Source&#x201d; that has been added by us to include additional information, is marked by a red circle. We can see that this Class &#x201c;indicates&#x201d; a &#x201c;Therapeutic or Preventive Measure.&#x201d; &#x201c;Data Source&#x201d; also has an additional relationship called &#x201c;data collected&#x201d; with a class called &#x201c;Research Activity.&#x201d;</p>
<fig id="F3" position="float">
<label>FIGURE 3</label>
<caption>
<p>A subset of our BKG schema as represented by the VOWL visualizer.</p>
</caption>
<graphic xlink:href="fdata-04-701966-g003.tif"/>
</fig>
<p>In <xref ref-type="fig" rid="F4">Figure&#x20;4</xref>, we can see a part of a BKG populated from an early research paper about COVID-19 (<xref ref-type="bibr" rid="B17">Malone et&#x20;al., 2021</xref>). The rectangles with yellow circles on them indicate classes, and the rectangles with purple rhombuses on them indicate the entities. The arrows are color-coded and they represent individual relationships that exist between pairs of entities. The blue arrows going upwards signify the relationship &#x201c;subclass of&#x201d; that exists between all classes and the superclass &#x201c;owl: Thing.&#x201d; We see a few bold lines that signify all the relationships that exist for the entity &#x201c;COVID-19.&#x201d; This entity has been identified as a &#x201c;Disease or Syndrome&#x201d; as is manifested by the bold purple line that connects the class with the entity. The dotted yellow arrows that exist between &#x201c;COVID-19&#x201d; and &#x201c;non-specific clinical signs,&#x201d; &#x201c;chest pain&#x201d; respectively indicate the relationship &#x201c;diagnoses.&#x201d; The bold grey dotted lines that exist between &#x201c;COVID-19&#x201d; and &#x201c;remdesevir,&#x201d; &#x201c;hydroxychloroquine,&#x201d; &#x201c;famotidine&#x201d; respectively signify the relationship &#x201c;treats.&#x201d;</p>
<fig id="F4" position="float">
<label>FIGURE 4</label>
<caption>
<p>A populated BKG from one research paper about COVID-19. We discuss this graph in <italic>Querying BKG to Retrieve COVID-19 Information</italic>.</p>
</caption>
<graphic xlink:href="fdata-04-701966-g004.tif"/>
</fig>
<p>Next, we demonstrate some of the reasoning capabilities of the BKG with the help of some SPARQL queries. For example, if we are interested to know what data source indicates a therapeutic or preventive procedure, we run the following query. The query translates to &#x201c;Find all pairs of entities such that one of them is a Data Source, the other is a Therapeutic or Preventive Procedure, and the Data Source indicates the Procedure.&#x201d; The variables &#x201c;x&#x201d; and &#x201c;y&#x201d; indicate the particular entities we are interested in retrieving from the BKG. All the entity types and property or relationship types have a prefix &#x201c;BKG:&#x201d; added with them to show that they belong to the BKG. The first line says that we have to look for the entities &#x201c;x&#x201d; that belong to the class &#x201c;Data Source.&#x201d; The second line says that we have to look for another set of entities &#x201c;y&#x201d; that belong to the class Therapeutic or Preventive Procedure. The last line says that &#x201c;x&#x201d; (Data Source) should indicate &#x201c;y&#x201d; (Therapeutic or Preventive Procedure).</p>
<p>SELECT ?x ?y WHERE&#x20;{</p>
<p>?x a BKG:Data_Source.</p>
<p>?y a BKG:Therapeutic_or_Preventive_Procedure.</p>
<p>?x BKG:indicates.</p>
<p>?y.}</p>
<p>The above query returns the value &#x201c;anecdotal report&#x201d; indicates &#x201c;famotidine treatment.&#x201d;</p>
</sec>
</sec>
<sec id="s3-2">
<title>Extracting Privacy Compliance Requirements From HIPAA COVID-19 and Organizational Privacy Policies</title>
<p>In <italic>Extracting COVID-19 Knowledge From Published Research Paper</italic>, we show how to extract research data related to COVID-19, including data about patients, treatments, and outcomes. It is also necessary to ensure that this data is secured. Any analysis done on shared data should respect associated security and privacy policies. Ensuring security and privacy while processing the data and handling patient records has become a primary challenge. In this section, we describe a framework to extract policy compliance information on data organizations that share and handle COVID-19 data. The policy compliance information is used in association with COVID-19 information extracted from a published research paper in <italic>Making Access Control Decisions to Securely Retrieve COVID-19 Data</italic> to make access control decisions on COVID-19 data. The policy compliance information comes both from individual organization&#x2019;s privacy policies and HIPAA regulations for COVID-19. We extract knowledge from the organizational privacy policies and HIPAA COVID-19 regulations into a HIPAA ontology. This knowledge graph can be queried to retrieve policy assertions. This, along with other compliance and integrity checks, is used to generate compliance information related to accessing COVID-19 information. The compliance information is eventually used to securely access COVID-19&#x20;data.</p>
</sec>
<sec id="s3-3">
<title>Developing HIPAA Ontology</title>
<p>The first step to gathering compliance information is representing policy rules in HIPAA COVID-19 regulation and organizational privacy policy in a knowledge graph. We developed an ontology for the knowledge graph (HIPAA Ontology) to populate extracted policy compliance information from both sources. We describe our method in detail in the following sections.</p>
<sec id="s3-3-1">
<title>Key Term Extraction</title>
<p>The first stage in developing the HIPAA ontology was to extract key terms from the HIPAA COVID-19 regulation document. In this preprocessing stage, we extracted the rules from the HIPAA document that address COVID-19 regulations. The rules were then analyzed in a bag-of-words model. We removed stop words from the list of words. We also removed certain words like &#x201c;could,&#x201d; &#x201c;shall,&#x201d; &#x201c;must,&#x201d; &#x201c;will,&#x201d; &#x201c;should,&#x201d; &#x201c;can.&#x201d; These modal words were used to extract rules represented in deontic logic from the organizational privacy policies. This is described further in <italic>Extracting Rules From Organizational Privacy Policies</italic>. From the remaining list of words in the HIPAA COVID-19 regulation, we collected the most frequently occurring terms. This list of words is the key terms in the HIPAA repository related to COVID-19. In <xref ref-type="table" rid="T1">Table&#x20;1</xref> we have listed the top key terms that were extracted from the HIPAA repository related to COVID-19 along with their cumulative frequency. These key terms helped us in generating the HIPAA ontology schema. This is further described in <italic>HIPAA Ontology Schema</italic>. These key terms helped us in checking compliance with organizational privacy policies. This is further discussed in <italic>Frequency of HIPAA Key Terms</italic>.</p>
<table-wrap id="T1" position="float">
<label>TABLE 1</label>
<caption>
<p>Key terms from COVID-19 guidance rules from HIPAA.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left">Keyword</th>
<th align="center">Frequency</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">HIPAA</td>
<td align="char" char=".">56</td>
</tr>
<tr>
<td align="left">COVID-19</td>
<td align="char" char=".">52</td>
</tr>
<tr>
<td align="left">PHI</td>
<td align="char" char=".">41</td>
</tr>
<tr>
<td align="left">Public</td>
<td align="char" char=".">39</td>
</tr>
<tr>
<td align="left">Telehealth</td>
<td align="char" char=".">36</td>
</tr>
<tr>
<td align="left">Provider</td>
<td align="char" char=".">25</td>
</tr>
<tr>
<td align="left">Communication</td>
<td align="char" char=".">21</td>
</tr>
<tr>
<td align="left">Notification</td>
<td align="char" char=".">20</td>
</tr>
<tr>
<td align="left">Individual</td>
<td align="char" char=".">19</td>
</tr>
<tr>
<td align="left">Privacy</td>
<td align="char" char=".">16</td>
</tr>
<tr>
<td align="left">Treatment</td>
<td align="char" char=".">16</td>
</tr>
<tr>
<td align="left">Authorization</td>
<td align="char" char=".">16</td>
</tr>
<tr>
<td align="left">Remote</td>
<td align="char" char=".">15</td>
</tr>
<tr>
<td align="left">Disclose</td>
<td align="char" char=".">14</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="s3-3-2">
<title>HIPAA Ontology Schema</title>
<p>Our prior work (<xref ref-type="bibr" rid="B13">Joshi et&#x20;al., 2016</xref>) described a semantically rich knowledge graph with HIPAA rules before COVID-19. In this paper, we expanded the ontology to include COVID-19 updates in the HIPAA regulation. We utilized the key terms extracted in <italic>Key Term Extraction</italic> to define the classes in HIPAA ontology, unlike the manual process we used earlier. These are the key terms in the HIPAA repository related to COVID-19. To build the HIPAA knowledge graph, we used the Prote&#xb4;ge&#xb4; semantic web tool (<xref ref-type="bibr" rid="B31">Protege, 2020</xref>). The high-level illustration of entities and relations in the HIPAA Ontology is shown in <xref ref-type="fig" rid="F5">Figure&#x20;5</xref>. The primary classes in the HIPAA ontology are as follows:<list list-type="simple">
<list-item>
<p>&#x2022; The HIPAA Stakeholder class is the main class that signifies the key healthcare providers or organizations who deal with patients&#x2019; data and are affected by the health regulations. This class has three main subclasses. These are the Business Associates, Exempt Entities, and Covered Entities. The word &#x201c;has&#x201d; means that these are the subclasses associated with a parent class. Each class is disjoint with other classes, which indicates that an individual cannot be an instance of more than one of these three classes.</p>
</list-item>
<list-item>
<p>&#x2022; The HIPAA Regulation is the top class to describe the regulation and its purpose. Health care providers have requirements that they have to adhere to HIPAA. Business Associates, Exempt Entities, and Covered Entities classes will have a relationship with this class using the object property has Regulation.</p>
</list-item>
<list-item>
<p>&#x2022; The HIPAA Covid Rule class represents the rules that apply to health care providers that deal or access COVID-19 patients&#x2019; data. This class has a relationship with the HIPAA Regulation parent class using the object property has CovidRule. This class has four subclasses Media Access, Contacting Covid-19 patients, PHI to Law Enforcement, and Telehealth indicate various guidance under COVID&#x20;rules.</p>
</list-item>
<list-item>
<p>&#x2022; The HIPAA Privacy and Security Rule classes represent privacy and security while accessing health-related data by organizations. We identified these classes and subclasses as part of our previous work (<xref ref-type="bibr" rid="B13">Joshi et&#x20;al., 2016</xref>). Using the object properties has PrivacyRule and has SecurityRule they are associated with the HIPAA Regulation class. Both privacy and security classes have a total of seven classes to describe the associated&#x20;rules.</p>
</list-item>
</list>
</p>
<fig id="F5" position="float">
<label>FIGURE 5</label>
<caption>
<p>HIPAA regulation knowledge&#x20;grpah.</p>
</caption>
<graphic xlink:href="fdata-04-701966-g005.tif"/>
</fig>
</sec>
<sec id="s3-3-3">
<title>Extracting Rules From Organizational Privacy Policies</title>
<p>In the next step, we populated the HIPAA ontology with rules from organizational privacy policies. In the organizational privacy policy documents, the policy rules are structured as Deontic Logic Statements (DLS). DLS is defined as the statements in a document that express an idea of permission, obligation, dispensation, or prohibition. We utilized DLS to extract policy rules as semi-formal statements from natural language texts. To populate our HIPAA ontology with organizational privacy policy rules, we extracted DLS from the privacy policies that convey a sense of permission or obligation. <xref ref-type="fig" rid="F6">Figure&#x20;6</xref> illustrates the relative frequency of the categories of DLS in the HIPAA COVID guidance.</p>
<fig id="F6" position="float">
<label>FIGURE 6</label>
<caption>
<p>Modal verbs distribution in HIPAA-COVID repository.</p>
</caption>
<graphic xlink:href="fdata-04-701966-g006.tif"/>
</fig>
<p>We utilized modal verbs like &#x201c;will,&#x201d; &#x201c;can,&#x201d; &#x201c;could,&#x201d; &#x201c;should,&#x201d; &#x201c;must&#x201d; etc for DLS extraction. These modal verbs helped us in identifying DLS in natural language text. They were also used to categorize DLS as permission or an obligation. Sentences with modal verbs like &#x201c;will,&#x201d; &#x201c;can,&#x201d; &#x201c;could&#x201d; are categorized as permissions. Sentences with verbs like &#x201c;must,&#x201d; &#x201c;should&#x201d; are categorized as obligations. The following are examples of DLS in each category extracted from the HIPAA COVID guidance:<list list-type="simple">
<list-item>
<p>&#x2022; Permission: &#x201c;A covered entity may disclose PHI to a first responder who may have been exposed to COVID-19, or may otherwise be at risk of contracting or spreading COVID-19, if the covered entity is authorized by law, such as state law, to notify persons as necessary in the conduct of a public health intervention or investigation&#x201d; (<xref ref-type="bibr" rid="B11">HHS, 2020</xref>).</p>
</list-item>
<list-item>
<p>
<italic>&#x2022;</italic> Obligation: &#x201c;The covered entity must make reasonable efforts to limit the use or disclosure of PHI to the minimum necessary to accomplish the intended purpose of the use or disclosure&#x201d; (<xref ref-type="bibr" rid="B11">HHS, 2020</xref>).</p>
</list-item>
</list>
</p>
<p>We extracted the permissions and obligations from organizational privacy policies. The extracted permissions and obligations determined how the HIPAA COVID-19 rules apply to healthcare provider privacy or organizational privacy policies dealing with COVID patient data. The extracted rules were populated into the HIPAA ontology.</p>
</sec>
</sec>
<sec id="s3-4">
<title>Generating Compliance Information</title>
<p>The HIPAA ontology was developed and populated with policy rules from HIPAA COVID-19 guidance and organization privacy policies. This ontology can be used to retrieve compliance information about organizations dealing with COVID-19 data. In this section, we describe how to retrieve critical compliance information from privacy policies and regulations. In our framework, we used four processes to retrieve the compliance information. We demonstrate these processes by including the results of these evaluations on 10 privacy policies of organizations or health centers dealing with COVID-19&#x20;data.</p>
<sec id="s3-4-1">
<title>Policy Assertion</title>
<p>In this section, we demonstrate the reasoning capabilities of the&#x20;HIPAA ontology and show how it can be used to make policy assertions. We queried the HIPAA ontology using the SPARQL queries (<xref ref-type="bibr" rid="B32">Sirin and Parsia, 2007</xref>). <xref ref-type="fig" rid="F7">Figure&#x20;7</xref> demonstrates the query results to check for the HIPAA COVID rules followed&#x20;by a specific organization. Rules that are missing in the privacy policy are shown as N/A. <xref ref-type="fig" rid="F8">Figure&#x20;8</xref> shows the query results to check all the rules in HIPAA regulation related to COVID-19. Organizations can query the HIPAA ontology to quickly check any rules in HIPAA. Based on this analysis, they can reexamine their policies to address HIPAA guidelines. This automated approach can alert the providers in case of any possible compliance violation. In our framework, we used the policy assertion to check if the organizational policies are compliant with HIPAA regulations. This information forms a part of the compliance information of an organization&#x2019;s privacy policy.</p>
<fig id="F7" position="float">
<label>FIGURE 7</label>
<caption>
<p>SPARQL query to check for HIPAA COVID rules of privacy policies.</p>
</caption>
<graphic xlink:href="fdata-04-701966-g007.tif"/>
</fig>
<fig id="F8" position="float">
<label>FIGURE 8</label>
<caption>
<p>SPARQL query to check for HIPAA COVID&#x20;rules.</p>
</caption>
<graphic xlink:href="fdata-04-701966-g008.tif"/>
</fig>
</sec>
<sec id="s3-4-2">
<title>Frequency of HIPAA Key Terms</title>
<p>As mentioned in <italic>Key Term Extraction</italic>, the key terms are the most important words in the HIPAA COVID-19 regulation. These key terms and words associated with it have to be addressed in an organization&#x2019;s privacy policy. The frequency of these terms or words related to them is an important indication of a policy&#x2019;s compliance with the HIPAA guidelines. In our framework, we evaluate the frequency of HIPAA key terms and related terms in a privacy policy. We used the vector representation of key terms to identify semantically similar terms in a document. We illustrate this process by evaluating the frequency of HIPAA key terms in the privacy policies of 10 organizations and health centers that deal with COVID-19 data. <xref ref-type="fig" rid="F9">Figure&#x20;9</xref> shows the frequency of semantically similar HIPAA Key Terms in 10 organizational privacy policies. The higher frequency of HIPAA key terms or semantically similar words in an organization&#x2019;s privacy policies indicates that the privacy policy is more compliant with the HIPAA regulation. This is one of the compliance information included in our framework.</p>
<fig id="F9" position="float">
<label>FIGURE 9</label>
<caption>
<p>Frequency of semantically similar HIPAA key terms in organizational privacy policies.</p>
</caption>
<graphic xlink:href="fdata-04-701966-g009.tif"/>
</fig>
</sec>
<sec id="s3-4-3">
<title>Determining Similarity Score</title>
<p>Along with HIPAA key terms, the semantic similarity between the organizational privacy policy and HIPAA regulation is indicative of compliance (<xref ref-type="bibr" rid="B7">Elluri et&#x20;al., 2020</xref>). In our framework, we evaluated the semantic similarity between organizational privacy policies and HIPAA regulation. We included the result of this analysis in the compliance information for the privacy policy. To demonstrate this process, we determined the semantic similarity scores for 10 corpora of ten health provider privacy policies that use COVID-19 data. To measure the semantic similarity, we used the vector representation of the documents in the Doc2Vec model. The similarity score was evaluated in radians. A lower similarity score means that the document is semantically closer to HIPAA and thus more in compliance with the regulation. The results of our analysis for 10 health center privacy policies are illustrated in <xref ref-type="fig" rid="F10">Figure&#x20;10</xref>. The similarity scores for these 10 documents are in the range of 0.44&#x2013;0.67. Organizations like the National Council and World Privacy Forum are more in compliance with the regulation. Five out of ten organizations have an average score of 0.5. An interesting fact is that none of the organizations have higher scores above 0.67, which is essential for health care providers.</p>
<fig id="F10" position="float">
<label>FIGURE 10</label>
<caption>
<p>Semantic similarity scores for privacy polices vs. HIPAA COVID&#x20;rules.</p>
</caption>
<graphic xlink:href="fdata-04-701966-g010.tif"/>
</fig>
</sec>
<sec id="s3-4-4">
<title>Assigning Vagueness Score for Privacy Policies</title>
<p>The fourth and final evaluation in our framework to extract compliance information for privacy policies is evaluating the vagueness score for privacy policies. Vagueness or lack of clarity in a text makes it difficult to interpret a text accurately. There are aspects of natural language that allow sentences to be grammatically sound but still unclear in their meaning. If a statement has multiple interpretations and there is no clarification towards the intended meaning, the statement is considered vague. For organizational privacy policies, vagueness contributes to a lack of clarity. This leaves room for misinterpretation. In our framework, we include information about the vagueness of a privacy policy in the compliance information. This helps in deciding the degree to which information extracted from an organizational privacy policy can be trusted.</p>
<p>We analyzed how words and sentence construction choices in English affect the vagueness in statements. We identified three linguistic markers that contribute to vagueness in privacy policy documents. These are:<list list-type="simple">
<list-item>
<p>1. Ambiguous Words: Ambiguous words are the words whose meaning is not clear from the given context. We used the lexical database Wordnet (<xref ref-type="bibr" rid="B20">Miller, 1998</xref>) to identify ambiguous words. In Wordnet, words are grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept. If a word is associated with more than one synset, we conclude that it is ambiguous.</p>
</list-item>
<list-item>
<p>2. Vague Words: There are certain words in the English language that are inherently vague. <xref ref-type="table" rid="T2">Table&#x20;2</xref> provides the taxonomy of vague terms that we used in our&#x20;model.</p>
</list-item>
<list-item>
<p>3. Reading complexity: The average reading skill of US adults is believed to be at about the 8th-grade level. CalOPPA recommends that privacy policies &#x201c;be written in clear and concise language, be written at no greater than an 8th-grade reading level.&#x201d; Overall reading complexity is thus an important measure for lack of clarity in documents. The Dale&#x2013;Chall readability formula is a readability test that measures the comprehension difficulty that readers face when reading a&#x20;text.</p>
</list-item>
</list>
</p>
<table-wrap id="T2" position="float">
<label>TABLE 2</label>
<caption>
<p>Categories of vague&#x20;terms.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th colspan="2" align="left">Vague terms</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td rowspan="2" align="left">Modal verbs</td>
<td align="left">&#x201c;may,&#x201d; &#x201c;might,&#x201d; &#x201c;can&#x201d;</td>
</tr>
<tr>
<td align="left">&#x201c;could,&#x201d; &#x201c;would,&#x201d; &#x201c;likely,&#x201d; &#x201c;possible,&#x201d; &#x201c;possibly&#x201d;</td>
</tr>
<tr>
<td rowspan="3" align="left">Conditional terms</td>
<td valign="top" align="left">&#x201c;depending,&#x201d; &#x201c;necessary,&#x201d; &#x201c;appropriate&#x201d;</td>
</tr>
<tr>
<td align="left">&#x201c;inappropriate,&#x201d; &#x201c;as needed&#x201d;</td>
</tr>
<tr>
<td align="left">&#x201c;as applicable,&#x201d; &#x201c;otherwise reasonably,&#x201d; &#x201c;sometimes,&#x201d; &#x201c;from time to time&#x201d;</td>
</tr>
<tr>
<td rowspan="2" align="left">Generalization terms</td>
<td valign="top" align="left">&#x201c;generally,&#x201d; &#x201c;mostly,&#x201d; &#x201c;widely&#x201d;</td>
</tr>
<tr>
<td align="left">&#x201c;general,&#x201d; &#x201c;commonly,&#x201d; &#x201c;usually,&#x201d; &#x201c;normally,&#x201d; &#x201c;typically,&#x201d; &#x201c;largely,&#x201d; &#x201c;often,&#x201d; &#x201c;primarily,&#x201d; &#x201c;among other things&#x201d;</td>
</tr>
<tr>
<td rowspan="3" align="left">Generalizing numeric terms</td>
<td valign="top" align="left">&#x201c;anyone,&#x201d; &#x201c;certain,&#x201d; &#x201c;everyone&#x201d;</td>
</tr>
<tr>
<td align="left">&#x201c;numerous,&#x201d; &#x201c;some,&#x201d; &#x201c;most&#x201d;</td>
</tr>
<tr>
<td align="left">&#x201c;few,&#x201d; &#x201c;much,&#x201d; &#x201c;many,&#x201d; &#x201c;various,&#x201d; &#x201c;including but not limited to&#x201d;</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>Using the measures described above, we evaluate the aggregated score of vagueness for a privacy policy. We rescaled the assigned score to the ranges of (1 3). A higher score indicated a privacy policy that is complex and hard to read. A lower score indicated a privacy policy that is relatively easy to read. We used this model to analyze vagueness in privacy policy texts of 10 organizations that collect, store and/or use patients&#x2019; data related to COVID-19. The results from our experimental evaluation are provided in <xref ref-type="table" rid="T3">Table&#x20;3</xref>.</p>
<table-wrap id="T3" position="float">
<label>TABLE 3</label>
<caption>
<p>Score of vagueness for organization with COVID-19&#x20;data.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left">Organization</th>
<th align="center">Ambiguous words</th>
<th align="center">Vague terms</th>
<th align="center">Reading complexity</th>
<th align="center">Score of vagueness</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">Akin gump</td>
<td align="char" char=".">0.657</td>
<td align="char" char=".">0.167</td>
<td align="char" char=".">0.98</td>
<td align="char" char=".">1.804</td>
</tr>
<tr>
<td align="left">Brown and brown insurance</td>
<td align="char" char=".">0.602</td>
<td align="char" char=".">0.001</td>
<td align="char" char=".">0.129</td>
<td align="char" char=".">0.732</td>
</tr>
<tr>
<td align="left">Department of education</td>
<td align="char" char=".">0.667</td>
<td align="char" char=".">0.219</td>
<td align="char" char=".">0.927</td>
<td align="char" char=".">1.813</td>
</tr>
<tr>
<td align="left">National council</td>
<td align="char" char=".">0.689</td>
<td align="char" char=".">0.192</td>
<td align="char" char=".">0.471</td>
<td align="char" char=".">1.352</td>
</tr>
<tr>
<td align="left">Faegre drinker</td>
<td align="char" char=".">0.629</td>
<td align="char" char=".">0.078</td>
<td align="char" char=".">0.889</td>
<td align="char" char=".">1.596</td>
</tr>
<tr>
<td align="left">Today&#x2019; wound clinic</td>
<td align="char" char=".">0.669</td>
<td align="char" char=".">0.212</td>
<td align="char" char=".">1.481</td>
<td align="char" char=".">2.362</td>
</tr>
<tr>
<td align="left">Mercer</td>
<td align="char" char=".">0.638</td>
<td align="char" char=".">0.156</td>
<td align="char" char=".">0.567</td>
<td align="char" char=".">1.361</td>
</tr>
<tr>
<td align="left">The network for public health law</td>
<td align="char" char=".">0.653</td>
<td align="char" char=".">0.096</td>
<td align="char" char=".">0.064</td>
<td align="char" char=".">0.813</td>
</tr>
<tr>
<td align="left">Simpson thatcher</td>
<td align="char" char=".">0.657</td>
<td align="char" char=".">0.153</td>
<td align="char" char=".">0.645</td>
<td align="char" char=".">1.455</td>
</tr>
<tr>
<td align="left">World privacy forum</td>
<td align="char" char=".">0.674</td>
<td align="char" char=".">0.225</td>
<td align="char" char=".">1.1278</td>
<td align="char" char=".">2.0268</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
</sec>
<sec id="s3-5">
<title>Making Access Control Decisions to Securely Retrieve COVID-19 Data</title>
<p>In <italic>Extracting COVID-19 Knowledge From Published Research Paper</italic>, we extracted medical information from COVID-19 research paper. In <italic>Generating Compliance Information</italic>, we extracted compliance information on organizations that collect, share and/or access COVID-19 data. The COVID-19 research paperrely on data sources that adhere to policy regulations like HIPAA. While the research papers are publicly available, extracting data from it and feeding it to a larger KG could potentially leak information that needs to be protected. As recent studies (Vadiya et&#x20;al.) show, inference attacks can infer data from de-identified sources. This is even more critical for a KG that extracts data from multiple published papers and has reasoning capabilities. The KG can be exploited for an inference attack even though data in individual papers was anonymized. In this section, we show how to use compliance information to restrict access to COVID-19 data and information. Controlling access to information or data is necessary to maintain the integrity and security of data. <xref ref-type="fig" rid="F11">Figure&#x20;11</xref> gives the overview of the framework to securely access COVID-19 data from BKG and HIPAA ontology.</p>
<fig id="F11" position="float">
<label>FIGURE 11</label>
<caption>
<p>Framework to generate securely accessible COVID-19 data from BKG and HIPAA ontology.</p>
</caption>
<graphic xlink:href="fdata-04-701966-g011.tif"/>
</fig>
<p>In <italic>Querying BKG to Retrieve COVID-19 Information</italic>, we show&#x20;how the BKG can be queried using SPARQL to retrieve COVID-19 information from published works. We also give an&#x20;example of a SPARQL query that can output the data source given a piece of specific information. This data source is one of the organizations that share COVID-19 data. Hence, the&#x20;corresponding privacy policy for the organization was populated into the HIPAA ontology, <italic>Extracting Rules From Organizational Privacy Policies</italic>, and verified against the HIPAA regulations. The compliance information for the data organization was used to make access control decisions on the COVID-19 information extracted from published&#x20;works.</p>
<p>We explain with an example the details of the framework. Consider the published work on &#x201c;Acute Heart Failure in Multisystem Inflammatory Syndrome in Children in the Context of Global SARS-CoV-2 Pandemic&#x201d; (<xref ref-type="bibr" rid="B1">Belhadjer et&#x20;al., 2020</xref>). In our framework, the data source for the information extracted from this paper was linked to CDC (<xref ref-type="bibr" rid="B3">Covid et&#x20;al., 2020</xref>). We populated the CDC privacy policy<xref ref-type="fn" rid="fn1">
<sup>1</sup>
</xref> onto the HIPAA ontology. One of the rules that the HIPAA ontology retrieved from the CDC privacy policy was, &#x201c;We do not use or&#x20;share your information for commercial purposes and, except&#x20;as&#x20;described above, we do not exchange or otherwise disclose this information<xref ref-type="fn" rid="fn2">
<sup>2</sup>
</xref>.&#x201d; To provide access control, we also&#x20;queried the users&#x2019; role. The access control stage queries,&#x20;&#x201c;Is the user a commercial agent?&#x201d; and &#x201c;Does the user intend to share this information for commercial purposes?&#x201d; If the&#x20;answer to both questions are &#x201c;No,&#x201d; the user is allowed access. Else, the user is denied access as they violate the privacy policies for the data. By this framework, we ensure that COVID-19 information is only accessible to agencies who have the right/permission.</p>
</sec>
</sec>
<sec id="s4">
<title>Conclusion and Future Work</title>
<p>Health regulations keep regularly updating, so providers have to update their privacy policies to address the latest rules if they are using patients&#x2019; data. Throughout the globe, providers update their privacy policies to demonstrate their commitment to HIPAA compliance and announce a modified edition of their policies by including the context related to the latest regulation rules. Privacy policies are short text and are available in textual format. Therefore, it requires a significant amount of human labor and intervention to ensure compliance with the updated regulation context rules. We anticipate that a semantically rich, machine-processable knowledge graph that captures health provider privacy policies dealing with COVID-19 data will substantially help automate their approach and keep it up to date with any new announcement.</p>
<p>In this paper, we extracted medical information from COVID-19 research papers. We used semantically similar keyword search&#x20;and text mining to extract compliance information on organizations that collect, share and/or access COVID-19 data.&#x20;We also showed how to use compliance information to restrict access to COVID-19 data and information. Controlling access to information or data is necessary to maintain the integrity and security of data. In future work, we aim to extract the topics related to short text using deep learning methods and classify them with the topics extracted from the HIPAA regulation.</p>
</sec>
</body>
<back>
<sec id="s5">
<title>Data Availability Statement</title>
<p>The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author.</p>
</sec>
<sec id="s6">
<title>Author Contributions</title>
<p>LE collected the privacy policies and HIPPA related data and&#x20;developed a Policy Compliance framework. AP developed the method to extract medical information from COVID-19 data.&#x20;AK developed the integrated framework to securely access COVID-19 data that satisfies policy compliance. AJ and&#x20;KJ were the principal investigators of the group that&#x20;collected data and developed the novel model presented in the paper.</p>
</sec>
<sec sec-type="COI-statement" id="s7">
<title>Conflict of Interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec id="s8" sec-type="disclaimer">
<title>Publisher&#x2019;s Note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
<fn-group>
<fn id="fn1">
<label>1</label>
<p>
<ext-link ext-link-type="uri" xlink:href="%20https://www.cdc.gov/other/privacy.html">https://www.cdc.gov/other/privacy.html</ext-link>
</p>
</fn>
<fn id="fn2">
<label>2</label>
<p>
<ext-link ext-link-type="uri" xlink:href="%20https://www.cdc.gov/other/privacy.html">https://www.cdc.gov/other/privacy.html&#x23;Sharing</ext-link>
</p>
</fn>
</fn-group>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Belhadjer</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>M&#xe9;ot</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Bajolle</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Khraiche</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Legendre</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Abakka</surname>
<given-names>S.</given-names>
</name>
<etal/>
</person-group> (<year>2020</year>). <article-title>Acute Heart Failure in Multisystem Inflammatory Syndrome in Children in the Context of Global Sars-Cov-2 Pandemic</article-title>. <source>Circulation.</source> <volume>142</volume>, <fpage>429</fpage>&#x2013;<lpage>436</lpage>. <pub-id pub-id-type="doi">10.1161/circulationaha.120.048360</pub-id> </citation>
</ref>
<ref id="B2">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bodenreider</surname>
<given-names>O.</given-names>
</name>
</person-group> (<year>2004</year>). <article-title>The Unified Medical Language System (Umls): Integrating Biomedical Terminology</article-title>. <source>Nucleic Acids Res.</source> <volume>32</volume>, <fpage>D267</fpage>&#x2013;<lpage>D270</lpage>. <pub-id pub-id-type="doi">10.1142/9789812702456_0008</pub-id> </citation>
</ref>
<ref id="B3">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Covid</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Team</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Covid</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Team</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Covid</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Team</surname>
<given-names>R.</given-names>
</name>
<etal/>
</person-group> (<year>2020</year>). <article-title>Coronavirus Disease 2019 in children&#x2014;United&#x20;States, February 12&#x2013;April 2, 2020</article-title>. <source>Morbidity Mortality Weekly Rep.</source> <volume>69</volume>, <fpage>422</fpage>. </citation>
</ref>
<ref id="B4">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>da Silva</surname>
<given-names>J.&#x20;A. T.</given-names>
</name>
<name>
<surname>Tsigaris</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Erfanmanesh</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Publishing Volumes in Major Databases Related to Covid-19</article-title>. <source>Scientometrics.</source> <fpage>1</fpage>&#x2013;<lpage>12</lpage>. </citation>
</ref>
<ref id="B5">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Dasgupta</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Piplai</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Kotal</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Joshi</surname>
<given-names>A.</given-names>
</name>
</person-group> (<year>2020</year>). &#x201c;<article-title>A Comparative Study of Deep Learning Based Named Entity Recognition Algorithms for Cybersecurity</article-title>,&#x201d; in <conf-name>4th International Workshop on Big Data Analytics for Cyber Intelligence and Defense, IEEE International Conference on Big Data</conf-name>. <pub-id pub-id-type="doi">10.1109/bigdata50022.2020.9378482</pub-id> </citation>
</ref>
<ref id="B6">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Dozier</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Kondadadi</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Light</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Vachher</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Veeramachaneni</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Wudali</surname>
<given-names>R.</given-names>
</name>
</person-group> (<year>2010</year>). &#x201c;<article-title>Named Entity Recognition and Resolution in Legal Text</article-title>,&#x201d; in <source>Semantic Processing of Legal Texts</source> (<publisher-name>Springer</publisher-name>), <fpage>27</fpage>&#x2013;<lpage>43</lpage>. </citation>
</ref>
<ref id="B7">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Elluri</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Joshi</surname>
<given-names>K. P.</given-names>
</name>
<name>
<surname>Kotal</surname>
<given-names>A.</given-names>
</name>
</person-group> (<year>2020</year>). &#x201c;<article-title>Measuring Semantic Similarity Across Eu Gdpr Regulation and Cloud Privacy Policies</article-title>,&#x201d; in <conf-name>7th International Workshop on Privacy and Security of Big Data (PSBD 2020), in conjunction with 2020 IEEE International Conference on Big Data</conf-name>(<publisher-name>IEEE BigData 2020</publisher-name>). </citation>
</ref>
<ref id="B8">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Elluri</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Nagar</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Joshi</surname>
<given-names>K. P.</given-names>
</name>
</person-group> (<year>2018</year>). &#x201c;<article-title>An Integrated Knowledge Graph to Automate Gdpr and Pci Dss Compliance</article-title>,&#x201d; in <conf-name>2018 IEEE International Conference on Big Data (Big Data)</conf-name> (<publisher-name>IEEE</publisher-name>), <fpage>1266</fpage>&#x2013;<lpage>1271</lpage>. </citation>
</ref>
<ref id="B9">
<citation citation-type="journal">
<collab>for Disease Control, C., Prevention</collab>. (<year>2003</year>). <article-title>Hipaa Privacy Rule and Public Health. Guidance From Cdc and the US Department of Health and Human Services</article-title>. <source>MMWR: Morbidity mortality weekly Rep.</source> <volume>52</volume>, <fpage>1</fpage>&#x2013;<lpage>17</lpage>. </citation>
</ref>
<ref id="B10">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>He</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Kayaalp</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>2008</year>). &#x201c;<article-title>Biological Entity Recognition With Conditional Random fields</article-title>,&#x201d; in <conf-name>AMIA Annual Symposium Proceedings</conf-name> (<publisher-name>American Medical Informatics Association</publisher-name>). <volume>2008</volume>, <fpage>293</fpage>. </citation>
</ref>
<ref id="B11">
<citation citation-type="journal">
<collab>HHS</collab> (<year>2020</year>). <article-title>Hipaa Covid</article-title>. <pub-id pub-id-type="doi">10.000/55555</pub-id> </citation>
</ref>
<ref id="B12">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Joshi</surname>
<given-names>K. P.</given-names>
</name>
<name>
<surname>Elluri</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Nagar</surname>
<given-names>A.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>An Integrated Knowledge Graph to Automate Cloud Data Compliance</article-title>. <source>IEEE Access.</source> <volume>8</volume>, <fpage>148541</fpage>&#x2013;<lpage>148555</lpage>. <pub-id pub-id-type="doi">10.1109/access.2020.3008964</pub-id> </citation>
</ref>
<ref id="B13">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Joshi</surname>
<given-names>K. P.</given-names>
</name>
<name>
<surname>Yesha</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Finin</surname>
<given-names>T.</given-names>
</name>
</person-group> (<year>2016</year>). &#x201c;<article-title>An Ontology for a Hipaa Compliant Cloud Service</article-title>,&#x201d; in <conf-name>4th International IBM Cloud Academy Conference ICACON 2016</conf-name>. </citation>
</ref>
<ref id="B14">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Kim</surname>
<given-names>D.-y. L.</given-names>
</name>
<name>
<surname>Joshi</surname>
<given-names>K. P.</given-names>
</name>
</person-group> (<year>2021</year>). &#x201c;<article-title>A Semantically Rich Knowledge Graph to Automate Hipaa Regulations for Cloud Health it Services</article-title>,&#x201d; in <conf-name>7th IEEE International Conference on Big Data Security on Cloud (BigDataSecurity 2021)</conf-name>. <pub-id pub-id-type="doi">10.1109/bigdatasecurityhpscids52275.2021.00013</pub-id> </citation>
</ref>
<ref id="B15">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Kotal</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Joshi</surname>
<given-names>K. P.</given-names>
</name>
<name>
<surname>Joshi</surname>
<given-names>A.</given-names>
</name>
</person-group> (<year>2020</year>). &#x201c;<article-title>Vicloud: Measuring Vagueness in Cloud Service Privacy Policies and Terms of Services</article-title>,&#x201d; in <conf-name>2020 IEEE 13th International Conference on Cloud Computing (CLOUD)</conf-name> (<publisher-name>IEEE</publisher-name>), <fpage>71</fpage>&#x2013;<lpage>79</lpage>. </citation>
</ref>
<ref id="B16">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lee</surname>
<given-names>I.</given-names>
</name>
<name>
<surname>Kovarik</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Tejasvi</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Pizarro</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Lipoff</surname>
<given-names>J.&#x20;B.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Telehealth: Helping Your Patients and Practice Survive and Thrive during the Covid-19 Crisis with Rapid Quality Implementation</article-title>. <source>J.&#x20;Am. Acad. Dermatol.</source> <volume>82</volume>, <fpage>1213</fpage>&#x2013;<lpage>1214</lpage>. <pub-id pub-id-type="doi">10.1016/j.jaad.2020.03.052</pub-id> </citation>
</ref>
<ref id="B17">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Malone</surname>
<given-names>R. W.</given-names>
</name>
<name>
<surname>Tisdall</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Fremont-Smith</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Huang</surname>
<given-names>X.-P.</given-names>
</name>
<name>
<surname>White</surname>
<given-names>K. M.</given-names>
</name>
<etal/>
</person-group> (<year>2021</year>). <article-title>Covid-19: Famotidine, Histamine, Mast Cells, and Mechanisms</article-title>. <source>Front. Pharmacol.</source> <volume>12</volume>, <fpage>633680</fpage>. <pub-id pub-id-type="doi">10.3389/fphar.2021.633680</pub-id> </citation>
</ref>
<ref id="B18">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Manning</surname>
<given-names>C. D.</given-names>
</name>
<name>
<surname>Surdeanu</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Bauer</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Finkel</surname>
<given-names>J.&#x20;R.</given-names>
</name>
<name>
<surname>Bethard</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>McClosky</surname>
<given-names>D.</given-names>
</name>
</person-group> (<year>2014</year>). &#x201c;<article-title>The Stanford Corenlp Natural Language Processing Toolkit</article-title>,&#x201d; in <conf-name>Proceedings of 52nd annual meeting of the association for computational linguistics: system demonstrations</conf-name>, <fpage>55</fpage>&#x2013;<lpage>60</lpage>. </citation>
</ref>
<ref id="B19">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Mikolov</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>I. S. K.</given-names>
</name>
<name>
<surname>Corrado</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Dean</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2013</year>). &#x201c;<article-title>Distributed Representations of Words and Phrases and Their Compositionality</article-title>,&#x201d; in <conf-name>26th International Conference on Neural Information Processing Systems</conf-name>. <publisher-name>ACM</publisher-name>. <volume>2</volume>, <fpage>3111</fpage>&#x2013;<lpage>3119</lpage>. </citation>
</ref>
<ref id="B20">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Miller</surname>
<given-names>G. A.</given-names>
</name>
</person-group> (<year>1998</year>). <source>WordNet: An Electronic Lexical Database</source>. <publisher-name>MIT press</publisher-name>.</citation>
</ref>
<ref id="B21">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Mittal</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Das</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Mulwad</surname>
<given-names>V.</given-names>
</name>
<name>
<surname>Joshi</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Finin</surname>
<given-names>T.</given-names>
</name>
</person-group> (<year>2016</year>). &#x201c;<article-title>Cybertwitter: Using Twitter to Generate Alerts for Cybersecurity Threats and Vulnerabilities</article-title>,&#x201d; in <conf-name>IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining</conf-name> (<publisher-name>IEEE Press</publisher-name>). <pub-id pub-id-type="doi">10.1109/asonam.2016.7752338</pub-id> </citation>
</ref>
<ref id="B22">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Mohan</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>D.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Medmentions: a Large Biomedical Corpus Annotated With Umls Concepts</article-title>. <comment>arXiv preprint arXiv:1902.09476</comment>. <pub-id pub-id-type="doi">10.1287/0481a5c2-9e88-4d8e-84f1-f92285ee0c95</pub-id> </citation>
</ref>
<ref id="B23">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Nadeau</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Sekine</surname>
<given-names>S.</given-names>
</name>
</person-group> (<year>2007</year>). <article-title>A Survey of Named Entity Recognition and Classification</article-title>. <source>Li.</source> <volume>30</volume>, <fpage>3</fpage>&#x2013;<lpage>26</lpage>. <pub-id pub-id-type="doi">10.1075/li.30.1.03nad</pub-id> </citation>
</ref>
<ref id="B24">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ocr</surname>
<given-names>H.</given-names>
</name>
</person-group> (<year>2020a</year>). <article-title>Covid-19 and Hipaa: Disclosures to Law Enforcement, Paramedics, Otherfirstresponders and Public Health Authorities</article-title> </citation>
</ref>
<ref id="B25">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ocr</surname>
<given-names>H.</given-names>
</name>
</person-group> (<year>2020b</year>). <article-title>Guidance on Covered Health Care Providers and Restrictions on Media Access to Protected Health Information About Individuals in Their Facilities</article-title> </citation>
</ref>
<ref id="B26">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ocr</surname>
<given-names>H.</given-names>
</name>
</person-group> (<year>2020c</year>). <article-title>Updated Guidance on Hipaa and Contacting Former Covid-19 Patients About Plasma Donation</article-title> </citation>
</ref>
<ref id="B27">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Paulheim</surname>
<given-names>H.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>Knowledge Graph Refinement: A Survey of Approaches and Evaluation Methods</article-title>. <source>Semantic web.</source> <volume>8</volume>, <fpage>489</fpage>&#x2013;<lpage>508</lpage>. </citation>
</ref>
<ref id="B28">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Piplai</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Mittal</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Abdelsalam</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Gupta</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Joshi</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Finin</surname>
<given-names>T.</given-names>
</name>
</person-group> (<year>2020a</year>). <article-title>Knowledge Enrichment by Fusing Representations for Malware Threat Intelligence and Behavior</article-title>. <source>UMBC Fac. Collection.</source> <pub-id pub-id-type="doi">10.1109/isi49825.2020.9280512</pub-id> </citation>
</ref>
<ref id="B29">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Piplai</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Mittal</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Joshi</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Finin</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Holt</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Zak</surname>
<given-names>R.</given-names>
</name>
</person-group> (<year>2020b</year>). <article-title>Creating Cybersecurity Knowledge Graphs from Malware after Action Reports</article-title>. <source>IEEE Access.</source> <pub-id pub-id-type="doi">10.1109/isi49825.2020.9280512</pub-id> </citation>
</ref>
<ref id="B30">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Piplai</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Ranade</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Kotal</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Mittal</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Narayanan</surname>
<given-names>S. N.</given-names>
</name>
<name>
<surname>Joshi</surname>
<given-names>A.</given-names>
</name>
<etal/>
</person-group> (<year>2020c</year>). <article-title>Using Knowledge Graphs and Reinforcement Learning for Malware Analysis</article-title>. <pub-id pub-id-type="doi">10.1109/bigdata50022.2020.9378491</pub-id> </citation>
</ref>
<ref id="B31">
<citation citation-type="journal">
<collab>Protege</collab> (<year>2020</year>). <article-title>Protege Tool</article-title>. <pub-id pub-id-type="doi">10.000/55555</pub-id> </citation>
</ref>
<ref id="B32">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sirin</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Parsia</surname>
<given-names>B.</given-names>
</name>
</person-group> (<year>2007</year>). <article-title>Sparql-dl: Sparql Query for Owl-Dl</article-title>. <source>OWLED.</source> <volume>258</volume>. </citation>
</ref>
</ref-list>
</back>
</article>