Research

The Institute for Biomedical Informatics conducts research projects in collaboration with other UK groups and external collaborators.

Biomedical Ontology Quality Assurance

Development of methods and tools for exhaustive analysis and curation of large biomedical ontologies such as SNOMED CT and FMA. Methods and tools include: lattice-based structural auditing, cycle detection, relation reversal detection, self-similarity, cross-terminological system alignment, search and visualization interfaces for ontological systems.

Data Analytics and Visualizations

Data Analytics and Visualizations, is the study of visual representations of abstract data using methods designed to focus the user’s ability to understand large amounts of data at once. The DELVE (Document ExpLoration and Visualization Engine) is a prototype for performing literature-based searches with the aid of interactive visualizations and a framework for quickly implementing such visualizations as modular web-applications. The goal for DELVE is to better satisfy the information needs of researchers and help them explore and understand the state of research in scientific literature.

Data Mining

Biomedical Data Mining is an area in informatics that involves identifying patterns and discovering new knowledge in biomedical datasets (numerical/categorical sets, sequences, or graphs) through computational methods involving machine learning or statistical predictive modeling. The fundamental tasks in data mining involve clustering, statistically significant item set (sequence or graph) or rule mining, and classification. Sample problems that are being explored on this topic include computational code extraction, predicting health outcomes such as readmissions or first diagnoses of particular diseases, identifying suitable structures of health information diffusion in online social networks, and linking pairs/groups of biomedical entities in knowledge graphs extracted from text using graph/path mining thus leading to potential new knowledge. Related research involves human-computer information retrieval and collaborative information visualization.

Electronic Data Capture

We develop complementary data extraction and presentation tools, such as OPIC and OnWARD. These tools use an ontology-driven, secure, rapidly-deployed, web-based framework to support data capture for large-scale multi-center clinical research. We develop these tools using the agile methodology to provide a flexible, user-centered dynamic form generator, which can be quickly deployed and customized for any clinical study without the need of deep technical expertise. Because of the flexible framework, these data management system can be extended to accommodate a large variety of data types, including genetic, genomic and proteomic data.

Health Literacy

Health literacy (HL) is the capability to read, understand and use health information to promote informed healthcare decisions. With the astounding volume of health information and advanced health IT, individuals are now facing the biggest challenge to collect, store, and organize personal health information for their own care. Unfortunately, individuals and healthcare systems are not yet ready for supporting development and use of personal health devices. As a biomedical informatics approach, scientifically sound and empirically designed health literacy intervention and outcomes study are critical to study how people manage their own health information. Current research focuses on the development of the intervention study for college students' HL improvements through informatics instruction.

Knowledge Discovery

Knowledge discovery describes the process of automatically searching large volumes of data for patterns that can be considered knowledge about the data. Knowledge discovery developed out of the data mining domain, and is closely related to it both in terms of methodology and terminology.  Our efforts in Knowledge Discovery include Translational Bioinformatics and Public Health Informatics research.

Knowledge Organization

Knowledge Organization (KO) is a field of information science to study information representation and retrieval by analyzing content matters and classifying information-bearing entities for optimal retrieval purpose. For instance, describing health websites with pre-defined metadata elements makes the contents searchable and easy to navigate otherwise not discoverable. The metadata-driven approach to better organize massive big data is challenging. In my research, the biomedical informatics methodologies such as clinical data mining, content-based medical image retrieval, and clinical natural language processing have been applied to better predict critical care patient mortality and detect morphological characteristics of lobular carcinoma in situ of breast cancer.

Natural Language Processing

Natural Language Processing (NLP) is the process of converting textual data into, ideally, 'actionable' information. But, often, it also includes converting unstructured text into structured data that is more straightforward to process using computers. In biomedical domains free text arises in multiple forms including scientific publications, clinical narratives (discharge summaries, pathology notes, progress notes), interview narratives (drug abuse, relationship counseling), and social media posts (Twitter, Facebook, and blogs). NLP involves extracting information from free text that can help researchers in biological, medical, and clinical domains to answer new questions and expedite the discovery process; also assist hospitals in providing better healthcare to patients and their families. Although, the actual information that is extracted depends on the particular problem at hand, the general tasks in NLP include part-of-speech tagging, parsing, named entity recognition, triple/relationship extraction, word sense disambiguation, and sentiment analysis. In biomedical domains, these problems are complicated because of significant variation in medical terms and domain specific writing styles. Once the mining task is complete, the extracted information can be used for solving other problems such as biomedical information retrieval, augmenting and auditing standard vocabularies, clinical cohort selection, quality control, and decision support.

Ontology-guided Data Integration

Modern medicine generates and uses tremendous volumes of data in a wide variety of formats. Ontology-guided data integration strives to develop novel, flexible informatics methodologies, tools and infrastructure to facilitate the collection, management, and analysis of clinical, physiological, and genomic data. We use a user-centered development approach and incorporate visual, ontological, searchable and explorative features in three interrelated components: Query Builder, Query Manager and Query Explorer. This allows us to integrate electrophysiological data from multiple sites and create resources such as the National Sleep Research Resource.