The Phenomic Analytics and Clinical Data Core (PACDC) provides collaborative phenotypic development and research data support to Geisinger investigators and clinicians. Its services include data extraction from multiple data sources, data organization and modeling, survey data capture, data visualization, terminology and ontology management and electronic phenotype development.
Data Modeling
Multiple data models are used for research and to participate in research networks. The PACDC uses data from CDIS and various disparate data sources within GHS to develop and maintain these data models. Currently, the core manages the Health Care Systems Research Network (HCSRN) Virtual Data Warehouse (VDW), the National Patient-Centered Clinical Research Network (PCORnet) Common Data Model (CDM), Informatics for Integrating Biology and the Bedside (i2b2), and a homegrown data model called the Phenomics Initiative Database (PIDB).
Exploratory Feasibility Analysis
Investigators with specific research questions in mind can interrogate data to determine the feasibility of conducting a study. A preparatory to research form is required and can be submitted to the PACDC in order to extract counts or specific data on a patient population to determine if an adequate sample size exists or if data exists to complete a study.
Electronic Phenotype Development
Defining traits of patients (disease, treatment, response, etc.) is an iterative and collaborative process between programmers, investigators and/or clinicians. Multiple methods exist to develop electronic phenotypes ranging from manually-curated, algorithmically-defined phenotypes to various regression and machine-learning approaches. The PACDC uses methods to create reproducible electronic phenotypes that are validated through chart review to increase throughput and efficiency.
Data Manipulation and Organization
The PACDC provides support for research studies requiring data to be structured and organized to perform analysis. The PACDC can assist in the organization and manipulation of data from multiple data sources and documents to create analysis ready datasets. They often work closely with biostatisticians and informaticians to facilitate the completion of analysis.
Natural Language Processing
Specific and general requests to conduct Natural Language Processing (NLP) on clinical notes or exam results can be done using our NLP pipeline. The pipeline currently uses Apache cTAKESTM and UMLS 2014 AB concepts (SNOMED, RxNorm and NCI). Additional terminologies contained in UMLS or locally developed can be created and implemented upon request.
Terminology Management Solutions
Local codes are mapped to standard terminologies for many concepts. Change management is deployed to track changes to code systems over time. They are currently focused on LOINC, ICD 9/10, SNOMED-CT, and RxNorm. Other smaller terminologies are used to coincide with meaningful use guidelines and data model requirements for research networks.
Electronic Survey Instrument Deployment
The PACDC also provides services to deploy survey instruments and survey tracking in conjunction with the Survey Research and Recruitment Core. We are developing the forms in MS Access and storing the data in MS SQL Server. RedCap is a new tool that is being deployed and will be available for development shortly.