Seminars in Arthritis and Rheumatism
Volume 40, Issue 5 , Pages 413-420, April 2011

Validation of Psoriatic Arthritis Diagnoses in Electronic Medical Records Using Natural Language Processing

  • Thorvardur Jon Love, MD

      Affiliations

    • Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts
    • Corresponding Author InformationAddress reprint requests to Thorvardur Jon Love, MD, Brigham and Women's Hospital, Harvard Medical School Boston, MA 02115
  • ,
  • Tianxi Cai, ScD

      Affiliations

    • Harvard School of Public Health, Boston, Massachusetts
  • ,
  • Elizabeth W. Karlson, MD

      Affiliations

    • Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts

published online 11 August 2010.

Objectives

To test whether data extracted from full text patient visit notes from an electronic medical record would improve the classification of psoriatic arthritis (PsA) compared with an algorithm based on codified data.

Methods

From the >1,350,000 adults in a large academic electronic medical record, all 2318 patients with a billing code for PsA were extracted and 550 were randomly selected for chart review and algorithm training. Using codified data and phrases extracted from narrative data using natural language processing, 31 predictors were extracted and 3 random forest algorithms were trained using coded, narrative, and combined predictors. The receiver operator curve was used to identify the optimal algorithm and a cut-point was chosen to achieve the maximum sensitivity possible at a 90% positive predictive value (PPV). The algorithm was then used to classify the remaining 1768 charts and finally validated in a random sample of 300 cases predicted to have PsA.

Results

The PPV of a single PsA code was 57% (95% CI 55%-58%). Using a combination of coded data and natural language processing (NLP), the random forest algorithm reached a PPV of 90% (95% CI 86%-93%) at a sensitivity of 87% (95% CI 83%-91%) in the training data. The PPV was 93% (95% CI 89%-96%) in the validation set. Adding NLP predictors to codified data increased the area under the receiver operator curve (P < 0.001).

Conclusions

Using NLP with text notes from electronic medical records improved the performance of the prediction algorithm significantly. Random forests were a useful tool to accurately classify psoriatic arthritis cases to enable epidemiological research.

Keywords: psoriatic arthritis, epidemiology, random forests, algorithm, natural language processing, electronic medical record, database, validation, locating, identifying, NLP

To access this article, please choose from the options below

Login to an existing account or Register a new account.

  • Purchase this article for 31.50 USD (You must login/register to purchase this article)

    Online access for 24 hours. The PDF version can be downloaded as your permanent record.

  • Subscribe to this title

    Get unlimited online access to this article and all other articles in this title 24/7 for one year.

  • Claim access now

    For current subscribers with Society Membership or Account Number.

  • Visit SciVerse ScienceDirect to see if you have access via your institution.
 

 The authors have no conflicts of interest to disclose.

PII: S0049-0172(10)00075-2

doi:10.1016/j.semarthrit.2010.05.002

Seminars in Arthritis and Rheumatism
Volume 40, Issue 5 , Pages 413-420, April 2011