← All Case Studies

NLP-Driven Clinical Data Extraction

YASHICOM delivered a natural language processing system that extracts structured data from unstructured clinical notes, supporting faster research and better patient outcomes for a UK health organisation.

Clinical data and healthcare technology

Clinical notes, discharge summaries, and referral letters contain a wealth of information that is critical for research, quality improvement, and care coordination—but much of it is locked in free text. A UK health organisation needed to turn this unstructured content into structured data without adding burden to clinicians. They partnered with YASHICOM to design and implement an NLP-driven extraction pipeline that could run securely over their existing data estate.

The challenge

The client held large volumes of de-identified clinical text from multiple care settings. Manual coding was too slow and expensive for research and reporting; keyword searches were incomplete and missed nuance. They wanted a solution that could identify diagnoses, procedures, medications, and key outcomes from narrative text, with output suitable for both downstream analytics and integration with clinical systems. Any solution had to comply with NHS data governance and information security standards.

YASHICOM’s approach

YASHICOM conducted a scoping phase to define the target entities and the required output schema, working with clinical and informatics leads. We then built a pipeline combining pre-trained biomedical language models with custom fine-tuning on the client’s own (anonymised) data, so that the system could handle local terminology and documentation styles. Entity linking mapped extracted concepts to standard terminologies (e.g. SNOMED CT where appropriate), enabling consistent aggregation and reporting.

Human review was built in for low-confidence extractions and for a sample of high-confidence ones, so that quality could be monitored and the models improved over time. The pipeline was deployed within the client’s secure environment, with no raw clinical text leaving the organisation.

Results

After piloting on selected specialties and scaling across the organisation:

  • Structured data from unstructured notes at scale — research and audit teams could query by diagnosis, procedure, and outcome without manual chart review for initial screening.
  • Faster research and quality improvement — time from “idea” to “dataset” fell sharply, enabling more rapid feedback into clinical practice.
  • Better patient outcomes — improved visibility of conditions and treatments supported care coordination and reduced duplication of tests and referrals.

Why YASHICOM

YASHICOM has deep experience in NLP and in working with sensitive, regulated data in the public and health sectors. We delivered a solution that was not only technically strong but also aligned with governance, security, and clinical acceptability. If your organisation is looking to unlock value from clinical or operational free text with NLP, we’d be glad to discuss how we can help.

Get in touch →