Marilyn Holt, PhD, on the My Cancer Genome Clinical Trial Database
– The scalable precision oncology knowledge base helped clinicians understand current research trends, study showed
This Reading Room is a collaboration between 鶹ý® and:
Automating the clinical trial knowledgebase (MCG) to update disease, biomarker, drug, and clinical trial data was found to help clinicians better understand trends in current research as well as existing gaps in cancer care.
The original MCG knowledgebase and website were launched in 2011 to guide clinicians in the application of genomic testing results for treatment of cancer patients, using a wiki-style approach -- i.e., with the information collaboratively edited and managed via a web browser). This process proved time-consuming and slow, however, so researchers developed reusable templates to make it easier to, as the team explained, "transform assertions into reconfigurable text and generate data pages."
A recent study in explained the evolution of the process and what the researchers learned.
In the following interview, Marilyn Holt, PhD, a staff scientist at Vanderbilt-Ingram Cancer Center in Nashville, answered questions about the team's findings.
What are the highlights and the applicability to precision and molecular oncology?
Holt: We designed an assertion-based data model that allows the MCG knowledgebase and website to expand with the field of precision oncology, which is not possible with manually curated resources.
You can now search any mutation on the website and click through to see all the clinical trials associated with that mutation and which biomarkers were included or excluded in those trials. It's a powerful resource that oncologists may not know is available.
I personally use it when I look at variants: What's the clinical trial space like? Are there many mutations or genes involved? This makes it easy to find the answers.
Why was it important to update the wiki-style approach that relied on manual evidence curation and synthesis of evidence?
Holt: We can maintain the knowledgebase and website with fewer man hours than with a wiki-style approach. When an individual writes up a page for the knowledgebase, it takes a few hours, and then undergoes expert review. Using the original model, we generated and maintained content for only 1,338 genes and alterations and 25 diseases over a period of 8 years. In contrast, we have now generated 30,000 pages of content in only 18 months.
As we add more assertions, the database will automatically turn this into content on the website. It just goes much faster.
When is the database most useful?
Holt: The database is useful to get a sense of the landscape of clinical trials or the prevalence of a gene. We have pulled in publicly available data from different resources, including the AACR's , and set the MGS [microarray-based genomic selection] sequencing for more than 100,000 patients from a consortium of institutions.
We have compiled and aggregated information, so it's one-stop shopping. If a patient with lung cancer has a certain mutation, an oncologist can find out how common the mutation is, and if there are clinical trial options for this patient. Our site provides an easy way to get information quickly.
How can available data sets be used to generate structured content?
Holt: All assertions come from publicly available sources. Some assertions are directly imported -- for example, from Project Genie. We have curated and organized content so it's easier to find the information you need. We also distinguish between manually curated assertions and computational assertions and describe how they are generated and which category they fall into.
How can the MCG assertions be integrated with applications beyond the website?
Holt: Essentially, our assertions and acknowledgement management system are like a warehouse of structured data. For example, there are assertions that list variants conferring resistance to EGFR inhibitors in non-small cell lung cancer (NSCLC), which we display as therapeutic assertions on the MCG website pages for NSCLC, the variants, and the EGFR inhibitors.
We recently for breast cancer and acute myeloid leukemia-related trials with structured, curated data from clinical trials using the MCG clinical trial knowledgebase. We performed detailed analytics on 1,128 breast cancer and 483 acute myeloid leukemia trial sets to highlight the top biomarkers, drug classes, and drugs, thereby supporting a full view of biomarkers, biomarker groups, and drugs that are currently being explored in these diseases.
What are the challenges?
Holt: It's very challenging to put together data or web pages that require synthesis of information in an automated way. Our pathway pages provide schematic information about different aspects of each potential pathway. These pages have to be updated manually. We are currently searching for computational short-cuts to reduce the time we spend on these pages, and thinking about which pages work with straight facts and which ones should synthesize information.
What are the future plans?
Holt: We are planning more assertion types that are not included in the initial roll-out, and there are sets of diagnostic and prognostic assertions that we would love to include on the website. We would also like to include levels of evidence to allow readers to look at a therapeutic assertion and know how good the assertion is.
What's the bottom-line message for practicing oncologists?
Holt: Using an assertion-based structure for a knowledgebase allowed us to rapidly scale research that is 100% computationally curated. Also, I highly recommend our clinical trial search function.
Read the study here and expert commentary about the clinical implications here.
Holt reported a relationship with GenomOncology.
Primary Source
JCO Clinical Cancer Informatics
Source Reference: