Documentation
Welcome to the documentation for OpenCodelists, part of the OpenSAFELY project from the Bennett Institute for Applied Data Science at the University of Oxford.
OpenCodelists is an open platform for creating and sharing codelists of clinical terms and drugs.
Table of contents
- Status of the project
- What is a codelist?
- Viewing a codelist
- Creating an account
- Organisations
- Creating a codelist from scratch with the codelist builder
- Creating a codelist from a CSV file
- Selecting appropriate codes for your codelist
- Editing a codelist
- Viewing your codelists
- Converting Pseudo-BNF codelists to dm+d
- Using a codelist in OpenSAFELY research
- Reporting bugs, requesting features, and asking for help
Status of the project
OpenCodelists has been used to create 8632 codelists since April 2020, and it is in active development by staff at the Bennett Institute.
Codelists created with OpenCodelists are used in research projects all over the UK, both within and without the OpenSAFELY research ecosystem.
Anybody can use OpenCodelists to create and share codelists.
To report bugs or suggest improvements, please raise an issue via the issue tracker on GitHub.
What is a codelist?
A codelist is a set of codes which can be recorded in clinical systems, representing data such as:
- Patient Demographics - e.g. Age, Ethnicity
- Medicines - e.g. Paracetamol, Morphine
- Condition Diagnoses - e.g. Crohn's Disease, Bipolar disorder
- Symptoms - e.g. Headache, Blood in urine
- Test Results - e.g. Potassium level, Abnormal ECG
- Procedures - e.g. Coronary artery bypass graft, Hysterectomy
- Activities - e.g. Medication review, Consultation via video
Codelists are used in almost all studies in OpenSAFELY - and other health data research - to select patients with activities and conditions of interest within the dataset(s) being used.
- Codelists each use a coding system such as SNOMED CT, dm+d (dictionary of medicines and devices), CTv3 (Clinical Terms v3), etc.
- Codelists can often be large, in order to capture the many possible codes that could represent a certain activity or condition.
- Creating or selecting a codelist can often be very nuanced, e.g. whether a codelist for "diabetes" should include or exclude gestational diabetes may vary according to the study.
- Sometimes a combination of codelists of different types may be required to fully capture patients with certain conditions, e.g. medications for asthma (dm+d), diagnoses of asthma in primary care (SNOMED CT), diagnoses of asthma in a hospital admission (ICD-10).
Viewing a codelist
The homepage of a codelist shows:
- information about what the codelist contains,
- how the codelist was created,
- links to any references
- details of who created the codelist
Codelist IDs and versions
The codelist homepage also displays:
- the codelist's ID
- current version information.
A codelist has a Codelist ID, which is the canonical ID for the codelist and determines the URL of its homepage. For example, if it existed, the above codelist would be found at https://opencodelists.org/codelist/user/bob/asplenia. This URL will always go to the latest visible version; if you are logged in and you have a version of the codelist in draft or review, this will be shown. Otherwise, it will go to the latest published version.
A codelist can have multiple versions, each of which has a Version ID (and also a Version Tag for some older codelists). The ID and tag (if applicable) for the version that you are currently viewing is also displayed under the Codelist ID.
If there are multiple versions of a codelist, links to these will be displayed:
The codelist details tabs
The Full list tab shows a searchable list of codes and terms.
Most codelists have a Tree tab, showing all of the codes in the codelist in the context of other codes in the coding system. This is helpful for seeing whether there are any accidental gaps in the codelist.
- Codes that are in the codelist have an "Included" label
- Codes that are not in the codelist are shown greyed out and have an "Excluded" label
For codelists that were created with the builder, there will also be a Searches tab, showing the search terms that were used to create the codelist.
For all codelists, there are links for downloading a CSV of the codelist and a CSV containing a definition of the codelist.
Creating an account
You do not need an account to view codelists, but you do to create codelists.
Anybody can create an account. To do so, click the Sign up menu option.
Organisations
If you are a member of an OpenCodelists organisation, you will see a My organisations menu option.
My organisations allows you to view codelists that are owned by your organisations, or that are waiting for review.
Any OpenCodelists user with an account can create a codelist. However, to create or edit codelists on behalf of an organisation, you must be a member of that organisation.
To join an organisation:
- Contact your organisation administrator.
- Your administrator will need your OpenCodelists username or email address in order to add you to the organisation.
Creating a codelist from scratch with the codelist builder
Our codelist builder tool helps you create a codelist from scratch, by searching terms and choosing which matching concepts should be included.
When signed in, click the My codelists menu option.
Then click Create a codelist.
This will take you to a form to create a new codelist.
Choose a name for the codelist, and select a coding system. We currently support codelists using the following coding systems:
- SNOMED CT
- ICD-10
- CTV3 (Read V3)
- BNF (British National Formulary codes)
- Dictionary of Medicines and Devices
If you are a member of an organisation, you can also choose an owner for the codelist (your own account or an organisation account).
Then click Create.
You'll be taken to the codelist builder tool, with instructions displayed.
Build your codelist by searching for terms or codes, and then choosing which of the matching concepts should be included in the codelist. Choose between searching for a code and a term with the Term and Code buttons above the search text entry.
Any concepts that match your search term, and their descendants, are shown in a hierarchy.
To keep the page manageable, only two levels of the hierarchy are initially visible.
You can further expand the hierarchy by clicking the ⊞
button next to concepts.
Include a concept by clicking the +
button, and exclude a concept by clicking the −
button.
When you include or exclude a concept, all of its descendants are also included or excluded.
Explicitly included/excluded concepts have buttons highlighted blue;
their descendants have buttons highlighted grey.
To undo inclusion or exclusion of a result, click on the include or exclude button again.
Sometimes a code can be in conflict. This happens when one of its ancestors in included and another is excluded. For instance, Arthritis of elbow is in conflict because it is both a descendant of the included Arthropathy of elbow, and the excluded Elbow joint inflamed.
Hover on a conflicted code and click More info to see the conflict details.
Under Concepts found there are a link to filter the builder view to conflicted or unresolved concepts.
All the searches used to build the codelist are displayed under Previous searches. View the results of specific searches again by clicking on them. The show all button returns to the combined results of all searches.
Delete a search by clicking the Remove button next to it. If you have already included some concepts from that search, they will not be removed.
The save buttons are at the top of the builder:
If you have not yet completed your codelist, you can Save draft at any time, and return to edit it later. Once you have included or excluded every search result and have no remaining conflicts, the Save for review button will be enabled. The Save for review button takes you to the codelist's homepage, where you can edit metadata to provide:
- a description
- methodology
- links to references
- codelist sign offs
The codelist is still not publicly available to allow for it to be reviewed and signed off. You can copy its URL and send it to a another OpenCodelists user to review. The reviewer signs off by editing the codelist's metadata, and adding their user and the date in the sign offs section.
Procedures for reviewing and signing off codelists may vary between organisations. For more information on procedures for building codelists to use in OpenSAFELY research, see the OpenSAFELY documentation.
Once the codelist is reviewed, it can be published, using the Publish version button from the codelist's homepage:
Publishing a codelist version will make that version permanent, and will delete any other draft or in-review versions.
Notes on building dm+d codelists
The builder functions largely the same with the dm+d coding system as with any other coding system, save for few minor details listed here.
Searches will be executed across Ingredient, VTM (Virtual Therapeutic Moeity), VMP (Virtual Medicinal Product), and AMP (Actual Medicinal Products) entities' codes or names (and descriptions, where available).
Whilst Ingredients are searched, they are not displayed in the results. However, any VMPs containing an Ingredient that matches a search will be displayed (along with their related VTMs and AMPs).
Codes are arranged in the tree view based on a hierarchy of VTM -> VMP -> AMP.
Due to the large number of AMPs for many VMPs, this part of the tree is not expanded by default but can be by clicking the small plus arrow next to the VMP whose AMPs you wish to view.
Historical dm+d codelists uploaded from csv or converted from Pseudo-BNF codelists are not fully enabled for editing in the builder.
By default, it is only possible to create new versions of these codelists by uploading a new csv file, or by re-running a conversion from Pseudo-BNF.
We can, on request, convert these historical codelists into ones that are fully enabled in the builder for creation of new versions.
Creating a codelist from a CSV file
As well as creating a codelist from scratch, you can create one by uploading a CSV file.
From the My codelists page, click Create a codelist.
- Choose a name for the codelist.
- Select a coding system.
- Choose a file to upload from your hard drive.
To create an OPCS-4 codelist, please see the notes elsewhere on this documentation page.
If you are a member of an organisation, you can also choose an owner for the codelist (your own account or an organisation account).
Requirements for uploading codelists to OpenCodelists
- Store the final codelist in CSV format.
- Store codes in exactly one column.
- Remove the header row. Standardised headers will automatically be added for you.
- There is currently a soft requirement that the first column must contain codes in the chosen coding system. These should preferably be named according to the CSV column names provided in the table above. (We plan to eventually remove this requirement.) The second column is typically a text description of the code.
More about codelist columns
The codelist page allow you to upload two columns:
- a code
- a text description of the code
However, some codelists may require a 'classification' or 'type' column, which classifies the codes into subcategories. For example, when using a codelist for venous thromboembolism, you may wish to classify these codes into deep vein thromboses and pulmonary embolisms. By using subcategories, you can keep all the codes in a single codelist, rather than uploading separate lists for each clinical subcategory. The OpenSAFELY documentation has guidance for using category columns with OpenSAFELY ehrQL dataset definition.
Uploading more than two columns is currently only possible for the OpenSAFELY core team. If your study requires this feature, please get in touch.
Potential issues when editing codelists in spreadsheet software, such as Excel
Avoid:
- filtering on an include or exclude column when finalising a codelist. Applied filters are lost in CSV conversion: all of the codes will be uploaded.
- editing SNOMED CT codelists. The codes get rounded.
When you click Create the codelist will be created and you will be taken to the codelist homepage.
From here, you can edit any metadata. You can also edit the codelist.
We are aware of an issue whereby Excel can truncate or round dm+d IDs, turning them into invalid IDs. This is due to Excel's interpretation of this column as a number type insufficiently large to contain a dm+d ID. For this reason, opening dm+d codelist files in Excel should be avoided wherever possible.
Where it is unavoidable to do so, rather than opening a dm+d codelist csv file directly in Excel (such as through the Open dialogue or from the file explorer), we recommend opening a blank Excel workbook and using the "Import data from Text/CSV" feature. To avoid the problematic rounding/truncation behaviour described above, specify the data type of the dm+d id column as "Text" during the import process.
Adding an OPCS-4 codelist
Note: OPCS-4 codelists can not currently be created using the form described above.
To add an OPCS-4 codelist, navigate to https://www.opencodelists.org/codelist/{ACCOUNT}/add/
— where {ACCOUNT}
in the URL is substituted with one of the following options:
- Either
user/{username}
whereusername
is your OpenCodelists username, to add the codelist to your personal account - Or the name of the organisation your account is associated with, to add the codelist under the organisation.
The OPCS-4 codes you upload should NOT include the decimal point.
Selecting appropriate codes for your codelist
Choosing which codes to include in your codelist can be challenging without understanding their usage in clinical practice.
Some clinical activities are represented by a single code, while others may require a comprehensive list of codes to accurately capture the intended clinical activity. Even a single incorrectly included or omitted code could potentially lead to vastly different results when using the codelist to query electronic health record data.
Some of the common pitfalls when selecting appropriate codes include:
- Including similar-sounding but unrelated codes. For example, ocular hypertension, which pertains to high fluid pressure within the eye, and is not appropriate for a codelist intended to capture high blood pressure.
- Omitting synonyms. For example, when defining a codelist for sore throat, it is essential to include clinical codes which describe pharyngitis as well.
- Misunderstanding study intent. Selecting an appropriate codelist requires careful consideration of which patients are relevant to the research aims. For example, the decision to include or exclude gestational diabetes in a diabetes codelist may vary depending on the specific study or context.
- Use of non-specific codes. Some codes might be useful to improve sensitivity of a study but care needs to be taken to consider potential negative impact on specificity. For example, sore throat is a potential symptom of Group A Strep infection but is also a symptom of many other conditions, if including this in a study it is likely other codelists (for example, antibiotics for Group A Strep treatment) would need to be used to maintain an appropriate level of specificity.
To avoid these pitfalls we recommend:
- Clearly defining your clinical feature of interest. This may include specific features you do not want to capture.
- Specifying the synonyms that may be used for your clinical feature of interest.
- Considering balancing sensitivity and specificity of the selected codes
- Looking for similar codelists on OpenCodelists and understanding what methodology they used for their selection of codes.
- Where available, using published data on code usage to understand how a clinical area is coded in practice. This doesn't exist for all code terminologies, but helpfully, NHS Digital provides a dataset on SNOMED CT code usage in primary care, which includes data since 2011. This dataset includes annual counts of how often each SNOMED CT code is recorded in GP patient records across England. You can explore recorded usage of individual codes or entire codelists, including those on OpenCodelists, using this prototype SNOMED CT code usage explorer. Note, however, that low or no usage for a code may not be indicative of its future use.
You can read more about codelists and their construction in the Bennett Institute blog series on clinical codes.
Editing a codelist
You can edit a codelist that you own, or that is owned by an organisation that you belong to. To do this, click Create new version on the codelist homepage.
This will open the builder, with all of the codes from your codelist selected. Additionally, if the codelist was created through the builder, then any terms that were searched for will be present.
You can search for new terms, and you can change whether any concepts are included or excluded.
You can discard your changes by clicking Discard. You can also save a draft version of your codelist by clicking Save as draft. And when all concepts have been resolved, you can save your changes and create a new version for review by clicking Save for review.
Importantly, the original version of your codelist is still accessible at the same URL.
Viewing your codelists
You can view your codelists the My codelists page.
This page shows a list of all your published codelists, organisation codelists that you have created or edited, and your codelists that are currently in draft or under review.
Converting Pseudo-BNF codelists to dm+d
Pseudo-BNF and the NHS Dictionary of Medicines and Devices (dm+d) are both medication coding systems in regular use in the UK.
The NHS regularly publishes a file which maps BNF codes to dm+d codes, which we ingest into OpenCodelists, allowing you to convert your Pseudo-BNF codelists to dm+d.
To convert a Pseudo-BNF codelists to dm+d:
- Go to the page for Pseudo-BNF codelist you wish to convert.
- If there are multiple versions of the Pseudo-BNF codelist, select the relevant codelist version.
- Click the "Convert to dm+d" button.
This will create a new codelist with the same name as your Pseudo-BNF codelist but with a "-dmd" suffix, and you will be taken to its page. The methodology statement of this dm+d codelist contains a link back to your original Pseudo-BNF. As with any other codelist, you are free to edit this statement, and all other codelist metadata.
If you wish to update this converted codelist (for example, after an update to the Pseudo-BNF list, the Pseudo-BNF coding system, or the Pseudo-BNF to dm+d mappings):
- Return to the Pseudo-BNF codelist.
- Click the "Convert to dm+d" button again
A new version of the dm+d codelist will be created.
If this new version is identical to an existing version of the codelist (i.e. there are no changes in the dm+d codes), you will be shown an error and the new version will not be created.
Using a codelist in OpenSAFELY research
Codelists are central to the research that is carried out in OpenSAFELY.
For more information about using codelists in OpenSAFELY research, see the OpenSAFELY documentation.
Reporting bugs, requesting features, and asking for help
If you've found a bug or would like to request a feature, please raise an issue in the issue tracker on GitHub.
If you'd like support, try asking in the OpenSAFELY discussion forum.