Documentation
Welcome to the documentation for OpenSAFELY Codelists, part of the OpenSAFELY project from the Bennett Institute for Applied Data Science at the University of Oxford.
OpenSAFELY Codelists is an open platform for creating and sharing codelists of clinical terms and drugs.
- Status of the project
- What is a codelist?
- Viewing a codelist
- Creating an account
- Organisations
- Creating a codelist from scratch with the builder
- Creating a codelist from a CSV file
- Selecting appropriate codes for your codelist
- Editing a codelist
- Viewing your codelists
- Using a codelist in OpenSAFELY research
- Future plans
- Reporting bugs, requesting features, and asking for help
Status of the project
OpenSAFELY Codelists is in active development. It's still rough around the edges, but has been used to create 6953 codelists since April 2020.
Anybody can use OpenSAFELY Codelists to create and share codelists.
See the section on future plans for upcoming features.
What is a codelist?
A codelist is a set of codes which can be recorded in clinical systems, representing data such as:
- Patient Demographics - e.g. Age, Ethnicity
- Medicines - e.g. Paracetamol, Morphine
- Condition Diagnoses - e.g. Crohn's Disease, Bipolar disorder
- Symptoms - e.g. Headache, Blood in urine
- Test Results - e.g. Potassium level, Abnormal ECG
- Procedures - e.g. Coronary artery bypass graft, Hysterectomy
- Activities - e.g. Medication review, Consultation via video
Codelists are used in almost all studies in OpenSAFELY - and other health data research - to select patients with activities and conditions of interest within the dataset(s) being used.
- Codelists each use a coding system such as SNOMED CT, dm+d (dictionary of medicines and devices), CTv3 (Clinical Terms v3), etc.
- Codelists can often be large, in order to capture the many possible codes that could represent a certain activity or condition.
- Creating or selecting a codelist can often be very nuanced, e.g. whether a codelist for "diabetes" should include or exclude gestational diabetes may vary according to the study.
- Sometimes a combination of codelists of different types may be required to fully capture patients with certain conditions, e.g. medications for asthma (dm+d), diagnoses of asthma in primary care (SNOMED CT), diagnoses of asthma in a hospital admission (ICD-10).
Viewing a codelist
The homepage of a codelist shows information about what the codelist contains and how it was created, as well as links to any references and details of who created the codelist.
The left hand side of the page displays the codelist's ID and current version information.
A codelist has a Codelist ID, which is the canonical ID for the codelist and determines the URL of its homepage. E.g. the above codelist can be found at https://opencodelists.org/codelist/opensafely/asplenia. This URL will always go to the latest visible version; if you are logged in and you have a version of the codelist in draft or review, this will be shown. Otherwise, it will go to the latest published version.
A codelist can have multiple versions, each of which has a Version ID (and also a Version Tag for some older codelists). The ID and tag (if applicable) for the version that you are currently viewing is also displayed under the Codelist ID.
The Full list tab shows a searchable list of codes and terms.
Most codelists have a Tree tab, showing all of the codes in the codelist in the context of other codes in the coding system. Codes that are not in the codelist are shown greyed out. This is helpful for seeing whether there are any accidental gaps in the codelist.
For codelists that were created with the builder, there will also be a Searches tab, showing the search terms that were used to create the codelist.
For all codelists, there are links for downloading a CSV of the codelist and a CSV containing a definition of the codelist.
If there are multiple versions of a codelist, links to these will be displayed on the left hand side.
Creating an account
You do not need an account to view codelists, but you do to create one.
Anybody can create an account. To do so, click Sign up in the top right of any page.
Organisations
If you are a member of an OpenCodelists organisation, you will see a My organisations menu option, where you can view codelists that are owned by your organisations, or that are waiting for review.
Any OpenCodelists user with an account can create a codelist. However, in order to create or edit codelists on behalf of an organisation, you must be a member of that organisation. To join an organisation, please contact your organisation administrator. Your adminstrator will need your OpenCodelists username or email address in order to add you to the organisation.
Creating a codelist from scratch with the builder
Our codelist builder tool helps you create a codelist from scratch, by searching terms and choosing which matching concepts should be included.
When signed in, click My codelists in the top right of any page.
Then click Create a codelist.
Choose a name for the codelist, and select a coding system from the dropdown. If you are a member of an organisation, you can also choose an owner for the codelist (your own account or an organisation account).
We currently support codelists using the following coding systems:
- SNOMED CT
- ICD-10
- CTV3 (Read V3)
- BNF (British National Formulary codes)
Then click Create.
You'll be taken to a page that is nearly blank.
You build your codelist by searching for terms or codes, and then choosing which of the matching concepts should be included in the codelist. To search for a code, prefix it with "code:" in the search field.
Any concepts that match your search term, and their descendants, are shown in a hierarchy.
In order to keep the page managable, only two levels of the hierarchy are initially visible.
You can drill down the hierarchy by clicking the ⊞
button next to a concept.
You include a concept by clicking the +
button, and exclude a concept by clicking the -
button.
When you include or exclude a concept, all of its descendants are also included or excluded.
Explicity included/excluded concepts have buttons highlighted blue;
their descendants have buttons highlighted grey.
If you have included or excluded a result by mistake, you can undo this by clicking on the include or exclude button again.
Sometimes a code can be in conflict. This happens when one of its ancestors in included and another is excluded. For instance, Acute severe exacerbation of mild persistent allergic asthma is a descendant of both Mild asthma and Acute asthma.
You can click on the ... to the right of the term to see more details.
There is a link to see just the concepts that are unresolved or in conflict.
All the searches used to build the codelist are displayed, and you can view the results of specific searches again by clicking on them. Show all returns to the combined results of all searches.
You can also delete a search by clicking the x next to it. If you have already included some concepts from that search, they will not be removed.
Once you have included or excluded every search result, you are able to save your changes. If you are not done with building your codelists, you can Save draft and come back to finish editing it later.
If you're done building the codelist, you can save it for review. Note that the Save for review button will be disabled if you have unresolved or conflicting codes in the codelist. Save for review takes you to the codelist's homepage, where you can edit metadata to provide a description, methodology, links to references, and sign offs.
The codelist is still not publicly available to allow for it to be reviewed and signed off. You can copy its URL and send it to a another OpenCodelists user to review. The reviewer signs off by editing the codelist's metadata and adding their user and the date in the sign offs section.
Procedures for reviewing and signing off codelists may vary between organisations. For more information on procedures for building codelists to use in OpenSAFELY research, see the OpenSAFELY documentation.
Once the codelist is reviewed, it can be published, using the Publish version link from the codelist's homepage. Publishing a codelist version will make that version permanent, and will delete any other draft or in-review versions.
Creating a codelist from a CSV file
As well as creating a codelist from scratch, you can create one by uploading a CSV file.
From the My codelists page, click Create a codelist.
Choose a name for the codelist, select a coding system from the dropdown, and choose a file to upload from your hard drive.
To create an OPCS-4 or dm+d codelist, please see the notes at the bottom of this page.
If you are a member of an organisation, you can also choose an owner for the codelist (your own account or an organisation account).
Requirements for uploading codelists to OpenCodelists
- Store the final codelist in CSV format.
- Store codes in exactly one column.
- Remove the header row. Standardised headers will automatically be added for you.
- There is currently a soft requirement that the first column must contain codes in the chosen coding system. These should preferably be named according to the CSV column names provided in the table above. (We plan to eventually remove this requirement.) The second column is typically a text description of the code.
More about codelist columns
The codelist page allow you to upload two columns:
- a code
- a text description of the code
However, some codelists may require a 'classification' or 'type' column, which classifies the codes into subcategories. For example, when using a codelist for venous thromboembolism, you may wish to classify these codes into deep vein thromboses and pulmonary embolisms. By using subcategories, you can keep all the codes in a single codelist, rather than uploading separate lists for each clinical subcategory. In your study definition, the filter_codes_by_category
functionality allows access to the subcategories of a codelist.
Uploading more than two columns is currently only possible for the OpenSAFELY core team. If your study requires this feature, please get in touch.
Potential issues when editing codelists in spreadsheet software, such as Excel
Avoid:
- filtering on an include or exclude column when finalising a codelist. Applied filters are lost in CSV conversion: all of the codes will be uploaded.
- editing SNOMED CT codelists. The codes get rounded.
When you click Create the codelist will be created and you will be taken to the codelist homepage.
From here, you can edit any metadata. You can also edit the codelist.
We are aware of an issue whereby Excel can truncate or round dm+d IDs, turning them into invalid IDs. This is due to Excel's interpretation of this column as a number type insufficiently large to contain a dm+d ID. For this reason, opening dm+d codelist files in Excel should be avoided wherever possible. Where it is unavoidable to do so, rather than opening a dm+d codelist csv file directly in Excel (such as through the Open dialogue or from the file explorer), we recommend opening a blank Excel workbook and using the "Import data from Text/CSV" feature. To avoid the problematic rounding/truncation behaviour described above, specify the data type of the dm+d id column as "Text" during the import process.
Adding an OPCS-4 or dm+d codelist
Note: OPCS-4 and dm+d codelists can not currently be created using the form described above.
To add an OPCS-4 or dm+d codelist, navigate to https://www.opencodelists.org/codelist/{ACCOUNT}/add/
— where {ACCOUNT}
in the URL is substituted with one of the following options:
- Either
user/{username}
whereusername
is your OpenCodelists username, to add the codelist to your personal account - Or the name of the organisation your account is associated with, to add the codelist under the organisation.
The OPCS-4 codes you upload should NOT include the decimal point.
Selecting appropriate codes for your codelist
Choosing which codes to include in your codelist can be challenging without understanding their usage in clinical practice. Some clinical activities are represented by a single code, while others may require a comprehensive list of codes to accurately capture the intended clinical activity. Even a single incorrectly included or omitted code could potentially lead to vastly different results when using the codelist to query electronic health record data.
Some of the common pitfalls when selecting appropriate codes include:
- Including similar-sounding but unrelated codes. For instance, ocular hypertension, which pertains to high fluid pressure within the eye, and is not appropriate for a codelist intended to capture high blood pressure.
- Omitting synonyms. For example, when defining a codelist for sore throat, it is essential to include clinical codes which describe pharyngitis as well.
- Misunderstanding study intent. Selecting an appropriate codelist requires careful consideration of which patients are relevant to the research aims. For example; the decision to include or exclude gestational diabetes in a diabetes codelist, for instance, may vary depending on the specific study or context.
- Use of non-specific codes. Some codes might be useful to improve sensitivity of a study but care needs to be taken to consider potential negative impact on specificity. For example, sore throat is a potential symptom of Group A Strep infection but is also a symptom of many other conditions, if including this in a study it is likely other codelists (for example antibiotics for Group A Strep treatment) would need to be used to maintain an appropriate level of specificity.
To avoid these pitfalls we recommend:
- Clearly defining your clinical feature of interest. This may include specific features you do not want to capture.
- Specifying the synonyms that may be used for your clinical feature of interest.
- Considering balacing sensitivity and specificity of the selected codes
- Looking for similar codelists on OpenCodelists and understanding what methodology they used for their selection of codes.
- Where available, using published data on code usage to understand how a clinical area is coded in practice. This doesn't exist for all code terminologies, but helpfully, NHS Digital provides a dataset on SNOMED CT code usage in primary care, which includes data since 2011. This dataset includes annual counts of how often each SNOMED CT code is recorded in GP patient records across England. You can explore recorded usage of individual codes or entire codelists, including those on OpenCodelists, using this prototype SNOMED CT code usage explorer. Note, however, that low or no usage for a code may not be indicative of its future use.
You can read more about codelists and their construction in the Bennett Institute blog series on clinical codes.
Editing a codelist
You can edit a codelist that you own, or that is owned by an organisation that you belong to. To do this, click Create new version on the left hand side of the codelist homepage.
This will open the builder, with all of the codes from your codelist selected. Additionally, if the codelist was created through the builder, then any terms that were searched for will be present.
You can search for new terms, and you can change whether any concepts are included or excluded.
You can discard your changes by clicking Discard. You can also save a draft version of your codelist by clicking Save as draft. And when all concepts have been resolved, you can save your changes and create a new version for review by clicking Save for review.
Importantly, the original version of your codelist is still accessible at the same URL.
Viewing your codelists
You can view your codelists the My codelists page.
This page shows a list of all your published codelists, organisation codelists that you have created or edited, and your codelists that are currently in draft or under review.
Using a codelist in OpenSAFELY research
Codelists are central to the research that is carried out in OpenSAFELY.
For more information about using codelists in OpenSAFELY research, see the OpenSAFELY documentation.
Future plans
Our future plans include:
- improving curation of codelists
- allowing users to record, within the tool, the reasons for their decisions
- adding a system to let users subscribe to a codelist to be notified when it has changed
- making it possible to build a new codelist based on one that somebody else has made
- building a system to ensure that codelist creators get credit for their work
- and much more
Reporting bugs, requesting features, and asking for help
If you've found a bug or would like to request a feature, please raise an issue in the issue tracker on GitHub.
If you'd like support, try asking in the OpenSAFELY discussion forum.