Representation of human genes in SNOMED CT - updated

Briefing Note Purpose

The purpose of this briefing note is to introduce a proposed structure for the representation of human gene names and their associated chromosomal locations in the context of body structures, specifically as subcellular structures. This shift from the current location of genes within SNOMED CT reflects the biological reality of genes as physical, organized, and functional units within chromosomes, aligns SNOMED CT with contemporary clinical practice, and future-proofs the terminology for the era of precision medicine.

It does not intend to replicate all of the genomic relationships that might be represented in a gene ontology such as the HUGO Gene Nomenclature Committee (HGNC), but only those necessary to associate clinically relevant genes with their representative chromosomes. This proposal provides non-normative examples of how gene names may be used in modeling other content (Appendix 1) such as Observable entities, Clinical findings, or Diseases.

This proposal also does not address the notion of a gene as an informational artifact (an alternative representation supported by organizations such as NCBI), but focuses on the structural relationship of a gene to its associated chromosome.

Date created 03 Sept 2025
Action Review content
Status Open
Disposition Awaiting feedback
Feedback by 31 Dec 2025

20250903 Briefing note_ Representation of Human Genes in SNOMED CT version 1.pdf (1.2 MB)

I am unable to access the previous Briefing Note from: Representation of Human Genes in SNOMED CT - Member Forum - SNOMED Forums. There were a number of comments made by Members to that Briefing Note (including responses from SNOMED International) which are not currently visible on the new platform. Having visibility of the previous BN and comments/responses will help us with our review.

Kind regards

Paul Wright
UK Member Forum Representative

Hi @paul.wright39 , thank you for pointing that out. The link on the related MF topic has now been updated and you will be able to get to the old discussion in Spaces.

1 Like

Thanks Rory

Hi Jim, I forwarded the briefing note to DHD, the Dutch organisation that maintains thesauri (linked to SNOMED) for diagnoses and procedures for hospital registration. One of the medical specialities included is clinical genetics.

They think any negative impact on their thesauri would be small and they do see advantages for clinical genetics. They would like to be able to code the gene and the disorder separately, yet establish equivalency with the precoordinated concept.

Personally, I see the value of an attribute to link disorder with gene, but I have my reservations about the addition of all those genes as separate SNOMED concepts. I think scope creep is a highly likely consequence.

I would prefer to find a way to use HGNC without replicating its content in SNOMED. Do these genes have a hierarchy themselves? If not, could we treat them as datatype properties, with the HGNC name as its String value?

Feikje,

We have discussed alternatives to adding genes to SNOMED vs. developing a mechanism to link directly to HGNC. If I am understanding your suggestion correctly, the issue with adding the HGNC name as a data property (i.e. concrete string) is that it would require more work to maintain than adding the gene names as concepts in SNOMED. This is because gene names and symbols change over time and adding them as data properties would require updating all of the associated concepts that use the string as opposed to just updating the gene concept.
As for the scope expansion, our plan is to only add those genes necessary to define existing content (i.e. Disorders, Observables, etc.). There is already a recognized need for gene names in the LOINC and NPU extension work.

The other issue with using concrete values as string values is it eliminates the benefit of linking the genes to their resident chromosomes and any analytical benefits coming from that hierarchy.

Adding gene names based on modeling need seems to be tractable as we are not adding variants or any other element from HUGO.
Happy to discduss further.

Hi,

The briefing ´note has been sent to the Head/Chair for Genomic Medicine Sweden, who then distributed it to all his academic and clinical associates who are experts in genetics. It was also sent to National Advisory Group for rare Diseases, the National Group for Diagnostic Medicine, the Swedish Society for Pathologists. Members in the pilot project for NPU also received the briefing note.

Comments and a concern was received from the National Advisory Group for Rare Diseases, where in their opinion, it sound logical and the argument put forth by SNOMED International is sound and well balanced. However, they, too, have concerns about concepts representing genomic /chromosomal aberrations, annotations, and the disease (disorder) that are associated with the aberrations, as earlier expressed by Feikje. They would like to know how SNOMED International intend to resolve the issue, more from a curious point of view. They are in favor if the HGNC ID be retained and be connected to the respective SNOMED concept, than to have the description (string) that is associated with the genetic concept as disease/disorder names changed over time.

The Regional National Quality Register for Cancer had other comments regarding representation of human genes in SNOMED CT.

The National Quality Registry for Cancer uses a national plattform called INCA (Information network for Cancer Care) where all data from cancer care plans are gathered for improving patient care and research purposes. They have a model that depicts the data being collected, for eg. Gene: BRAC1 (HGNC:1100), Changes: Mutation (the type of mutation is then specified separately) and the Status: Detected/Present.

They wondered if concepts such as 412734009 |BRCA1 gene mutation detected (finding)| are created that would mean a huge explosion of such concepts be created to represent all the finding related to all genetic associated disorders, and that would not be practical for maintenance. Their recommendation is that the modelling of such concepts be represented with distinct concepts separation Gene structure (the entity), Observable entity (the entity being observed or measured) and Clinical finding (results) to represent the mutation detected to improve granularity in documentation for precision medicine.

The committee for Genomic Medicine Sweden will have their strategic meeting on the 9th of Dec. They will discuss further their insights and questions amongst themselves pertaining to the briefing note and they will also respond if they should have a call with Jim Case.

The working group for the NPU pilot study, though we did not discuss the issue in depth, they was a mention by one of the NPU experts that it seem logical to have concepts on genes be under body structure.

The New Zealand National Release Centre, along with New Zealand’s CanShare team are in support of this proposal and eager for SNOMED International to consider the promotion of the gene structure concepts we have developed in the NZ extension.

Kind regards

Lea Miharsa

New Zealand National Release Centre

Hi Jim,

I agree that using a datatype property to specify the gene, copying it from HGNC, would be even worse. What I would prefer is to link the SNOMED disorder to the HGNC concept for the gene, using its unique URI (which I assume it has, being an ontology?). Or, as Keng-Lin suggests, we could use the HGNC-ID.

Your point about the link between gene and resident chromosome is certainly valid, and I see it’s required by the first use case listed in the briefing note. Referring to HGNC-ID would not support this link, but adding HGNC as a dependency would.

Adding another ontology as a dependency is a perfectly valid thing to do in OWL. Best practice, in fact, because it promotes re-use. Historically it’s been impossible for SNOMED but now that all axioms are maintained in OWL, is it still?

I realise that this would cause all kind of problems and take years to implement. Clearly, any serious exploration of this idea should involve the MAG. Even so - let’s not underestimate the consequences of continuing the practice of copying all kinds of content into SNOMED.

So, for our next meeting, let’s discuss:

  • How important is the link between gene and chromosome? How vital are the use cases that depend on it?
  • What are the reasons for and against a major change on SNOMED that allows dependencies on external ontologies?
  • If we add SNOMED concepts for genes, how do we ensure it stays entirely in sync with HGNC?

This proposal is an improvement over the previous version but still has a number of issues.

Scope and intended utility.

It remains unclear what the purpose of the SNOMED CT gene representations will be.

The majority of the proposal emphasises their value in contributing to sufficient definitions of finding, disorder and observable entity concepts. However the document also states that their addition will provide:

“…a structured list of clinically relevant genes names…that can be used as values in molecular testing…”

Surely this is a very different use case, (a) requiring a larger set of genes to be included in SNOMED CT than that required for defining content from other chapters and (b) providing users with a potentially unhelpful ‘choice’ of coding schemes for recording gene names as test results? It would be helpful for members making strategic design decisions if SI could make a clear statement of what SNOMED CT will not set out to support.

Specifics of the proposal.

‘Finding site’ vs. ‘Associated gene’.

The proposal states that “…adding Gene names as Body Structures…allows for the use of the existing FINDING SITE attribute for Clinical findings to represent the location and type of changes to the gene associated with a disorder…”

All well and good.

However in the appendix examples there are multiple uses of a new attribute named ‘associated gene’ between the same domain-range pair. Its inclusion seems to be introduced by the later “…will provide a mechanism to model genes associated with a disorder (will require a new attribute, See examples in Appendix 1)…” entry, but no explanation is given as to when each modelling approach should be used. The 412734009 |BRCA1 gene mutation detected (finding) | example uses ‘finding_site’ plus a gene value, whereas all the other examples use ‘associated_gene’ for gene values and ‘finding_site’ for macroscopic anatomy. How does this approach square with the earlier “…allows for the use of the existing FINDING SITE attribute…” quote?

As an aside regarding the candidate name ‘associated gene’. Whilst consistent with its ‘associated morphology’ neighbour, it seems to contradict the “Attributes should be named as verb senses…” clause of the editorial guide.

Chromosome banding part of the proposal.

The case for addition of concepts for chromosome bands is persuasive, however:

(1) If an automated approach is to be used, why limiting the bands added? Surely it will be simpler to add the full set (appropriately associated with each new named chromosome arm concept) than to try and filter those added (and then extend over time) in terms of which bands are actually referenced in the data. Plenty of macroscopic body structures sit in the data and are not yet used in modelling, why are chromosome bands any different?

The proposal states that it will only add bands “…to support specific disease concepts…” (citing 699305004 |1q21.1 microdeletion syndrome (disorder)| as an example). However the ‘Diagrammatic representation of proposal’ diagram includes ‘Structure of chromosome band 7q34’. whilst indeed the location of the BRAF gene, 7q34 is not currently named in any ‘specific disease concepts’ in SNOMED CT. If it is to be included what is the editorial criterion for its inclusion?

(2) Many karyotype-named disorders do not have neat chromosome + arm + region + band abnormality patterns. Notably many are named with band spans such as 1228886008 | 9q33.3q34.11 microdeletion syndrome (disorder) | and 880086001| 12q24.31-q24.32 deletion syndrome (disorder) |. Will these concepts be modelled with multiple chromosome band sites, or would the spans be modelled as individual concepts?

(3) Assuming karyotype abnormality concepts are modelled with ‘chromosome band’ values, and gene abnormality concepts are modelled with gene values, is there not a risk - given the same or related morphologic abnormality values - that ‘gene abnormality disorders’ will classify as kinds of ‘band abnormality disorders’? if it does occur this arrangement might be explained away by appeal to the ‘Logical versus vernacular’ argument (‘strange things happen’), but it really should be avoided. An individual with an isolated gene abnormality does not imply an instance of a syndrome caused by an abnormality of the wider surrounding chromosome band - if they are at all related, the relationship would be the other way around.

Perhaps there is a need to consider modelling karyotype abnormalities in terms of ‘entire named chromosome bands’ to avoid such false inferencing - in which case you’ll need to add these too.

(4) It’s hard to explain in text, but it may be worth comparing the treatment of 70488008 | Chromosome pair 7 (cell structure) | in the overview diagram, the suggested modelling of 699308002 |Microdeletion of chromosome 15q24 (disorder)| in example 4, and the current classification of 699308002 |Microdeletion of chromosome 15q24 (disorder)| as a kind of 362984008 | Anomaly of chromosome pair (disorder) |.

In order for 699308002 |Microdeletion of chromosome 15q24 (disorder)| to retain its position as a kind of ‘anomaly of chromosome pair’ (which would seem desirable) it would appear to need two finding sites - one explicitly declaring the numbered chromosome pair involved and one declaring the specific chromosome band (unrelated to any ‘pair’ structure). Is this correct?

Process

The ‘proposed solution’ contains a number of changes that have already been enacted (e.g. inactivation of 91272006 |Chromosome (cell structure)|, exhaustive addition of ‘Chromosome N structure’ and ‘Entire chromosome N’ content). If these changes are part of the proposal and awaiting consultation feedback, why have they already been made?

Impact

The impact is framed entirely from a terminology developer perspective. Without adequate experimentation it is not possible to know the classification, data entry and analytic consequences of the changes proposed. We need to see the outcome of such experiments before we can understand the likely impact beyond the narrow terminology viewpoint.

Kind regards

Ed

@fhielkema

These are good questions and we shall discuss them, although in the immediate short term, ontology linkage is not an established feature in SNOMED and waiting to include genes until this feature is available (if ever) would indefinitely postpone what many see is a valuable addition to the terminology.

@echeetham

Ed,

Thanks for your detailed and thorough analsis of the revised proposal. I am happy to see we are going in the right direction. Your comments have been very helpful in identifying areas of the proposal that require additional clarification and detail. I hope the responses below address the concerns you listed and there will be changes to the document to ensure that your concerns are addressed.

1. Scope and intended utility

This proposal defines a structural representation for clinically relevant human gene names within SNOMED CT, solely to support the authoring, definition, and classification of SNOMED CT content such as Clinical findings, Disorders, and Observable entities. It is not intended to provide a comprehensive gene catalogue, to support primary recording or reporting of molecular test results, or to replace external gene nomenclature or reporting standards (e.g. HGNC), although it seeks to align with external genomic standards. The wording “Provide a structured list of clinically relevant gene names associated with their respective chromosomes that can be used as values in molecular testing” was misleading. It was intended to describe the use of gene names in the modeling and definition of gene-specific content in SNOMED CT, such as Observable entities, Clinical findings, and Disorders, and not to imply support for the direct recording or reporting of gene names as molecular test results. This will be corrected in the next version of the document. I hope this clarifies the scope.

2. Finding site vs. Associated gene

We acknowledge that the proposal does not clearly distinguish between the use of Finding site and the new “Associated gene” attribute, which results in ambiguity in the examples. Finding site is intended for use only when the specific chromosomal structure (i.e. arm or band) or gene itself is the locus of the abnormality (e.g. a mutation or deletion affecting the chromosome or gene structure). In other cases where a gene is referenced to indicate etiological involvement rather than the anatomical site of change, the gene should be modeled using the “associated gene” attribute. The proposal will be updated to make the editorial guidance regarding use of gene names explicit to ensure consistent modeling.

Regarding the name of the recommended new attribute, the existing name in the proposal is for example only. We are reviewing attribute names in other terminologies to assist in semantically correct name assignment for this attribute, along with the text definition. The new proposed name will conform (as you suggest) to the verb form as well being explicit as to the nature of the relationship of the gene to the associated disorder. This will be included in the next version of the proposal.

3. Chromosome banding

3.1 Why not add all bands?

The proposal does not assert that unused chromosome band concepts are inappropriate. The initial implementation prioritizes those bands required to support existing international content and known use cases, with the expectation that additional bands may be added incrementally over time. This approach is consistent with how SNOMED CT has historically managed growth within the Anatomy and Morphology hierarchies. The International System for Human Cytogenomic Nomenclature (ISCN 2024) defines between approximately 400 major bands and 850 major and sub-bands. Not all of these are necessary to locate clinically relevant genes or to support karyotype-related findings within the scope of this proposal. Limiting the initial addition of chromosome band concepts to those required for defined use cases is a pragmatic editorial approach. If large-scale addition of bands is identified as a use case, SNOMED will investigate sources that can facilitate bulk inclusion of this content.


3.2 Band spans (e.g. 9q33.3–9q34.11)

SNOMED CT does not generally introduce explicit concepts for anatomical or structural ranges, particularly where the potential set of ranges is open-ended. In light of this, disorders named using chromosomal band ranges (e.g. 1228886008 |9q33.3-9q34.11 microdeletion syndrome (disorder)|) do not require the introduction of explicit “band range” structure concepts. These disorders can remain primitive and be modeled using the appropriate chromosome arm (e.g. long arm of chromosome 9), which correctly represents the structural locus at the level of granularity supported by the terminology. This approach is consistent with existing SNOMED CT editorial practice and avoids the need to introduce a potentially unbounded set of range concepts.

3.3 Risk of false subsumption: gene abnormality <> band abnormality

We agree that inappropriate subsumption between gene abnormality disorders and karyotype abnormality disorders must be avoided. This is addressed by modeling gene abnormalities using gene structures as the locus of change, while karyotype abnormalities reference chromosome- or band-level structures, typically in combination with the affected chromosome pair. This approach maintains a clear separation of concerns between gene-level and chromosome-level abnormalities, preserving the intended inference direction and aligning with existing SNOMED CT modeling practice. This, of course, will be part of the review of the modeling impact.

3.4 Chromosome pairs vs. bands vs. current classification

The Appendix 1 example for Microdeletion of chromosome 15q24 (disorder) inadvertently omits the chromosome pair role group that is present in current international modeling of karyotype abnormalities. This will be corrected in the next version of the document to align the example with existing SNOMED CT content patterns. The omission does not reflect a change in modeling approach, nor does it imply that chromosome pair involvement would be lost under the proposed structure.

4. Process - Changes already enacted

Updates to chromosomal anatomy, including the addition of specific chromosome structures and chromosome arms, have already been implemented to address longstanding gaps in the body structure hierarchy. These changes were independently justified to improve modeling of existing content and are not contingent on the outcome of this proposal. They also establish a structural foundation for the potential addition of chromosome bands and gene names, which remain subject to consultation. The document will be updated to reflect the completion of the affected elements of the proposal.


5. Impact beyond terminology development, i.e. testing

This proposal is necessarily focused on the authoring and modeling of gene and chromosomal content, as establishing a coherent and semantically sound structure is a prerequisite for evaluating downstream use. Consideration of classification behavior, data entry, analytics, and interoperability occurs once proposed content has been modeled and tested, and forms part of established SNOMED CT quality assurance and release processes. This evaluation will also involve any existing Clinical finding, Disorder, or Observable entity concepts that are remodeled to use the new gene and chromosome band structures. It is also recognized that the absence of gene concepts in SNOMED CT has already limited the quality and precision of modeling in these areas, which itself represents a downstream impact. Following this review, any identified issues will be addressed prior to content promotion to the International release.

Dear Jim

Thank you for this response. The statement regarding gene representation is welcome and makes much clearer which use cases/requirements SNOMED CT will be able and unable to support.

I continue to disagree regarding your stance on band representation! Your statement that “…The initial implementation prioritizes those bands required to support existing international content and known use cases…” and that “…this approach is consistent with how SNOMED CT has historically managed growth within the Anatomy and Morphology hierarchies…” is not, to my mind, supported by the data. By my calculations the number of body structure concepts used in modelling was 16% (4300/26000) in 2002 and had gradually risen to 23% (8400/36000) by 2024. The numbers are higher for morphology values (28% rising to 51%) but still nowhere near 100%.

I looked closely at karyotype band usage when preparing my ePoster for the 2025 Expo (‘An investigation into the use of chromosome band nomenclature in SNOMED CT disorder concepts’) so have quite good figures. These suggest that between 200 and 250 (the higher number includes band spans) unique bands can be identified in clinical finding descriptions and definition text, which - when compared to the 850 high resolution bands (I used the data from here to locate disorders on ideogram.js in the ePoster) suggests that somewhere between 23% and 30% of bands may well already be of value for modelling - a proportion well comparable to that of body structures historically. I would therefore still strongly suggest that a complete set of long and short chromosome arm concepts are added, and then SI perform a one-time automated process to associate each 850-resolution band with its respective arm. That way the work is done and bands can be referenced as needed now and in the future.

I would also suggest that the ‘band spans’ challenge is given a little more thought. I’m not particularly wedded to a concept-by-concept representation of band spans, but simply abstracting away to the chromosome arm may be unnecessarily lossy. I would also point out that a proportion of gene loci are given as spans in the HGNC data. For example:

  • The concept 1187194006 |Chronic enteropathy associated with solute carrier organic anion transporter family member 2A1 gene (disorder)| has a synonym of |Chronic enteropathy associated with SLCO2A1 gene|
    The tables here give gene symbol SLCO2A1 a location of 3q22.1-q22.2
  • Likewise, the concept 1230376005 |Contactin associated protein 2-related developmental and epileptic encephalopathy (disorder)| has a synonym of |CNTNAP2-related developmental and epileptic encephalopathy|
    The HGNC tables give gene symbol CNTNAP2 a location of 7q35-q36.1 (this same span is included in the Orphanet derived text definition for 1230376005).

I’m not sure I understand the approach you describe in section 3.3 to avoid false subsumption - hopefully this will become clearer with worked examples.

Finally I welcome all the planned analysis including “…consideration of classification behavior, data entry, analytics, and interoperability…” steps prior to content promotion to the International release. This will be important for us to understand the practical impact of the proposed changes.

Kind regards - Ed

Thanks, Ed — I appreciate you sharing the data from your Expo work and the detailed analysis of band usage, as well as your analysis of both anatomy and morphology concept usage. That additional evidence is helpful and reinforces that chromosome bands already play a meaningful role in existing SNOMED content.

To clarify, the p- and q-arm level concepts are already in the process of being added. The remaining open questions relate to the scope and timing of broader band inclusion, including the handling of band spans and the absence of a comprehensive, license-free authoritative source that could support automated creation of band concepts. These areas warrant further investigation, and your suggestions provide useful input to that work.

In the meantime, I think it remains important that we proceed with the agreed elements of the proposal, including ongoing analysis of classification behavior, data entry, analytics, and interoperability WRT the addition of gene names and chromosomal abnormalites. Resolution of the band issues can proceed in parallel, but of course, the immediate addition of selected bands can be done. This is more of a resource issue than a disagreement on the value of adding all the bands.

The initial scope of the proposal was focused on the addition of gene names, but it has since morphed to include chromosomal aspects in general (necessarily), which vastly expanded the scope. This will require a phased approach, whcih would be communicated through normal channels such as additional briefing notes, early visibility notifications, etc.

1 Like