Standardised Set of Allowed Characters in SNOMED CT Descriptions

HI @pwilliams,

This looks really good.

The only range that may be too restrictive to cover LOINC, UCUM, and the SNOMED organism hierarchy would be:

  • U+0370–U+03FF DISALLOW # Greek and Coptic
    ** U+03B1 (e.g., α-carbon, α-helix for chemical names)
    ** U+03B2 (e.g β-lactam for organisms)
    ** U+03BC (e.g. Micro for unit of measure in UCUM)
    ** U+03A9 (e.g. Ohm for unit of measure in UCUM)

There are probably more, but these are some examples.

The only other question, which may not be valid, would be do we need specific allow rules for unicode characters for Oriental languages that are being encoded to UTF-8 or is that validation handled separately? That validation may be better handle in association with a specific language refset, but I don’t know the intimates of the back-end system designs.

Thanks
John