Homoglyph Domains
Domains that substitute visually identical or near-identical Unicode characters for Latin letters to impersonate legitimate domains while appearing genuine to end users.
Definition
A homoglyph domain exploits the visual similarity between characters from different Unicode scripts to register a domain that looks identical or nearly identical to a trusted domain in most fonts. A common example is substituting the Cyrillic small letter a (U+0430) for the Latin small letter a (U+0061), making a domain like pаypal.com (Cyrillic a) indistinguishable from paypal.com in many email clients and browsers. These domains are technically distinct at the DNS level but visually deceptive to human readers.
Why It Matters
Homoglyph domains are a primary technique in spear-phishing and business email compromise campaigns because they bypass naive string-comparison filters while reliably deceiving users. Standard authentication controls like SPF and DKIM will pass for a homoglyph domain if the attacker has correctly configured DNS records for the spoofed domain, meaning the email appears both visually legitimate and technically authenticated.
How It Works
Attackers identify characters in Unicode that are visually confusable with each character in the target domain, a set formally defined by the Unicode Consortium in UTS #39 (Unicode Security Mechanisms). They register the resulting domain, configure mail infrastructure, and often obtain a TLS certificate since certificate authorities perform no visual similarity checks. Detection requires converting the domain to its Unicode Normalization Form C (NFC), mapping each character to its confusable skeleton using UTS #39 tables, and then comparing the normalized result against known brand domains. Edit-distance algorithms such as Damerau-Levenshtein are applied to catch multi-character substitution variants that UTS #39 alone may not surface. Punycode encoding (used in internationalized domain names, or IDNs) must also be decoded before analysis because many email clients display the decoded Unicode form.
DFIR Platform
Phishing Email Checker
The Phishing Email Checker applies Unicode confusable character mapping per UTS #39 combined with Damerau-Levenshtein distance scoring to detect homoglyph and near-homoglyph sender domains.
View DocumentationRelated Concepts
Try these concepts in practice
Free tier with 100 credits/month. No credit card needed.