Background
The Y-chromosome haplogroup R is divided into three major branches, R1a, R1b and R2, which have been pivotal in the prehistory of Eurasia. Haplogroup R1a is today found at high frequencies in Eastern Europe, Central Asia, and South Asia. Haplogroup R1b is most frequent in Western Europe but is also present at lower levels across Asia, including South Asia. Within these broad clades, specific sub-lineages are informative about ancient migrations. This report focuses on three sub-clades with relevance to the Indian subcontinent’s paternal genetic landscape: R1a-Z93, R1b-Z2103, and R1b-PH155. R1a-Z93 is the Asian branch of R1a associated with Indo-Iranian (Indo-European) populations, while R1b-Z2103 is a branch of R1b linked to Bronze Age steppe herders (e.g. Yamnaya). R1b-PH155 represents a rare, archaic branch of R1b with a scattered presence in South-Central Asia. Understanding the split times, origins, and distribution of these haplogroups can shed light on the migration patterns during the Neolithic and Bronze Age, particularly the spread of Indo-European peoples into/from the Indian subcontinent and their connections with Iranian, Central Asian, and even Arabian populations.
R1a-Z93: The Indo-Iranian Branch of R1a
Origin and Split Times: R1a-Z93 is a major subclade of R1a-M417 that dominates in Central and South Asia. Genetic evidence indicates that R1a-M417 split into two daughter branches – European R1a-Z282 and Asian R1a-Z93 – approximately 5,800 years ago (roughly 3800 BCE). This divergence likely occurred in the general region of Iran or Eastern Anatolia, according to Underhill et al. (2014). The coalescence of R1a-Z93 lineages (i.e. the time to the most recent common ancestor within Z93) is estimated at around 2600 BCE, suggesting that the surviving Z93 lineages began expanding in the Early Bronze Age. Poznik et al. (2016) found evidence of a “striking expansion” in two most prominent R1a branches in India of R1a-Z93 → Y7 between ~2400–2200 BCE and R1a-Z93 → Y6 around 2000 BCE, during the mature phase of the Indus Valley Civilization.
Migration Patterns and Historical Context: R1a-Z93 is strongly associated with the Indo-Aryan and Iranian peoples; ancient DNA from the Sintashta and Andronovo cultures of the Central Steppe (Middle-Late Bronze Age, ~2100–1500 BCE) shows that R1a-Z93 was the dominant male lineage of these Indo-European steppe pastoralists. In one large ancient DNA study, all 44 male individuals from the Middle-Late Bronze Age Central Steppe carried R1a of the Z93 subtype. These steppe groups are widely considered related to Indo-Iranian language speakers. R1a-Z93 has been sampled in ancient individuals in the northern Indian subcontinent – for example, two males from the Iron Age Swat Valley carried R1a-Z93, demonstrating the presence of this lineage in South Asia long before the modern era.
Connections to Other Eurasian Populations: R1a-Z93 is essentially the Y-DNA hallmark of the Indo-Iranian (Aryan) people. It connects South Asians with Central Asians and Eastern Europeans through a common ancestral node (R1a-M417). The sister branch R1a-Z282 took hold in Europe (e.g. among Slavic and Balto-Slavic peoples), whereas R1a-Z93 remained in the Indo-Iranian world. This means Indian and Pakistani R1a lineages share deep ancestry (~≈5–6k years ago) with R1a lineages in Eastern Europe. In ancient Iran and the Steppe, R1a-Z93 was also found among Eastern Iranian/Indo-Aryan Scythian and Sarmatian nomads of the Iron Age. Interestingly, Arab populations also exhibit the presence of R1a; these lineages are often seen sharing splits with Indian carriers of R1a from within the past 4,000 years. Thus, R1a-Z93’s distribution today links the Indian subcontinent, Iranians, and Central Asians, while showing a smaller yet significant presence in the Arab world through shared sub-clades with Indian carriers of R1a.
Modern Distribution in the Indian Subcontinent: R1a as a whole is one of the dominant Y-haplogroups in India and Pakistan, and most R1a in this region falls within the Z93 branch. It is prevalent among both Indo-European language-speaking and Dravidian-speaking groups. For instance, surveys have found R1a frequencies of 30–70% among North Indian populations such as Punjabis and certain Brahmin groups. Some caste and regional groups (e.g. Khatris, certain Gujaratis) show R1a frequencies well above 60%. Even in Pakistan and Afghanistan, R1a-Z93 is common among ethnic groups like Pashtuns, Punjabis, Kalash, and Tajiks. Notably, one South Asian-specific subbranch, R1a-M780 (under Z93), reaches high frequencies in India, Pakistan, and the Himalayas. By contrast, outside the Indo-Iranian sphere, R1a-Z93 frequency drops: it appears at moderate levels in parts of Central Asia (e.g. ~30–50% in Kyrgyz and Tajiks) and is lower in the Near East. This distribution map highlights the Indian subcontinent and South Central Asia as a core zone for R1a-Z93 today. A very important point to note is that R1a-Z93* lineages also show a significant presence in the Indian subcontinent and its immediate neighbourhood; this observation implies that the region is not necessarily a recipient of inflows from Z2123 derived Sintashta lineages and it could actually be the reservoir area from which Z93 lineages spread into the steppes and other parts of Western Eurasia.
R1b-Z2103: A Steppe Bronze Age Lineage in the Periphery of South Asia
Origin and Split Times: R1b-Z2103 is a major subclade of haplogroup R1b-M269, famously associated with Bronze Age steppe nomads. It is also known by the marker R-CTS1078 and is one of the two main branches of R1b-L23 (the other being R1b-L51 which gave rise to Western European lineages). According to YFull data, R1b-Z2103 formed roughly around 4100 BCE. In other words, the initial diversification of Z2103 occurred in the Late Neolithic to Early Bronze Age. Ancient DNA indicates that the Yamna (Yamnaya) culture of the Pontic Steppe (ca. 3300–2500 BCE), widely believed to be Proto-Indo-European speakers, carried predominantly R1b-L23 Y-chromosomes. Many Yamnaya males belonged to the Z2103 branch. Genetic chronology suggests that R1b-Z2103 diverged from the Western European–bound branch (R1b-L51) early; one estimate places the split of these R1b lineages at no later than ~3000–2800 BCE. By the early Bronze Age, R1b-Z2103 had already split into sub-lineages. YFull’s 95% confidence interval for Z2103’s formation does not predate 4800 BC, indicating a rapid emergence during the Copper-Bronze transition.
Migration Patterns and Historical Implications: R1b-Z2103 was carried by the massive Bronze Age migrations westward out of the steppe that shaped Western Eurasia. While its sibling branch (R1b-L51/P312/U106) swept into Western Europe (e.g. with the Bell Beaker culture), R1b-Z2103 spread in the direction of the Caucasus Mountains and the Pontic Steppes. Ancient individuals from the Yamnaya horizon across the steppe, from Ukraine to the lower Volga and the Caucasus foothills, commonly have R1b-Z2103. From the steppe, this lineage expanded into the Balkans and Anatolia. For example, R1b-Z2103 is found in Bronze Age Europe in contexts like the Vučedol culture (~2900–2600 BCE). It also appears in ancient Anatolia/Armenia: a Late Bronze Age Armenian (Allentoft et al. 2015) and an Iron Age Armenian from Kapan (~1000 BCE) carried R1b-Z2103. These suggest that Z2103 accompanied Indo-European migrations into the Balkans and Anatolia (possibly associated with Thracians, Hittites, or Armenians). Notably, a man from Iron Age northern Iran (Hasanlu, ~9th century BCE) belonged to R1b-Z2103 > L584, indicating this lineage was also present in the Iran/Caucasus region. Thus, R1b-Z2103 likely traveled into the Caucasus via peoples like the Mitanni Indo-Iranians.
Genetic Connections: R1b-Z2103 connects South Asia’s gene pool to the Pontic Steppe herders and to populations of the Caucasus and Anatolia. Modern populations of the Caucasus (e.g. Armenians, Azeris) have substantial frequencies of R1b-L23/Z2103 (often 20–30%), reflecting the legacy of Bronze Age migrations in that region. Iranians today also carry R1b-L23 at frequencies around 4–10%, higher in the north-west (e.g. 23% in Azerbaijan province) and lower in southeast Iran. In Central Asia, R1b-Z2103 is found among some groups, but more common R1b lineages there are R1b-M73 (e.g. among Kazakhs) – a different clade. Arab populations generally have low frequencies of R1b-M269; the predominant haplogroups in Arabia are J and E. Some R1b in the Middle East (including Z2103) appears among Levantine or Gulf Arabs, likely due to ancient Near Eastern gene flow or recent admixture. For example, Armenian and Anatolian R1b lineages (Z2103) might have spread into Syria/Mesopotamia, but overall R1b-Z2103 is not a defining marker of Arab lineages.
Presence in the Indian Subcontinent: R1b is relatively uncommon in South Asia, haplogroup surveys indicate R1b frequencies of around 1–5% in India and Pakistan overall. It is a bit more frequent in the northwest (e.g. among Pashtuns or Baloch it can reach a few percent). The R1b lineages from the Z2103 branch present in the Indian subcontinent show greater SNP diversity overall than the ancient Yamnaya people and this observation presents a possibility that the Yamnaya herders were spreading westward from the IVC carrying bos indicus. One branch of R1b-Z2103, defined by marker BY611, has been observed in a few individuals in Pakistan; this same subclade is common in the Caucasus and was carried by Yamnaya, suggesting an ancient connection. In summary, R1b-Z2103’s footprint in South Asia is significant with greater SNP diversity than the Yamnaya Samara samples, and it serves as a reminder that the Indian subcontinent may have served as the reservoir for the expansion of both R1 paternal lineages - R1a and R1b.
R1b-PH155: An Archaic R1b Lineage in South-Central Asia
Origin and Divergence: R1b-PH155 is a very early branching subclade of haplogroup R1b (also known in some literature as R1b1b, or R1b2 in older nomenclature). It is distinct from the dominant R1b lineages (like M269) which colonized Europe. In fact, R1b-PH155 represents a sister branch to R1b-L754 (R1b1a) – the lineage that includes all the common R1b in Europe and West Asia. This means that R1b-PH155 split off from the rest of R1b very early in the Upper Paleolithic or Mesolithic. While precise dating is difficult due to its rarity, haplogroup R1b as a whole likely arose ~18,000–14,000 BCE, and R1b-PH155 “arose relatively soon after the emergence of R1b”. In other words, R1b-PH155 could be on the order of ~14,000 years old or more, making it a relic lineage. It remained extremely rare throughout history, which complicates pinpointing its geographic origin – it could have originated anywhere in Eurasia. The distribution of early R1b suggests a West Asian origin for R1b; thus PH155 might have originated in some part of Central Asia/South Central Asia as small hunter-gatherer bands. Because it is so divergent, the “split time” between R1b-PH155 and the main R1b lineages is essentially the split of the R1b tree itself.
Bronze Age and Neolithic Presence: Despite its rarity, R1b-PH155 has been detected in notable ancient remains, providing clues to its migrations. The most remarkable finds come from the Tarim Basin in Xinjiang (western China). Three Early Bronze Age mummies from the Tarim Basin (around 2000 BCE in the Xiaohe cemetery) were found to carry Y-haplogroup R1b belonging to the PH155 clade. These mummies were of the so-called Tarim or Tocharian culture – a population with Western Eurasian features that appeared in Xinjiang in the Bronze Age. The discovery of R1b-PH155 in Tarim suggests that this lineage had an ancient presence in the eastern fringes of the Eurasian world. The Tarim Basin individuals had no close affiliation with the mainstream R1b of the steppe (they were not R1b-M269); instead they represent a survival of an ancient Eurasian lineage in a genetically isolated population. This has led researchers to propose that R1b-PH155 could have been part of an “Ancient North Eurasian” gene pool that persisted in Central Asia since the Mesolithic. It is possible that carriers of PH155 were initially in South Central Asia or Siberia and got absorbed or isolated by expanding populations. The Tarim evidence firmly places R1b-PH155 in Central Asia by the Bronze Age, meaning it split from other Eurasian groups long before (thousands of years prior) the main Indo-European expansion streams. By contrast, PH155 has not yet been identified in Neolithic West Asia or Europe – implying its trajectory was different from R1b-M269.
Historical and Migration Implications: The presence of R1b-PH155 in the Tarim Basin raises intriguing historical questions. The Tarim mummies’ culture has often been linked to the proto-Tocharians (an Indo-European branch in Bronze Age Xinjiang). If those individuals were Tocharian speakers, it means an extremely divergent R1b lineage became part of an Indo-European population by the Bronze Age. In either case, R1b-PH155’s presence in the Tarim Basin may indicate a migration northward across the Indian Himalayas in prehistory. It could have moved with hunter-gatherers or early bos indicus pastoralists traversing the Himalayas.
Links to Indo-Europeans, Iranians, and Arabs: Unlike R1a-Z93 and R1b-Z2103, R1b-PH155 is not strongly tied to the main Indo-European expansions – it represents a parallel thread. Nevertheless, there are some connections: the Tarim Basin carriers of PH155 might have interacted with Indo-European (Tocharian) cultures in Xinjiang. Additionally, living descendants of PH155 have been found in regions influenced by Indo-Europeans. For example, Tajikistan (an Iranian-speaking Central Asian population) has some PH155 cases, and Ladakh in India (Indo-Aryan/Tibeto-Burman mix) also has reported PH155 carriers. This suggests that PH155 carriers were present in the Himalayan zone and may have spread to Xinjiang via Tajikistan. In the Iranian plateau and the Near East, R1b-PH155 is extremely rare but notably it has been detected in Bahrain (Dilmun) in the Persian Gulf. The finding in Bahrain is fascinating – Bahrain was an ancient trading center between Mesopotamia, Arabia, and the Indus Valley. A genetic study in the Gulf observed haplogroup R1b-PH155 in Bahrain alongside the dominant branch R1b-L754. One branch of PH155, R1b-PH200, is found in Bahrain and Turkey today, hinting that this lineage perhaps moved along trade or migration routes from the IVC to West Asia. Among Arabs, PH155 is virtually absent except for such isolated cases (e.g. likely introduced via Persian Gulf intermixing). Among Iranians, PH155 is not a common marker either; the predominant R clades in Iran are R1a-Z93 (in the east) and R1b-M269 (in the northwest), with PH155 being a curiosity if present at all.
Modern Distribution in South Asia: Though rare, R1b-PH155 has been confirmed in a few individuals in and around the Indian subcontinent. YFull data and other analyses have identified living PH155* subclades in Bhutan, Ladakh (India), Nepal, and adjacent areas. These Himalayan and Trans-Himalayan occurrences are particularly intriguing – it has likely persisted in the subcontinent since prehistoric times (and possibly migrated out via early trans-Himalayan migrations or forays into Central Asia).
R2-M479: The forgotten cousin of Hg R1
Emerging as one of the two primary branches of haplogroup R (alongside R1), R2 exhibits a pronounced concentration in the Indian subcontinent, with its subclades R2a (M124) and R2b (FGC21706) offering insights into ancient migrations, cultural admixture, and population stratification. Genetic studies position R2’s origin in northern India, predating major demographic events such as the emergence of Steppe herders and Neolithic expansions. This lineage reaches its highest frequencies among specific ethnic groups in Pakistan, India, and Sri Lanka, with pockets of distribution extending into Central Asia and the Middle East. Its prevalence among diverse caste and tribal populations underscores its deep-rooted connection to the subcontinent’s prehistory, while its structural complexity reflects millennia of microevolutionary processes.
Genetic architecture and phylogenetic structure: Hg R2 is defined by the single-nucleotide polymorphism (SNP) M479, which distinguishes it from its sister clade R1. Phylogenetically, R2 splits into two primary branches: R2a (M124) and R2b (FGC21706). The R2a lineage is further subdivided into R2a1 (L263) and R2a2 (P267), while R2b includes subclades such as R2b1 (FGC50339). These subdivisions are critical for tracing population movements, as R2a dominates in South Asia, whereas R2b, identified more recently, shows a narrower distribution concentrated in the Indian subcontinent. The discovery of R2b highlights the need for expanded genetic testing in understudied regions, as earlier studies often categorized R2(xR2a) lineages as “undifferentiated” due to methodological limitations.
Evolutionary timeline: The coalescence age of Hg R2 remains debated, but its deep presence in South Asia suggests an origin predating 10,000-15,000 years. Studies by Sengupta et al. (2005) note that the microsatellite variance associated with R2 in India parallels that of R1a1, hinting at overlapping temporal depths for both lineages. Mondal et al. (2017) explicitly argue for an autochthonous origin in northern India, with R2 diverging from ancestral R-M207 before the Holocene. This timeline aligns with archaeological evidence of early sedentism in the Indus Valley and Gangetic plains, implying that R2 carriers may have been among the regionʼs earliest agricultural communities.
Autochthonous origins and prehistoric context: The case for R2ʼs indigenous South Asian origin rests on multiple lines of evidence. First, its frequency gradient declines sharply outside the subcontinent: 10-15% in India and Sri Lanka, 7% to 8% in Pakistan, 1% in Turkey, and negligible levels in Europe. This pattern contrasts with R1a1, which exhibits a westward cline suggestive of Steppe-related dispersals. Second, R2ʼs microsatellite diversity peaks in India, particularly among tribal populations like the Lodha (43%) and Karmali (100%), indicating prolonged in-situ evolution. Third, the near-absence of R2 in ancient DNA from Eurasian Steppe populations further dissociates it from Bronze Age migrations.
Challenges to the steppe migration hypothesis: While R1a1 is frequently linked to Indo-Aryan expansions, R2ʼs distribution lacks a clear correlation with Indo-European linguistic boundaries. For instance, R2 comprises 20% of Sinhalese Y chromosomes in Sri Lanka—a population with not very elevated levels of Steppe ancestry—compared to 14% to 19% among Indo-Aryan speaking Punjabis and Gujaratis. This dissonance suggests that R2 spread independently of later Steppe incursions, possibly through Neolithic farmer expansions or Mesolithic hunter-gatherer networks.
Regional distribution: In India, R2 frequencies vary markedly by region and social group. West Bengal exhibits exceptionally high frequencies, with 23% among general populations and 100% in the Karmali tribe. Upper castes, including Bhargavas (32%) and Chaturvedis (32%), show elevated levels, as do middle castes like Kammas (73%) and Yadavs (50%). Notably, Brahmin groups such as Punjabi Brahmins (25%) and Bengali Brahmins (22%) retain substantial R2 lineages, challenging simplistic narratives of caste-based genetic isolation.
Pakistanʼs Burusho and Hunza peoples stand out with R2 frequencies of 14% and 18%, respectively, while Parsi communities in Karachi show 20%. These enclaves may reflect ancient trade links or relic populations displaced by later migrations. In Sri Lanka, 38% of Sinhalese men carry R2, a striking figure attributed to prehistoric maritime contacts with peninsular India.
Tribal populations provide critical insights into R2ʼs pre-agrarian distribution. The Lodha of eastern India 43%, Tharus of Nepal 17%, and Chenchu of Andhra Pradesh (20%) exemplify lineages minimally affected by caste stratification. Conversely, caste groups like the Khandayat (46%) and Kallar (44%) demonstrate how R2 permeated both hierarchical and egalitarian societies, possibly through patrilocal residence patterns or founder effects.
Connections beyond the subcontinent: Though R2 is overwhelmingly South Asian, low frequencies appear in Tajikistan 3.8%, Uzbekistan 2.1%, and Azerbaijan 1.5%. These outliers likely trace to medieval trade along the Silk Road or earlier Neolithic exchanges. The presence of R2a among Jewish populations—notably Iraqi, Persian, and Ashkenazi Jews—hints at Bronze Age Levantine connections, though admixture during the Babylonian exile remains plausible. In Iran, R2 accounts for 1% of Y chromosomes, primarily concentrated in Zoroastrian communities linked to ancient Persian migrations. Its scarcity in Mesopotamia and the Arabian Peninsula contrasts with the prevalence of J2 and E1b1b, underscoring R2ʼs distinct dispersal trajectory.
Comparison with other subcontinental lineages: Despite both belonging to haplogroup R, R2 and R1a1 exhibit opposing distribution patterns. R1a1 peaks in northern and eastern India (up to 65% in Brahmins) and correlates strongly with Indo-European languages, while R2 dominates in the east and south, showing no linguistic affiliation. This dichotomy suggests that R1a1 may be more heavily involved in migrations out of the subcontinent, whereas R2 represents a more India-specific, autochthonous stratum.
Hg H, the “Indian marker,” and Hg L, linked to the Indus Valley Civilization, co-occur with R2 in many populations. For example, the Bhil tribe of Gujarat carries 18% R2 alongside 30% H, while Punjabis show 5% R2 and 12% L. These overlaps imply complex interactions between Neolithic agriculturists R2, indigenous foragers H, and Bronze Age urbanites L. The phylogeography of R2 supports a model of early Holocene expansions from the Gangetic plains into Sri Lanka and coastal Pakistan. High frequencies among the Sinhalese and Maldivians (12%) point to seafaring communities disseminating the lineage via the Indian Ocean rim. Meanwhile, its absence in Andamanese tribes reinforces the deep genetic divide between South Asiaʼs mainland and insular populations.
Conclusion: R2ʼs penetration into both upper castes and tribals complicates theories of caste as a strictly heredity-based system. The lineageʼs prevalence among Bhargavas (traditionally scribes) and Kammas (warrior-farmers) suggests that paternal ancestry was but one factor in caste consolidation, alongside profession and regional identity. Hg R2 emerges as a cornerstone of South Asiaʼs genetic heritage, its roots entwined with the subcontinentʼs earliest sedentary communities. While debates persist regarding its exact coalescence age and dispersal routes, the weight of evidence—from frequency gradients to microsatellite diversity—favors an indigenous origin in northern India. Future research should prioritize sequencing ancient DNA from Indus Valley and Mesolithic sites to resolve remaining questions about R2ʼs role in shaping South Asiaʼs demographic mosaic.
The presence in the Indian subcontinent of both Hg R1 and R2 lineages with substantial SNP diversity suggests that these descendants of Hg R have likely spread from the subcontinent and its immediate neighbourhood into the Steppes and West Eurasia.
References:
https://en.wikipedia.org/wiki/Haplogroup_R2
https://en.wikipedia.org/wiki/Genetics_and_archaeogenetics_of_South_Asia
https://www.familytreedna.com/groups/r2/about/background
https://en.wikipedia.org/wiki/YDNA_haplogroups_in_populations_of_South_Asia
https://pmc.ncbi.nlm.nih.gov/articles/PMC2755252/
https://pmc.ncbi.nlm.nih.gov/articles/PMC1347984/
https://www.yfull.com/tree/R-Z2103/
The various SNPs of Z2103 in India and Pakistan are -
1) FT75171(H) * MF265358
2) FGC62957 * FT2341
3) Y519532(T) * BY13762/A13597 * BY13763/PH945
4) BY38485 * Y35893 * FT39364(H)
5) BY3719 > FT318536 * FT323323 * FT280468
6) A16656 > Y516444(T) * Y516443(T) * FT317929(H)
7) Y14423 * Y14416 * Y14415
8) FGC24408 > FT294577 * FT297122 * FT298037