Phonological history of Hindustani

This article contains phonetic transcriptions in the International Phonetic Alphabet (IPA). For an introductory guide on IPA symbols, see Help:IPA. For the distinction between [ ], / / and ⟨ ⟩, see IPA § Brackets and transcription delimiters.

You may need rendering support to display the uncommon Unicode characters in this article correctly.

The inherited, native lexicon of the Hindustani language exhibits a large number of extensive sound changes from its Middle Indo-Aryan and Old Indo-Aryan. Many sound changes are shared in common with other Indo-Aryan languages such as Marathi, Punjabi, and Bengali.

Indo-Aryan etymologizing

The history of Hindustani language is marked by a large number of borrowings at all stages.^[1]^[2] Native grammarians have devised a set of etymological classes for modern Indo-Aryan vocabulary:

Tadbhava (Sanskrit: तद्भव, "arising from that") refers to terms that are inherited from vernacular Apabhraṃśa (Sanskrit: अपभ्रंश, "corrupted"), from the dramatic Prakrits, and further from Sanskrit. An example is Hindustani jībh "tongue", inherited through Prakrit jibbhā, from Sanskrit jihvā. Such words are the focus of this article.
Tatsama (Sanskrit: तत्सम, "same as that") refers to words that are borrowed into Hindi or Old Hindi directly from Sanskrit with minor phonological modification (e.g. lack of pronunciation of the final schwa). The Hindi register of Hindustani is associated with a large number of tatsama words through Sanskritisation. An example is Hindustani jihvāmūlīy "guttural", directly from Sanskrit jihvāmūlīya.
Ardhatatsama (Sanskrit: अर्धतत्सम, "half-same as that") refers to words that are semi-learned borrowings from Sanskrit. That is, words that underwent some tadbhava sound changes, but were adapted on the basis of a Sanskrit word. An example is Hindustani sūraj "sun", which is from Prakrit sujja, from Sanskrit sūrya. We would expect Hindustani *sūj from Prakrit, but the -r- was added later on after the Sanskrit word. Such adaptation to Sanskrit occurred continuously and as early as the Middle Indo-Aryan stage. Adapted words were crucial to determining the date and chronology of sound changes.^[3]
Deśaj (Sanskrit: देशज, "indigenous") refers to words that may or may not be derived from Prakrit, but cannot be shown to have a clear Sanskrit etymon. This is sometimes complicated by Sanskrit re-borrowing of Prakrit words. Such words sometimes derive from Non-Indo-Aryan languages—primarily Austroasiatic (Munda) languages, as well as Dravidian and Tibeto-Burman languages.^[4] An example is Hindustani ōṛhnā "to cover up, veil", from Prakrit ǒḍḍhaṇa "covering, cloak", from Dravidian, whence Tamil உடு (uṭu, "to wear").

In the context of Hindustani, other etymological classes of relevance are:

Perso-Arabic loanwords, which came to Old Hindi from Classical Persian. The pronunciation is closer to Classical Persian, rather than modern Iranian Persian. The Urdu register of Hindustani is associated with a large number of Perso-Arabic loanwords. An example is Hindustani zubān "tongue, language", from Classical Persian zubān (whence Persian zobân).
Borrowings from Northwestern Indo-Aryan. Modern Hindustani, while based primarily on the language of the Khariboli region, comes from a dialectal mixture. Many of the Western Hindi dialects are transitional to Punjabi and the Northwestern Indo-Aryan languages, and have donated words to Hindustani that underwent Northwestern sound changes. We often encounter doublets like Hindustani makkhan "butter", borrowed from Northwestern dialects (compare Punjabi makkhaṇ), and Hindustani mākhan, the native tadbhava term which is now archaic/obsolete outside of fossilized phrases.^[5]

Like many other languages, many phenomena in the historical evolution of Hindustani are better explained by the wave model than by the tree model. In particular, the oldest changes like the retroflexion of dental stops and loss of ṛ have been subject to a great deal of dialectal variance and borrowing. In the face of doublets like Hindustani baṛhnā "to increase" and badhnā "to increase" where one has undergone retroflexion and the other has not, it is difficult to know exactly under what conditions the sound change operated.^[6]^[7] One often encounters sound changes described as "spontaneous" or "sporadic" in the literature (such as "spontaneous nasalization"). This means that the sound change's context and/or isogloss (i.e. dialects in which the sound change operated) have been sufficiently obscured by inter-dialect borrowing, semi-learned adaptations to Classical Sanskrit or Prakrits, or analogical leveling.

From Vedic Sanskrit to Early Middle-Indo-Aryan

This section summarizes the changes occurring between Vedic Sanskrit (ca. 600 BCE) and the first attestations of Early Middle-Indo-Aryan in Pali or Ashokan Prakrit (ca. 280 BCE).^[8]

Early changes common to Dardic

The following changes are common to Middle Indo-Aryan and Dardic:

Pali, Prakrit, Hindustani, and many other Indo-Aryan languages partially-preserve some conservative features of Proto-Indo-Aryan (PIA) lost in Vedic Sanskrit, though spontaneous changes produce many counter-examples:^[8]
- PIA kṣ, gẓʰ merge to Sanskrit kṣ, but remain distinguished later as kh, jh.^[9]
  - PIA Hákṣi > Sanskrit akṣi > Hindustani ā̃kh "eye"
  - PIA gẓʰárana- > Sanskrit kṣaraṇa- > Hindustani jharnā "to cascade"
- Proto-Indo-European *r and *l are generally merged to r in Sanskrit, but were somewhat preserved in parallel dialects.
A dental spontaneously cerebralizes to a retroflex stop in the environment of a rhotic. This rule origined in the east, and later to the north and northwest; it was less common in the west).^[10] Some scholars like Wackernagel argue that the original cases (or borrowings from eastern dialects) with a retroflex stop in the environment of a rhotic, like prati- > paṭi- and mēḍhra "ram, penis" (already retroflex in Proto-Indo-Aryan *Hmáyẓḍʰram) influence later analogical formation.
- Sanskrit ardha > Hindustani ādhā "half", but sārdha > Hindustani sāṛhe "and a half"
- Many cerebralized words were old enough to be borrowed back into Classical Sanskrit, like paṭh- "to read" (from older pṛth- "to spread") with a specialized meaning.^[11]
Loss of ṛ is common to Dardic, Pali, and Prakrit (whence Hindustani), but operates differently in each. In Central Indo-Aryan:
- Initial ṛ > ri-
  - Sanskrit ṛṇa > Hindustani rin "debt"
- Elsewhere, ṛ > i usually
  - Sanskrit kṛta > Hindustani kiyā "done"
  - Sanskrit mṛtyu > Prakrit miccu > Hindustani mīc "death"
- Alternatively, non-initial ṛ > a, u, perhaps due to dialectal influence, analogical leveling, umlaut, or assimilation to a preceding labial
  - Sanskrit śṛṇōti > Prakrit suṇaï > Hindustani sune "hears", where ṛ > u perhaps influenced by Prakrit sua "heard" (< Sanskrit śruta)
  - Sanskrit pṛcchā > Prakrit pucchā > Hindustani pūch "question"
  - Sanskrit nṛtya > Prakrit ṇacca > Hindustani nāc "dance"

Middle Indo-Aryan assimilations

After the split of Dardic languages, the following changes are common to Pali and Prakrit:

The sibilants ś, ṣ, s merge to s (Sanskrit dēśa > Hindustani des "nation")
aya monophthongizes to ē
Occasionally, ava monophthongizes to ō (Sanskrit avara "lower" > Pali and Prakrit ōra, ōraṃ "to this side" > Hindustani or "side")

Several changes below will yield a very distinct phonotactic structure in MIA that almost resembles that of Dravidian languages.^[8] Regarding the assimilations of Old Indo-Aryan consonant conjuncts, the Jayadhavalā (ca. ninth century AD) writes

dīsaṁti doṇṇi vaṇṇā saṁjuttā aha va tiṇṇi cattāri
tāṇaṁ duvvala-lōvaṁ kāūṇa kamō pajuttavvō
"When two, or three or four, consonants appear in combination, elide the weakest one, and continue the process"^[12]

Here, "weakest" refers to sounds of higher sonority, and "elide" refers to either true elision/loss or total assimilation of the weaker sound to the stronger sound. Specifically, the sonority scale of Prakrit is (weakest) h < r < y < v < l < sibilants < nasals < stops (strongest). It will be helpful to keep this notion of "stronger" and "weaker" sounds in mind through the following sound changes. The relevant changes (organized by approximate chronology) are:

By palatalization, ty, thy, dy, dhy > *cy, *chy, *jy, *jhy and ts, ps > *cs.
Occasionally, t, d, dh > p, b, bh / _v, _m.
mr, ml > ṃbr, ṃbl (which becomes Prakrit ṃb through later sound changes, as in Sanskrit āmra > Prakrit aṃba > Hindustani ā̃b, ām "mango")

Sibilant aspiration: The sibilant s becomes h in cluster with stronger sounds (nasals or stops) before further assimilations.

Loss of word-final consonants: Final nasals n and m become the anusvara ṃ. The final sequence -aḥ, which already has the variant -ō in sandhi, becomes -ō. Elsewhere, the final consonant is lost without a trace.
Initial cluster simplification: Only the strongest sound in a word-initial cluster is retained — Sanskrit grāma > Prakrit gāma > Hindustani gā̃v "village"
Two-mora rule: All over-long (>2 morae) syllables are simplified to 2 morae syllables. For purposes of syllabification, syllable onsets can be only one consonant (and not an aspirated nasal, which always counts as two separate consonants word-medially). In syllable codas consisting of more than one consonant, the weakest consonants are lost without a trace. Then, if there is a long vowel followed by a syllable coda, the long vowel is shortened.
- The Classical Sanskrit long vowels ē /eː/ and ō /oː/ cannot occur with a consonant coda, so they are shortened to ĕ /e/ and ŏ /o/. Since short /e/ and /o/ are in complementary distribution with their long forms, the short vowels were generally also represented with the same graphemes; their representation as ĕ, ŏ with a breve mark is merely a matter of romanization of Pali and Prakrit. — Sanskrit nētra > Prakrit nĕtta "eye"
- The "overlong" Vedic vowels ai and au (really /aːi/ and /aːu/) are trimoraic and thus always shortened to ē̆ (/e(ː)/) and ō̆ (/o(ː)/), respectively.
Medial cluster simplification: In medial clusters, the weaker sound assimilates to the stronger sound, whilst maintaining its aspiration. Then cases of -ChC- and -ChCh- become -CCh-.
- Sanskrit sapta > Prakrit satta > Hindustani sāt "seven"
- Sanskrit dugdha > Prakrit duddha > Hindustani dūdh "milk"

Some interesting cases and further sound changes:

Sibilants and nasals are considered of very similar sonority. Typically, -sm-, -sn- > *-hm-, *-hn- > Pali and Prakrit -mh-, -ṇh-, but both cases can also assimilate to -ss-. — Sanskrit vismara- > Prakrit vissara- > Hindustani bisarnā "to forget", but also note the Prakrit variants vimhara-, visumara-, etc.
ṇh, mh, rh, lh are not considered phonemes but rather as a sequence of two sounds (as with Sanskrit), consequently:
- Intervocalically we see -ṇh-, -mh-, -rh-, and -lh-, rather than *-ṇṇh-, *-mmh-, etc
- Initially, the outcome of Sanskrit sn- and sm- is varied. In some cases, initial sn- > ṇ(h)- is tolerated (Sanskrit snēha > Prakrit ṇ(h)ēha, siṇēha, saṇēha > Hindustani neh "oiliness, grease, love"). Elsewhere, the situation is resolved by anaptyxis either before the assimilation (Sanskrit smaraṇa > Prakrit samaraṇa > poetic/archaic Hindustani sãvarnā "to remember") or after the assimilation (Sanskrit snāna > *ṇhāṇa > Prakrit ṇahāṇa > Hindustani nahānā "bathing").
*yyh, *vvh > jjh, bbh by fortition. — Sanskrit mahya > Prakrit majjha- "me" (> Hindustani mujh)
The sequence -sr- can sometimes yield -ṃs-, rather than -ss- — Sanskrit aśru > Prakrit assu, aṃsu > Hindustani ā̃sū "tear"
The sequence -mh- can sometimes fortify to -ṃbh-.
In the rarer Sanskrit clusters -CsN-, either the consonant C or the nasal N are deleted, rather than the sibilant. — Sanskrit tīkṣṇa "sharp" > Prakrit tikkha, tiṇha > Hindustani tīkhā "spicy", Sanskrit jyotsnā- > Prakrit jŏṇha > Hindustani junhāī "moonlight"
Anaptyxis, rather than assimilation, can sometimes be applied to break a heterogeneous cluster. This is most common in cases of a stop followed by sonorant, as in -tn-, -dm-, -kl-, etc — Sanskrit ratna > Pali ratana or Sanskrit klēśa > Prakrit kilēsa > Hindustani kales "grief".

The above sound changes are rather sweeping and complex, so it helps to walk through certain examples:

Sanskrit vyāghra > *vāghra (initial cluster simplification, syllabification represented) > *vāgh•ra (syllabification) > *vagh•ra (two-mora rule, long vowel shortened) > *vagh•ga (r assimilates to stronger sound) > *vag•gha (-ChC- > -CCh-) > Pali and Prakrit vaggha "tiger", whence Hindustani bāgh.
Sanskrit jīhvā > *jīh•vā (syllabification) > *jih•vā (two-mora rule, long vowel shortened) > *jivh•vā (h assimilates to stronger sound) > *jiv•vhā (-ChC- > -CCh-) > *jib•bhā (fortition) > Pali and Prakrit jibbhā "tongue", whence Hindustani jībh "tongue".
Sanskrit saurāṣṭra "pertaining to Saurashtra" > *saurāhṭra (sibilant aspiration) > *sau•rāhṭ•ra (syllabification) > *sō•raṭh•ra (two-mora rule) > *sō•raṭh•ṭa (assimilation) > *sō•raṭ•ṭha > Prakrit sōraṭṭha, whence Hindustani soraṭh (name of a particular raga)

Changes after the split of Pali and Prakrit

The following changes are only seen in Prakrit and not in Pali (other Pali-specific changes do also occur beyond this point):

Fortition of y: y /j/ > j /dʒ/ initially and yy /jː/ > jj /dːʒ/ everywhere
- Sanskrit yaḥ, yō > Pali yō but Prakrit jō, whence Hindustani jo "that, what"
Cases of -ēy- and -ī̆y- are sometimes re-analyzed as having a geminate glide and undergo the above rule as well.
- Sanskrit kālēya > Prakrit kālēya, kālijja, *kālĕjja, whence Hindustani kalējā "liver"
In Pali, Sanskrit -jñ- becomes -ññ- (geminate of the palatal nasal). In Prakrit, the result is usually -jj-, but is sometimes -ṇṇ-/-nn- (probably semi-learned)
- Sanskrit rājñī > Prakrit rāṇī, whence Hindustani rānī.
In Pali, geminate -vv- > -bb-, but this never occurred in Prakrit.
As noted before, the reflex of Sanskrit ṛ is different in Pali, Prakrit, and Dardic (e.g. initial ṛ > Prakrit ri- always, but Pali and Dardic a-, i-, u-). Also, the reflex of Sanskrit clusters involving a sibilant and sonorant is unstable between Pali and Prakrit.

Orthographic changes

Before a consonant, ṅ, ñ, ṇ, n, m, and the anusvara ṃ are in complementary distribution. In Sanskrit, each different nasal consonant is typically written out. In later languages, all pre-consonant nasals are written as the anusvara ṃ.
The Sanskrit long vowels ē and ō are sometimes romanized as e and o (without the macron) since Sanskrit didn't have short versions of these vowels so there is no ambiguity. The romanization of long ē, ō and short ĕ, ŏ in Prakrit has been discussed

Up to Dramatic Prakrits

These changes occur after Pali and Early Prakrit, but before the development of the dramatic regional Prakrits like Maharashtri Prakrit and Shauraseni Prakrit (ca. 200 AD):

Merging of nasals ṇ, n > ṇ, represented as a retroflex nasal. Whether the actual place of articulation of this sound was truly retroflex or was dental (and just orthographically represented as a retroflex nasal) is debated. Regardless, this sound regularly becomes Hindustani dental n later on (but intervocalically, the sound becomes ṇ in other languages like Marathi, Gujarati, and Punjabi).^[8]
Lenition of intervocalic stops over time, through various attested stages. First, all single intervocalic unvoiced stops become voiced. Then, non-retroflex stops spirantize (one possibility is g, gʱ, dʒ, d, dʱ, b, bʱ > ɣ, ɣʱ, ʑ, ð, ðʱ, β, βʱ / V_V). Per Chatterji, this stage is represented by vacillation between writing a voiced stop, semivowel, or nothing. The retroflex voiced stops ḍ, ḍh likely become flaps intervocalically (the reflex ultimately in Hindustani), but this distinction is not represented orthographically.^[8] Finally, aspirated spirants debuccalize (ɣʱ, ðʱ, βʱ > ɦ), the spirant β > ʋ (romanized as v), and remaining spirants ɣ, ʑ, ð are lost, leaving the surrounding two vowels in hiatus. Between two ā̆ vowels, hiatus is usually resolved by what Hemachandra, in his grammar of Prakrit, calls a “lightly pronounced y-sound” (laghuprayatnatarayakāraśrutiḥ).^[12] As far as orthography/romanization is concerned, this results in the optional inclusion of epenthetic -y- or less likely -v- between the ā̆ vowels; occasionally -d- became -r- as in the numbers from 11-18. This orthographic choice should not be confused with the older genuine /j/ phoneme. Similarly, after a front vowel, euphonic/orthographic -y- appears. Elsewhere, hiatus is fully tolerated. After ā̆, the diaeresis is often used in romanization (e.g. aï, aü) to differentiate this sound from the older overlong vowels. Examples:
- Sanskrit śōka > Pali sōka /soː.kɐ/ > Early Dramatic Prakrit sōga /soː.gɐ/ > /soː.ɣɐ/ > Prakrit sōa /soː.ɐ/ "sorrow"
- Sanskrit śata /ɕɐ.t̪ɐ/ > Prakrit sada /sɐ.d̪ɐ/ > /sɐ.ɐ/ > /sɐ.ʋɐ/ > old Hindustani sau /sɐʊ̯/, whence Hindustani sô /sɔː/ "hundred"
- Sanskrit ekādaśa /eː.kaː.d̪ɐ.ɕɐ/ > Prakrit egārasa /eː.ɡaː.ɾɐ.sɐ/ > /eː.ɡaː.ɾɐ.ɦɐ/ > /ɪ.ɡaː.ɾɐ.ɦɐ/, whence with methesis Hindustani gyārah /gjaː.ɾɐɦ/ > /ɡjaː.ɾaː/ "eleven"
- Sanskrit tapaka > Pali tapaka /t̪ɐ.pɐ.ka/ > /t̪ɐ.bɐ.gɐ/ > /t̪ɐ.βɐ.ɣɐ/ > Prakrit tava(y)a /t̪ɐ.ʋɐ.(j)ɐ/ "tawa", whence Hindustani tavā
- Sanskrit/Pali kathana /kɐ.t̪ʰɐ.n̪ɐ/ > /kɐ.d̪ʱɐ.nɐ/ > /kɐ.ðʱɐ.nɐ/ > Prakrit kahaṇa /kɐ.ɦɐ.nɐ/, whence Hindustani kahnā "to say, narrate"
- Sanskrit/Pali paṭhaṇa /pɐ.ʈʰɐ.ɳɐ/ > /pɐ.ɖʱɐ.nɐ/ > Prakrit paḍhaṇa /pɐ.ɽʱɐ.nɐ/, whence Hindustani paṛhnā "to read"
- Sanskrit caturtha > Pali catuttha /tʃɐ.t̪ut̪.t̪ʰɐ/ > Early or Shauraseni Dramatic Prakrit caduttha /tʃɐ.d̪ut̪.t̪ʰɐ/ > /tʃɐ.ðut̪.t̪ʰɐ/ > Prakrit caüttha /tʃɐ.ut̪.t̪ʰɐ/
Lenition of intervocalic y /j/, similarly to the above change. The optional inclusion of epenthetic -y- sometimes makes this confusing, but at this point /j/ is no longer phonemic in Prakrit; it is merely an epenthetic hiatus-filler. — Sanskrit nayana > Prakrit ṇa(y)aṇa > Hindustani nainā "eyes"
Lenition of intervocalic v /ʋ/ between ā̆ and a glide. — Sanskrit praviṣṭa > Prakrit païṭṭha > Hindustani paiṭhā "entered", but Sanskrit nava > Prakrit ṇava > Hindustani nau "nine" with retention of -v-.
Occasionally, the sequences aï and aü contracted early on in Prakrit to ē̆ and ō̆. This a separate change than the later coalescence of vowels in hiatus. — Sanskrit sthavira > Earlier Prakrit ṭhavira > *ṭhaïra > Later Prakrit ṭhēra "old"
Prakrit ḍ, ḷ, l, and r often alternate with each other, particularly in words loaned from non Indo-Aryan sources.
- PIA *swaẓḍaśa > Sanskrit ṣoḍaśa > Prakrit solasa /soː.lɐ.sɐ/ > /soː.lɐ.ɦɐ/, whence Hindustani solah "sixteen"

Pleonastic Suffixes

Another change worth noting here that will become more prevalent by late MIA and early NIA is the extension of Old Indo-Aryan nominals and roots with pleonastic suffixes. The consensus, implied by the name, is that these innovative suffixes have little semantic purpose and mainly serve to distinguish homophones (created by the sweeping sound changes between Sanskrit and Prakrit). They are applied after nominal and verb stems, before inflecting suffixes. Some are recognizable as the reflexes of Old Indo-Aryan diminutive suffixes.^[13]

The most important suffixes are feminine -iā- (< earlier -igā < Sanskrit -ikā) and masculine -a- (< earlier -ga < Sanskrit -ka). The equivalent Sanskrit endings were already common in Old Indo-Aryan as diminutives, but become more popular at this stage and ultimately become the "marked" declension of nouns in Hindustani and other Indo-Aryan languages.

(Sanskrit karpaṭa >) Prakrit kappaḍa + -a- > *kappaḍa(y)a > Hindustani kapṛā "clothing"
(Sanskrit kaṭa "twist of straw" >) Prakrit kaḍa + -iā- > *kaḍiā > Hindustani kaṛī "chain link"
(Sanskrit naptṛ "grandson" >) Prakrit natti + -a- > nattia > Hindustani nātī "grandson"

The other common suffixes are -kka-, -ḍa-, -illa-, -la-, -lla-, -ulla-, and -ra-. These suffixes are very often combined with each other:

(Sanskrit markaṭa >) Prakrit makkaḍa + -ḍa- + -iā > *makkaḍiā > Hindustani makṛī "spider"
(Sanskrit matsya >) Prakrit maccha + -lla- + -iā > *macchaliā > Hindustani machlī "fish" (but also unextended Prakrit maccha > Hindustani māch)

Up to Apabhraṃśa

These changes occur after the dramatic Prakrits, and characterize the Late Prakrit, or Apabhraṃśa, stage (ca. 900 AD). Some of these changes start to differentiate Hindustani dialects (part of the central Indo-Aryan zone) from other Indo-Aryan languages.

Intervocalic -m- > -ṃv-. This change notably did not occur in the Western zone (e.g. Gujarati).^[6]
- Sanskrit grāma > Pali/Prakrit gāma > Apa. *gaṃva > Hindustani gā̃v "village", but Gujarati gām
Final long vowels are shortened: ā > a, ē, ī > i, and ō, ū > u
Intervocalic -v- is lost after a glide.
Long ū is shortened before another vowel — Sanskrit kūpaka > Prakrit kūvaya > Apa. *kuaa > Hindustani kuā "well"

Development of a Latin-like stress system

Abandonment of Vedic lexical stress in favor of a Latin-like positional stress system. Stress falls on the penultimate syllable if it is heavy, failing which it falls on the antepenultimate syllable if it is heavy, failing which it falls on the fourth syllable from the end.

This system retroactively came to characterize Classical Sanskrit, but it can be considered a MIA development that was only fully completed around the Apabhraṃśa stage. Once it had developed in languages like Gujarati and Hindustani, it affected many sound changes which occurred afterwards. It is not seen in Pali, and happened late enough that some modern languages like Marathi, which have vestiges/reflexes of Vedic stress, do not appear to be included in this development.^[8]

Up to Hindustani

Changes after this point characterize the New Indo-Aryan (NIA) era from the MIA period. Many of these changes distinguish Hindi from nearby languages like Marathi, Gujarati, and Punjabi.

Before, it was convenient to use the nominal/verbal stem as the "dictionary" form in describing sound changes (e.g. ending in -a for the nominative masculine a-stem). In Hindustani, the dictionary form (e.g. ending in -ā for many masculine nouns) actually descends from the Prakrit nominative case (e.g. ending in -aō, from Sanskrit -akaḥ, rather than ending in -aya from Sanskrit -aka). The nominative form for nouns in Prakrit will be used below unless otherwise specified.

Retroflex ṇ, ḷ are dentalized to n, l
Intervocalic -v- is lost around -ī̆-. This explains why we have Hindustani tavā "tawa" (< Prakrit tavaa) but taī "griddle" (< Prakrit taviā), both from the same root. Compare Marathi, Punjabi, and Gujarati tavī "griddle". In some cases, like Hindustani dī̆yā < Prakrit dīvau, the variant in -v- (dīvā) is found in Modern Hindustani as a regional variant. In Hindustani, this process went much further than in other regions, and analogical leveling sometimes caused the -v- to be lost altogether. — Prakrit ṇavaa, ṇaviā > Old Hindi navā ~ naī or nayā ~ naī > Hindustani nayā ~ naī "new" (with dialectal/archaic navā). For this, we have Marathi, Gujarati, and Punjabi navā ~ navī.
Initial v- > b- and medial geminate -vv- > -bb- — Prakrit vāla > Old Hindi bāla "hair", but Gujarati vāḷ.
ī is shortened before a vowel. — Prakrit dīvaya > dī(v)ā > Hindustani diyā "lamp"

New-Indo-Aryan vowel coalescence

Several processes which were already underway in Late Apabhraṃśa.

Concerning diphthongs:

The sequences aü and aï become diphthongs au /ɐu/ and ai /ɐi/. — Prakrit païjjō > Old Hindi páija > Hindustani paij "vow"
The glides i and u in hiatus after ā give rise to new overlong diphthongs (/ɐːu/ and /ɐːi/). The sequences āya and āva also weakened to these overlong diphthongs, and both are written as āya and āva.

Concerning glides:

Glides in hiatus of like quality coalesce. — Prakrit duuṇaō > Apa. dúuṇau > Old Hindi dū́nā > Hindustani dūnā "twice"
In the case of unlike vowels in succession:
- If the first is unstressed i or u and the second vowel is stressed, the vowel becomes a new glide. — Prakrit pivāsō > Apa. piā́su > Hindustani pyās "thirst"
- If the first is ī̆, ū̆, e, or o and the second vowel is short and unstressed, the second vowel is lost and the first vowel is lengthened if short.
  - Prakrit sīalō > Apa. sī́alu > Hindustani sīl "cold"
  - Prakrit pāṇiō > Apa. pā́ṇiu > Hindustani pānī "water"

Concerning ā̆:

Generally, a + ā, ā + a, and ā + ā (where short a is not part of a diphthong) all coalesce into ā. — Prakrit cittaāra > Hindustani citāra "painter"
- Ahead of some suffixes like -ra and -la with short vowels, there is more pressure to separate the suffix from the root, and so the -y- appears to intervene. — Prakrit sā(y)ara > Old Hindi sāyara "sea"^[14]^[15]

The sequence a + a (where short a is not part of a diphthong) generally becomes ai first, and can also contract even further to ē.
- Prakrit mayaṇaō > Old Hindi mainā > Hindustani mainā, mēnā "myna"
- Prakrit kayalaō > *kailā > Hindustani kēlā "banana"
Similarly, ava (where final short a is not part of a diphthong) contracted to either au or further to ō.
- Prakrit khavaṇaü > *kháunā > Hindustani khonā "to lose", but Prakrit avara > Old Hindi áura > Hindustani aur "and"

Other sequences of vowels in hiatus require medial -y-.

Prakrit gayaō > Pre-Old Hindi *gayau > Hindustani gayā "gone". a + au cannot contract so intervening -y- appears.

Turner explains the occasional further contraction of ai > e and au > o (at least for Gujarati) in terms of inherited words versus semi-learned words: in the former the process has had time to go further. A similar explanation of occasions where -y- possessed more reality could be drawn up to word frequency, dialectal borrowing, and semi-learned borrowings.

Vowel lengthening and shortening rules

VCː > VːC rule: This is one of the most core sound changes of the NIA period, and is almost pan-Indo-Aryan.^[8] The change states that MIA geminates are de-geminated, and the preceding short vowel undergoes compensatory lengthening. — Prakrit satta > Old Hindi sāta "seven", whence Hindustani sāt
A similar process (VṃC > ṼːC) occurs for clusters with a homorganic nasal which results in long nasalized vowels. —Prakrit saṃjha > Old Hindi sā̃jha "evening", whence Hindustani sā̃jh
Compensatory lengthening from older geminates was sometimes accompanied by spontaneous (and regionally random) nasalization of the vowel. In some cases this goes back to Prakrit or is otherwise reflected in many NIA languages. — Prakrit akkhi > Old Hindi ā̃kha "eye", whence Hindustani ā̃kh.
Pre-tonic vowel shortening: A very important set of Central Indo-Aryan shifts that are not seen in languages like Marathi and Bengali. In a pre-tonic position, heavy/long vowels are shortened. In many cases, this change is fed by the vowel lengthening rules above. It results in Hindustani's distinctive ablaut system, since adding heavy/stressed suffixes to a root with a long vowel forces the root's vowel to shorten.^[7]
- ā > a — Old Hindi mī́ṭhā "sweet" but miṭhā́ī (not *mīṭhā́ī)
- au, ō, ū > u —Old Hindi chōṛanā "to leave" but chuṛānā "to cause to leave, release" (not *chōṛānā, but compare retained -ō- in Old Marathi cognate sōḍavaṇē̃)
- ai, ē, ī > i — Old Hindi khḗlanā "to play" but khilā́nā "to cause to play" (not *khēlānā, but compare retained -ē- in Old Marathi cognate khēḷavaṇē)
- Pre-tonic nasalized vowels typically become short nasal vowels, though they can also lose nasalization. — Prakrit paṃcāsa > Old Hindi pacā́sa "fifty", whence Hindustani pacās
Word rhythm shortening: Another Central Indo-Aryan sound change not seen other regions. The long vowels ā, ī, and ū are shortened before a consonant, followed later by another consonant and a heavy vowel (i.e. long vowel or diphthong).^[7] This explains several alterations present in modern Hindustani:
- Hindustani nīcā "low" but niclā "lower" (not *nīclā)
- Hindustani pūt "child" but putlā "mannequin" (not *pūtlā)
- Hindustani pāgal "crazy man" but paglī "crazy woman" (not *pāglī)

Counter-examples to vowel rules

The above rules and their caveats still do not sufficiently explain all cases of vowel length and gemination encountered in Hindustani, but it is closest to the ordering of the rules that Turner proposes in his analyses of Gujarati, Marathi, and Hindi. More complex phenomena must be employed to explain the counter-examples.^[7] The first set of counter-examples are cases where gemination appears to have been lost early-on, predating the VCː > VːC rule. These are confined to:

The Prakrit participle suffix -aṃta(ō), which loses nasalization and becomes Old Hindi -atā > Hindustani -tā, as in kartā "doing"
Geminate consonants in pleonastic suffixes, e.g. -akka-, -illa-, -ulla-, -aṭṭa-, etc. — Prakrit pālakka > pālaka > Hindustani pālak "spinach", rather than pālakka > *pālāka > Hindustani *palāk.
Geminates after prefixes (e.g. from Sanskrit ud-, nis-, and vi-), unless the prefix syllable carried the positional stress of the word.

The second set of examples are from semi-learned adaptation to Sanskrit. For instance, from Prakrit aṃdhaa we predict Hindustani *ā̃dhā but find andhā "blind", under influence of the Sanskrit etymon andha. From Prakrit suddhi we predict Old Hindi *sūdha (> Hindustani *sūdh) but find sudha "memory, sense" (> Hindustani sudh), under influence of the Sanskrit etymon śuddhi.

The third set of examples are from analogy and morphological processes. In the case of verbs with an expected long vowel in the root, there is competition throughout the paradigm due to word rhythm shortening. Based on the participle in -atā and infinitive in -anā, the root's vowel should be shortened; elsewhere, it should stay lengthened. The result of this is usually a short vowel which has been analogically leveled throughout the paradigm. There was also a tendency to associate short root vowels with intransitive verbs and long vowels with transitive verbs, which is inherited from the Sanskrit tendency (compare Sanskrit tapyatē "is heated" and tāpayati "causes to heat up"). Hence, based on Prakrit tappaï "is heated", we find both Hindustani tapnā "is heated" and tāpnā "heats (sthg.) up", where the long-vowel form has been analogically created. Other verbs with a long vowel in the root have either been re-lengthened or evaded rhythmic shortening based analogically on the de-verbal nominal form. For instance, we have Hindustani nācnā "to dance" (with nāc "dancing") and bā̃dhnā "to bind" (with bā̃dh "bond").

The fourth set of examples are borrowings from the northwest (whence Punjabi and Sindhi). The vowel lengthening rules did not take place in the northwestern region (words with this sound change in Punjabi and Sindhi are themselves borrowings from other Indo-Aryan languages, like Hindustani).^[8] These borrowings, likely from a Western Hindi dialect transitional to Punjabi,^[8] result in a large number of doublets in Hindustani, where in many cases the native word has been or is being eclipsed by the borrowed word:

Prakrit	Hindustani native term	Hindustani borrowed term	Meaning
makkhaṇa	mākhan	makkhan	"butter"
haḍḍa	hāṛ	haḍḍā	"bone"
acchaa	āchā	acchā	"clear, good"
sacca	sāc, sā̃cā	sac, saccā	"true"
maṭṭi, miṭṭi	māṭī	miṭṭī	"soil"
pakkaa	pākā	pakkā	"ripened, full"

The final set of examples occurs in unstressed small words (e.g. postpositions) that were reduced without lengthening. This is probably due to rhythmic vowel shortening across a larger phrase. Compare reductions of English the, a, etc. in unstressed environments. Such words include Hindustani sab "all" (< Prakrit savva), tujh "you (oblique)" (< Prakrit tujjha), and is "this (oblique)" (< Ap. ĕssa < Prakrit ēassa).

Sound changes from Old Hindi through modern Hindustani

Final nominative -au > -ā. -au is retained in the second-person plural suffix (from where it later becomes Hindustani -o).

Attenuation of post-tonic and final short vowels to /ǝ/. A number of words are saved from this lenition by semi-learned lengthening of the final vowel. For instance, from Sanskrit guru > Prakrit guru > Old Hindi gura, but also the semi-learned variant gurū "teacher, guide"
Suffix weakening: During the Old Hindi stage, final unstressed -ai and -au monophthongized to -e and -o, respectively.^[16] Hence, the general third-person singular ending underwent Sanskrit -ati > Prakrit -adi > Apabhraṃśa -aï > Old Hindi -ai > Hindustani -e, but when it was stressed in the monosyllabic Old Hindi hai, it remains unsimplified in Hindustani hai "is".
Indo-Aryan schwa deletion: ə → ∅ / VC_CV, though the application of this rule (particularly when there are many schwas in sequence) is dependent on the morphological boundaries of the word. This change is not indicated in the Devanagari script for Hindustani. — Old Hindi rāta > Hindustani rāt "night"
- This resulting in some ablaut alterations throughout a single verbal paradigm. For example, the infinitive utarnā "to descend" has the past participle utrā "descended", where the intertonic vowel in Old Hindi utarā has been lost.
-nr- > -ndr- by epenthesis, where cases of -nr- arise from schwa deletion. — Prakrit paṇṇaraha > Old Hindi panaraha > *panrah (schwa deletion) > Hindustani pandrah, pandrā "fifteen"
Unstressed (short) vowels are also lost in other positions, particularly initial vowels in words of 3 or more syllables or intertonic short vowels. — Old Hindi aḍhā́ī > Hindustani ḍhāī "two and a half"
Lenition of Ṽbh > Vmh and Ṽb > Vm: This change was a dialectal feature, and in regional Hindi variants the archaic form persists. In some cases, the regional variant which did not undergo this change ended up supplanting the main-dialect form, at least in writing.
- Old Hindi tā̃ba > Hindustani tām "copper (in compounds)", with regional variant tā̃b
- Old Hindi kũbhāra > Hindustani kumhār "potter", with regional variant kũbhār
- Old Hindi ā̃ba > Hindustani ām "mango", with regional variant ā̃b (compare Marathi āmbā, where this sound change never occurred)
- Old Hindi sãbhālanā > Hindustani sãbhālnā, with the pronunciation-spelling variant samhālnā
- The common root samajh- "to understand" from Prakrit saṃbujjh- should be treated as an irregular case because the ṃbh > mh > m shift and shifting of stress to the first syllable (hence confusion of post-tonic u > a) occurred in Pre-New-Indo-Aryan, hence it is present in Old Hindi and languages like Marathi which usually don't have this lenition rule.
Loss of nasal aspiration if not pre-vowel: This rule is fed by schwa-deletion and lenitions of Ṽb(h). It explains why Hindustani has mh in tumhārā "your" but no h in tum "you" (< *tumh < older tumha).

Sounds from loanwords: The sounds /f, z, ʒ, q, x, ɣ/ are loaned into Hindi-Urdu from Persian, English, and Portuguese.
- In Hindi, /f/ and /z/ are most well-established, but can be /pʰ/ or /bʰ/ in rustic speech. /q, x, ɣ/ are variably (by dialect) assimilated into /k, kʰ, g/, respectively, and /ʒ/ is almost never pronounced and substituted by /ʃ/ or /dʒʰ/.^[17]
- /pʰ/ is starting to merge into /f/ in a number of Hindustani dialects.
- Sanskrit ṛ is borrowed into Hindustani as /rɪ/, but is pronounced more like /ru/ in languages like Marathi.
Monophthongization of ai to /ɛː ~ æː/ and au to /ɔː/ in many non-Eastern dialects.^[18] A separate /æː/ arguably exists in Hindustani by English loanwords.
Shifts before /ɦ/: Before h + a short vowel or deleted schwa, the pronunciation of short a shifts allophonically to short [ɛ] or [ɔ] (only if the short vowel is u). This change is part of the prestige dialect of Delhi, but may not occur to the full degree for every speaker. Often, this step is taken further by assimilation of short vowel after /ɦ/ to [ɛ] or [ɔ], and then by loss of /ɦ/ and coalescence/lengthening of vowels into long /ɛː/ and /ɔː/. In some cases, different inflections of the same word have differing outcomes^[18]
- Hindustani bahut /bǝ.ɦʊt̪/ > [bɔ.ɦʊt̪] > [bɔ.ɦɔt̪] > [bɔːt̪] "a lot, many"
- Hindustani pahlā /pǝɦ.läː/ > [pɛɦ.läː] > [pɛː.läː] "first"
- Hindustani bahan /bǝ.ɦǝn/ > [bɛ.ɦǝn] > [bɛ.ɦɛn] > [bɛːn] "sister"
- Hindustani kahnā /kǝɦ.näː/ > [kɛɦ.näː] > [kɛː.näː] "to say", but kahegā "he will say" is still pronounced [kǝ.ɦeː.gäː]

Examples of sound changes

The following table shows a possible sequence of changes for some basic vocabulary items, leading from Sanskrit to Modern Hindustani. All entries are romanized. An empty cell means no change at the given stage for the given item. Only sound changes that had an effect on one or more of the vocabulary items are shown. Words may not be attested at each stage.


Gloss	juhi	tiger	donkey	dusky	it grows	two and half	to support
Sanskrit (nominative)	yūthikā	vyāghraḥ	gardabhakaḥ	śyāmalakaḥ	utpadyati	ardhatṛtīyaḥ	sambhālanam
Sandhi (e.g. final -aḥ > -ō)		vyāghrō	gardabhakō	śyāmalakō		ardhatṛtīyō	saṃbhālanaṃ
Early Cerebralization						arḍhatṛtīyō
Loss of ṛ						arḍhatatīyō
Sibilant merger				syāmalakō
C + y, s palatalization					utpajyati
Initial cluster simplif.		vāghrō		sāmalakō
Two-mora rule		vaghrō
Medial cluster simplif.		vagghō	gaddabhakō		uppajjati	aḍḍhatatīyō
Pali	yūthikā	vagghō	gaddabhakō	sāmalakō	uppajjati	aḍḍhatatīyō	saṃbhālanaṃ
Init. y > j, med. yy > jj	jūthikā
Merging of nasals							saṃbhālaṇaṃ
Intervocalic lenitions	jūhiā		gaddahaō	sāmalaō	uppajjaï	aḍḍhaaīō
Pleonastic suffix additions							saṃbhālaṇaō
Prakrit	jūhiā	vagghō	gaddahaō	sāmalaō	uppajjaï	aḍḍhaaīō	saṃbhālaṇaō
-VmV- > -VṃvV-				saṃvalaō
Shorten final long vowels	jūhia	vagghu	gaddahaü	saṃvalaü		aḍḍhaaīu	saṃbhālaṇaü
Positional stress	jū́hia	vágghu	gáddahaü	sáṃvalaü	uppájjaï	aḍḍháaīu	saṃbhā́laṇaü
Dentalization of ṇ, ḷ							saṃbhā́lanaü
vv > bb and initial v > b		bágghu
Vowels in hiatus coalesce	jū́hī		gáddahau	sáṃvalau		aḍḍhā́ī	saṃbhā́lanau
VCː > VːC or VṃC > ṼːC		bā́ghu	gā́dahau	sā̃valau	ūpā́jaï	āḍhā́ī	sā̃bhā́lanau
Pre/post-tonic vowel shortens					upā́jaï	aḍhā́ī	sãbhā́lanau
Word rhythm shortening			gádahau		upájaï
Final nominative -au > -ā			gádahā	sā̃valā			sãbhā́lanā
Final short vowels > /ǝ/		bā́gha
Old Hindi	jūhī	bāgha	gadahā	sā̃valā	upajai	aḍhāī	sãbhālanā
Final -ai, -au > -e, -o					upaje
Schwa deletion		bāgh	gadhā	sā̃vlā	upje		sãbhālnā
Unstressed initial vowel loss						ḍhāī
-Ṽbh-, -Ṽb- > -Vmh-, -Vm-							samhālnā
Hindustani Romanized	jūhī	bāgh	gadhā	sā̃vlā	upje	ḍhāī	samhālnā
Hindustani Devangari	जूही	बाघ	गधा	साँवला	उपजे	ढाई	सम्हालना
Hindustani Urdu	جوہی	باگھ	گدھا	سانولا	اپجے	ڈھائی	سمہالنا

References

^ "A Guide to Hindi". BBC - Languages - Hindi. BBC. Retrieved 11 December 2015.
^ Kumar, Nitin (28 June 2011). "Hindi & Its Origin". Hindi Language Blog. Retrieved 11 December 2015.
^ Masica, Colin P. (1993). The Indo-Aryan Languages. Cambridge University Press. ISBN 978-0-521-29944-2.
^ Grierson, George (1920). "Indo-Aryan Vernaculars (Continued)". Bulletin of the School of Oriental Studies. 3 (1): 51–85. doi:10.1017/S0041977X00087152. S2CID 161798254. at pp. 67-69.
^ Turner, Ralph Lilley, ed. (1969–1985). A comparative dictionary of Indo-Aryan language. London: Oxford University Press. p. 599. OCLC 503920810.
^ ^a ^b J. Bloch (1970). Formation of the Marathi Language. Motilal Banarsidass. pp. 33, 180. ISBN 978-81-208-2322-8.
^ ^a ^b ^c ^d Turner, Ralph Lilley (1975). Collected Papers, 1912-1973. Oxford University Press. ISBN 9780197135822.
^ ^a ^b ^c ^d ^e ^f ^g ^h ⁱ Masica, Colin P. (1993). The Indo-Aryan Languages. Cambridge University Press. p. 167. ISBN 978-0-521-29944-2.
^ Kobayashi, Masato (2004). Historical Phonology of Old Indo-Aryan Consonants. Study of Languages and Cultures of Asia and Africa Monograph Series. Vol. 42. pp. 60–65. ISBN 4-87297-894-3.
^ J. Bloch (1970). Formation of the Marathi Language. Motilal Banarsidass. p. 6. ISBN 978-81-208-2322-8.
^ J. Bloch (1970). Formation of the Marathi Language. Motilal Banarsidass. pp. 129, 130. ISBN 978-81-208-2322-8.
^ ^a ^b https://prakrit.info/prakrit/grammar.html?r=phonology
^ "The -kk- verbal extension in Indo-Aryan". 3 May 2022.
^ Jaroslav Strnad (2013). Morphology and syntax of Old Hindī: edition and analysis of one hundred Kabīr vānī poems from Rājasthān. Brill. p. 191.
^ Thomas Oberlies (2005). A Historical Grammar of Hindi. Leykam. p. 5.
^ Jaroslav Strnad (2013). Morphology and syntax of Old Hindī: edition and analysis of one hundred Kabīr vānī poems from Rājasthān. Brill. p. 384.
^ Shapiro 2003, p. 260.
^ ^a ^b Shapiro 1989, p. 9–21.