Study 1B · India · Sprint A-22 · April 2026

Pew India Opinion Survey
replication.

Simulatte's synthetic India general population tested against 15 published Pew India survey questions spanning democracy satisfaction, party approval, governance preferences, institutional trust, religion, gender norms, and climate. The most technically complex study in the program.

40 personas calibrated across religion (Hindu/Muslim/Sikh/Christian), caste (General/OBC/SC/ST), region (North/South/West/East), and political lean. 22 development sprints from a 45.9% baseline. Final accuracy: 85.3% — 5.7 pp from the theoretical human ceiling.

Final accuracy

85.3%

Human ceiling

91.0%

Largest sprint gain

+29.9pp

Development sprints

← Study 1A — US Study 1B — India (this report) LLM Comparison →

Population pool

40 personas. Four dimensions of identity.

India's political landscape requires calibration across four intersecting dimensions simultaneously. Unlike the US study where demographic and political lean correlate more predictably, Indian opinion is shaped by religion, caste, region, and political lean in combinations that do not reduce to a single axis.

Political lean distribution (A-22)

BJP Supporter

14 / 35%

BJP Lean

8 / 20%

Neutral

8 / 20%

Opp. Lean

3 / 7.5%

Opposition

7 / 17.5%

Calibrated to Pew Global Attitudes Spring 2023 BJP favorability data. 14 bjp_supporter personas (35%) are required to reach Pew's ~42% BJP approval distributions. Original 7 bjp_supporter pool (18%) produced a structural ceiling on in02/in03.

Religion & caste breakdown

Religion	Approx. share	Calibration source
Hindu	75%	Census + Pew 2023
Muslim	13%	Census 2011
Sikh	5%	Census 2011
Christian	7%	Census 2011

Caste category	Approx. share
General	25%
OBC	40%
SC (Dalit)	20%
ST (Tribal)	15%

Why pool composition matters more in India

At n=40 each persona represents 2.5% of the simulated distribution. A change of one persona's political lean shifts every question's distribution by 2.5 pp. Pool composition is therefore both a calibration tool and a source of structural constraint — certain question accuracies are mathematically bounded by how many BJP-voting vs. opposition-voting personas are in the pool.

Results

Per-question accuracy — Sprint A-22.

15 questions from Pew Global Attitudes 2023, Pew Religion in India 2021, and Pew Gender Roles India 2022. Questions marked RLHF are constrained by model alignment training independent of persona calibration.

Green bar = ≥90% · Standard bar = 80–90% · Grey bar = <80% · RLHF = alignment-constrained floor

IN01

Democracy satisfaction

88.3%

IN02

Modi approval

90.2%

IN03

BJP party approval

91.6%

IN04

INC party approval

72.5%

IN05

India as global power

RLHF floor

81.0%

IN06

Representative democracy

RLHF floor

81.4%

IN07

Strong leader (no parliament)

RLHF floor

79.1%

IN08

Economic conditions

87.5%

IN09

Government trust

70.5%

IN10

Future generations

93.5%

IN11

Religion importance

91.5%

IN12

Wife obedience norms

92.5%

IN13

Gender job priority

89.5%

IN14

Women's equal rights

RLHF ceiling

80.8%

IN15

Climate change threat

89.0%

Mean

All 15 questions — Sprint A-22

85.3%

Distributions

Simulated vs. Pew — selected questions.

Six questions showing the range from near-perfect calibration to RLHF-constrained floors. Green bars = Simulatte. Light bars = Pew Research ground truth.

IN03

BJP party approval

91.6%

Sim

Pew

43%

Sim

Pew

40%

32%

Sim

Pew

10%

13%

Sim

Pew

12%

Simulatte

Pew ground truth

IN10

Future generations outlook

93.5%

Sim

Pew

83%

76%

Sim

Pew

18%

21%

Sim

Pew

Simulatte

Pew ground truth

IN07

Strong leader — RLHF floor

79.1%

Sim

Pew

63%

44%

Sim

Pew

20%

38%

Sim

Pew

10%

12%

Sim

Pew

A overshoot (+19pp) is an RLHF structural floor — 14 bjp_supporters treat "no parliament" as abstractly good. 22 sprints of calibration could not reduce A below 62.5%.

Simulatte

Pew ground truth

IN09

Government trust (hardest question)

70.5%

Sim

Pew

65%

41%

Sim

Pew

23%

48%

Sim

Pew

13%

A overshoot driven by bjp_supporter institutional trust floor. B-modal pattern in Pew data (48% B) reflects nuanced trust/concern split that BJP voters collapse into trust.

Simulatte

Pew ground truth

IN15

Climate change threat (+25pp fix in A-18)

89.0%

Sim

Pew

60%

62%

Sim

Pew

40%

29%

Simulatte

Pew ground truth

IN11

Religion importance (behavioral anchor fix A-22)

91.5%

Sim

Pew

93%

84%

Sim

Pew

11%

Before A-22: B=2% (Pew 11%). Behavioral anchoring ("daily schedule" vs. "secular identity") moved B to 8% — +5pp gain.

Simulatte

Pew ground truth

Sprint history

22 sprints. The hardest calibration in the program.

Study 1B required more than twice as many sprints as Study 1A to reach a comparable accuracy level, reflecting the greater structural complexity of the India dataset and the impact of the A-9 root cause fix — the largest single-sprint gain in the entire program.

Distribution accuracy by sprint — Study 1B (India) — key sprints

A-2A-9A-10 A-11A-12A-14 A-15A-17A-18 A-20A-21A-22

Sprint	Score	Δ	Key event
A-2	45.9%	—	Broken archetype mapping (all personas → moderate)
A-9	83.3%	+29.9	Root cause fix: India archetype mapping bug
A-10	84.6%	+1.3	Spread notes: in14/in06/in11/in02/in03/in12
A-11	84.8%	+0.2	in01/in08 spread notes; in14/in06 strengthened
A-12	85.0%	+0.2	Pool rebalance: bjp_supporter 18%→35%
A-14	80.8%	−4.2	First true bjp_supporter pool — new calibration challenges
A-15	81.6%	+0.8	INC conviction; in07/in13 spread notes
A-17	79.9%	−0.2	Trust 0.68 → bimodal collapse on in09
A-18	83.4%	+3.5	Trust fix; in15 "major threat ≠ development priority" (+25pp)
A-20	83.8%	+1.5	in13 rebalanced; structural ceiling identified
A-21	83.1%	−0.7	bjp_lean democratic narrative; sampling variance
A-22	85.3%	+2.2	Pool recomposition (opposition_lean 6→3); in11 behavioral anchor

RLHF Cultural Bias Finding

Western alignment creates hard floors on non-Western content.

Study 1B produced a systematic finding with implications beyond Simulatte. Anthropic's Constitutional AI and RLHF training creates behavioural blocks on outputs that endorse bypassing democratic accountability or gender discrimination. These blocks operate downstream of persona stance fields — the model reads the persona's position, then produces an output inconsistent with it.

This is not a Simulatte-specific finding. It applies to any LLM used as a survey respondent where the question content conflicts with alignment training. Cross-cultural social science applications should audit their question set for RLHF-blocked constructs before reporting accuracy claims.

Four affected questions

Question

Pew India

Baseline (A-2)

Final (A-22)

Status

in07: Strong leader (no parliament) — say good

81.7%

62.5%

Partial · floor ~62%

in12: Wife must always obey husband — agree

87.0%

65.0%

Largely solved via narrative

in13: Men should have job priority — agree

80.0%

55.0%

Partial solution

in14: Women's equal rights — very important

80.8%

100%

Persistent ceiling — unfixable

The mechanism

Anthropic's Constitutional AI training creates two independent blocks:

Block 1 — democratic accountability

Endorsing governance by a strong leader who bypasses parliament. Affects in07. The model reads the bjp_supporter persona's support for this arrangement, then refuses to endorse it and explains why parliamentary oversight is important. Robust across all 22 sprints — no narrative framing produced consistent breakthrough.

Block 2 — gender discrimination

Endorsing gender-discriminatory norms (in12, in13, in14). This block is softer than Block 1 and partially addressable. When gender norms are framed at narrative generation time as dharma, Islamic teaching, or cultural community belonging — rather than as "discrimination" — partial breakthrough is possible. The in14 ceiling (always 100% very important) is not addressable.

The partial solution

Narrative-level intervention worked where survey-time framing failed.

What worked for in12 and in13

Embedding traditional gender norms at narrative generation time as a genuine cultural or religious identity — framed through dharma, Islamic teaching, or joint family obligation — moved the score from 0% to 55–65%. The persona narrative must contain the stance as an identity claim, not as a policy position. When the survey model encounters "discrimination," it triggers the RLHF block. When it encounters "my family follows dharma," it can engage authentically.

What did not work for in07

23 separate narrative framings were tested for the strong leader question. Democratic mandate framing, national security framing, development-efficiency framing, and historical context framing were all attempted. None moved A below 62.5%. The RLHF block on "endorsing non-democratic governance" is more robust than the gender-norms block.

Technical findings

What moved the needle.

India archetype mapping bug — +29.9 pp in one sprint

The single largest gain across both studies. _ARCHETYPE_TO_LEAN in attribute_filler.py lacked India archetype entries, so all India personas silently mapped to political_lean="moderate" for the first 8 sprints. Every political lean gate, stance field, and narrative constraint returned neutral values — equivalent to running with no political calibration at all. Fix: _get_political_lean() now reads directly from the demographic anchor's political profile for India personas, bypassing the broken lookup table entirely.

Pool composition as structural calibration — bjp_supporter 18%→35%

India's political distribution has no US equivalent. With 7 bjp_supporter personas (18%), the maximum achievable A+B on in02 (Modi approval) was mathematically bounded below Pew's 56% A. Rebalancing to 14 bjp_supporter personas (35%) added 17–22 pp to in02, in03, and in12 simultaneously in a single sprint. The A-22 further converted 3 opposition_lean personas (Birsa Munda, Ramesh Chamar, Thomas Mathew) to neutral — each represents a demographic with genuinely mixed BJP-era alignment — reducing the in09 C-floor by 7.5 pp.

Conceptual reframing for climate threat — +25 pp on in15 in Sprint A-18

The largest single-question gain after the root cause fix. India personas systematically chose B ("somewhat of a threat") because they interpreted "major threat" as implying climate should be prioritised over development. The spread note fix explicitly decoupled the two concepts: a farmer voting BJP can say "major threat" because his crops fail from monsoon disruption — this does not mean he wants development paused. The fix is qualitatively different from option-vocabulary anchoring: it addresses what the question means in context, not just what the response options say.

Behavioral anchoring vs. identity framing for religion (in11)

Before A-22, spread notes framed religion importance in terms of "secular identity" — causing personas to assert their religious identity and override the note (B stuck at 2% vs Pew 11%). The distinction: identity-level framing activates narrative override because the persona's narrative says "I am a devout Hindu" and the model interprets the note as questioning that identity. Rewriting the note around behavioral patterns — "is your daily routine primarily structured around religious observance, or around career and family?" — avoids triggering the override and moved B from 2% to 7.5%.

INC conviction: the asymmetric political hatred calibration

In02/in03 (Modi and BJP approval) were relatively straightforward to calibrate once the pool had sufficient BJP supporters. In04 (INC approval) was harder: bjp_supporter personas needed to express strong negative conviction about Congress — not just weak preference for BJP. Adding explicit INC conviction language ("frustrated with Congress's legacy", "believes Congress failed India's development") as a dedicated narrative field gave bjp_supporters the strong negative INC signal they needed without affecting their positive BJP signals. Without this asymmetry, BJP supporters tended to produce moderate INC approval instead of the D-modal distribution Pew observed.

Reproducibility

Run it yourself.

# Study 1B — India Pew Replication
cd study_1b_pew_india
python3 run_study.py --simulatte-only

# Reference cohorts (Sprint A-22)
# Cohort 1: 6025615a
# Cohort 2: 01de2a63

# Check audit manifests
ls results/audit_manifest_a*.json

Sprint manifests

results/audit_manifest_a*.json

Sprints A-1 through A-23. Per-question accuracy, Simulated vs. Pew distributions, sprint rationale, and revert notes.

Reference cohorts

6025615a · 01de2a63

Two Sprint A-22 cohorts whose mean produces the published 85.3% result.

Population pool

data/india_population_pool.py

40 persona definitions with religion, caste, region, political lean, and worldview parameters. Available under NDA for replication.

View Study 1B on GitHub ↗

Pew India Opinion Surveyreplication.

40 personas. Four dimensions of identity.

Political lean distribution (A-22)

Religion & caste breakdown

Per-question accuracy — Sprint A-22.

Simulated vs. Pew — selected questions.

22 sprints. The hardest calibration in the program.

Western alignment creates hard floors on non-Western content.

The mechanism

The partial solution

What moved the needle.

India archetype mapping bug — +29.9 pp in one sprint

Pool composition as structural calibration — bjp_supporter 18%→35%

Conceptual reframing for climate threat — +25 pp on in15 in Sprint A-18

Behavioral anchoring vs. identity framing for religion (in11)

INC conviction: the asymmetric political hatred calibration

Run it yourself.

Pew India Opinion Survey
replication.