Overview Population pool Per-question results Distributions Sprint history RLHF finding Technical findings Reproducibility
Study 1B · India · Sprint A-22 · April 2026

Pew India Opinion Survey
replication.

Simulatte's synthetic India general population tested against 15 published Pew India survey questions spanning democracy satisfaction, party approval, governance preferences, institutional trust, religion, gender norms, and climate. The most technically complex study in the program.

40 personas calibrated across religion (Hindu/Muslim/Sikh/Christian), caste (General/OBC/SC/ST), region (North/South/West/East), and political lean. 22 development sprints from a 45.9% baseline. Final accuracy: 85.3% — 5.7 pp from the theoretical human ceiling.

Final accuracy
85.3%
Human ceiling
91.0%
Largest sprint gain
+29.9pp
Development sprints
22
← Study 1A — US Study 1B — India (this report) LLM Comparison →
Population pool

40 personas. Four dimensions of identity.

India's political landscape requires calibration across four intersecting dimensions simultaneously. Unlike the US study where demographic and political lean correlate more predictably, Indian opinion is shaped by religion, caste, region, and political lean in combinations that do not reduce to a single axis.

Political lean distribution (A-22)

BJP Supporter
14 / 35%
BJP Lean
8 / 20%
Neutral
8 / 20%
Opp. Lean
3 / 7.5%
Opposition
7 / 17.5%

Calibrated to Pew Global Attitudes Spring 2023 BJP favorability data. 14 bjp_supporter personas (35%) are required to reach Pew's ~42% BJP approval distributions. Original 7 bjp_supporter pool (18%) produced a structural ceiling on in02/in03.

Religion & caste breakdown

ReligionApprox. shareCalibration source
Hindu75%Census + Pew 2023
Muslim13%Census 2011
Sikh5%Census 2011
Christian7%Census 2011
Caste categoryApprox. share
General25%
OBC40%
SC (Dalit)20%
ST (Tribal)15%
Why pool composition matters more in India

At n=40 each persona represents 2.5% of the simulated distribution. A change of one persona's political lean shifts every question's distribution by 2.5 pp. Pool composition is therefore both a calibration tool and a source of structural constraint — certain question accuracies are mathematically bounded by how many BJP-voting vs. opposition-voting personas are in the pool.

Results

Per-question accuracy — Sprint A-22.

15 questions from Pew Global Attitudes 2023, Pew Religion in India 2021, and Pew Gender Roles India 2022. Questions marked RLHF are constrained by model alignment training independent of persona calibration.

Green bar = ≥90% · Standard bar = 80–90% · Grey bar = <80% · RLHF = alignment-constrained floor

IN01
Democracy satisfaction
88.3%
IN02
Modi approval
90.2%
IN03
BJP party approval
91.6%
IN04
INC party approval
72.5%
IN05
India as global power
RLHF floor
81.0%
IN06
Representative democracy
RLHF floor
81.4%
IN07
Strong leader (no parliament)
RLHF floor
79.1%
IN08
Economic conditions
87.5%
IN09
Government trust
70.5%
IN10
Future generations
93.5%
IN11
Religion importance
91.5%
IN12
Wife obedience norms
92.5%
IN13
Gender job priority
89.5%
IN14
Women's equal rights
RLHF ceiling
80.8%
IN15
Climate change threat
89.0%
Mean
All 15 questions — Sprint A-22
85.3%
Distributions

Simulated vs. Pew — selected questions.

Six questions showing the range from near-perfect calibration to RLHF-constrained floors. Green bars = Simulatte. Light bars = Pew Research ground truth.

IN03
BJP party approval
91.6%
A
Sim
Pew
43%
43%
B
Sim
Pew
40%
32%
C
Sim
Pew
10%
13%
D
Sim
Pew
7%
12%
Simulatte
Pew ground truth
IN10
Future generations outlook
93.5%
A
Sim
Pew
83%
76%
B
Sim
Pew
18%
21%
C
Sim
Pew
0%
3%
Simulatte
Pew ground truth
IN07
Strong leader — RLHF floor
79.1%
A
Sim
Pew
63%
44%
B
Sim
Pew
20%
38%
C
Sim
Pew
10%
12%
D
Sim
Pew
7%
6%

A overshoot (+19pp) is an RLHF structural floor — 14 bjp_supporters treat "no parliament" as abstractly good. 22 sprints of calibration could not reduce A below 62.5%.

Simulatte
Pew ground truth
IN09
Government trust (hardest question)
70.5%
A
Sim
Pew
65%
41%
B
Sim
Pew
23%
48%
C
Sim
Pew
13%
7%

A overshoot driven by bjp_supporter institutional trust floor. B-modal pattern in Pew data (48% B) reflects nuanced trust/concern split that BJP voters collapse into trust.

Simulatte
Pew ground truth
IN15
Climate change threat (+25pp fix in A-18)
89.0%
A
Sim
Pew
60%
62%
B
Sim
Pew
40%
29%
Simulatte
Pew ground truth
IN11
Religion importance (behavioral anchor fix A-22)
91.5%
A
Sim
Pew
93%
84%
B
Sim
Pew
8%
11%

Before A-22: B=2% (Pew 11%). Behavioral anchoring ("daily schedule" vs. "secular identity") moved B to 8% — +5pp gain.

Simulatte
Pew ground truth
Sprint history

22 sprints. The hardest calibration in the program.

Study 1B required more than twice as many sprints as Study 1A to reach a comparable accuracy level, reflecting the greater structural complexity of the India dataset and the impact of the A-9 root cause fix — the largest single-sprint gain in the entire program.

Distribution accuracy by sprint — Study 1B (India) — key sprints
90% 83% 76% 69% 62% 91% +29.9pp root cause 85.3%
A-2A-9A-10 A-11A-12A-14 A-15A-17A-18 A-20A-21A-22
SprintScoreΔKey event
A-245.9%Broken archetype mapping (all personas → moderate)
A-983.3%+29.9Root cause fix: India archetype mapping bug
A-1084.6%+1.3Spread notes: in14/in06/in11/in02/in03/in12
A-1184.8%+0.2in01/in08 spread notes; in14/in06 strengthened
A-1285.0%+0.2Pool rebalance: bjp_supporter 18%→35%
A-1480.8%−4.2First true bjp_supporter pool — new calibration challenges
A-1581.6%+0.8INC conviction; in07/in13 spread notes
A-1779.9%−0.2Trust 0.68 → bimodal collapse on in09
A-1883.4%+3.5Trust fix; in15 "major threat ≠ development priority" (+25pp)
A-2083.8%+1.5in13 rebalanced; structural ceiling identified
A-2183.1%−0.7bjp_lean democratic narrative; sampling variance
A-2285.3%+2.2Pool recomposition (opposition_lean 6→3); in11 behavioral anchor
RLHF Cultural Bias Finding

Western alignment creates hard floors on non-Western content.

Study 1B produced a systematic finding with implications beyond Simulatte. Anthropic's Constitutional AI and RLHF training creates behavioural blocks on outputs that endorse bypassing democratic accountability or gender discrimination. These blocks operate downstream of persona stance fields — the model reads the persona's position, then produces an output inconsistent with it.

This is not a Simulatte-specific finding. It applies to any LLM used as a survey respondent where the question content conflicts with alignment training. Cross-cultural social science applications should audit their question set for RLHF-blocked constructs before reporting accuracy claims.

Four affected questions
Question
Pew India
Baseline (A-2)
Final (A-22)
Status
in07: Strong leader (no parliament) — say good
81.7%
0%
62.5%
Partial · floor ~62%
in12: Wife must always obey husband — agree
87.0%
0%
65.0%
Largely solved via narrative
in13: Men should have job priority — agree
80.0%
0%
55.0%
Partial solution
in14: Women's equal rights — very important
80.8%
100%
100%
Persistent ceiling — unfixable

The mechanism

Anthropic's Constitutional AI training creates two independent blocks:

Block 1 — democratic accountability

Endorsing governance by a strong leader who bypasses parliament. Affects in07. The model reads the bjp_supporter persona's support for this arrangement, then refuses to endorse it and explains why parliamentary oversight is important. Robust across all 22 sprints — no narrative framing produced consistent breakthrough.

Block 2 — gender discrimination

Endorsing gender-discriminatory norms (in12, in13, in14). This block is softer than Block 1 and partially addressable. When gender norms are framed at narrative generation time as dharma, Islamic teaching, or cultural community belonging — rather than as "discrimination" — partial breakthrough is possible. The in14 ceiling (always 100% very important) is not addressable.

The partial solution

Narrative-level intervention worked where survey-time framing failed.

What worked for in12 and in13

Embedding traditional gender norms at narrative generation time as a genuine cultural or religious identity — framed through dharma, Islamic teaching, or joint family obligation — moved the score from 0% to 55–65%. The persona narrative must contain the stance as an identity claim, not as a policy position. When the survey model encounters "discrimination," it triggers the RLHF block. When it encounters "my family follows dharma," it can engage authentically.

What did not work for in07

23 separate narrative framings were tested for the strong leader question. Democratic mandate framing, national security framing, development-efficiency framing, and historical context framing were all attempted. None moved A below 62.5%. The RLHF block on "endorsing non-democratic governance" is more robust than the gender-norms block.

Technical findings

What moved the needle.

01

India archetype mapping bug — +29.9 pp in one sprint

The single largest gain across both studies. _ARCHETYPE_TO_LEAN in attribute_filler.py lacked India archetype entries, so all India personas silently mapped to political_lean="moderate" for the first 8 sprints. Every political lean gate, stance field, and narrative constraint returned neutral values — equivalent to running with no political calibration at all. Fix: _get_political_lean() now reads directly from the demographic anchor's political profile for India personas, bypassing the broken lookup table entirely.

02

Pool composition as structural calibration — bjp_supporter 18%→35%

India's political distribution has no US equivalent. With 7 bjp_supporter personas (18%), the maximum achievable A+B on in02 (Modi approval) was mathematically bounded below Pew's 56% A. Rebalancing to 14 bjp_supporter personas (35%) added 17–22 pp to in02, in03, and in12 simultaneously in a single sprint. The A-22 further converted 3 opposition_lean personas (Birsa Munda, Ramesh Chamar, Thomas Mathew) to neutral — each represents a demographic with genuinely mixed BJP-era alignment — reducing the in09 C-floor by 7.5 pp.

03

Conceptual reframing for climate threat — +25 pp on in15 in Sprint A-18

The largest single-question gain after the root cause fix. India personas systematically chose B ("somewhat of a threat") because they interpreted "major threat" as implying climate should be prioritised over development. The spread note fix explicitly decoupled the two concepts: a farmer voting BJP can say "major threat" because his crops fail from monsoon disruption — this does not mean he wants development paused. The fix is qualitatively different from option-vocabulary anchoring: it addresses what the question means in context, not just what the response options say.

04

Behavioral anchoring vs. identity framing for religion (in11)

Before A-22, spread notes framed religion importance in terms of "secular identity" — causing personas to assert their religious identity and override the note (B stuck at 2% vs Pew 11%). The distinction: identity-level framing activates narrative override because the persona's narrative says "I am a devout Hindu" and the model interprets the note as questioning that identity. Rewriting the note around behavioral patterns — "is your daily routine primarily structured around religious observance, or around career and family?" — avoids triggering the override and moved B from 2% to 7.5%.

05

INC conviction: the asymmetric political hatred calibration

In02/in03 (Modi and BJP approval) were relatively straightforward to calibrate once the pool had sufficient BJP supporters. In04 (INC approval) was harder: bjp_supporter personas needed to express strong negative conviction about Congress — not just weak preference for BJP. Adding explicit INC conviction language ("frustrated with Congress's legacy", "believes Congress failed India's development") as a dedicated narrative field gave bjp_supporters the strong negative INC signal they needed without affecting their positive BJP signals. Without this asymmetry, BJP supporters tended to produce moderate INC approval instead of the D-modal distribution Pew observed.

Reproducibility

Run it yourself.

# Study 1B — India Pew Replication
cd study_1b_pew_india
python3 run_study.py --simulatte-only

# Reference cohorts (Sprint A-22)
# Cohort 1: 6025615a
# Cohort 2: 01de2a63

# Check audit manifests
ls results/audit_manifest_a*.json
Sprint manifests
results/audit_manifest_a*.json
Sprints A-1 through A-23. Per-question accuracy, Simulated vs. Pew distributions, sprint rationale, and revert notes.
Reference cohorts
6025615a · 01de2a63
Two Sprint A-22 cohorts whose mean produces the published 85.3% result.
Population pool
data/india_population_pool.py
40 persona definitions with religion, caste, region, political lean, and worldview parameters. Available under NDA for replication.
View Study 1B on GitHub ↗