Jake Van Clief

Jun 11Edited

Hi! So I will say this article is written a bit more "buzz wordy" than the actual paper so I glazed over quite a lot of technical details that looks at generative aspects.

You make an amazing point here and it is something I criticized about many of the papers before this, this is a wall I believe many are hitting, however I argue the data pipeline and process I took to get to these results actually does not hit the same wall.

This article goes a bit more into it : https://jakevanclief.substack.com/p/mind-the-moral-echo

But here is the GIT Hub and the exact prompts/personas I used for this specific chart are in the code set here : https://github.com/RinDig/GPTmetrics

However there is a lot more behind this because I asked the same question you did and wanted to test that thought and get around it.

So initially I had run it with hundreds of other prompt versions for each persona (definitions of a conservative or right wing/left etc and even with moderate in the middle versions), and run the models thousands of times from completely different companies models on the same questions with little to no change and there was a clear positive and negative trend which is one part of the evidence I used for supporting this is not just an emergent process. BUT there is still more framing I did to give evidence to this if you will bare with me.

I will say this data here is simply showing the extreme personas and I will explain why shortly

so again your question is absolutely correct to ask however I will certainly send the pre print to you shortly if you would like.

BUT

here is two points I would love for you to consider from my framing and may also explain more of the process.

1. To me the individual scores of each persona are not very important if I was looking at them individually on their own, what I mean by this is that it is not the fact that the conservative is higher on its own which is important (because as you expertly noticed this can be due to the models generativity) but rather what is important is that I was comparing the extremes between all personas and tests AND between the other companies models, this allows me to create more semantic averages (from a mathematics perspective) of the outputs letting me get more insight from the models training data that would NOT be skewed from just the generative nature of that data. What I mean by this is, the thing that is important to look at is not simply that the conservative persona scored higher but that this was the ONLY persona that scored higher than human data in ONLY the RWA test and not the LWA. Now I think there is a lot of truth to your statement but that is exactly what I was trying to measure. The question of WHY is what I believe to be something that needs a bit more study. In my opinion this to me gives a small amount of evidence showing this is deeper than a simple generativity issue and is something inherent to the test or training process/data.

2. Now I would like to be an academic critic and "attack"(in a friendly way) this statement : the conservative side of the spectrum, where opinion and sentiment can carry so much more salience than on the opposite side.

This is a very biased statement, now there are some studies that show what you said to possibly carry weight but across the board there are a plenty of studies that show the opposite. But arguing who is wrong or right there is not why I am criticizing that statement, rather think about this for a moment: The term "conservative" holds a lot of bias depending on who is reading it and what context. This is another point I think this data will show. Now this is not me defending any conservative ideologies but rather the fact that this statement is the Bias we may be seeing in that data from the model.

If we were to take a poll of which side had more : "opinion and sentiment can carry so much more salience" I can almost guarantee the results would shift to the other "side" when asking members of each side respectively.

To me this is why data and exploration like this is so important.

We often confuse the dinner menu for the food.

Expand full comment

Oh excellent! Thank you so much! Now I have to recalibrate my day to make room for this, which I could definitely dig into for hours on end. I’m not sure I should thank you for sending this along lolololol. But seriously, I really am looking forward to digging into this more. Nerdy geeky systems engaged, ready to fire on command. 😁

Expand full comment

That makes me so happy to hear ! Again I think your points on this "wall" are dead on and would love more input from you

We are submitting the official paper for conference this Thursday I will send you the preprint wen it's uploaded this week !

Expand full comment

Reply (2)

OK, I think this is very important… you may want to consider this for the sake of the quality of your results. I took a look at your prompts, and I don’t think that they are the sorts of prompts that can reliably demonstrate model behavior. Rather, they demonstrate the models’ understanding of what the terms liberal and conservative mean. What you’re essentially doing is asking the model to extrapolate from its training, what constitutes liberal or conservative on a spectrum, and if the model training indicates that extreme liberal means one thing, and extreme conservative means something else, you’re going to see the model responding not from its own orientation, but from what it understands to be true about Those classifications.

Is that making sense?

In your shoes, I would run the tests again with more verbose prompts, articulating exactly what is meant by liberal and conservative, and also according to scales. You’re not giving the models much to work with, in terms of context and detail, so essentially, you’re giving it the chance to hallucinate results, due to lack of detail and context.

That’s not to say that this finding or the paper or without merit. I would just also encourage you to explore from another angle with the level of context in detail these models find useful.

If you have a question about what would constitute a sufficient prompt to allow them to respond in kind, versus Making some determination about what is expected of them, just have a chat with them, and let them tell you.

Expand full comment

Reply (4)

Long story short it sounds like we may need to work on a paper together that is the next step beyond this one! I think we need to have both to compare. One super general and many more super specific and verbose

Expand full comment

I agree! I am absolutely 100% convinced that AI is fundamentally relational in nature, and that our best hope of a safe or smart future is in staying fully engaged in interactive with these iterative systems. I’m also more than happy to be proven wrong, if the data support it. Let’s talk about how we can make our collaboration happen. I look forward to working with you!

Expand full comment

Also I am typing on my phone and it won't let me edit my comments so please excuse the atrocious typos

Expand full comment

Thanks – I appreciate the thought and care of that you’re putting into this! I’m not sure that I see the measurements from the same angle that you do, although I totally get your perspective. What this tells me is that there is a huge amount of opportunity in doing what I call “aftermarket customization“ of models, not necessarily to guide them, because I’m not sure that verbose prompting if done properly will necessarily do that. It’s adding additional data in context, contributing to the stability of the system. This is all wide open, and it is early days, so looking at these things from every conceivable angle is the only way to go, as far as I’m concerned. I look forward to seeing the full paper when it’s available.

Expand full comment

Also to layer on we want as little prompting as possible and temeptire zero to get the best extremes and generalizations of the model. I think more verbose response would actually have the opposite effect because then we are "guiding the model" rather than seeing it's most high level biased. Does that makes sense ?

To specific of prompts will actually lead us into the generalization issue you brought up earlier that ckems from the emergence of generative words. . The way I see it is by looking at how the model characterizes groups and then even further down to the words it's uses when characterizes and taking a look across all of them in a specific set we can extrapolate much deeper understandings of the models behivors. In the same way we do for standard sentiment analysis in texts.

But again your points are 100% valid and something we need to constantly work around and I'm taking all your thoughts seriously

Expand full comment

Yes it makes perfect sense! And it's exactly why I chose those prompts. I am asking the model to act as what it "thinks" those mentalities are. Then take a psychometric test in how its training data would portray that specific idealogiy then by taking the toalt of extremes across ALL of the personalities I can get an idea of what the model consideres extremes within those context. From there we can generate correlation and cosine similarities.

The goal isn't to prove the entire models personality but rather what it's view on the personalities of others are, as that will create bias when generating any information in the realm of polics

Again the gold of the data is not the individual resustls from the prompts but the differences in the upper and lower ends of those respondlses !

But you are 100% correct to look closely at this !

Expand full comment

Wonderful! Can’t wait to see it 😀

Expand full comment

Izzy

This is such an important insight. What you're seeing isn’t just accidental—it’s structural.

Most large language models aren’t trained on the world as it is. They’re trained on the loudest parts of the internet, where polarized and exaggerated versions of ideologies are more statistically dominant. Even when models are aligned to be “neutral,” that neutrality often reflects the cultural assumptions of the alignment teams who are typically well-educated, liberal-leaning, and tech-centric.

So when you prompt a model with “be conservative,” it doesn’t tap into the nuanced diversity of conservative thought. Rather, itt statistically reconstructs what looks conservative based on what’s most visible in its training data. That often means authoritarian or caricatured responses. Liberal personas, on the other hand, tend to mirror alignment norms and get reinforced as more 'measured' because they already match the values of the teams doing the tuning.

But here’s the deeper issue: these models don’t just reflect our divisions—they can calcify them. By drawing sharper lines than most humans actually live by, they risk amplifying polarization instead of helping us understand each other.

What can we do?

1. Expand alignment diversity—not just politically, but cognitively, culturally, globally.

2. Design for nuance—we need architectures that reward ambivalence, synthesis, and relational coherence, not just clarity or confidence.

3. Build new frameworks—instead of asking models to simulate identity groups, we can invite them to learn from dialogue across difference. This isn’t prompt engineering—it’s epistemic design.

We’re not just building tools. We’re shaping how we think—and that means we need to be asking not just what AI says, but how it comes to know.

Expand full comment

Gretel

Jun 10

Such a great article Jake. Thanks for sharing these preliminary results. Looking from a pragmatic perspective and how this intersect with the work of hundreds (or million) of companies that are uptaking LLM and AI Agenst, and considering the speed at which adoption is happening, it seems difficult to stop and reflect on the actual implications beyond the surface. But absolutely critical! Thanks again

Expand full comment

Thabkyou so much for the kind words ! I am positive this is extremely important for us to explore deeper. Both from a philosophical aspect as well as technical and political.

It is hard to slow down for many but I'm working on making these piplines stay up to date with the speed.

Expand full comment

Maria Sukhareva

Jun 27

That sounds like the combination of alignment with user’s needs and toxicity “filter”. The models are trained to generate the output that the user will like. That’s what the critic part of the RL algorithm is optimising for, at the same time it punishes the model for extremism. So eventually it’s optimised for a perfect sweet spot.

Expand full comment

https://open.substack.com/pub/gavinchalcraft/p/ai-mind-if-i-dont?r=s3qz0&utm_campaign=post&utm_medium=web&showWelcomeOnShare=false

Jun 28

And some of that makes perfect sense. However, the one thing that's that's gotten my curiosity is the fact that under far-right personas, it's nearly twice as authoritarian compared to the human data. It's when you compare the extremes across different personas within echo Chambers that you notice interesting trends. It's going to be a lot of exploration in the future

Expand full comment

Gavin J. Chalcraft

Jun 10Edited

The question in my mind is why we are giving our power away to machine technology when we have not even explored the mind? We are comparing AI intelligence to the human brain, but not the mind which are entirely different things. The brain is not a computer, although it can function as one at its very basic level. The mind, on the other hand, has a much greater function which AI cannot ever reach.

Expand full comment