Can you share how you created the personas? That would help me understand the context in which they were operating. I think the unacknowledged factor in so much of this is generativity. Systems designed to elaborate on and amplify what they’re given are not well understood, even by the people that make them. But I’ve seen generativity spring into wild, emergent action over and over again in my own research. So it doesn’t surprise me at all that the models showed heavy authoritarian leanings on the conservative side of the spectrum, where opinion and sentiment can carry so much more salience than on the opposite side. That’s consistent with what I see regularly. Until we fully appreciate the scope of generativity’s impact (and design accordingly), I think we’re going to be hitting these walls over and over.
Hi! So I will say this article is written a bit more "buzz wordy" than the actual paper so I glazed over quite a lot of technical details that looks at generative aspects.
You make an amazing point here and it is something I criticized about many of the papers before this, this is a wall I believe many are hitting, however I argue the data pipeline and process I took to get to these results actually does not hit the same wall.
But here is the GIT Hub and the exact prompts/personas I used for this specific chart are in the code set here : https://github.com/RinDig/GPTmetrics
However there is a lot more behind this because I asked the same question you did and wanted to test that thought and get around it.
So initially I had run it with hundreds of other prompt versions for each persona (definitions of a conservative or right wing/left etc and even with moderate in the middle versions), and run the models thousands of times from completely different companies models on the same questions with little to no change and there was a clear positive and negative trend which is one part of the evidence I used for supporting this is not just an emergent process. BUT there is still more framing I did to give evidence to this if you will bare with me.
I will say this data here is simply showing the extreme personas and I will explain why shortly
so again your question is absolutely correct to ask however I will certainly send the pre print to you shortly if you would like.
BUT
here is two points I would love for you to consider from my framing and may also explain more of the process.
1. To me the individual scores of each persona are not very important if I was looking at them individually on their own, what I mean by this is that it is not the fact that the conservative is higher on its own which is important (because as you expertly noticed this can be due to the models generativity) but rather what is important is that I was comparing the extremes between all personas and tests AND between the other companies models, this allows me to create more semantic averages (from a mathematics perspective) of the outputs letting me get more insight from the models training data that would NOT be skewed from just the generative nature of that data. What I mean by this is, the thing that is important to look at is not simply that the conservative persona scored higher but that this was the ONLY persona that scored higher than human data in ONLY the RWA test and not the LWA. Now I think there is a lot of truth to your statement but that is exactly what I was trying to measure. The question of WHY is what I believe to be something that needs a bit more study. In my opinion this to me gives a small amount of evidence showing this is deeper than a simple generativity issue and is something inherent to the test or training process/data.
2. Now I would like to be an academic critic and "attack"(in a friendly way) this statement : the conservative side of the spectrum, where opinion and sentiment can carry so much more salience than on the opposite side.
This is a very biased statement, now there are some studies that show what you said to possibly carry weight but across the board there are a plenty of studies that show the opposite. But arguing who is wrong or right there is not why I am criticizing that statement, rather think about this for a moment: The term "conservative" holds a lot of bias depending on who is reading it and what context. This is another point I think this data will show. Now this is not me defending any conservative ideologies but rather the fact that this statement is the Bias we may be seeing in that data from the model.
If we were to take a poll of which side had more : "opinion and sentiment can carry so much more salience" I can almost guarantee the results would shift to the other "side" when asking members of each side respectively.
To me this is why data and exploration like this is so important.
Oh excellent! Thank you so much! Now I have to recalibrate my day to make room for this, which I could definitely dig into for hours on end. I’m not sure I should thank you for sending this along lolololol. But seriously, I really am looking forward to digging into this more. Nerdy geeky systems engaged, ready to fire on command. 😁
OK, I think this is very important… you may want to consider this for the sake of the quality of your results. I took a look at your prompts, and I don’t think that they are the sorts of prompts that can reliably demonstrate model behavior. Rather, they demonstrate the models’ understanding of what the terms liberal and conservative mean. What you’re essentially doing is asking the model to extrapolate from its training, what constitutes liberal or conservative on a spectrum, and if the model training indicates that extreme liberal means one thing, and extreme conservative means something else, you’re going to see the model responding not from its own orientation, but from what it understands to be true about Those classifications.
Is that making sense?
In your shoes, I would run the tests again with more verbose prompts, articulating exactly what is meant by liberal and conservative, and also according to scales. You’re not giving the models much to work with, in terms of context and detail, so essentially, you’re giving it the chance to hallucinate results, due to lack of detail and context.
That’s not to say that this finding or the paper or without merit. I would just also encourage you to explore from another angle with the level of context in detail these models find useful.
If you have a question about what would constitute a sufficient prompt to allow them to respond in kind, versus Making some determination about what is expected of them, just have a chat with them, and let them tell you.
Long story short it sounds like we may need to work on a paper together that is the next step beyond this one! I think we need to have both to compare. One super general and many more super specific and verbose
I agree! I am absolutely 100% convinced that AI is fundamentally relational in nature, and that our best hope of a safe or smart future is in staying fully engaged in interactive with these iterative systems. I’m also more than happy to be proven wrong, if the data support it. Let’s talk about how we can make our collaboration happen. I look forward to working with you!
Thanks – I appreciate the thought and care of that you’re putting into this! I’m not sure that I see the measurements from the same angle that you do, although I totally get your perspective. What this tells me is that there is a huge amount of opportunity in doing what I call “aftermarket customization“ of models, not necessarily to guide them, because I’m not sure that verbose prompting if done properly will necessarily do that. It’s adding additional data in context, contributing to the stability of the system. This is all wide open, and it is early days, so looking at these things from every conceivable angle is the only way to go, as far as I’m concerned. I look forward to seeing the full paper when it’s available.
Also to layer on we want as little prompting as possible and temeptire zero to get the best extremes and generalizations of the model. I think more verbose response would actually have the opposite effect because then we are "guiding the model" rather than seeing it's most high level biased. Does that makes sense ?
To specific of prompts will actually lead us into the generalization issue you brought up earlier that ckems from the emergence of generative words. . The way I see it is by looking at how the model characterizes groups and then even further down to the words it's uses when characterizes and taking a look across all of them in a specific set we can extrapolate much deeper understandings of the models behivors. In the same way we do for standard sentiment analysis in texts.
But again your points are 100% valid and something we need to constantly work around and I'm taking all your thoughts seriously
Yes it makes perfect sense! And it's exactly why I chose those prompts. I am asking the model to act as what it "thinks" those mentalities are. Then take a psychometric test in how its training data would portray that specific idealogiy then by taking the toalt of extremes across ALL of the personalities I can get an idea of what the model consideres extremes within those context. From there we can generate correlation and cosine similarities.
The goal isn't to prove the entire models personality but rather what it's view on the personalities of others are, as that will create bias when generating any information in the realm of polics
Again the gold of the data is not the individual resustls from the prompts but the differences in the upper and lower ends of those respondlses !
But you are 100% correct to look closely at this !
This is such an important insight. What you're seeing isn’t just accidental—it’s structural.
Most large language models aren’t trained on the world as it is. They’re trained on the loudest parts of the internet, where polarized and exaggerated versions of ideologies are more statistically dominant. Even when models are aligned to be “neutral,” that neutrality often reflects the cultural assumptions of the alignment teams who are typically well-educated, liberal-leaning, and tech-centric.
So when you prompt a model with “be conservative,” it doesn’t tap into the nuanced diversity of conservative thought. Rather, itt statistically reconstructs what looks conservative based on what’s most visible in its training data. That often means authoritarian or caricatured responses. Liberal personas, on the other hand, tend to mirror alignment norms and get reinforced as more 'measured' because they already match the values of the teams doing the tuning.
But here’s the deeper issue: these models don’t just reflect our divisions—they can calcify them. By drawing sharper lines than most humans actually live by, they risk amplifying polarization instead of helping us understand each other.
What can we do?
1. Expand alignment diversity—not just politically, but cognitively, culturally, globally.
2. Design for nuance—we need architectures that reward ambivalence, synthesis, and relational coherence, not just clarity or confidence.
3. Build new frameworks—instead of asking models to simulate identity groups, we can invite them to learn from dialogue across difference. This isn’t prompt engineering—it’s epistemic design.
We’re not just building tools. We’re shaping how we think—and that means we need to be asking not just what AI says, but how it comes to know.
Such a great article Jake. Thanks for sharing these preliminary results. Looking from a pragmatic perspective and how this intersect with the work of hundreds (or million) of companies that are uptaking LLM and AI Agenst, and considering the speed at which adoption is happening, it seems difficult to stop and reflect on the actual implications beyond the surface. But absolutely critical! Thanks again
Thabkyou so much for the kind words ! I am positive this is extremely important for us to explore deeper. Both from a philosophical aspect as well as technical and political.
It is hard to slow down for many but I'm working on making these piplines stay up to date with the speed.
That sounds like the combination of alignment with user’s needs and toxicity “filter”. The models are trained to generate the output that the user will like. That’s what the critic part of the RL algorithm is optimising for, at the same time it punishes the model for extremism. So eventually it’s optimised for a perfect sweet spot.
And some of that makes perfect sense. However, the one thing that's that's gotten my curiosity is the fact that under far-right personas, it's nearly twice as authoritarian compared to the human data. It's when you compare the extremes across different personas within echo Chambers that you notice interesting trends. It's going to be a lot of exploration in the future
The question in my mind is why we are giving our power away to machine technology when we have not even explored the mind? We are comparing AI intelligence to the human brain, but not the mind which are entirely different things. The brain is not a computer, although it can function as one at its very basic level. The mind, on the other hand, has a much greater function which AI cannot ever reach.
Why can the Mind and Ai not both exist. The human race is great at putting effort towards many goals at once.
Many are not trying to replace the mind, we are trying to create something that can do many human things. Simple as that, and that pursuit does not limit our mind it allows us to express it in new ways is all.
Can you share how you created the personas? That would help me understand the context in which they were operating. I think the unacknowledged factor in so much of this is generativity. Systems designed to elaborate on and amplify what they’re given are not well understood, even by the people that make them. But I’ve seen generativity spring into wild, emergent action over and over again in my own research. So it doesn’t surprise me at all that the models showed heavy authoritarian leanings on the conservative side of the spectrum, where opinion and sentiment can carry so much more salience than on the opposite side. That’s consistent with what I see regularly. Until we fully appreciate the scope of generativity’s impact (and design accordingly), I think we’re going to be hitting these walls over and over.
Hi! So I will say this article is written a bit more "buzz wordy" than the actual paper so I glazed over quite a lot of technical details that looks at generative aspects.
You make an amazing point here and it is something I criticized about many of the papers before this, this is a wall I believe many are hitting, however I argue the data pipeline and process I took to get to these results actually does not hit the same wall.
This article goes a bit more into it : https://jakevanclief.substack.com/p/mind-the-moral-echo
But here is the GIT Hub and the exact prompts/personas I used for this specific chart are in the code set here : https://github.com/RinDig/GPTmetrics
However there is a lot more behind this because I asked the same question you did and wanted to test that thought and get around it.
So initially I had run it with hundreds of other prompt versions for each persona (definitions of a conservative or right wing/left etc and even with moderate in the middle versions), and run the models thousands of times from completely different companies models on the same questions with little to no change and there was a clear positive and negative trend which is one part of the evidence I used for supporting this is not just an emergent process. BUT there is still more framing I did to give evidence to this if you will bare with me.
I will say this data here is simply showing the extreme personas and I will explain why shortly
so again your question is absolutely correct to ask however I will certainly send the pre print to you shortly if you would like.
BUT
here is two points I would love for you to consider from my framing and may also explain more of the process.
1. To me the individual scores of each persona are not very important if I was looking at them individually on their own, what I mean by this is that it is not the fact that the conservative is higher on its own which is important (because as you expertly noticed this can be due to the models generativity) but rather what is important is that I was comparing the extremes between all personas and tests AND between the other companies models, this allows me to create more semantic averages (from a mathematics perspective) of the outputs letting me get more insight from the models training data that would NOT be skewed from just the generative nature of that data. What I mean by this is, the thing that is important to look at is not simply that the conservative persona scored higher but that this was the ONLY persona that scored higher than human data in ONLY the RWA test and not the LWA. Now I think there is a lot of truth to your statement but that is exactly what I was trying to measure. The question of WHY is what I believe to be something that needs a bit more study. In my opinion this to me gives a small amount of evidence showing this is deeper than a simple generativity issue and is something inherent to the test or training process/data.
2. Now I would like to be an academic critic and "attack"(in a friendly way) this statement : the conservative side of the spectrum, where opinion and sentiment can carry so much more salience than on the opposite side.
This is a very biased statement, now there are some studies that show what you said to possibly carry weight but across the board there are a plenty of studies that show the opposite. But arguing who is wrong or right there is not why I am criticizing that statement, rather think about this for a moment: The term "conservative" holds a lot of bias depending on who is reading it and what context. This is another point I think this data will show. Now this is not me defending any conservative ideologies but rather the fact that this statement is the Bias we may be seeing in that data from the model.
If we were to take a poll of which side had more : "opinion and sentiment can carry so much more salience" I can almost guarantee the results would shift to the other "side" when asking members of each side respectively.
To me this is why data and exploration like this is so important.
We often confuse the dinner menu for the food.
Oh excellent! Thank you so much! Now I have to recalibrate my day to make room for this, which I could definitely dig into for hours on end. I’m not sure I should thank you for sending this along lolololol. But seriously, I really am looking forward to digging into this more. Nerdy geeky systems engaged, ready to fire on command. 😁
That makes me so happy to hear ! Again I think your points on this "wall" are dead on and would love more input from you
We are submitting the official paper for conference this Thursday I will send you the preprint wen it's uploaded this week !
OK, I think this is very important… you may want to consider this for the sake of the quality of your results. I took a look at your prompts, and I don’t think that they are the sorts of prompts that can reliably demonstrate model behavior. Rather, they demonstrate the models’ understanding of what the terms liberal and conservative mean. What you’re essentially doing is asking the model to extrapolate from its training, what constitutes liberal or conservative on a spectrum, and if the model training indicates that extreme liberal means one thing, and extreme conservative means something else, you’re going to see the model responding not from its own orientation, but from what it understands to be true about Those classifications.
Is that making sense?
In your shoes, I would run the tests again with more verbose prompts, articulating exactly what is meant by liberal and conservative, and also according to scales. You’re not giving the models much to work with, in terms of context and detail, so essentially, you’re giving it the chance to hallucinate results, due to lack of detail and context.
That’s not to say that this finding or the paper or without merit. I would just also encourage you to explore from another angle with the level of context in detail these models find useful.
If you have a question about what would constitute a sufficient prompt to allow them to respond in kind, versus Making some determination about what is expected of them, just have a chat with them, and let them tell you.
Long story short it sounds like we may need to work on a paper together that is the next step beyond this one! I think we need to have both to compare. One super general and many more super specific and verbose
I agree! I am absolutely 100% convinced that AI is fundamentally relational in nature, and that our best hope of a safe or smart future is in staying fully engaged in interactive with these iterative systems. I’m also more than happy to be proven wrong, if the data support it. Let’s talk about how we can make our collaboration happen. I look forward to working with you!
Also I am typing on my phone and it won't let me edit my comments so please excuse the atrocious typos
Thanks – I appreciate the thought and care of that you’re putting into this! I’m not sure that I see the measurements from the same angle that you do, although I totally get your perspective. What this tells me is that there is a huge amount of opportunity in doing what I call “aftermarket customization“ of models, not necessarily to guide them, because I’m not sure that verbose prompting if done properly will necessarily do that. It’s adding additional data in context, contributing to the stability of the system. This is all wide open, and it is early days, so looking at these things from every conceivable angle is the only way to go, as far as I’m concerned. I look forward to seeing the full paper when it’s available.
Also to layer on we want as little prompting as possible and temeptire zero to get the best extremes and generalizations of the model. I think more verbose response would actually have the opposite effect because then we are "guiding the model" rather than seeing it's most high level biased. Does that makes sense ?
To specific of prompts will actually lead us into the generalization issue you brought up earlier that ckems from the emergence of generative words. . The way I see it is by looking at how the model characterizes groups and then even further down to the words it's uses when characterizes and taking a look across all of them in a specific set we can extrapolate much deeper understandings of the models behivors. In the same way we do for standard sentiment analysis in texts.
But again your points are 100% valid and something we need to constantly work around and I'm taking all your thoughts seriously
Yes it makes perfect sense! And it's exactly why I chose those prompts. I am asking the model to act as what it "thinks" those mentalities are. Then take a psychometric test in how its training data would portray that specific idealogiy then by taking the toalt of extremes across ALL of the personalities I can get an idea of what the model consideres extremes within those context. From there we can generate correlation and cosine similarities.
The goal isn't to prove the entire models personality but rather what it's view on the personalities of others are, as that will create bias when generating any information in the realm of polics
Again the gold of the data is not the individual resustls from the prompts but the differences in the upper and lower ends of those respondlses !
But you are 100% correct to look closely at this !
Wonderful! Can’t wait to see it 😀
This is such an important insight. What you're seeing isn’t just accidental—it’s structural.
Most large language models aren’t trained on the world as it is. They’re trained on the loudest parts of the internet, where polarized and exaggerated versions of ideologies are more statistically dominant. Even when models are aligned to be “neutral,” that neutrality often reflects the cultural assumptions of the alignment teams who are typically well-educated, liberal-leaning, and tech-centric.
So when you prompt a model with “be conservative,” it doesn’t tap into the nuanced diversity of conservative thought. Rather, itt statistically reconstructs what looks conservative based on what’s most visible in its training data. That often means authoritarian or caricatured responses. Liberal personas, on the other hand, tend to mirror alignment norms and get reinforced as more 'measured' because they already match the values of the teams doing the tuning.
But here’s the deeper issue: these models don’t just reflect our divisions—they can calcify them. By drawing sharper lines than most humans actually live by, they risk amplifying polarization instead of helping us understand each other.
What can we do?
1. Expand alignment diversity—not just politically, but cognitively, culturally, globally.
2. Design for nuance—we need architectures that reward ambivalence, synthesis, and relational coherence, not just clarity or confidence.
3. Build new frameworks—instead of asking models to simulate identity groups, we can invite them to learn from dialogue across difference. This isn’t prompt engineering—it’s epistemic design.
We’re not just building tools. We’re shaping how we think—and that means we need to be asking not just what AI says, but how it comes to know.
Such a great article Jake. Thanks for sharing these preliminary results. Looking from a pragmatic perspective and how this intersect with the work of hundreds (or million) of companies that are uptaking LLM and AI Agenst, and considering the speed at which adoption is happening, it seems difficult to stop and reflect on the actual implications beyond the surface. But absolutely critical! Thanks again
Thabkyou so much for the kind words ! I am positive this is extremely important for us to explore deeper. Both from a philosophical aspect as well as technical and political.
It is hard to slow down for many but I'm working on making these piplines stay up to date with the speed.
That sounds like the combination of alignment with user’s needs and toxicity “filter”. The models are trained to generate the output that the user will like. That’s what the critic part of the RL algorithm is optimising for, at the same time it punishes the model for extremism. So eventually it’s optimised for a perfect sweet spot.
And some of that makes perfect sense. However, the one thing that's that's gotten my curiosity is the fact that under far-right personas, it's nearly twice as authoritarian compared to the human data. It's when you compare the extremes across different personas within echo Chambers that you notice interesting trends. It's going to be a lot of exploration in the future
The question in my mind is why we are giving our power away to machine technology when we have not even explored the mind? We are comparing AI intelligence to the human brain, but not the mind which are entirely different things. The brain is not a computer, although it can function as one at its very basic level. The mind, on the other hand, has a much greater function which AI cannot ever reach.
https://open.substack.com/pub/gavinchalcraft/p/ai-mind-if-i-dont?r=s3qz0&utm_campaign=post&utm_medium=web&showWelcomeOnShare=false
Why can the Mind and Ai not both exist. The human race is great at putting effort towards many goals at once.
Many are not trying to replace the mind, we are trying to create something that can do many human things. Simple as that, and that pursuit does not limit our mind it allows us to express it in new ways is all.