Oliver Baumann – University of Southern Denmark
J.P. Eggers – New York University
Nils Stieglitz – Frankfurt School of Finance & Management
Josiah Drewry – George Washington University
Thomas Taiyi Yan – University of Maryland
Question 1. The juxtaposition between benchmarking against peers versus one’s historic records is a fascinating topic. When operationalizing social comparison, you chose means of performances of all units as the reference point. While you cited prior work for making this decision, did you guys explore more “dynamic” forms of benchmarking? For example, work on competition and rivalry (e.g. Kilduff, Elfenbein & Staw, 2010) has shown that competition is often not directed towards a general group, but rather very selective targets (e.g. those who are similar and geographically close). In this sense, we wonder if you considered or explored different configurations of social comparison in which units may benchmark against specific opponents? For instance, does centralization of competition (i.e. every unit competes against one another vs. all units compete against one) influence exploration-exploitation mechanisms and eventually long-term organizational performance?
This point incorporates a key question hidden behind the model: to what extent does social comparison emerge organically (in which case we would expect it to emerge like in Gavin Kilduff’s work, focusing on similar others), as opposed to emerging based on policies or data transparency from the organization itself (in which case we would expect mean performance to be a good benchmark). Both are certainly possible, and likely co-occur in most organizations. It would be really important to better understand how and why social comparisons emerge. A lot of research talks about them as organic processes (papers like Nickerson & Zenger (2008) come to mind), but in work like the Kacperczyk et al. (2015) paper on mutual fund managers, it seems likely that social comparisons emerge in many firms because there are explicit performance (or promotion) incentives tied to relative performance. Certainly, any organization that involves stack ranking is making this comparison explicit and encouraging comparison. The idea that the dynamics of comparison (and benchmark selection) would be different based on how comparisons emerge is a really interesting topic to explore.
Specifically, in our paper, the model shows that intra-organizational heterogeneity makes social comparisons perform worse. This would suggest that benchmarking more towards similar other units would be advantageous, though it sacrifices on sample size in the comparison set and makes noise more salient. We didn’t expressly explore such different benchmarking structures or configurations in the final model, but it could definitely represent a path for future modeling work in this domain. Ideally a model exploring this could include (qualitative or quantitative) empirical data to complement the modeling approach.
Question 2. The concept of “emergent ambidexterity” was extremely interesting. To the extent that this paper delineated the aspiration-setting mechanism through which benchmarking paradigms and contextual variables influence the entire organization’s upstream ability to be ambidextrous, it contributed significant insights as to how we might use generative/complexity science to study complex phenomena. What is your view on how generative/complexity science may uniquely help advance our understanding of organizations?
This finding was very intriguing to us, too. What we find particularly cool about this is how a very simple mechanism–social comparisons don’t require a lot of information or information processing–can actually help regulate behavior in a rather intelligent way, by making a complex organization ambidextrous in a dynamic sense. What is “emergent” here is how this aggregate organizational behavior results from a set of structurally independent units that are only behaviorally interdependent. Another way to label this would be to call it a phenomenon of aggregation – how lower-level behavior and interactions give rise to higher-level behavior and performance.
The term “emergence” has been defined in many different ways, and a physicist, for instance, might require that there is some sort of phase transition going on before calling something emergent, which is not the case in our model. However, we think it’s fair to label such aggregation phenomena “emergent,” because they are usually not obvious ex ante and sometimes not even intuitive. We actually think that a lot of what organizations do is to aggregate their members’ contributions, and while some of these processes are well-understood, others are not, especially those where uncertainty, heterogeneity, and dynamics matter. In that sense, we clearly think that a “complexity science” perspective is extremely useful to think about these systems and perhaps “discover” novel aggregation mechanisms. And asking for more work on aggregation processes is also a case for further models of organization as complex adaptive systems, for lab studies that study aggregation processes under controlled conditions, and for careful qualitative empirical work that might serve as inspiration or illustration of these mechanisms.
So much of what we do in large-scale empirical work is to try and remove confounding effects in order to focus on a single main effect (or, at most, a simple interaction). While this is certainly appropriate in some contexts and provides a nice means of theory testing, the reality is that human interaction is complex and messy. As a result, focusing ONLY on the simple main effects and trying to remove all other complexity sometimes obscures some of the most interesting human interactions. Complexity-based approaches – both modeling and empirical approaches such as new machine learning tools that embrace complexity-based approaches – provide new opportunities to understand the world around us.
Question 3. An important aspect of your model is that, for historical aspirations, changes in exploration and exploitation depend so heavily on the rate of updating. However, it seems unlikely that many units would choose to adjust this parameter, or that they would arrive independently at the rate that your model would indicate is optimal. Firms are certainly aware of the dangers of over-reacting to short-term underperformance when investing in new technology, but they also need to make decisions quickly. Given this tension, how would you advise real-world business units to refine their approach, if they use historical aspirations?
Much of the credit for this definitely has to go to the reviewers. When we wrote the paper initially, we were very focused on grounding our modeling assumptions in empirical data. (Side note: Dan Levinthal always talks about two ways to build models. You should either build a model that creates an outcome visible in the real world and then see what got you there, or make a number of behaviorally-plausible assumptions and see what interesting outcomes you can discover. This paper is in the latter camp). So we looked at the few papers that have estimates of the updating (alpha) parameter in a historical aspirations model and felt like 0.5 (half new data and half historical) was a reasonable estimate. But the reviewers wanted to know how the model behaved with different parameters, which forced us to go back and look at the role of updating speed more systematically.
It is definitely interesting that historical aspirations seem to perform best in our model when when alpha is high, e.g., when firms only incorporate new feedback to a very limited degree, and rely mostly on old information. This was true even in relatively dynamic environments, where one would expect that faster updating would be more relevant. We felt that this was interesting, and were happy the reviewers pushed us to consider these issues more deeply.
In terms of the practical implications, we think the main way to interpret this is that using short-term changes in performance to set your benchmarks is likely to be problematic. If firms face dynamic environments with changing new technologies, even just looking at last period’s performance as a benchmark is likely to be inappropriate. Managers need to be more forward-looking, and develop projections and goals based on something more prospective than simply raw historical data. But the model would suggest that, if a firm is going to rely solely on historical data to set goals, long-run averages are probably more useful than just looking at last period’s results.
Two caveats accompany this conclusion, though. One is that we are talking about stationary environments. That is, even highly dynamic environments in our model are “stable” in the sense that the way they are changing does not change. In such environments, slow learning of targets works well. In nonstationary environments, in contrast, faster updating might be necessary to learn something about the world before it changes in fundamental ways again. A second caveat is that firms in our model do not grow in a traditional sense. Their resource endowment doesn’t change based on accumulated long run performance. So our findings and suggestions might apply more strongly to larger, more stable firms (in terms of size), as opposed to rapidly-growing organizations. At least the dynamics become a good deal more complex, once we allow for growth. There may be an opportunity to incorporate growth through a system more like that in Nelson & Winter’s (1982) simulations in their book that would shed more light on questions like this. There’s only so much that one can incorporate into a single model…
Question 4. Intra-organizational comparison necessitates updating aspirations each period, which makes it fundamentally different from historical aspirations. This raises interesting questions about how units should respond to feedback if their performance is roughly average, especially when there is some ambiguity in how they are ranked. Taking the extreme case, if “winning” and “losing” is binary, as it is in your model, then a unit at the 49th percentile would behave very differently than the unit at the 51st percentile, which may make little sense strategically or regarding resource allocation. Considering the numerous assumptions that must be made in a model like yours, how did you choose to set and adjust parameters for intra-organizational comparisons and historical aspirations?
We played with allocating resources by share, as opposed to allocating all resources towards either exploration or exploitation, and the results were pretty similar. Doing so makes some units at the 40th percentile allocate a small share of their resources to exploitation, but also makes a unit at the 60th allocate some to exploration. The differences we saw in the results didn’t seem to justify the additional complexity of the model, when we were assessing the tradeoff.
Back to the idea of grounding the model in empirical data, we relied on the classic findings from Cyert & March (1963) that organizations (and individuals) typically classify outcomes discretely as wins and losses. Think about analysts and earnings: we see headlines about Apple beating earnings or Facebook missing them, with far less emphasis on how much they missed them by. (Is missing by $0.02 a lot or a little?). This is of course something of an exaggeration, but it captures the real dynamics of how decisions get made in many cases. It also captures the idea that it is hard to engage in simultaneous exploration and exploitation within the same unit. Different units within the firm can specialize, and a unit can shift its focus over time. But again, we felt that this approach was a plausible model of real world behavior.
At the same time, it does seem somewhat absurd that a quarter-to-quarter slide from the 51st percentile within the firm to the 49th percentile would mean a complete shift in R&D strategy. If this did happen, one argument could be that the unit at the 51st percentile was already talking about the need to reallocate, and the slide becomes the straw that broke the camel’s back. But we have very limited data that REALLY shows this – we can use fixed effects models with aspirations data to track changes, but seeing actual shifts in resource allocation right around the margin is difficult.
Question 5. Finally, what were the most enjoyable aspects of completing this study? As for the computational model, what were the major challenges in setting it up? If you could include just one more contingency or write an entire paper about it, what would it be, and how might it pave the way for yet more new theory or empirical investigations?
In many ways, the most enjoyable was also the most difficult. At this point, most organizational models are effectively NK models or multi-armed bandit models – two pretty well-used and well-understood modeling forms. But while there are flavors of multi-armed bandits in our model, doing this paper right required building something new out of the “bones” of Levinthal & March (1981). This ended up being a lot of work, but it allowed us to try and be true to the research questions and the empirical data on organizations, as opposed to forcing our theory to conform to existing model formats. This clearly has pros and cons. It took lots of work and was difficult to get audiences to understand (we had many experienced modelers complain that our model was just too complex), but it allowed us to do something new and explore questions that current models struggle to tackle. The fact that we were willing to take our time (this paper emerged from a 2011 discussion in Denmark) made it more feasible to take this approach.
The question about an additional contingency is actually easy. One facet of the model that we became really interested in playing with was the way that resources were allocated within the firm. Should all units get an equal share? Should the best units get more resources? Should the WORST units get more resources? While we found it really interesting conceptually, it was such a different question from the one in this paper around social comparisons that we just couldn’t justify including it here. But we’ve already been working on extending the model to talk about different corporate resource allocation strategies, and how those strategies shape both search and performance outcomes. Ideally managers could figure out the marginal return on any given resource to be allocated across units, but doing so is exceptionally hard. So firms tend to default either to equal (or proportional, based on size) allocations or lean towards performance-based allocations, where the best performing units get more resources. But it is very unclear whether that is a good heuristic for managers to follow. We hope to explore that more fully in our next project.
Cyert, R. M., & March, J. G. (1963). A behavioral theory of the firm. Englewood Cliffs, N.J.: Prentice Hall.
Kacperczyk, A., Beckman, C. M., & Moliterno, T. P. (2015). Disentangling risk and change: Internal and external social comparison in the mutual fund industry. Administrative Science Quarterly, 60(2), 228-262.
Kilduff, G. J., Elfenbein, H. A., & Staw, B. M. (2010). The psychology of rivalry: A relationally dependent analysis of competition. Academy of Management Journal, 53(5), 943-969.
Levinthal, D., & March, J. G. (1981). A model of adaptive organizational search. Journal of Economic Behavior & Organization, 2(4), 307-333.
Nelson, R.R., & Winter, S.G. (1982). An Evolutionary Theory of Economic Change. Cambridge, Mass.: Belknap Press.
Nickerson, J. A., & Zenger, T. R. (2008). Envy, comparison costs, and the economic theory of the firm. Strategic Management Journal, 29(13), 1429-1449.