Places I like to go: El Morro de las Tres Cruces overlooking Medellin Colombia; Moon, Goodyear blimp, city lights, airplanes, Citgo, and Fenway -- an illuminated night of playoff baseball in Boston, August 2005; Medieval crossbow festivities in Piazza San Francesco, Lucca Italy; Half Dome, Yosemite Park CA



  1. A. M. Petersen
    Quantifying the distribution of editorial power and manuscript decision bias at the mega-journal PLOS ONE (pdf)
    Submitted. SSRN e-print:2901272 (2017) Abstract We analyzed the longitudinal activity of nearly 7,000 editors at the mega-journal PLOS ONE over the 10-year period 2006-2015. Using the article-editor associations, we develop editor-specific measures of power, activity, article acceptance time, citation impact, and editorial renumeration (an analogue to self-citation). We observe remarkably high levels of power inequality among the PLOS ONE editors, with the top-10 editors responsible for 3,366 articles -- corresponding to 2.4% of the 141,986 articles we analyzed; the Gini-index of this power distribution is 0.583, which is comparable to some of the highest wealth-inequalities in the world. Such high inequality levels suggest the presence of unintended incentives, which may reinforce unethical behavior in the form of decision-level biases at the editorial level. Due to the size and complexity associated with managing such a large mega-journal, our results indicate that editors may become apathetic in judging the quality of articles and susceptible to modes of power-driven misconduct. We used the longitudinal dimension of editor activity to develop two panel regression models which test and verify the presence of editor-level bias. In both models we clustered the articles within each editor's profile and used editor fixed-effects to isolate the individual-level trends over time: in the first model we analyzed the citation impact of articles, and in the second model we modeled the decision time between an article being submitted and ultimately accepted by the editor. We focused on two variables that represent social factors that capture potential conflicts-of-interest: (i) we accounted for the social ties between editors and authors by developing a measure of repeat authorship among an editor's article set, and (ii) we accounted for the rate of citations directed towards the editor's own publications in the reference list of each article he/she oversaw. Our results indicate that these two factors play a significant role in the editorial decision process, pointing to the misuse of power. Moreover, these two effects appear to increase with editor age, which is consistent with behavioral studies concerning the evolution of misbehavior and response to temptation in power-driven environments. And finally, we analyze ``editor renumeration'' -- the number of citations one might receive by adapting biases towards certain scientific peers as well as self-citations from scientific strangers. By applying quantitative evaluation to the gatekeepers of scientific knowledge, we shed light on various issues crucial to science policy, and in particular, the management of large megajournals.

  2. R. K. Pan, A. M. Petersen, F. Pammolli, S. Fortunato
    The Memory of Science: Inflation, Myopia, and the Knowledge Network (pdf)    (Supporting Information)
    Submitted. arXiv e-print:1607.05606 (2016) Abstract Science is a growing system, exhibiting 4% annual growth in publications and 1.8% annual growth in the number of references per publication. Together these growth factors correspond to a 12-year doubling period in the total supply of references, thereby challenging traditional methods of evaluating scientific production, from researchers to institutions. Against this background, we analyzed a citation network comprised of 837 million references produced by 32.6 million publications over the period 1965-2012, allowing for a detailed analysis of the "attention economy" in science. Unlike previous studies, we analyzed the entire probability distribution of reference ages - the time difference between a citing and cited paper - thereby capturing previously overlooked trends. Over this half-century period we observe a narrowing range of attention - both classic and recent literature are being cited increasingly less, pointing to the important role of socio-technical processes. To better understand how these patterns fit together, we developed a network-based model of the scientific enterprise, featuring exponential growth, the redirection of scientific attention via publications' reference lists, and the crowding out of old literature by the new. We validate the model against several empirical benchmarks. We then use the model to test the causal impact of paradigm shifts in science, thereby providing theoretical guidance for science policy analysis. In particular, we show how perturbations to the growth rate of scientific output - i.e. following from the new layer of rapid online publications - affects the reference age distribution and the functionality of the vast science citation network as an aid for the search & retrieval of knowledge. In order to account for the inflation of science, our study points to the need for a systemic overhaul of the counting methods used to evaluate citation impact - especially in the case of evaluating science careers, which can span several decades and thus several doubling periods.

  3. O. A. Doria Arrieta, F. Pammolli, A. M. Petersen
    Quantifying the negative impact of brain drain on the integration of European science (pdf)
    In press, Science Advances (2017). DOI:10.1126/sciadv.1602232 Abstract

  4. A. M. Petersen, M. Puliga
    High-skilled labour mobility in Europe before and after the 2004 enlargement (pdf)
    Journal of the Royal Society Interface 14, 20170030 (2017). DOI:10.1098/rsif.2017.0030 Abstract The extent to which international high-skilled mobility channels are forming is a question of great importance in an increasingly global knowledge-based economy. One factor facilitating the growth of high-skilled labor markets is the standardization of certifiable degrees meriting international recognition. Within this context, we analyzed an extensive high-skilled mobility database comprising roughly 382,000 individuals from 5 broad profession groups (Medical, Education, Technical, Science & Engineering, and Business & Legal) over the period 1997-2014, using the 13-country expansion of the European Union (EU) to provide insight into labor market integration. We compare the periods before and after the 2004 enlargement, showing the emergence of a new East-West migration channel between the 13 mostly eastern EU entrants (E) and the rest of the western European countries (W). Indeed, we observe a net directional loss of human capital from E->W, representing 29% of the total mobility after 2004. Nevertheless, the counter-migration from W->E is 7% of the total mobility over the same period, signaling the emergence of brain circulation within the EU. Our analysis of the country-country mobility networks and the country-profession bipartite networks provides timely quantitative evidence for the convergent integration of the EU, and highlights the central role of the UK and Germany as high-skilled labor hubs.We conclude with two data-driven models to explore the structural dynamics of the mobility networks.First, we operationalize a redistribution model to explore the potential ramifications of Brexit, showing the extent to which a 'hard' Brexit, i.e. complete disintegration from the EU, may benefit the overall homogeneity of the European mobility network. Second, we use a panel regression model to explain empirical high-skilled mobility rates in terms of various economic `push-pull' factors, the results of which show that government expenditure on education, per-capita wealth, geographic proximity, and labor force size are significant attractive features of destination countries.

    -European mobility and the potential consequences of Brexit, The Royal Society, Ruth Milne

  5. A. M. Petersen
    Quantifying the impact of weak, strong, and super ties in scientific careers (pdf)    (short summary)
    Proceedings of the National Academy of Sciences USA 112, E4671-E4680 (2015). DOI:10.1073/pnas.1501444112 Abstract Scientists are frequently faced with the important decision to start or terminate a creative partnership. This process can be influenced by strategic motivations, as early career researchers are pursuers, whereas senior researchers are typically attractors, of new collaborative opportunities. Focusing on the longitudinal aspects of scientific collaboration, we analyzed 473 collaboration profiles using an ego-centric perspective which accounts for researcher-specific characteristics and provides insight into a range of topics, from career achievement and sustainability to team dynamics and efficiency. From more than 166,000 collaboration records, we quantify the frequency distributions of collaboration duration and tie-strength, showing that collaboration networks are dominated by weak ties characterized by high turnover rates. We use analytic extreme-value thresholds to identify a new class of indispensable `super ties', the strongest of which commonly exhibit >50% publication overlap with the central scientist. The prevalence of super ties suggests that they arise from career strategies based upon cost, risk, and reward sharing and complementary skill matching. We then use a combination of descriptive and panel regression methods to compare the subset of publications coauthored with a super tie to the subset without one, controlling for pertinent features such as career age, prestige, team size, and prior group experience. We find that super ties contribute to above-average productivity and a 17% citation increase per publication, thus identifying these partnerships -- the analog of life partners -- as a major factor in science career development.

    -Dynamic duos in science can reap rewards of academic partnerships, Times Higher Education
    -Lifetime collaborators reap the benefits, Nature
    -Quantifying scientific collaboration, Physics Today
    -Publishing Partners, The Scientist
    -Collaboration and scientific career development, PNAS Highlight
    -Study suggests long term collaborations result in more productive scientific careers,
    -Collaboration Fosters More Productive Scientific Careers than Competition,

  6. A. M. Petersen, D. Rotolo, L. Leydesdorff
    A Triple Helix Model of Medical Innovation: Supply, Demand, and Technological Capabilities in terms of Medical Subject Headings (pdf)
    Research Policy 45(3), 666-681 (2016). DOI:10.1016/j.respol.2015.12.004 Abstract We develop a model of innovation that enables us to trace the interplay among three key dimensions of the innovation process: (i) demand of and (ii) supply for innovation, and (iii) technological capabilities available to generate innovation in the forms of products, processes, and services. Building on triple helix research, we use entropy statistics to elaborate an indicator of mutual information among these dimensions that can provide indication of reduction of uncertainty. To do so, we focus on the medical context, where uncertainty poses significant challenges to the governance of innovation. We use the Medical Subject Headings (MeSH) of MEDLINE/PubMed to identify publications classified within the categories "Diseases" (C), "Drugs and Chemicals" (D), "Analytic, Diagnostic, and Therapeutic Techniques and Equipment" (E) and use these as knowledge representations of demand, supply, and technological capabilities, respectively. Three case-studies of medical research areas are used as representative 'entry perspectives' of the medical innovation process. These are: (i) human papilloma virus, (ii) RNA interference, and (iii) magnetic resonance imaging. We find statistically significant periods of synergy among demand, supply, and technological capabilities (C-D-E) that point to three-dimensional interactions as a fundamental perspective for the understanding and governance of the uncertainty associated with medical innovation. Among the pairwise configurations in these contexts, the demand-technological capabilities (C-E) provided the strongest link, followed by the supply-demand (D-C) and the supply-technological capabilities (D-E) channels.

  7. L. Leydesdorff, A. M. Petersen, I. Ivanova
    Self-Organization of Meaning and the Reflexive Communication of Information (pdf)
    Social Science Information 56(1), 4-27 (2017). DOI:10.1177/0539018416675074 Abstract Following a suggestion of Warren Weaver, we extend the Shannon model of communication piecemeal into a complex systems model in which communication is differentiated both vertically and horizontally. This model enables us to bridge the divide between Niklas Luhmann's theory of the self-organization of meaning in communications and empirical research using information theory. First, we distinguish between communication relations and correlations between patterns of relations. The correlations span a vector space in which relations are positioned and thus provided with meaning. Second, positions provide reflexive perspectives. Whereas the different meanings are integrated locally, each instantiation opens horizons of meaning that can be codified along eigenvectors of the communication matrix. The next-order specification of codified meaning can generate redundancies (as feedback on the forward arrow of entropy production). The horizontal differentiation among the codes of communication enables us to quantify the creation of new options as mutual redundancy. Increases in redundancy can then be measured as local reduction of prevailing uncertainty (in bits). The generation of options can also be considered as a hallmark of the knowledge-based economy: new knowledge provides new options. Both the communication-theoretical and the operational (information-theoretical) perspectives can thus be further developed.

  8. A. Morescalchi, F. Pammolli, O. Penner, A. M. Petersen, M. Riccaboni
    The evolution of networks of innovators within and across borders: Evidence from patent data (pdf)
    Research Policy 44(3), 651-668 (2015). DOI:10.1016/j.respol.2014.10.015 Abstract Recent studies on the geography of knowledge networks have documented a negative impact of physical distance and institutional borders upon research and development (R&D) collaborations. Though it is widely recognized that geographic constraints hamper the diffusion of knowledge, less attention has been devoted to the temporal evolution of these constraints. In this study we use data on patents filed with the European Patent Office (EPO) for 50 countries to analyze the impact of physical distance and country borders on inter-regional links in four different networks over the period 1988-2009: (1) co-inventorship, (2) patent citations, (3) inventor mobility and (4) the location of R&D laboratories. We find the constraint imposed by country borders and distance decreased until mid-1990s then started t o grow, particularly for distance. The intensity of European cross-country inventor collaborations increased at a higher pace than their non-European counterparts until 2004, with no significant relative progress afterwards. Moreover, when analyzing networks of geographical mobility, multinational R&D activities and patent citations we do not depict any substantial progress in European research integration aside from the influence of common global trends.

  9. C. Schulz, A. Mazloumian, A. M. Petersen, O. Penner, D. Helbing
    Exploiting citation networks for large-scale author name disambiguation (pdf)
    EPJ Data Science 3, 11 (2014). DOI:10.1140/epjds/s13688-014-0011-3 Abstract We present a novel algorithm and validation method for disambiguating author names in very large bibliographic data sets and apply it to the full Web of Science (WoS) citation index. Our algorithm relies only upon on the author and citation graphs available for the whole period covered by the WoS. A pair-wise publication similarity metric, which is based on common co-authors, self-citations, shared references and citations, is established to perform a two-step agglomerative clustering that first connects linked papers and then merges similar clusters. This parameterized model is optimized towards an h-index based recall, which favors the inclusion of well-cited publications, and a name-initials-based precision using WoS metadata and cross-referenced Google Scholar profiles. Despite the use of limited metadata, we reach a recall of 87% and a precision of 88% with a preference for researchers with high h-index values. 47 million articles of WoS can be disambiguated on a single machine in less than a day. We develop an h-index distribution model, confirming that the prediction is in excellent agreement with the empirical data, and yielding insight into the utility of the h-index in real academic ranking scenarios.

  10. A. M. Petersen, O. Penner
    Inequality and cumulative advantage in science careers: a case study of high-impact journals (pdf)    (short summary)
    EPJ Data Science 3, 24 (2014). DOI:10.1140/epjds/s13688-014-0024-y Abstract Analyzing a large data set of publications drawn from the most competitive journals in the natural and social sciences we show that research careers exhibit the broad distributions of individual achievement characteristic of systems in which cumulative advantage plays a key role. While most researchers are personally aware of the competition implicit in the publication process, little is known about the levels of inequality at the level of individual researchers. Here we analyzed both productivity and impact measures for a large set of researchers publishing in high-impact journals, accounting for censoring biases in the publication data by using distinct researcher cohorts defined over non-overlapping time periods. For each researcher cohort we calculated Gini inequality coefficients, with average Gini values around 0.48 for total publications and 0.73 for total citations. For perspective, these observed values are well in excess of the inequality levels observed for personal income in developing countries. Investigating possible sources of this inequality, we identify two potential mechanisms that act at the level of the individual that may play defining roles in the emergence of the broad productivity and impact distributions found in science. First, we show that the average time interval between a researcher's successive publications in top journals decreases with each subsequent publication. Second, after controlling for the time dependent features of citation distributions, we compare the citation impact of subsequent publications within a researcher's publication record. We find that as researchers continue to publish in top journals, there is more likely to be a decreasing trend in the relative citation impact with each subsequent publication. This pattern highlights the difficulty of repeatedly producing research findings in the highest citation-impact echelon, as well as the role played by finite career and knowledge life-cycles, and the intriguing possibility that confirmation bias plays a role in the evaluation of scientific careers.

    -Scientific networks and success in science, EPJ Data Science Editorial

  11. I. Pavlidis, A. M. Petersen, I. Semendeferi
    Together we stand (pdf)
    Nature Physics 10, 700-702 (2014). DOI:10.1038/nphys3110 Abstract During the past 70 years science has been transforming, from the solitary operation that for centuries it used to be, into an endeavor characterized by ever-increasing team size. The importance of this transformation to our technology-driven society cannot be overestimated. As science undergoes this phenomenal evolution, one might expect that the scientific community and its main host - academia - would develop new norms that better serve a new stage. Moth metamorphosis is an example of a natural process that does exactly this, brilliantly adapting form to function. Alas, social constructs are not as flexible as natural processes. The academic career structure originally conceived to reward self-sufficient singletons, continues to be implemented in a system dominated by teams and characterized by symbiotic relationships. To make matters worse, increasingly specialized education leaves academics ill prepared to cope with this challenge. When, how, and why did this malformation start, where does it lead, and how can it be ameliorated? By addressing these questions we bring to the fore the causal links and future projections of the problem, informing a policy and moral dialogue for its resolution.

    -Team Science Is Tied to Growth in Grants With Multiple Recipients, The Chronicle of Higher Education
    -Researchers say academia can learn from Hollywood,

  12. A. M. Petersen, S. Fortunato, R. K. Pan, K. Kaski, O. Penner, A. Rungi, M. Riccaboni, H. E. Stanley, F. Pammolli
    Reputation and Impact in Academic Careers (pdf)    (Supporting Information)
    Proceedings of the National Academy of Sciences USA 111, 15316-15321 (2014). DOI:10.1073/pnas.1323111111 Abstract Reputation is an important social construct in science, which enables informed quality assessments of both publications and careers of scientists in the absence of complete systemic information. However, the relation between reputation and career growth of an individual remains poorly understood, despite recent proliferation of quantitative research evaluation methods. Here we develop an original framework for measuring how a publication's citation rate \Delta c depends on the reputation of its central author i, in addition to its net citation count c. To estimate the strength of the reputation effect, we perform a longitudinal analysis on the careers of 450 highly-cited scientists, using the total citations C_i of each scientist as his/her reputation measure. We find a citation crossover cx which distinguishes the strength of the reputation effect. For publications with c < c_x, the author's reputation is found to dominate the annual citation rate. Hence, a new publication may gain a significant early advantage corresponding to roughly a 66% increase in the citation rate for each tenfold increase in C_i. However, the reputation effect becomes negligible for highly cited publications meaning that for c >= c_x the citation rate measures scientific impact more transparently. In addition we have developed a stochastic reputation model, which is found to reproduce numerous statistical observations for real careers, thus providing insight into the microscopic mechanisms underlying cumulative advantage in science.

    -Recognition: Build a reputation, Nature Jobs
    -Being a big name in science brings benefits, Nature
    -Researchers prefer citing researchers of good reputation,
    -Scientists' reputations and citation rates, PNAS Highlight

  13. A. M. Petersen, I. Pavlidis, I. Semendeferi
    A quantitative perspective on ethics in large team science (pdf)
    Science & Engineering Ethics 20, 923-945 (2014). DOI: 10.1007/s11948-014-9562-8 Abstract The gradual crowding out of singleton and small team science by large team endeavors is challenging key features of research culture. It is therefore important for the future of scientific practice to reflect upon the scientists' ethical responsibilities within teams. To facilitate this reflection we show labor force trends in the US revealing a skewed growth in academic ranks and increased levels of competition for promotion within the system; we analyze teaming trends across disciplines and national borders demonstrating why it is becoming difficult to distribute credit and to avoid conflicts of interest; and we use more than a century of Nobel prize data to show how science is outgrowing its old institutions of singleton awards. Of particular concern within the large team environment is the weakening of the mentor-mentee relation, which undermines the cultivation of virtue ethics across scientific generations. These trends and emerging organizational complexities call for a universal set of behavioral norms that transcend team heterogeneity and hierarchy. To this end, our expository analysis provides a survey of ethical issues in team settings to inform science ethics education and science policy.

    - Family values, Philip Ball (Chemisty world, April 17, 2014)

  14. O. Penner, R. K. Pan, A. M. Petersen, K. Kaski, S. Fortunato
    On the Predictability of Future Impact in Science (pdf)
    (Nature) Scientific Reports 3, 3052 (2013). DOI: 10.1038/srep03052 Abstract Correct assessment of scientist's past research impact and potential for future impact is fundamental to all personnel recruitment decisions in science. Quantitative measures for impact of previous work are already, formally and informally, involved in the recruitment and evaluation process. Of greater concern in the recruitment process is what a candidate will do in the future. Attempts have recently been made to develop models capable of predicting a scientist's future impact by way of his or her future h-index. Here we present a cross-sectional analysis of 762 longitudinal careers drawn from three disciplines: physics, biology, and mathematics. By applying future impact models to these careers we identify a number of subtle, but critical, flaws in current models. Specifically, cumulative non-decreasing measures like the h-index contain intrinsic spurious autocorrelation, resulting in a significant overestimation of their "predictive power". Applying the model to a scientist's annual h-index change (a non-cumulative measure), the models exhibit far less predictive power. Moreover, the predictive power of these models vary greatly with the career age of scientists, producing least accurate estimates for already risk-burdened early career researchers. These results place in doubt the suitability of linear regression models of future h-index for real application in recruitment decisions and indicate that more effort is needed to develop and benchmark career predictability algorithms.

    - Models to predict scientists' future impact often fail, Phys.Org (Oct.30, 2013)
    - Divinations of academic success may be flawed, Nature

  15. A. M. Petersen, S. Succi
    The Z-index: A geometric representation of productivity and impact which accounts for information in the entire rank-citation profile (pdf)
    J. Informetrics 7, 823-832 (2013). DOI: 10.1016/j.joi.2013.07.003 Abstract We present a simple generalization of Hirsch's h-index, Z = sqrt(h^2 + C)/5, where C is the total number of citations. Z is aimed at correcting the potentially excessive penalty made by h on a scientist's highly cited papers, because for the majority of scientists analyzed, we find the excess citation fraction (C-h^2)/C to be distributed closely around the value 0.75, meaning that 75 percent of the author's impact is neglected. Additionally, Z is less sensitive to local changes in a scientist's citation profile, namely perturbations which increase h while only marginally affecting C. Using real career data for 476 physicists careers and 488 biologist careers, we analyze both the distribution of Z and the rank stability of Z with respect to the Hirsch index h and the Egghe index g. We analyze careers distributed across a wide range of total impact, including top-cited physicists and biologists for benchmark comparison. In practice, the Z-index requires the same information needed to calculate h and could be effortlessly incorporated within career profile databases, such as Google Scholar and ResearcherID. Because Z incorporates information from the entire publication profile while being more robust than h and g to local perturbations, we argue that Z is better suited for ranking comparisons in academic decision-making scenarios comprising a large number of scientists.

  16. O. Penner, A. M. Petersen, R. K. Pan, S. Fortunato
    The case for caution in predicting scientists' future impact (pdf)
    Physics Today 66, 8-9 (2013). DOI: 10.1063/PT.3.1928 Abstract To further examine dimensions of career predictability as proposed by Acuna et al. [Nature 489, 201-2 2012], we applied their model to a longitudinal career data set of 100 Assistant professors in physics, two from each of the top 50 physics departments in the US. We use the Acuna model to calculate the predictive power of the model as a function of the number of years into the future we are attempting to predict as well as the career age of the scientists. The Acuna model does a respectable job of predicting h(t+Delta t), say, 3 or 4 years into the future when aggregating all age cohorts together. However, when calculated for subset of specific age cohorts we find that the model's predictive power significantly decreases, especially when applied to researchers in the first three years of their career. In those cases the model does a much worse job of predicting future success, and hence, exposes a serious limitation. The limitation is particularly concerning as early career decisions make up a significant portion, if not the majority, of cases where quantitative approaches are likely to be applied.

  17. A. Chessa, A. Morescalchi, F. Pammolli, O. Penner, A. M. Petersen, M. Riccaboni
    Is Europe Evolving Toward an Integrated Research Area? (pdf)
    Science 339, 650-651 (2013). DOI: 10.1126/science.1227970 Abstract An integrated European Research Area (ERA) is a critical component for a more competitive and open European R&D system. However, the impact of EU-specific integration policies aimed at overcoming innovation barriers associated with national borders is not well understood. Here we analyze 2.4 x 10^6 patent applications filed with the European Patent Office (EPO) over the 25-year period 1986-2010 along with a sample of 2.6 x 10^5 records from the ISI Web of Science to quantitatively measure the role of borders in international R&D collaboration and mobility. From these data we construct five different networks for each year analyzed: (i) the patent co-inventor network, (ii) the publication co-author network, (iii) the co-applicant patent network, (iv) the patent citation network, and (v) the patent mobility network. We use methods from network science and econometrics to perform a comparative analysis across time and between EU and non-EU countries to determine the ``treatment effect'' resulting from EU integration policies. Using non-EU countries as a control set, we provide quantitative evidence that, despite decades of efforts to build a European Research Area, there has been little integration above global trends in patenting and publication. This analysis provides concrete evidence that Europe remains a collection of national innovation systems.

    - European Research: Still Fragmented After All These Years, AlphaGalileo Foundation
    - Europe still has a way to go to achieve true unity, Research Europe, Issue 359
    - Ricerca europea, l'integrazione ancora non c'e, Le Scienze (Scientific American, Italy)
    - Unione europea, ancora non cadono le frontiere della ricerca, Wired

  18. A. M. Petersen, J. Tenenbaum, S. Havlin, H. E. Stanley, M. Perc
    Languages cool as they expand: Allometric scaling and the decreasing need for new words (pdf)
    (Nature) Scientific Reports 2, 943 (2012). DOI: 10.1038/srep00943 Abstract Language is the hallmark of our cumulative culture, by which means we are able to continuously improve on the achievements of previous generations. According to the most recent estimates, the size of the English lexicon has grown by roughly 88% during the 20th century alone. But what is the utility of so many new words? What is the reach of each new word? Since many new words are technical, what is the likelihood of encountering them, and alternatively, what is the use in remembering them? Underlying these questions is the pressure applied by technological change, which is fundamentally altering the ways in which humans communicate, store, and recall information. We test the stability of the large-scale statistical properties of written language over the 209-year period 1800-2008, analyzing the Zipf law, the Heaps' law, and the size-variance relation quantifying langauge growth patterns for all Google 1-gram databases comprising 7 different langauges. We find that the annual growth fluctuations of word use has a decreasing trend as the corpus size increases, indicating a slowdown in linguistic evolution following language expansion. This "cooling pattern" forms the basis of a third statistical regularity, which unlike the Zipf and the Heaps law, is dynamical in nature.

    - Choice Words: Graphing the evolution of language, arts&sciences Fall 2013 Magazine (Annual BU Research Highlight)
    - How big is your language?, The Hindu, (Dec. 20, 2012)
    - Physicists Explore The Rise And Fall Of Words, Inside Science News Service (ISNS)
    - When physicists do linguistics, The Boston Globe / International Herald Tribune (Feb. 10/11, 2013)

  19. A. M. Petersen, M. Riccaboni, H. E. Stanley, F. Pammolli.
    Persistence and Uncertainty in the Academic Career (pdf)
    Proceedings of the National Academy of Sciences USA 109, 5213 - 5218 (2012). DOI: 10.1073/pnas.1121429109 Abstract Recent shifts in the business structure of universities and a bottleneck in the supply of tenure track positions are two issues that threaten to change the longstanding patronage system in academia. Understanding how institutional changes within academia may affect the overall potential of science requires a better quantitative understanding of how careers evolve over time. Since knowledge spillovers, cumulative advantage, and collaboration are distinctive features of the academic profession, the employment relationship should be designed to account for these factors. We quantify the impact of these factors in the production n_i(t) of a given scientist i by analyzing the longitudinal career data of 300 scientists and compare our results with 21,156 sports careers comprising a non-academic labor force. The increase in the typical size of scientific collaborations has led to the increasingly difficult task of allocating funding and assigning recognition. We use measures of the scientific collaboration radius, which can change dramatically over the course of a career, to provide insight into the role of collaboration in productio n efficiency. We introduce a model of proportional growth to provide insight into the complex relation between knowledge spillovers, competition, and uncertainty at the individual scale. Our model shows that high competition levels can make careers vulnerable to ``sudden death'' termination relatively early in the career as a result of negative production fluctuations and not necessarily due to lack of individual persistence.

    - Short-term contracts may hinder young scientists, PNAS Highlight

  20. A. M. Petersen, J. Tenenbaum, S. Havlin, H. E. Stanley.
    Statistical Laws Governing Fluctuations in Word Use from Word Birth to Word Death (pdf)
    (Nature) Scientific Reports 2, 313 (2012). DOI: 10.1038/srep00313 Abstract In this aggregate analysis of the growth rates of millions of words we demonstrate significant signatures of competition driven systems in the linguistic arena of English, Spanish and Hebrew. How often a given word is used, relative to other words, can convey information about the word's linguistic utility. Using Google word data for 3 languages over the 209-year period 1800-2008, we found by analyzing word use an anomalous recent change in the birth and death rates of words, which indicates a shift towards increased levels of competition between words as a result of new standardization technology. We demonstrate unexpected analogies between the growth dynamics of word use and the growth dynamics of economic institutions. Our results support the intriguing concept that a language's lexicon is a generic arena for competition which evolves according to selection laws that are related to social, technological, and political trends. Specifically, the aggregate properties of language show pronounced differences during periods of world conflict, e.g. World War II.

    - F1000 Evaluated Article, Faculty of 1000 post-publication peer review
    - The New Science of the Birth and Death of Words, Wall Street Journal (Mar. 17, 2012)
    - Languages Lose Vocab to Science and Spell-Check, InnovationNewsDaily
    - Digital Spell-Checking May Be Killing Off Words, LiveScience / MSNBC /
    - Modern era brings death to words, ScienceNews
    - Study tracks births, deaths of words, United Press International (UPI)
    - Study reveals words' Darwinian struggle for survival, theGuardian
    - La guerra de las palabras, el Espectador (Colombia)
    - Word Extinction, A nice blog summary by Dev Gualtieri (Aug. 11, 2011)

  21. A. M. Petersen, W-S. Jung, J-S. Yang, H. E. Stanley.
    Quantitative and Empirical demonstration of the Matthew Effect in a study of Career Longevity (pdf)
    Proceedings of the National Academy of Sciences USA 108, 18-23 (2011). DOI: 10.1073/pnas.1016733108 Abstract In many competitive systems, there are typically only few "big winners." This largely reflects the everyday fact that obtaining future opportunities often depends on an individual's record of achievement since employment opportunities are limited to a finite number of competitors. We solve exactly a longevity model which predicts the distribution of career length P(x) for professions characterized by high selectivity and uncertainty. We confirm the model's prediction for P(x) using extensive empirical data for the careers of both scientists (publishing in high-impact journals such as Nature, Science, etc.) and professional athletes (playing in MLB, NBA, Premier League, and Korean Professional Baseball). This study uncovers a remarkably simple statistical law which describes the frequencies of the extremely short careers of `one-hit wonders' as well as the extremely long careers of the `iron-horses'. Our model highlights the importance of early career development, showing that many careers are stunted by the relative disadvan- tage associated with inexperience.

  22. A. M. Petersen, H. E. Stanley, S. Succi.
    Statistical regularities in the rank-citation profile of scientists (pdf)
    (Nature) Scientific Reports 1, 181 (2011). DOI: 10.1038/srep00181 Abstract We analyze the individual career publication statistics of 200 'stellar' physicists and 100 Assistant professors in order to better understand success, productivity, and the h-index. In order to analyze the entire set of publications of a given scientist at once, we analyze the rank-citation curve c(r) using the Zipf ranking technique. Incredibly, we observe universal feature: although every scientist has a distinct h value, each scientist also has a similar (two-parameter) curve c(r)! Using the properties of this universal curve we show that the total number of citations C scales with an author's h-index as C ~ h^(1+\beta), where \beta is a high-rank power-law scaling exponent for c(r). That the human endeavors of these scientists produces a common representative curve suggests that scientific careers are governed by the statistical laws of competition and cumulative advantage. Such statistical regularities in the input-output patterns of scientists can be used as benchmarks for theoretical models of career progress.

  23. A. M. Petersen, O. Penner.
    A method for the unbiased comparison of MLB and NBA career statistics across era (pdf)
    Presented at the MIT Sloan Sports Analytics Conference 2012 (2012). Abstract An extension of "Methods for detrending success metrics to account for inflationary and deflationary factors" to National Basketball Association (NBA) career statistics. Includes extensive tables listing re-ranked top-50 achievements for points, rebounds and assists, at the season and career level.

  24. A. M. Petersen, O. Penner, H. E. Stanley.
    Methods for detrending success metrics to account for inflationary and deflationary factors (pdf)
    Eur. Phys. J. B 79, 67-78 (2011). DOI: 10.1140/epjb/e2010-10647-1
    Pre-print title: Detrending career statistics in professional Baseball: accounting for the Steroids Era and beyond Abstract We compare both career and seasonal achievements of 130+ years of baseball players, (e.g., addressing the question of who effectively hit more home runs -- Babe Ruth or Barry Bonds?), using statistical methods to account for time-dependent factors that inflate success measures. We provide non-technical top-50 record tables for career HR, H, RBI, W, K and season HR, H, RBI, K, focussing on the accessible measures found in newspaper box-scores and on the back of baseball cards.

    - Complexity Theory and the National Baseball Hall of Fame, the European Physical Journal News Highlights
    - New Statistical Method Ranks Sports Players From Different Eras, MIT Technology Review
    - Boston University clip, The Daily Free Press
    - A Physics Curveball , arts&sciences Fall 2010 Magazine (Annual BU Research Highlight)
    - Baseball Greats Reranked , BU Today, April 8, 2011

  25. A. M. Petersen.
    Applications of Statistical Physics to the Social and Economic Sciences (pdf)
    PhD Thesis, Boston University (2011). Thesis Advisor: H. Eugene Stanley

  26. B. Podobnik, D. Horvatic, A. M. Petersen, B. Urosevic, H. E. Stanley.
    Bankruptcy risk model and empirical tests (pdf)
    Proceedings of the National Academy of Sciences USA 107, 18325 (2010). DOI: 10.1073/pnas.1011942107 Abstract We compare bankrupt companies with non-bankrupt companies using Zipf ranking techniques to analyze the debt-to-assets leverage ratio R. Using the distribution of R for bankrupt versus non-bankrupt companies, we estimate the bankruptcy risk of an existing company conditional on its current R value and find that the probability of bankruptcy P(B) ~ R.

    - The relationship between bankruptcy and relative debt for U.S. companies , PNAS Highlight

  27. A. M. Petersen, F, Wang, S. Havlin, H. E. Stanley.
    Market dynamics immediately before and after financial shocks: quantifying the Omori, productivity and Bath laws (pdf)
    Physical Review E 82, 036114 (2010). DOI: 10.1103/PhysRevE.82.036114 Abstract Financial shocks (incoming information) can cause significant cascading (e.g. ``market rallies"), so we use methods from earthquake physics to better understand the expected dynamics before and after shocks of characteristic main-shock magnitude M.

  28. A. M. Petersen, F. Wang, S. Havlin, H. E. Stanley.
    Quantitative law describing market dynamics before and after interest-rate change (pdf)
    Physical Review E 81, 066121 (2010). DOI: 10.1103/PhysRevE.81.066121 Abstract We analyze the financial "earthquake" that occurs evey time the U.S. Federal Reserve makes an announcement to change the federal target interest rate, and estimate the magnitude of market `anticipation' and `surprise' using the fundamental relationship between the federal effective `overnight' interest rate and the 6-month Treasury Bill.

    - Bernanke Announcement Leaves Quake Like Aftershocks , Inside Science News Service

  29. A. M. Petersen, B. Podobnik, D. Horvatic, H. E. Stanley.
    Scale-invariant properties of public-debt growth (pdf)
    Europhysics Letters 90, 38006 (2010). DOI: 10.1209/0295-5075/90/38006 Abstract Applying methods from macro-economic growth theory, we find 'convergence' in country debt-to-GDP leverage ratios over the last 30+ years.

  30. A. M. Petersen, F. Wang, H. E. Stanley.
    Methods for measuring the citations and productivity of scientists across time and discipline (pdf)
    Physical Review E 81, 036114 (2010). DOI: 10.1103/PhysRevE.81.036114 Abstract If we account for the time-dependent increase in paper citations as well as variations in paper collaboration group size, what do the distributions of (i) career total citations and (ii) career total number of publications look like for individual scientists in highly competitive journals? Also, evidence of cumulative advantage demonstrated by the increasing publication rate of individual scientists with each new publication in his/her career.

  31. B. Podobnik, D. Horvatic, A. M. Petersen, M. Njavro, H. E. Stanley.
    Common scaling behavior in finance and macroeconomics (pdf)
    Eur. Phys. J. B 76, 487 (2010). DOI: 10.1140/epjb/e2009-00380-3 Abstract We analyze the growth rates of worldwide stock indices and relate the market capitalization (MC) of the index to the gross domestic product (GDP) of the index country.

  32. B. Podobnik, D. Horvatic, A. M. Petersen, H. E. Stanley.
    Quantitative relations between risk, return, and firm size (pdf)
    Europhysics Letters 85, 50003 (2009). DOI: 10.1209/0295-5075/85/50003 Abstract For individual companies comprising the Nasdaq (2002-2008) and S&P500 (2003-2008) indices, we analyze the logarithmic growth rate (return) R of the stock price. We also relate the annual market capitalization (MC) and the return-to-risk < R >/sigma(R) for each company and find interesting differences between the Nasdaq and S&P500.

  33. B. Podobnik, D. Horvatic, A. M. Petersen, H. E. Stanley.
    Cross-Correlations between Volume Change and Price Change (pdf)
    Proceedings of the National Academy of Sciences USA 106, 22079 (2009). DOI: 10.1073/pnas.0911983106 Abstract In analogy to the analysis of price volatility in financial markets, we analyze the absolute logarithmic returns (volatility) of total volume at the 1-day time resolution for individual stocks as well as stock indices, and use Detrended Cross-Correlation Analysis (DCCA) to quantify the relation between price volatility and volume volatility.

  34. A. M. Petersen, W-S. Jung, H. E. Stanley.
    On the distribution of career longevity and the evolution of home run prowess in professional baseball (pdf)
    Europhysics Letters 83, 50010 (2008). DOI: 10.1209/0295-5075/83/50010 Abstract How is it that 3% of all fielders finish their career with one at-bat and 3% of all pitchers finish their career with less than one inning pitched; Yet, there are also some careers that span more than 10,000 at-bats and 3,000 innings pitched? Analyzing every Major League Baseball player career over the 80-year period 1920-2000, we find a beautiful statistical law which describes both the extremely short careers of `one-hit wonders' as well as the extremely long careers of the `iron-horses'. Furthermore, analyzing home run rates, we find evidence consistent with performance enhancing drugs during the `Steroids Era' of the 1990's and 2000's.

  35. M. Mobilia, A. Petersen, S. Redner.
    On the role of Zealotry in the Voter Model (pdf)
    J. Stat. Mech. 08, P08029 (2007). DOI: 10.1088/1742-5468/2007/08/P08029 Abstract Why is it that in the history of democratic elections (e.g. Presidential elections), complete consensus (polarization) has never been achieved? For example, the largest percentage of voters for U.S. President elect was approximately 61% in Johnson over Goldwater, 1964. We investigate a stochastic opinion model in which consensus is stymied by the presence of zealots, agents who are completely fixed in their opinion, even if all their neighbors are of opposite opinion. Surprisingly, we find that the number and not the density of zealots determines the degree of consensus among the voters in our model.


You can also find preprint versions of my papers here on the arXiv


  • The computational social science of academic career growth (2015) (pdf)

  • Being Ethical in Large-Team Science: A Quantitative Historical Perspective (2013) (pdf)
    Presented at The History of Science Society 2013 Annual Meeting, Nov. 2013

  • Using big data to quanitfy the evoloution of language at the micro and macro scale (2013) (pdf)
    R-rated version here (pdf), presented at Nerd Nite Milan, Oct. 30 2013

  • Ascent in competitive arenas: From Fenway Park to Mass Ave (2013) (pdf)
    Presented at the "Science of Success" Symposium, Northeastern Univ. & IQSS Harvard University

  • Multilevel networks in science: from individual careers to Europe (2013) (pdf)
    Presented at the "Econophysics and Networks Across Scales" Workshop, Lorentz Center International Center for workshops in the Sciences, Leiden University

  • Beyond the Asterisk* : Adjusting for Performance Inflation in Professional Sports (2012) (pdf)
    presented at the "Sabermetrics, Scouting and the Science of Baseball" weekend seminar for the benefit of the Jimmy Fund

  • Persistency and uncertainty across the academic career (2012) (pdf)

  • Quantifying statistical regularities in the career achievements of scientists and professional athletes (2012) (pdf)

  • Quantitative laws describing market dynamics before and after interest-rate change and other financial shocks (2011) (pdf)