A short and informal replication of Petrovich and Buonomo 2018

We receive and gladly publish this post from Maximilian Noichl, a MA student at the University of Vienna (check his website and his recent publication).

I recently came across a very interesting study by Eugenio Petrovich and Valerio Buonomo (2018), in which they analyze co-citation networks for the last three decades of analytic philosophy. After a bit of conversation with Eugenio, I thought that I would do a little quasi-replication of their results, to try out some things.

So I downloaded the sample Petrovich and Buonomo used from the Web of Science. It consists of the years 1985-2014 from the following five journals:

  • Philosophical Review
  • Nous
  • Journal of Philosophy
  • Mind
  • Philosophy and Phenomenological Research

The question that Petrovich and Buonomo are interested in is whether analytic philosophy has become more diverse in these thirty years. There are various ways in which one could approach this question. Petrovich and Buonomo go for visual inspection, and I will try to present an approach to that later. But, first and foremost Eugenio and I were talking about formal evaluation of the co-citation graphs. In what follows I took some inspiration from Tang, Cheng, and Chen (2017) who present a recommendable longitudinal study of the digital humanities.

Transitivities

The first thing I did was to try and get the transitivities for the different samples. Transitivity is a very basic measure of how much a graph tends to cluster. It results from three times the number of triangles in the graph (in our case those triangles consist of three sources, of which each combination was co-cited at least once) divided by the number of triplets (any three sources). A fully connected graph would score 1 on this measure, while a completely unconnected graph would result in a 0. For this little exercise I will use a sliding window approach, in which I always consider 5 years together. It results in the following picture:

Transitivities

As we can see, the transitivity gets smaller over time, reaches its lowest point after 2000, and then rises again. I have also looked into the size of the so called giant component of the network, which is the largest connected sub-graph. In all cases it was higher than 93 %, and slowly approaching 98 % over time, which means that nearly all cited sources were connected via co-citation with the others. In other words: There was only one large network of philosophy, not many small unconnected sub-networks. As this was mostly constant, I think we should be able to interpret the falling transitivity as diminished local co-citation: While clusters are at the beginning very tightly knit, they get more diverse over time. We can assume that it is the local connectivity that is getting smaller, as the average minimal distance of nodes in the networks, which describes how far it usually takes to get from one randomly picked node to the next, is pretty much constant over time at 3.16 steps (std: 0.06). This value is pretty small. Short average distances together with high transitivities suggest small-world networks, in which every node can be reached from every other node via only a few steps, because the network consists of tightly clustered subunits, connected via hubs. This is a property we would generally expect from co-citation-networks drawn from one discipline. It seems to me, that we can interpret the slight diminishment of small worldedness as an indication of an increase in the scope of the kind of philosophy that appears in the surveyed journals.

Gini Coefficients

We can do another thing to get an idea about dynamics in the discipline. In some sense, citations can be considered the currency of academia. And like regular currency, some receive more, some less. Indeed, like with regular currency, a select few receive a lot more then everyone else. This suggests that we can use the same tools that are used to quantify financial inequality in societies over time to quantify inequalities in citation-counts over time. Below I have calculated the Gini-coefficients over the same five year windows used above (using a snippet by Olivia Guest). The Gini coefficient quantifies inequality on a scale from zero to one, in which zero means complete equality, while one indicates that everything is owned by only one person.

Ginis

I would tend to interpret high Gini Coefficients as a sign of increased specialization, as they suggest that most articles focus on a similar set of authors. A lower gini-coefficient on the other hand might be indicative of diversification: As the circle of towering figures with very high citation-counts is enlarged, it stands to reason that also the thematic field becomes more varied. By this measure, analytic philosophy, as depicted by our sample, experienced peak specialization in the early 2000s, but has become slightly more diverse since then. This seems to be somewhat add odds with our previous result, so I’m not sure what to make of it. I would like to check this against the actual content of the articles though: Given full-texts, or at least word-vectors, it would be easy to calculate similar measures.

Visualization

Now for the most fun part, the visual inspection: A neat trick when dealing with confusing networks is to lay out their minimum spanning tree, instead of a usually zealously pruned version of the network itself. I am using the wonderful tmap-library by Probst and Reymond (2019), and will also use faerun, a visualization framework developed by the same authors. I used the Leiden-algorithm (Traag, Waltman, and van Eck (2019)) to identify communities in the networks, which are represented by the colours below.

In the networks below we see the results. Because it would be annoying to browse through 23 graphics, I will only show each of the three decades, in the same way Petrovich and Buonomo do. To read the graphics, remember that the minimum spanning tree-construction will try to put the sources with the strongest connections next to each other, which is why we have these little balls, usually around a primary source of major importance, with which all the others are co-cited. But the algorithm sometimes has to do trade-offs, so we can not expect every node to be linked to its respective nearest neighbour.

I’ve been running the YAKE-keyword algorithm on the abstracts and titles of the citing papers associated with the communities, so we can learn a little bit more about them. Be mindful, we have only titles for the first decade, which is why keyword quality here is low.

1985-1994

[To explore an interactive version,] try it out at this link.

1995-2004

[To explore an interactive version,] try it out at this link.

2005-2014

[To explore an interactive version,] try it out at this link.

I think these graphics tend to indicate that the intellectual landscape in these five journals has been broadened over time.

Length of Bibliographies

Eugenio noted that while the sample is nearly constant over time, as the five journals output similar numbers of papers every year, the length of bibliographies, and therefore the size of the co-citation networks has increased strongly. This seems to agree with general trends in philosophy. From a larger dataset I had lying around ((used in this visualization)[https://homepage.univie.ac.at/noichlm94/full/zoom_final/index.html]) I have extracted the length of bibliographies over time, and the picture suggests quite a considerable effect (depicted using ggpointdensity{.r} by (Lukas Kremer)[https://github.com/LKremer/ggpointdensity]).

It is at the moment not clear to us what drives this effect. On the one hand, technological advancements might have made the management of large amounts of literature far easier. But it might as well be connected with various cultural changes in the discipline. I’m very interested to hear opinions on how this effect should be treated when doing diachronic analyses of literature: Should it somehow be corrected for, to ease comparability? How can it be treated a genuine feature of the data?

Literature

Petrovich, Eugenio, and Valerio Buonomo. 2018. “Reconstructing Late Analytic Philosophy. A Quantitative Approach.” Philosophical Inquiries 6 (1): 151–82. https://doi.org/10.4454/philinq.v6i1.184.

Probst, Daniel, and Jean-Louis Reymond. 2019. “Visualization of Very Large High-dimensional Data Sets as Minimum Spanning Trees”. ChemRxiv. https://doi:10.26434/chemrxiv.9698861.v1.

Tang, Muh-Chyun, Yun Jen Cheng, and Kuang Hua Chen. 2017. “A Longitudinal Study of Intellectual Cohesion in Digital Humanities Using Bibliometric Analyses.” Scientometrics 113 (2): 985–1008. https://doi.org/10.1007/s11192-017-2496-6.

Traag, V. A., L. Waltman, and N. J. van Eck. 2019. “From Louvain to Leiden: Guaranteeing Well-Connected Communities.” Scientific Reports 9 (1): 5233. https://doi.org/10.1038/s41598-019-41695-z.

This entry was posted in Data-Driven Research, History of philosophy. Bookmark the permalink.

1 Response to A short and informal replication of Petrovich and Buonomo 2018

  1. Eugenio Petrovich says:

    This is a very nice piece and I am grateful to Maximilian for sharing it with the DR2 community!

    I appreciate in particular the new, quantitative methods introduced to measure specialization/diversification of analytic philosophy: transitivity (based on network theory) and Gini coefficient (drawn from economics). They add a quantitative spin to our article, which was mainly based on qualitative inspection of the maps from experts.

    As for the slight difference between our diachronic analysis and the trend of the two indexes presented here, I wonder if it has to do with the fact that, in our maps, we used a threshold (i.e., we included only the references cited >= 20 times, thus pruning the network), whereas – if I got it right – Maximilian considers all the network.

    Is it possible that the picture we get when we work with the high-degree reference-nodes only (the “classics” of analytic philosophy) is different from the one we get considering all the +58000 reference-nodes? Perhaps the “classics” form a sort of “frame” which is more clusterized compared to the overall network? I think it would be interesting to explore how the transitivity and the Gini coefficient change in the function of the threshold used to select the nodes.

    As for the increasing length of the bibliographies, in my article on Scientometrics, I noted that the trend began already in the 1960s, with a marked acceleration between the 1980s and 1990s (See Fig.1). Probably, it is due to the interplay of three factors:
    1) a change in the editorial policies of journals: maybe editors started to encourage authors to turn implicit references to explicit citations
    2) the emergence of the Internet, which has considerably simplified the literature search for philosophers as it has done for scientists.
    3) the presence of a knowledge accumulation process (this is the focus of the Scientometrics article and it is widely discussed there).

    More research is needed to gauge the effect of 1 and 2 and, for sure, we need also to better understand how the changing length affects the network statistics.

Leave a Reply

Your e-mail address will not be published. Required fields are marked *.

This site uses Akismet to reduce spam. Learn how your comment data is processed.