New DR2 member

We warmly welcome Pietro Lana, master student at the philosophy department here in Turin, as a new member of the DR2 group. Check him and other DR2 affiliates in our People section.

Posted in Uncategorized | Leave a comment

Considerations about corpus-dependency of topic modelling with Mallet

By Sara Garzone and Nicola Ruschena

In the context of text mining, topic modelling analyses co-occurrence patterns among textual data, in order to isolate clusters from the set of expressions occurring in a corpus. Topic modelling aims at extracting topics occurring in a corpus and categorize documents on the basis of their semantic content. It often represents an appealing approach for data-driven analysis in short-run projects, for it is an unsupervised method, i.e., there is no requirement for algorithm training from labelled data, whose production is quite a demanding task. Moreover, software programs that are executable from command line or user interface have been developed to perform topic modelling, so as to provide more friendly environments for researchers who are not much acquainted with code design.

Mallet is a tool for topic modelling: it is a Java-based package for statistical natural language processing, which was initially developed by Andrew McCallum at the University of Massachusetts. It allows topic modelling on textual corpora, without requiring advanced technical knowledge in statistics and programming. 

Mallet’s topic modelling is based on the Latent Dirichlet Allocation (LDA) model, a Bayesian probabilistic generative model which has been applied for the first time to text classification tasks by David Blei et al. in 2003, and thereafter has become the standard for probabilistic text categorization under latent semantic hypotheses. Along with many other techniques in the field of natural language processing, topic modelling relies upon the so-called distributional hypothesis (Harris 1954), according to which words occurring in the same contexts tend to have similar meanings. 

From co-occurrence analysis and clustering it is then possible to expect clusters to reflect semantic proximity relations, or topics. With advanced applications of probabilistic models, a categorization of documents can then be obtained on the basis of the degree of probability of their being a member of detected topics. The underlying assumption is that in each document a probabilistic distribution of every topic can be recognized. With LDA-based topic modelling one can try to understand which of the topics that have been detected in the corpus are likely to be present in each document, given the occurring terms.
Continue reading

Posted in Data-Driven Research, Text mining | 2 Comments

Three-day on-line conference, Nov. 23-24-25, 2020

We are pleased to give notice of the next three-day conference organised by the Digital Humanities department at the University of Basel: “Digital Practices. Reading, writing and evaluation on the web”. The conference will take place online, November 23–25, 2020. 

Among the contents:


November, 24 – 14:30 – Discussion of limits and opportunities lying behind the operationalization of literary theory’s concepts, by Simone Rebora (Distant Reading Story World Absorption);

November, 24 – 16:50 – Detection of correlation between held belief and social condition in the analysis of two survey-based datasets, by Charles Lassiter (Big Data and Naturalized Social Epistemology: a New Frontier for the Digital Humanities);

November, 25 – 11:00 – Building of a database organizing information (names, dates, locations, parental relationships) extracted by means of HTR (Handwritten Text Recognition) provided by Transkribus, by Amanda C.S. Pinheiro (Extracting Data from Baptismal Records through Coding);

November, 25 – 15:30 – Topic Modelling techniques applied to short textual units at sentence and paragraph level and analysis of evaluative discourse, by Philipp Dreesen and Julia Krasselt (Evaluations of the Quran in Right-Wing Populist Media. Metapragmatic Sequence Analyses With Topic Modeling).


All talks will be given via Zoom. 

Here the complete program: (PDF)

The registration is open to all who are concerned > Free registration

Posted in Digital Humanities, Quantitative methods, Uncategorized | Leave a comment

DR2@STOREP: quantitative history of ideas between philosophy and economics

We are proud to announce that several DR2 members will be present at the 17th Annual Conference of STOREP (Associazione Italiana per la Storia dell’Economia Politica || Italian Association for the History of Economic Thought). This is now a well-established tradition, as DR2 was at STOREP Conference already in 2018 and in 2019.

In particular, DR2 members and the PRIN project Has economics finally become an immature science? Mapping economics at an epoch of fragmentation, by combining historical perspectives and new quantitative approaches organized a joint session on Quantitative methods in the history of ideasOctober 2, 2020, from 9:00 to 11:00 (Central European Standard Time).

 – – – – –

Check the conference program for more details.

Posted in Data-Driven Research, Economics | Tagged | Leave a comment

New DR2 Paper is out on Synthese

We are pleased to announce that a new paper by DR2 co-founders Guido Bonino and Paolo Tripodi, together with another DR2 affiliate member, Paolo Maffezioli, has been published on Synthese: “Logic in analytic philosophy: a quantitative analysis”. 

Abstract: Using quantitative methods, we investigate the role of logic in analytic philosophy from 1941 to 2010. In particular, a corpus of five journals publishing analytic philosophy is assessed and evaluated against three main criteria: the presence of logic, its role and level of technical sophistication. The analysis reveals that (1) logic is not present at all in nearly three-quarters of the corpus, (2) the instrumental role of logic prevails over the non-instrumental ones, and (3) the level of technical sophistication increases in time, although it remains relatively low. These results are used to challenge the view, widespread among analytic philosophers and labeled here “prevailing view”, that logic is a widely used and highly sophisticated method to analyze philosophical problems.


Posted in Data-Driven Research, Digital Humanities, Distant Reading, DR2, History of analytic philosophy, Quantitative methods | Leave a comment

Joint Paper by three DR2 Members

We are pleased to announce and to share the publication of this joint paper, written by three DR2 members: “Reclutamento accademico: come tutelare il pluralismo epistemico? Un modello di simulazione ad agenti”, Carlo Debernardi, Eleonora Priori e Marco Viola, Sistemi Intelligenti,

(Abstract ENG): According to some authors (e.g. Gillies 2014, Viola 2017), when researchers are called to express a judgment over their peers, they might exhibit an epistemic bias that make them favouring those who belong to their School of Thought (SoT). A dominant SoT is also most likely to provide some advantage to its members’ bibliometric indexes, because more people potentially means more citations. In the long run, even the slight preference for one SoT over the others might lead to a monopoly, hampering the oft-invoked pluralism of research. In academic recruitment, given that those who recruited to permanent position will often become the recruiter of tomorrow, such biases might give rise to a self-reinforcing loop. However, the way in which this dynamics unfolds is affected by the institutional infrastructure that regulates academic recruitment. To reason on how the import of epistemic bias changes across various infrastructures, we built a simple Agent-Based Model using NetLogo 6.0.4., in which researchers belonging to rival SoTs compete to get promoted to professors. The model allows to represent the effect of epistemic and bibliometric biases, as well as to figure out how they get affected by the modification of several parameters.

Posted in Digital Humanities, DR2, Methodology, Pluralism, Quantitative methods | Leave a comment

Interesting tutorials @Programming Historian

Following the posts of the past few weeks, today we present an interesting website – and peer-reviewed journal – on the same wake.

That is Programming Historian, a useful, multilingual and open access collection of tutorials about computational tecniques for humanities. Almost one hundred guides, ranging from the most simple and introductory to advanced topics such as text mining, big data and network analysis. Definitely worth a look!

Posted in Digital Humanities, Distant Reading, History of ideas, Methodology, Quantitative methods, Text mining, Text-Mining, Tutorials | Tagged | Leave a comment

Science mapping – an in depth review

A new entry of the Encyclopedia of Knowledge Organization (IEKO) has been published. This contribution – by Eugenio Petrovich – is an in depth review of Science Mapping. It is also a must-read for anyone who wants to approach the core concepts of bibliometrics without reference to the formal machinery.

The entry is publicly available here.

Posted in Uncategorized | Leave a comment

Building a flow map from scrap

By Emiliano Tolusso

Maps are a pleasant and handy way of visualizing spatial data. Choropleths especially are widely employed in the visual description of spatial phenomena. However, as much as traditional maps are a popular solution, they suffer from a fundamental, insurmountable flaw: they conceived as ways to represent static objects, well-contained into some arbitrary border. How can we trace on a map something that is actually in motion across these borders?

Flow maps are a versatile hybrid between a traditional map and a flowchart. As such, flow maps are a well-fitted solution to display the motion of different objects in space in an orderly fashion.

Flow maps, as a matter of fact, are directed, georeferenced networks! They can virtually represent the motion of every kind of object on a plan surface: migrations, trade routes, money transfer. Getting more creative, and more on point with our focus on distant reading, flows may represent citations among geographically recognizable institutions retrieved from a set of papers. All you need is a couple of coordinates (X, Y; Long, Lat), and any flow can be effectively charted. is a useful resource to quickly build a flow map from scrap. It offers many advantages:

  1. You don’t need to write a single line of code or to open a GIS system.
  2. It is based on Google sheets.
  3. The maps are pretty cool!

Let’s see how it works.

Continue reading

Posted in Data-Driven Research, Maps | Leave a comment

Practices and Malpractices. What the Analysis of Retractions can Tell us about the Research Ethos of the Humanities

By Eugenio Petrovich

In the last decades, the number of retractions of scientific articles has significantly grown in all disciplines (Steen et al., 2013). Even prestigious journals such as Science are not immune to such growth (Wray & Andersen, 2018). The spread of the phenomenon, as well as its accelerating pace, gives rise to concern in the scientific community, as a rising proportion of retractions are due to the manipulation of data, the use of fabricated or fraudulent data, plagiarism, and other types of research misconduct (Fang et al., 2012). Some striking cases have even reached the large public, such as the infamous article by Jeremy Wakefield about a connection between vaccines and autism, that was published in The Lancet in 1998 and retracted only twelve years later. Such cases are particularly troublesome since they risk to mine seriously the trust of society in science.

Continue reading

Posted in Data-Driven Research, Quantitative methods | Leave a comment