On-line workshop, February 17-19 – Sentiment analysis on multilingual 18th-century corpora

We give notice of the on-line workshop Sentiment Analysis in Literary studies organized by the Centre for Information Modelling of the University of Graz.

Sentiment analysis is a common task in literary studies, yet sitting outside the mainstream of analytic computational procedures applied to philosophical corpora. Critic facets of sentiment analysis procedures for historical-philosophical analysis lie primarily on tools’ dictionary-dependancy, from which follow difficulties in obtaining in-depth historical understanding and the possibility of arbitrary biases in interpretation of both the retrieved sentiment and its object. However, reasons for such a sidelining hold when they are referred to techniques and workflows commonly deployed and followed in order to achieve sentiment analysis, while wanting to perform such a task might not be a radically flawed endeavour per se, as long as researchers set a well-grounded research framework.

The workshop’s programme anticipates a thorough examination of existing tools, approaches and workflows as well as of preliminary steps such as textual preprocessing, and it is uncommonly devoted to the analysis of 18th century texts.

The workshop fills in the context of the project “DiSpecs – Distant Spectators. Distant Reading for Periodicals of the Enlightenment”, funded by CLARIAH-AT and in cooperation with the Institute for Interactive Systems and Data Science, the Know-Center GmbH Graz and the Centre for Information Modelling – Austrian Centre for Digital Humanities (ZIM-ACDH) and the Institute for Romance Studies. The project aims to investigate the digitized, TEI encoded and semantically enriched texts of The Spectators (http://gams.uni-graz.at/mws), with quantitative methods of data analysis referred to as distant reading and macroanalysis (topic modeling, stylometry, meme diffusion, sentiment analysis and community detection).
The 18th-century journalistic genre of “spectators” (or moralistic sheets) had a large audience of urban readers and played an essential role in public opinion genesis. This project endeavours to create an integral database for all the moralistic press in French, Italian, Spanish, English, German, and Portuguese. In this context, the Spectator discourses’ quantitative analysis aims to enhance and improve micro-narration studies regarding the repetition of motifs throughout different journals.

Official information is reported below.

Sentiment Analysis in Literary Studies, 2021

Online, February 17-19, 2021

The workshop introduces the concepts of Sentiment Analysis and will give an overview of related methods and tools with a special focus on their application to historical literary text corpora. The participants will be presented with the following content:

  • Introduction to sentiment analysis: methods, projects, tools, and first steps
  • Visualization
  • Hypothesis testing
  • Challenges of sentiment analysis in historical literary texts
  • Dictionary-based sentiment analysis
  • Preprocessing steps
  • Using a tool chain
  • Project presentations of the participants

Each workshop day will feature a keynote lecture from experts in the field. All keynotes will be free of charge and open to the public.

Participation

Participation in the hands-on workshop is free of charge and open to 20 students and scholars of all academic stages. No previous specific skills are required (although general computer literacy is expected).

To apply, we ask you to use the application form and provide a brief motivational note why you would like to attend the workshop (max. 250 words).

APPLICATION FORM

We reserve the right to choose the individual participants based on their research/study interests and motivation.

Applications are welcome until January 15, 23:59 CET.

Acceptance will be communicated by January 25.

 

Questions?

Please contact

dispecs(at)uni-graz.at

https://informationsmodellierung.uni-graz.at

Organizing committee

Bernhard Geiger (Know-Center Graz)
Christina Glatz (University of Graz)
Elisabeth Hobisch (University of Graz)
Philipp Koncar (Graz University of Technology)
Sanja Sarić (University of Graz)
Martina Scholger (University of Graz)

 

 

Posted in Data-Driven Research, Digital Humanities, Distant Reading, Uncategorized | Leave a comment

Happy new year!

DR2 team wishes everybody a happy new year!

Posted in Uncategorized | Leave a comment

PostDoc Fellowship Opportunity in Ontology for Industry Laboratory for Applied Ontology (LOA), ISTC-CNR at Trento (Italy)

We are pleased to forward information about a PostDoc Fellowship in Ontology in Trento.

Research will focus on the study of ontology for industry with the possibility to engage in both foundational ontology and the development of a domain ontology for a real industrial case. The research topics will be fixed depending on your interests. The selection will be based on your academic titles and research record.

Application deadline: Jan. 13, 2021 (see below)

Job Description

Your research will focus on the use of applied ontology as a driver for data and information interoperability, following the FAIR data principles, in the industrial domain and in particular in areas like aerospace, manufacturing and material engineering. You will work on three types of activities (emphasis depends on your own interests): (1) study and formalization of the relationships across existing top-level and middle-level ontologies including the identification of commonalities and formal alignments among them; (2) collaborate in the development and implementation of an ecosystem of ontologies and knowledge bases for automatic or semi-automatic data exchange (preserving semantic interpretation); (3) collaborate on the topics of the European project “OntoCommons”.

Working at LOA

The position is fully funded for one year, and may be renewed till the end of the project in Oct. 2023. You will get an annual salary of EUR 26.000,00 and have access to all research facilities of the Laboratory for Applied Ontology. You will work in Trento, a touristic city in the Alps in Northern Italy at one hour from Verona and ½ hour from South Tirol by regular train. The Laboratory has a large network of collaborations in all Europe and around the world (including USA, Canada, Brazil, Japan, Korea).

Research at LOA

The Laboratory for Applied Ontology developed the Descriptive Ontology for Linguistic and Cognitive Engineering (DOLCE), the first logically formalized foundational ontology, and collaborated in the development of several other ontologies (like BFO, YAMATO, UFO) and developed the OntoClean methodology for taxonomies. It co-founded and hosted for almost 10 year the International Association for Ontology and its Applications (www.iaoao.org).

Requirements

  • – A PhD in in Computer Science or Information Science or Engineering or Mathematics or Physics or Philosophy or Communication Science,
  • – Two years or more of research experience,
  • – Research experience on the use of ontology,
  • – Knowledge of logical languages,
  • – Written and spoken English skills
  • – Survival knowledge of Italian (all research activities are in English)

Important Dates

Deadline for application: 13 January 2021.

Interview: 20 January 2021.

Starting Date: 1 February 2021 (flexible).

Call and Contact

You can download the call in English at this link:

https://www.istc.cnr.it/sites/default/files/vacancies/bandi/notice_of_selection_n._istc-adr-290-2020-tn.docx

For further information, please write to: emilio.sanfilippo@cnr.it

Posted in Uncategorized | Leave a comment

Analytic and Continental Philosophy: playing around with quantitative methods

by Pietro Lana

 

What can quantitative methods tell us about the differences between the Analytic and the Continental philosophical traditions? Attempts to define, characterize and distinguish the two have led to such a variety of positions that even the most cautious proposals have been put into question (Glock, 2008). The difficulties in identifying sufficient or necessary differentiation criteria – be they doctrinal, methodological, stylistic or thematic – seem to call for a different approach. It was 1993 when Michael Dummet in his Origins of Analytical Philosophy compared the analytic and the continental traditions to the Rhine and the Danube: rivers that after rising close to one another flow into different seas. What if, instead of attempting to provide a cartography of the two rivers, were we to dive into their waters? What follows is a brief exploration of some possible applications of quantitative methods to the study of the differences between the two traditions. As any exploration, it is not meant to provide conclusive results on the subject, but rather to shed light on further possible routes. 

The assessed corpus consists of all the articles published between 1980 and 2018 by four Anglophone philosophy journals, two of them belonging to the analytic tradition (“Philosophical Studies”, “Mind”) and two of them belonging to the continental tradition (“Continental Philosophy Review”, “Research in Phenomenology”). Because of the comparative nature of the study, the journals were chosen both on the basis of their relevance – in terms of average number of weighted citations – and on the basis of their representativeness: all four journals explicitly present themselves as being part of one of the two traditions. In conducting the textual analysis, the “analytic” and “continental” corpora have also been divided into further subcorpora by decades of publication, in order to allow for the results to show eventual changes over time.

The text mining software employed to assess the corpus is Lancsbox, developed at Lancaster University by Vaclav Brezina, Matthew Timperley and Anthony McEnery. It has been chosen because it allows for a variety of analysis on the language data that are present in a given corpus, such as type/token ratio, frequency, dispersion, keyword generation and collocation. The collocation graphs below have been generated by using Gephi, a network exploration software developed by Mathieu Bastian and Eduardo Ramos Ibañez. 

 

Type/token ratio

In a given text, the type-token ratio (TTR) is the total number of unique words (types) divided by the total number of words (tokens). The value of the type-token ratio is, therefore, directly proportional to the lexical richness of the text considered. The idea behind beginning the exploration by comparing the type-token ratios of the corpora was guided by the possibility of generating a first, preliminary result concerning the stylistic differences in the two traditions. If reducing something as complex as the notion of style to a matter of variety in vocabulary seems like a dubious choice, it is nevertheless a first and necessary step in the direction of a deeper analysis. In conducting this first analysis, the number of texts contained in the significantly larger analytic subcorpora has been manually reduced to that of the continental subcorpora, in order to avoid the risk that a considerable difference in the number of tokens could affect the validity of a comparison of their ratios. The table and the graph below show the type/token ratios of the analytic (AA) and continental (CC) subcorpora.

 

type/token ratios of the analytic (AA) and continental (CC) subcorpora

The graph shows that there is, indeed, a consistent difference between the corpora of the two traditions: continental articles exhibit a substantially higher lexical richness than the analytic ones, throughout all of the decades taken in consideration. One possible interpretation of these results can be found in the positions expressed by, among others, D’Agostini and Marconi concerning the different stylistic approaches of the two traditions. In other words, the lower lexical richness observed in the analytic articles could be a consequence of a more pronounced aspiration to formal rigor and the use of explicit arguments, as opposed to the continental tendency to a more varied exposition, closer to that of other fields in the humanities (D’Agostini, 1997; Marconi, 2011). In order to delve deeper into this possibility, a second textual analysis has been conducted by looking into the corpora for terms that could give further hints in this direction.

 

Relative frequencies

Drawing from widespread characterizations of the late analytic philosophy1, a list of terms was built that could be traced back to the use of a rigorous and explicit argumentative style, so as to determine whether a difference in their average relative frequency could be observed between the different corpora. Below is the list of terms followed by the results of the textual analysis.

The list: Argue Argues Argued Argument Arguments Objection Objections Defend Defends Defense Reject Rejects Justify Justifies Justified Reply Replies Assume Assumed Assumes Assumption Assumptions Example Examples Define Defines Definition Conclusion Conclusions Axiom Axioms Law Laws Norm Norms Principle Principles Condition Conditions Requirement Requirements Required Criterion Criteria Theory Theories Hypothesis Hypotheses Consequence Consequences Necessary Necessarily Logical Rational Reason Reasons.

average relative frequency of corpora

As it turns out, the terms in the list above are extremely more frequent in the analytic corpora than in the continental ones. The results of the textual analysis are not surprising at all. Afterall, the terms in the list were selected on the basis of extremely popular characterizations of the late analytic philosophy: the use of explicit arguments and the aspiration to formal rigor. Nevertheless, the results are valuable from a methodological point of view to the extent that they confirm the validity of the method employed by quantifying a difference that is often only expressed in discursive terms. 

At the same time it is important to remind that, insofar as this analysis is concerned, the results only shed light on the use of explicit arguments, but do not say much about the presence of implicit ones: they can show a stylistic difference, but hardly one of content.

 

Lockwords and collocations

Among the various tools that Lancsbox offers to users, the lockword tool is specifically designed for the identification of similarities between corpora: it allows to generate a list of the most frequent terms that display a similar relative frequency in two corpora, thus highlighting their common ground. Among the many rather vague and general terms that were identified by means of this tool, two of them in particular stood out because of their inherent complexity and relevance: “analysis” and “language”. In the attempt to thoroughly examine their different uses in the context of the two traditions, an analysis of their collocates has been attempted. Unfortunately, the scale of the corpora made it impossible for the software to obtain the graphs initially intended: the analysis only displays the main collocates (i.e. the most frequent terms within a range of 5 words from the lockword) but does not show the relations between the collocates themselves, making the results hard to read. Further topic modeling analyses might contribute to a better understanding of the perspectives that these graphs merely suggest. 

Nevertheless, for this very reason the following graphs result in an unexpectedly stimulating opportunity to engage oneself in the interpretative exercise of contextualizing the generated linkages, given the fact that aside from some obvious cases, many collocates require some reflection and others are real head-scratchers. For example, by looking at the graphs built around the term “analysis”, in the one based on the continental corpus it is possible to notice words related to Husserl’s phenomenological analysis and to Heidegger’s ontological analysis. On the other hand, interesting terms among the analytic collocates are “conceptual” and “concepts”, which seem at least in part to undermine Williamson’s opinion that late analytic philosophy is no longer interested in conceptual analysis (Williamson, 2008). Also, by examining the collocates of “language” in the graphs of the two traditions it is possible to notice in both of them the presence of the term “ordinary”, which is not surprising in the analytic tradition but is somewhat unexpected in the continental context. These are only a few examples: the graphs are left below with no further comment, free to examine for anyone willing to engage in the task.

analysis corpora AA and CC

language corpora AA and CC

Bibliography

Bastian, Mathieu, and Eduardo Ramos Ibañez. Gephi (version 0.9.2), s.d.

Brezina, Vaclav, Matthew Timperley and Anthony McEnery. Lancsbox (version 5.0.1). Lancaster University, s.d.

D’Agostini, Franca. Analitici E Continentali, Guida Alla Filosofia Degli Ultimi Trent’anni, 1997.

Dummett, Michael. Origins of analytical philosophy. A&C Black, 2014.

Glock, Hans-Johann. What Is Analytic Philosophy? Cambridge University Press, 2008.

Marconi, Diego. Analytic Philosophy and Intrinsic Historicism. «Teorema: Revista Internacional de Filosofía 30», n. 1 Book Symposium: What is Analytic Philosophy? (2011): 23–32.


Williamson, Timothy. The Philosophy of Philosophy. John Wiley & Sons, 2008. 

 

 


[1] To find out more on our project on History of Late Analytic Philosophy click here.

Posted in Data-Driven Research, Digital Humanities, Distant Reading, History of analytic philosophy, History of philosophy, Quantitative methods, Text mining, Text-Mining | 1 Comment

New DR2 member

We warmly welcome Pietro Lana, master student at the philosophy department here in Turin, as a new member of the DR2 group. Check him and other DR2 affiliates in our People section.

Posted in Uncategorized | Leave a comment

Considerations about corpus-dependency of topic modelling with Mallet

By Sara Garzone and Nicola Ruschena

In the context of text mining, topic modelling analyses co-occurrence patterns among textual data, in order to isolate clusters from the set of expressions occurring in a corpus. Topic modelling aims at extracting topics occurring in a corpus and categorize documents on the basis of their semantic content. It often represents an appealing approach for data-driven analysis in short-run projects, for it is an unsupervised method, i.e., there is no requirement for algorithm training from labelled data, whose production is quite a demanding task. Moreover, software programs that are executable from command line or user interface have been developed to perform topic modelling, so as to provide more friendly environments for researchers who are not much acquainted with code design.

Mallet is a tool for topic modelling: it is a Java-based package for statistical natural language processing, which was initially developed by Andrew McCallum at the University of Massachusetts. It allows topic modelling on textual corpora, without requiring advanced technical knowledge in statistics and programming. 

Mallet’s topic modelling is based on the Latent Dirichlet Allocation (LDA) model, a Bayesian probabilistic generative model which has been applied for the first time to text classification tasks by David Blei et al. in 2003, and thereafter has become the standard for probabilistic text categorization under latent semantic hypotheses. Along with many other techniques in the field of natural language processing, topic modelling relies upon the so-called distributional hypothesis (Harris 1954), according to which words occurring in the same contexts tend to have similar meanings. 

From co-occurrence analysis and clustering it is then possible to expect clusters to reflect semantic proximity relations, or topics. With advanced applications of probabilistic models, a categorization of documents can then be obtained on the basis of the degree of probability of their being a member of detected topics. The underlying assumption is that in each document a probabilistic distribution of every topic can be recognized. With LDA-based topic modelling one can try to understand which of the topics that have been detected in the corpus are likely to be present in each document, given the occurring terms.

Yet, clusters do not correspond immediately to topics, in the sense in which a generic reader would understand them. This kind of common-sense topics depends indeed on human interests, and this is especially true in cases in which automated analyses are implemented for research purposes. It has to be noted that Mallet’s output produces a number k of topics that needs to be set up by researchers in advance, and also that Mallet would return k topics even if it were “fed” with phone guides or meaningless data. Some legwork is therefore required to experiment different values for k and evaluating which setting returns the most consistent clusters accordingly both to researchers’ previous specialistic knowledge and Mallet’s Dirichlet parameter (see tabs in the following of the post).

Moreover, researchers have to be aware in advance of some features of the investigated corpora, such as size, sparsity and degree of specialization, which may condition the effectiveness of topic individuation, in order to obtain reliable results.

In this brief post we will report three topic modelling experiments conducted with Mallet, with the aim of obtaining the extraction of topics from three corpora that differ in content, size and composition. It is worth noticing that all the three features determine differences in topics retrieval. For a thorough tutorial on Mallet please notice “Getting started with topic modelling and Mallet” by Graham, Weingart and Milligan (2012) available on the programming historian website.

Case 1

The first corpus includes about 67,000 articles citing the philosopher Michel Foucault, in the field  ‘Humanities’ from 1980 to 2019. This corpus contains a wide variety of issues, since it includes articles on philosophy, history, social sciences, gender studies, literature, etc. Besides that, these are articles written not only by Foucault scholars but also by journalists, addressed to a wider audience, which is not necessarily competent in the philosophy of Foucault. The language is, therefore, extremely heterogeneous.

Case 2

The second corpus includes about 200 papers published in the journal Foucault Studies from 2004 to 2019. Unlike the first corpus, this one employs a considerably more specialised terminology, because the articles have been collected from a single journal, with  a well-defined editorial line. Moreover, most of the authors are part of the Foucauldian scholarship. For this reason, specific philosophical terms are more frequent in these articles, and the variety of topics is quite limited, if compared to case 1. This is the smallest corpus among the three considered.

Case 3

The third corpus includes about 700 articles citing the philosopher Baruch Spinoza, collected from French journals in the field of the social sciences, from 1980 to 2014. Like in case 1, the disciplines are various (economics, politics, sociology, etc.) and language is heterogeneous. The journals that have been selected to build the corpus are the most famous scientific journals for each discipline: therefore, even if articles cover several topics, the terminology is nonetheless more technical than in corpus 1, which included generalist journals as well. Similarly to case 2, the corpus dimension is small.

As expected, the results of the execution of Mallet substantially depend on the size and composition of the analysed corpora. 

The output files are composed of clusters of 20 words, ordered by descending frequencies in the corpus. In these files, the thematic heterogeneity of a corpus may result in extremely variegated clusters, which could contain associations of terms and themes not immediately intelligible for research purposes. This heterogeneity determines a diversified approach to the results, since some clusters will require a significant effort for interpretation. Indeed, on a vast and varied corpus like the first one, Mallet has an enormous amount of information to process while searching for frequent topics and patterns. The links found among words should be more consistent and reliable than in the case of a small corpus containing more limited information. 

Examining the first corpus, the first result seems to be that the clusters include mainly terms that are somewhat associated (or consistent) with one another, and in these cases we can recognise subjects with clear boundaries. After some trials, it was possible to establish a precise number of topics that Mallet had to process, in such a way that each topic corresponds to a specific subject. For example, in corpus 1: 

 

0 0,45455  political politics war history cultural society power nation culture colonial identity religious modern global india public rights europe discourse post
1 0,45455 development policy power political economic global public security government local society management human planning environmental governance change politics urban economy
2 0,45455 religion religious god church jewish ancient medieval spiritual ritual modern christianity theology catholic divine tradition biblical christ classical theological history

 

Here it is possible to associate each cluster with a disciplinary field. 

The outputs configuration is similar in corpus 2, which contains a small number of articles but a much more homogeneous vocabulary:

 

0 0,45455  truth subject freedom ethics practice ethical practices subjectivity care existence essential ancient relationship parrhesia hermeneutics life critical aesthetics rabinow process
1 0,45455 law men police women system legal justice group laws black war lives rights punishment public panopticon prison order criminal groups
2 0,45455 power relations disciplinary sovereign resistance discourse techniques biopower knowledge biopolitical war practices modern body political production racism mechanisms discursive effects

 

The articles from the Foucault Studies journal, on which corpus 2 was built, have an internal consistency: the authors ponder the applications of Foucauldian philosophy to current political and social problems. So even if Mallet has fewer data to process, in order to extract topics, the occurrence of specific terms and themes somewhat close to Foucault scholars makes the analysis more straightforward and effective. In this case, the interpretation of the results is a simpler task.

The same is not true in the case of the third corpus, where clusters sometimes contain terms from various disciplines:

 

0 0.625  jaspers moral kant language hegel knowledge hume frege rights justice husserl individuals descartes psychologie smith morality heidegger cognitive natural money
1 0.625 hegel politique conscience diderot éthique mouvement dieu hobbes pouvoir kant judaïsme travail peuple freud loi métaphysique puissance subjectivité rationalité guerre
2 0.625 politique sociale heidegger mouvements pouvoir utopie choix action nietzsche travail expérience droits guerre entreprise courant individu mouvement collective classe scientifiques

 

In each of these clusters, there are philosophical terms relating to the most disparate themes and citations of very diverse figures. On a corpus with reduced dimensions and a wide thematic variety, it is preferable to establish a reasonably low number of topics, but it is however not possible to eliminate thematic divergence. This variety requires more effort to interpret each cluster, but at the same time it produces original results. Indeed, an advantage of topic modelling on a corpus such as the third one is the prominence given to some unusual associations. This means that links between words of the same cluster, which seem inconsistent or bizarre at a first glance, actually derive from articles presenting original analyses of the subject, which in the analysis of a large corpus would be overwhelmed by more frequent patterns. In tiny corpora, indeed, every term has a greater weight in view of topics elaboration.

We also have to admit that complex clusters require complementary investigations on the corpus, such as manually checking text strings in order to verify the reason for the association of apparently unrelated terms. However, the need for such an approach has nothing to do with a weak reliability of the results, as it simply suggests that the interpretation has possibly to be guided by an expert in the field. 

Further confirmation of the fact that we will not necessarily obtain unreliable results on small corpora is the Dirichlet parameter obtained in the third corpus. Indeed, in the output file tutorial.keys, Mallet indicates a number lower than 1 that emerged from the application of Bayes’ theorem; this parameter signals the probability of finding the various topics in the corpus, expressed with a decimal value between 0 and 1. The Dirichlet parameter by default is symmetrical for all topics generated by Mallet. By doing several trials with a different number of topics (the k parameter mentioned above), on this third corpus we achieved to reach the 0.625 point, which is acceptably high. We also have to recognise that we decided to work with 8 topics on the basis of the best configuration achievable for the interpretation: with this number of topics we had both a fair interpretability and a good score. 

For wider corpora it is harder to obtain a good Dirichlet parameter, due to the enormous amount of data to process. However, we have to consider that this parameter is only a partial indicator of the reliability of outputs: the evaluation of field experts is still the most relevant feedback, even for the choice of the k number of topics. Mallet, however, provides the possibility to check which number of topics would generate a higher consistency, through the coherence score function: this score in the range of 0 and 1 indicates the probability to find the terms within the clusters together, in form of co-occurrences, in the corpus. The k number of topics with the best coherence score should be the one to work on, but in fact a compromise between a high score and a good interpretability can be obtained after several trials.

To make a long story short, Mallet is a software executable on different types of corpora, but the most linear and consistent results will be obtained with the wider amounts of textual documents. We have to recognise, however, that the complex composition of clusters is not a hard limit for research, since relevant and original results can hide behind this heterogeneity. Nevertheless, this sometimes determines the need for more work, so as to obtain a reliable interpretation, such as additional checks of unexpected co-occurrences within the topics.

 

References:

Blei, D. M., Ng, A. Y.,  and Jordan, M. I.. (2003) “Latent dirichlet allocation”. Journal of Machine Learning Research 3  (3/1/2003), 993–1022. https://dl.acm.org/doi/10.5555/944919.944937

Graham, S., Weingart, S., Milligan, I. (2012). Getting Started with Topic Modeling and MALLET. The Programming Historian, 1. https://doi.org/10.46430/phen0017

Harris, Z. S. (1954) Distributional Structure. WORD, 10(2–3), 146–162. https://doi.org/10.1080/00437956.1954.11659520

McCallum, A. K. (2002)  MALLET: A Machine Learning for Language Toolkit. http://mallet.cs.umass.edu.

Posted in Data-Driven Research, Text mining | 1 Comment

Three-day on-line conference, Nov. 23-24-25, 2020

We are pleased to give notice of the next three-day conference organised by the Digital Humanities department at the University of Basel: “Digital Practices. Reading, writing and evaluation on the web”. The conference will take place online, November 23–25, 2020. 

Among the contents:

 

November, 24 – 14:30 – Discussion of limits and opportunities lying behind the operationalization of literary theory’s concepts, by Simone Rebora (Distant Reading Story World Absorption);

November, 24 – 16:50 – Detection of correlation between held belief and social condition in the analysis of two survey-based datasets, by Charles Lassiter (Big Data and Naturalized Social Epistemology: a New Frontier for the Digital Humanities);

November, 25 – 11:00 – Building of a database organizing information (names, dates, locations, parental relationships) extracted by means of HTR (Handwritten Text Recognition) provided by Transkribus, by Amanda C.S. Pinheiro (Extracting Data from Baptismal Records through Coding);

November, 25 – 15:30 – Topic Modelling techniques applied to short textual units at sentence and paragraph level and analysis of evaluative discourse, by Philipp Dreesen and Julia Krasselt (Evaluations of the Quran in Right-Wing Populist Media. Metapragmatic Sequence Analyses With Topic Modeling).

 

All talks will be given via Zoom. 

Here the complete program: (PDF)

The registration is open to all who are concerned > Free registration

Posted in Digital Humanities, Quantitative methods, Uncategorized | Leave a comment

DR2@STOREP: quantitative history of ideas between philosophy and economics

We are proud to announce that several DR2 members will be present at the 17th Annual Conference of STOREP (Associazione Italiana per la Storia dell’Economia Politica || Italian Association for the History of Economic Thought). This is now a well-established tradition, as DR2 was at STOREP Conference already in 2018 and in 2019.

In particular, DR2 members and the PRIN project Has economics finally become an immature science? Mapping economics at an epoch of fragmentation, by combining historical perspectives and new quantitative approaches organized a joint session on Quantitative methods in the history of ideasOctober 2, 2020, from 9:00 to 11:00 (Central European Standard Time).

 – – – – –

Check the conference program for more details.

Posted in Data-Driven Research, Economics | Tagged | Leave a comment

New DR2 Paper is out on Synthese

We are pleased to announce that a new paper by DR2 co-founders Guido Bonino and Paolo Tripodi, together with another DR2 affiliate member, Paolo Maffezioli, has been published on Synthese: “Logic in analytic philosophy: a quantitative analysis”. 

Abstract: Using quantitative methods, we investigate the role of logic in analytic philosophy from 1941 to 2010. In particular, a corpus of five journals publishing analytic philosophy is assessed and evaluated against three main criteria: the presence of logic, its role and level of technical sophistication. The analysis reveals that (1) logic is not present at all in nearly three-quarters of the corpus, (2) the instrumental role of logic prevails over the non-instrumental ones, and (3) the level of technical sophistication increases in time, although it remains relatively low. These results are used to challenge the view, widespread among analytic philosophers and labeled here “prevailing view”, that logic is a widely used and highly sophisticated method to analyze philosophical problems.

 

Posted in Data-Driven Research, Digital Humanities, Distant Reading, DR2, History of analytic philosophy, Quantitative methods | Leave a comment

Joint Paper by three DR2 Members

We are pleased to announce and to share the publication of this joint paper, written by three DR2 members: “Reclutamento accademico: come tutelare il pluralismo epistemico? Un modello di simulazione ad agenti”, Carlo Debernardi, Eleonora Priori e Marco Viola, Sistemi Intelligenti, https://www.rivisteweb.it/doi/10.1422/97367.

(Abstract ENG): According to some authors (e.g. Gillies 2014, Viola 2017), when researchers are called to express a judgment over their peers, they might exhibit an epistemic bias that make them favouring those who belong to their School of Thought (SoT). A dominant SoT is also most likely to provide some advantage to its members’ bibliometric indexes, because more people potentially means more citations. In the long run, even the slight preference for one SoT over the others might lead to a monopoly, hampering the oft-invoked pluralism of research. In academic recruitment, given that those who recruited to permanent position will often become the recruiter of tomorrow, such biases might give rise to a self-reinforcing loop. However, the way in which this dynamics unfolds is affected by the institutional infrastructure that regulates academic recruitment. To reason on how the import of epistemic bias changes across various infrastructures, we built a simple Agent-Based Model using NetLogo 6.0.4., in which researchers belonging to rival SoTs compete to get promoted to professors. The model allows to represent the effect of epistemic and bibliometric biases, as well as to figure out how they get affected by the modification of several parameters.

Posted in Digital Humanities, DR2, Methodology, Pluralism, Quantitative methods | Leave a comment