Thursday 01 Nov 2018Entropic selection of concepts unveils hidden topics in documents corpora

Dr. Alessio Cardillo - University of Bristol

Harrison 170 14:30-15:30


The organization and evolution of science has recently become itself an object of scientific quantitative investigation, thanks to the wealth of information that can be extracted from scientific documents, such as citations between papers and co authorship between researchers. However, only few studies have focused on the concepts that characterize full documents and that can be extracted and analyzed, revealing the deeper organization of scientific knowledge. Unfortunately, several concepts can be so common across documents that they hinder the emergence of the underlying topical structure of the document corpus, because they give rise to a large amount of spurious and trivial relations among documents. To identify and remove common concepts, I will introduce a method to gauge their relevance according to an objective information-theoretic measure related to the statistics of their occurrence across the document corpus. After progressively removing concepts that, according to this metric, can be considered as generic, I will show how the topic organization of the corpus under scrutiny displays a correspondingly more refined structure.


Alessio Cardillo is currently postdoc research fellow at the University of Bristol (UK). After obtaining his MSc in Physics at University of Catania (Italy), he moved to Zaragoza (Spain) for his PhD. Before moving to Bristol, he was postdoctoral fellow at the Ecole Polytechnique Federale de Lausanne (EPFL) in Switzerland first, and then at the Catalan Institute for Human Paleoecology and Social Evolution (IPHES) in Tarragona. His research interests focus on the analysis of the structure of networked systems, like: urban mobility and street patterns, scientific collaborations, collections of documents, and multiplex networks. He is also interested in the emergence of collective behaviours like cooperation or synchronization by means of coevolutionary dynamics.

Add to calendar

Add to calendar (.ics)