Monday 04 Dec 2017: Learning Autonomously from Data Streams

Prof. Plamen Angelov - University of Lancaster

Queens LT6.2 13:30-14:30

The staggering proliferation of heterogeneous, large scale data sets and streams is recognised as an untapped resource which offers new opportunities for extracting aggregated information to inform decision-making in policy and commerce. However, currently existing methods and techniques for data mining involve a lot of prior assumptions, handcrafting and a range of other bottleneck issues: i) scalability – vast amounts of data which require high throughput automated methods (e.g. manual labelling of data samples can be prohibitive); ii) complex, heterogeneous data (including signals, images, text that may be uncertain and unstructured); iii) dynamically evolving, non-stationary data patterns, and the shortcomings of the “standard” assumptions about data distributions; iv) the need to hand craft features, parameters or set thresholds. As a result, a large proportion of the available data remains untapped. The key challenge now is to manage, process and gain value and understanding from the vast quantity of heterogeneous data without handcrafting and prior assumptions, at an industrial scale.

In this talk a newly emerging theoretical framework which we call Empirical Data Analytics will be introduced and described and its relation to the probability, density, centrality, etc. Traditional disciplines of Machine Learning, Data Mining, Pattern Recognition, System Modelling and Identification are well developed. However, current tools often require a number of restrictive assumptions, or handcrafting/manual selection of features, distribution types, parameters, thresholds, etc. Existing algorithms are usually iterative, including internal cycles. In traditional statistical approaches, averages play a more important role than the individual specifics. Even rapidly emerging AI and computational intelligence approaches require ad hoc assumptions and a priori decisions (e.g. network depth/ architecture, membership function type and parameters). Furthermore, most existing algorithms assume fixed model structures. This hampers their application to dynamically evolving non-stationary data streams and dealing with shifts and drifts. For example, in cybersecurity, adversaries are often adaptive and intelligent; they exploit the vulnerabilities of traditional systems that are based on fixed prior assumptions, designed for stationary data streams and data generated by the same distribution. Attacks on spam filtering may, for example, include spam messages that are obscured by random misspellings of trigger words; similar problems exist for detecting malware and biometric spoofing.

Motivated by the principle of Occam’s Razor [3], we suggest a complete departure from traditional approaches to large-scale data analysis: we advocate recognising the central importance and complexity of real-world data. Our aim is to establish a new paradigm for autonomous data analytics that is based on minimal prior assumptions. The guiding principles of this paradigm are that i) we should avoid assumptions about the statistical properties of the data; ii) the burden of human effort should be shifted away from the large amount of raw data to the top of the knowledge pyramid; iii) all new methods for data analytics should be scalable.

Within EDA we define cumulative proximity, typicality, eccentricity, local and global, uni and multimodal density. Typicality is particularly interesting, because it resembles (but differs from) the probability density function (pdf), information potential and other similar representations related to system state and structure description and has very close links with laws of physics such as gravitation, intensity and inverse square distance. In the talk this new concept will be described as well as a number of applications to various problems.

References:

[1] P. Angelov, Autonomous Learning Systems: From Data Streams to Knowledge in Real time, John Willey and Sons, Dec.2012, ISBN: 978-1-1199-5152-0.

[2] P Angelov et al, Empirical Data Analysis: A New Tool for Data Analytics, IEEE SMC Conf., Budapest,2016.

[3] H G Gauch, Scientific Method in Practice, Cambridge Univ. Press, 2003.

Biographical data of the speaker:

Prof. Angelov (MEng 1989, PhD 1993, DSc 2015) is a Fellow of the IEEE, of the IET and of the HEA. He is Vice President of the International Neural Networks Society (INNS) for Conference and Governor of the Systems, Man and Cybernetics Society of the IEEE. He has 30 years of professional experience in high level research and holds a Personal Chair in Intelligent Systems at Lancaster University, UK. He leads the Data Science group at the School of Computing and Communications (this year on sabbatical) which includes over 20 academics, researchers and PhD students. He has authored or co-authored over 250 peer-reviewed publications in leading journals, peer-reviewed conference proceedings, 6 patents, two research monographs (by Wiley, 2012 and Springer, 2002) cited over 6100 times with an h-index of 37 and i10-index of 110. His single most cited paper has over 800 citations. He has an active research portfolio in the area of computational intelligence and machine learning and internationally recognised results into online and evolving learning and algorithms for knowledge extraction in the form of human-intelligible fuzzy rule-based systems. Prof. Angelov leads numerous projects (including several multimillion ones) funded by UK research councils, EU, industry, UK MoD. His research was recognised by ‘The Engineer Innovation and Technology 2008 Special Award’ and ‘For outstanding Services’ (2013) by IEEE and INNS. He is also the founding co-Editor-in-Chief of Springer’s journal on Evolving Systems and Associate Editor of several leading international scientific journals, including IEEE Transactions on Fuzzy Systems (the IEEE Transactions with the highest impact factor) of the IEEE Transactions on Systems, Man and Cybernetics as well as of several other journals such as Applied Soft Computing, Fuzzy Sets and Systems, Soft Computing, etc. He gave over a dozen plenary and key note talks at high profile conferences. Prof. Angelov was General co-Chair of a number of high profile conferences including IJCNN2013, Dallas, TX; IJCNN2015, Killarney, Ireland, the inaugural INNS Conference on Big Data, San Francisco; the 2nd INNS Conference on Big Data, Thessaloniki, Greece; the 3rd INNS Big Data and Deep Learning Conference, Bali, April 2018 and a series of annual IEEE Symposia on Evolving and Adaptive Intelligent Systems (started in 2006). Dr Angelov is the founding Chair of the Technical Committee on Evolving Intelligent Systems, SMC Society of the IEEE and was previously chairing the Standards Committee of the Computational Intelligent Society of the IEEE (2010-2012). He was also a member of International Program Committee of over 100 international conferences (primarily IEEE). More details can be found at www.lancs.ac.uk/staff/angelov

Add to calendar (.ics)

event

Monday 04 Dec 2017: Learning Autonomously from Data Streams

Prof. Plamen Angelov - University of Lancaster