The University of Texas at Dallas, USA
Title: Cyber Security Meets Big Knowledge: Towards a Secure HACE Theorem
Abstract: The collection, storage, access, manipulation, and mining of massive amounts of data have resulted in the design and development of Big Data Management and Analytics (BDMA) Technologies over the past decade. Subsequently the HACE theorem has emerged as a way to characterize big data. As stated in , “big data starts with large volume, heterogeneous, autonomous sources with distributed decentralized control, and seeks to explore complex and evolving relationships among data.” Over the years, the HACE theorem has become fundamental to big data characterization as the Newton’s Laws are to Physics. Associated with the HACE theorem is the Big Data Processing Framework consisting of three tiers focusing on data access, data sharing and data mining/analytics. The big data processing framework dealing with heterogeneous, autonomous, complex and evolving data has resulted in Big Knowledge that deals with “fragmented knowledge from heterogeneous, autonomous information sources for complex and evolving relationships, in addition to domain expertise .”
While several advances have been made on BDMA based on the HACE theorem and the big data processing framework, the massive amounts of data collected, shared and analyzed have resulted in serious security and privacy violations. For example, the three-tier structure of the big data processing framework could be maliciously attacked resulting in compromised models and techniques. Furthermore, novel access control models are needed to secure the heterogeneous data sources and the complex relationships. Finally, the analysis of the massive amounts of data could result in serious privacy challenges. Therefore, what is needed is a HACE Theorem and an associated Big Data Processing Framework for securing the big data and ensuring that the privacy of the individuals is maintained. This presentation will discuss our approach towards achieving this goal.
At the heart of security and privacy are the policies enforced on the systems and activities. Therefore, we first discuss a Policy-Aware Big Data Processing Framework based on the three tiers discussed in . In particular, we specify policies for (i) data collection, storage, access and deletion, (ii) data sharing and dissemination, and (iii) data mining/analysis and knowledge creation. Second, even if we develop an appropriate policy-aware big data processing framework, malicious individuals will learn about the data and knowledge we are using as well as our techniques and adapt their behavior so that they are not caught. This in turn will result in us adapting our algorithms to thwart the adversary. Subsequently a game is played between us and the adversary and our goal is to win the game. This results in an Adversarial Big Data Processing Framework. The presentation will discuss the issues and challenges towards developing such a framework that will ensure security and at the same time protect the privacy of the individuals. We will also examine ways to adapt the HACE theorem to include security and privacy.
The final part of the presentation will discuss issues on securing the Big Knowledge that deals with the fragmented knowledge from heterogeneous, autonomous information sources for complex and evolving relationships, in addition to domain expertise. Such knowledge could be presented in the form of knowledge graphs. Therefore, we will examine various security models for the knowledge graphs and explore how different pieces of knowledge graphs could be combined to violate the privacy of the individuals. Furthermore, the knowledge graphs themselves could be attacked. Therefore, the knowledge graphs and their reasoning techniques have to be adapted to handle the adversarial attacks. The presentation will also explore ways to incorporate the secure knowledge management process into the Secure Big Data Processing Framework.
 Xindong Wu et al. Data mining with big data, IEEE Transactions on Knowledge and Data Engineering, Vol. 26, #1, 2014.
Dr. Bhavani Thuraisingham is the Founders Chair Professor of Computer Science and the Executive Director of the Cyber Security Research and Education Institute at the University of Texas at Dallas. She is also a visiting Senior Research Fellow at Kings College, University of London and an elected Fellow of the ACM, IEEE, the AAAS, the NAI and the BCS. She is currently the co-director of both the Women in Data Science and Women in Cyber Security Centers and was a Cyber Security Policy Fellow at the New America Foundation I 2017-2018. Her research interests are on integrating cyber security and artificial intelligence/data science for the past 34years (when it used to be computer security and data management/mining).
She has received several awards including the IEEE CS 1997 Technical Achievement Award, ACM SIGSAC 2010 Outstanding Contributions Award, the IEEE Comsoc Communications and Information Security 2019 Technical Recognition Award, the IEEE CS Services Computing 2017 Research Innovation Award, the ACM CODASPY 2017 Lasting Research Award, the IEEE ISI 2010 Research Leadership Award, the IEEE ICDM 2018 Outstanding Service Award, the SDPS Transformative Achievement Gold Medal for interdisciplinary research, the ACM SACMAT 10 Year Test of Time Awards for 2018 and 2019 (for papers published in 2008 and 2009) and the 2017 DFW Business Journal Women in Technology Award. She co-chaired the Women in Cyber Security Conference (WiCyS) in 2016 and delivered the featured address at the 2018 Women in Data Science (WiDS) at Stanford University and has chaired several conferences for ACM and IEEE including IEEE ICDM and ACM CCS.
Her 39-year career includes industry (Honeywell), federal research laboratory (MITRE), US government (NSF) and US Academia. Her work has resulted in 130+ journal articles, 300+ conference papers, 140+ keynote and featured addresses, six US patents, fifteen books as well as technology transfer of the research to commercial and operational systems. She received her PhD from the University of Wales, Swansea, UK, and the prestigious earned higher doctorate (D. Eng) from the University of Bristol, UK
the Vrije Universiteit Amsterdam
Title: Analysing, understanding and repairing knowledge graphs at scale
Abstract: Now that knowledge graphs routinely reach the scale of hundreds of millions or even billions of edges, they will often violate the standard logical semantics that is emposed by RDF or OWL. Instead, we will have to reach to other mathematical and computational tools to understand these graphs, to detect errors and to repair these. Techniques from network analysis, from natural language understanding and from machine learning give us a new view on the meaning that is captured in these very large knowledge graphs, giving us new tools tfor querying, maintaining, visualising and explaining knowledge graphs.
Frank van Harmelen (1960) is a professor in Knowledge Representation & Reasoning in the Computer Science department (Faculty of Science) at the Vrije Universiteit Amsterdam. After studying mathematics and computer science in Amsterdam, he moved to the Department of AI in Edinburgh, where he was awarded a PhD in 1989 for his research on meta-level reasoning. While in Edinburgh, he co-developed a logic-based toolkit for expert systems, and worked with Prof. Alan Bundy on proof planning for inductive theorem proving. After his PhD research, he moved back to Amsterdam where he worked from 1990 to 1995 in the SWI Department under Prof. Wielinga, on the use of reflection in expert systems, and on the formal underpinnings of the CommonKADS methodology for Knowledge-Based Systems. In 1995 he joined the AI research group at the Vrije Universiteit Amsterdam, where he now leads the Knowledge Representation and Reasoning Group.
Since 2000, he has played a leading role in the development of the Semantic Web, which aims to make data on the web semantically interpretable by machines through formal representations. He was co-PI on the first European Semantic Web project (OnToKnowledge, 1999), which laid the foundations for the Web Ontology Language OWL. OWL has become a worldwide standard, it is in wide commercial use, and has become the basis for an entire research community. He co-authored the Semantic Web Primer, the first academic textbook of the field and now in its third edition, which is in worldwide use (translations in 5 languages, 10.000 copies sold of the English edition alone). He was one of the architects of Sesame, an RDF storage and retrieval engine, which is in wide academic and industrial use with over 200,000 downloads. This work received the 10-year impact award at the 11th International Semantic Web Conference in 2012, which is the most prestigous award in the field.
In recent years, he pioneered the development of large scale reasoning engines. He was scientific director of the 10m euro EU-funded Large Knowledge Collider, a platform for distributed computation over semantic graphs with billions of edges. The prize-winning work with his student Jacopo Urbani has improved the state of the art by two orders of magnitude.
He is scientific director of The Network Institute. In this interdisciplinary research institute some 150 researchers from the Faculties of Social Science, Humanities and Computer Science collaborate on research topics in computational Social Science and e-Humanities.
He is a fellow of the European AI Society ECCAI (membership limited to 3% of all European AI researchers), in 2014, he was admitted as member of the Academia Europaea (limited to the top 5% of researchers in each field), and in 2015 he was admitted as Member of the Royal Netherlands Society of Sciences and Humanities (450 members across all sciences). He is a guest professor at the University of Science and Technology in Wuhan, China.
Xi'an Jiaotong University
Title: On Presuppositions of Machine Learning: A Meta Theory
Abstract: Machine Learning (ML) has been run and applied by premising a series of presuppositions, which contributes both the great success of AI and the bottleneck of further development of ML. These presuppositions include (i) the independence assumption of loss function on dataset (Hypothesis I); (ii) the large capacity assumption on hypothesis space including solution (Hypothesis II); (iii) the completeness assumption of training data with high quality (Hypothesis III); and (iv) the Euclidean assumption on analysis framework and methodology (Hypothesis IV).
We report, in this presentation, the effort and advances made by my group on how to break through these presuppositions of ML and drive ML development. For Hypothesis I, we introduce the noise modeling principle to adaptively design the loss function of ML, according to the distribution of data samples, which provides then a general way to robustlize any ML implementation. For Hypothesis II, we propose the model driven deep learning approach to define the smallest hypothesis space of deep neural networks (DNN), which yields not only the very efficient deep learning, but also a novel way of DNN design, interpretation and connection with the traditional optimization based approach. For Hypothesis III, we develope the axiomatic curriculum learning framework to learn the patterns from an incomplete dataset step by step and from easy to difficult, which then provides feasible ways to tackle very complex incomplete data sets. Finally, for Hypothesis IV, we introduce Banach space geometry in general, and XU-Roach theorem in particular, as a possibly useful tool to conduct non-Euclidean analysis of ML problems. In each case, we present the idea, principles, application examples and literatures.
Zongben Xu, a member of the Chinese Academy of Sciences, is a mathematician and an expert in signal and information processing. He received his MS degree in mathematics in 1981 from Northwest University, China and PhD degree in applied mathematics in 1987 from Xi'an Jiaotong University, China. He now serves as a professor of mathematics and computer science, and the director of the Institute for Information and System Sciences. In 2007, he was appointed as a Chief Scientist of National Basic Research Program of China (973 Project).
His current reserach interests include intelligent information processing, machine learning and theories in numerical modeling. He proposes the L(1/2) regulation theory, which serves as foundations for sparse microwave imaging. He also discovers and proves Xu-Roach theorem in machine learning, which solves several difficult problems in nueral networks and simulated evolutionary computation, and provides a general deduction criteria for machine learning and nonlinear analysis under non-Euclidean fromwork. Lastly, he initiates new modeling theories and methodologies based on visual cognition, and formulates series of new algorithms for clustering analysis, discriminant analysis, and latent viable analysis, which have been widely applied to science and engineering. He is the owner of the National Natural Science Award of China in 2007, and the winner of CSIAM Su Buchin Applied Mathematics Prize in 2008. He delivered a sectional talk at International Congress of Mathematicians (ICM 2010) upon the invitation of the congress committee.
Zongben Xu was the vice-president of Xi’an Jiaotong University. He currently makes several important services for government and professional societies, including the deputy director for the department of the Information and Technology Science of the Chinese Academy of Sciences, the director of the Xi’an International Academy for Mathematics and Mathematical Technology, the director for the National Engineering Laboratory for Big Data Analytics, the member of National Big Data Expert Advisory Committee and the member of National Open Innovation Platform for New Generation Artificial Intelligence and Strategic Advisory Committee.