Proceedings of the 2023 Workshop on Recent Advances in Resilient and Trustworthy ML Systems in Autonomous Networks
Last Update : [26 November, 2023]
SESSION: Keynote Talks
- Sandra Scott-Hayward
Space-air-ground integrated networks (SAGIN) or space-air-ground-sea integrated networks (SAGSIN) extend terrestrial network infrastructure to non-terrestrial networks (NTNs) integrating satellite communications, unmanned aerial vehicles (UAVs), high altitude platforms (HAPs), and space systems. These integrated networks can provide connectivity to remote locations, enhance service delivery, and support new ventures. While there is a benefit to the use of machine learning (ML) and artificial intelligence (AI) to manage analysis of massive volumes of space and terrestrial network data and to provide intelligent systems management in space missions and terrestrial network services, the extension of terrestrial networking to integrate space systems merits analysis with respect to AI and security. ML and AI solutions are being explored to advance cooperative and autonomous behaviour, self-organisation, and optimization within the SAGIN. Specifically, there are questions around the impact of interactive systems in space e.g., satellite/UAV cooperation (AI-driven) and security considerations of the dynamic environment. There is also the question of the global nature of space e.g., how to cooperate/collaborate between nations, how to regulate or account for the lack of regulation in space. Furthermore, there is a question of sustainability and the impact of expansion of terrestrial systems into space. In this talk, we will explore the security, intelligence, and programmable aspects of SAGIN with particular consideration of the dynamic interactions throughout the SAGIN, the global nature of space and what this means for SAGIN design and operation.
- Konrad Rieck
Academia is thriving like never before. Thousands of papers are submitted to conferences on hot research topics, such as artificial intelligence and computer vision. To handle this growth, systems for automatic paper-reviewer assignments are increasingly used during the reviewing process. These systems employ statistical topic models from machine learning to characterize the content of papers and automate their assignment to reviewers.
In this keynote talk, we explore the attack surface introduced by entrusting the matching of reviewers to machine-learning algorithms. In particular, we introduce an attack that modifies a given paper so that it selects its own reviewers. Technically, this attack builds on a novel optimization strategy that alternates between fooling the topic model and preserving the semantics of the document. In an empirical evaluation with a (simulated) conference, our attack successfully selects and removes reviewers in different scenarios, while the tampered papers remain indistinguishable from innocuous submissions to human readers. The talk is based on a paper by Eisenhofer & Quiring et al. published at the USENIX Security Symposium in 2023.
SESSION: Session 1: Resilient ML Systems
How Resilient is Privacy-preserving Machine Learning Towards Data-Driven Policy? Jakarta COVID-19 Patient Study Case
- Bahrul Ilmi Nasution
- Yudhistira Nugraha
- Irfan Dwiki Bhaswara
- Muhamad Erza Aminanto
With the rise of personal data law in various countries, data privacy has recently become an essential issue. One of the well-known techniques used in overcoming privacy issues during analysis is differential privacy. However, many studies have shown that differential privacy decreased the machine learning model performance. It becomes problematic for any organization like the government to draw a policy from accurate insights from citizen statistics while maintaining citizen privacy. This study reviews differential privacy in machine learning algorithms and evaluates its performance on real COVID-19 patient data, using Jakarta, Indonesia as a case study. Besides that, we also validate our study with two additional datasets, the public Adult dataset from University of California, Irvine, and an Indonesia socioeconomic dataset. We find that using differential privacy tends to reduce accuracy and may lead to model failure in imbalanced data, particularly in more complex models such as random forests. The finding emphasizes differential privacy usage in government is practical for the trustworthy government but with distinct challenges. We discuss limitations and recommendations for any organization that works with personal data to leverage differential privacy in the future.
- Soohyun Jung
- Keisuke Furumoto
- Takeshi Takahashi
- Yoshiaki Shiraishi
Software vulnerabilities have emerged as a critical concern, leading to substantial financial losses for individuals, corporations, and diverse stakeholders. The application of machine learning models in automating vulnerability assessment has gained traction, yet the pervasive challenge of concept drift often hinders their efficacy. Concept drift, manifested as shifts in statistical data properties over time or influenced by external factors, undermines model accuracy and necessitates continuous adaptation. This paper addresses the pressing issue of concept drift in the context of automated vulnerability assessment. Our contribution lies in proposing a novel methodology to mitigate concept drift-induced performance degradation. We advocate for incorporating interval-based training, where models are learned during periods devoid of concept drift, and a repository of diverse models is established. Subsequently, model selection is guided by a dynamic evaluation process using a modest amount of new concept data. Through comprehensive experimentation, we illuminate our approach's prowess in swiftly accommodating evolving data distributions, thereby safeguarding the accuracy of vulnerability assessment models.
SESSION: Session 2: Robust ML Systems
- Romain Ilbert
- Thai V. Hoang
- Zonghua Zhang
- Themis Palpanas
Balancing the trade-off between accuracy and robustness is a long-standing challenge in time series forecasting. While most of existing robust algorithms have achieved certain suboptimal performance on clean data, sustaining the same performance level in the presence of data perturbations remains extremely hard. In this paper, we study a wide array of perturbation scenarios and propose novel defense mechanisms against adversarial attacks using real-world telecom data. We compare our strategy against two existing adversarial training algorithms under a range of maximal allowed perturbations, defined using \ell_\infty -norm, \in [0.1,0.4]. Our findings reveal that our hybrid strategy, which is composed of a classifier to detect adversarial examples, a denoiser to eliminate noise from the perturbed data samples, and a standard forecaster, achieves the best performance on both clean and perturbed data. Our optimal model can retain up to 92.02% the performance of the original forecasting model in terms of Mean Squared Error (MSE) on clean data, while being more robust than the standard adversarially trained models on perturbed data. Its MSE is 2.71× and 2.51× lower than those of comparing methods on normal and perturbed data, respectively. In addition, the components of our models can be trained in parallel, resulting in better computational efficiency. Our results indicate that we can optimally balance the trade-off between the performance and robustness of forecasting models by improving the classifier and denoiser, even in the presence of sophisticated and destructive poisoning attacks.
- Muhammad Akbar Husnoo
- Adnan Anwar
- Robin Doss
- Nasser Hosseinzadeh
The decentralization of modern power grid systems in this new era of advanced communication and information technology to enhance efficacy and efficiency has created a new class of privacy challenges which requires innovative approaches. In recent times, Federated Learning (FL) has surfaced as a promising privacy-preserving solution to misbehaviour detection in smart grids which enables the collaborative learning of a model without requiring sharing of raw sensitive power-related data. Despite its virtues, recent literature have highlighted that FL-based approaches are inherently prone to Byzantine threats due to their potential in compromising the integrity of the learning process and undermining the performance and reliability of misbehaviour detection models. Therefore, to tackle these technical impediments, this manuscript puts forward a novel privacy-preserving and computationally-efficient federated misbehaviour detection technique that discriminates between natural power system disturbances and cyberattack events. Specifically, our designated solution leverages the use of a privacy-preserving gradient quantization-based scheme known as Differentially-Private Sign Stochastic Gradient Descent (DP-SIGNSGD) to improve the robustness of anomaly detection approaches against Byzantine attacks and improve computation efficiency. Empirical evaluations of our proposed framework on using publicly available industrial control systems datasets reveal superior attack detection rates whilst being resilient to Byzantine threats and computation-efficient as opposed to conventional FL strategies.
SESSION: Session 3: Explainable ML Systems
- Harry Chandra Tanuwidjaja
- Takeshi Takahashi
- Tsung-Nan Lin
- Boyi Lee
- Tao Ban
Intrusion Detection Systems (IDSs) play a major role in detecting suspicious activities and alerting users of potential malicious adversaries. Security operators investigate these alerts and attempt to mitigate the risks and damage. Many IDS-related studies have focused on improving detection accuracy and reducing false positives; however, the operators need to understand the rationale behind IDS engines issuing an alert. In contrast to conventional rule-based engines, machine-learning-based engines use a detection mechanism that is like a black box, i.e., it is not designed to indicate a rationale. % Explainable AI (XAI) techniques, which explain how the model derives decisions, have also been studied. In this paper, we introduce an explainable IDS (X-IDS) that copes with the well-used XAI techniques to ensure that the system can explain the decisions. To this end, we used local interpretable model-agnostic explanations and Shapley additive explanations, and we evaluated their differing characteristics. We proposed our explanation framework that consists of the variable importance plot, individual value plot, and partial dependence plot. Furthermore, we conclude by discussing future issues regarding better explainable IDS.