Computer Science

New submissions

Submissions received from Thu 16 May 24 to Fri 17 May 24, announced Mon, 20 May 24

New submissions
Cross-lists
Replacements

[ total of 475 entries: 1-475 ]
[ showing up to 2000 entries per page: fewer | more ]

New submissions for Mon, 20 May 24

[1] arXiv:2405.10346 [pdf, other]: Title: AMCEN: An Attention Masking-based Contrastive Event Network for Two-stage Temporal Knowledge Graph Reasoning

Authors: Jing Yang, Xiao Wang, Yutong Wang, Jiawei Wang, Fei-Yue Wang

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Temporal knowledge graphs (TKGs) can effectively model the ever-evolving nature of real-world knowledge, and their completeness and enhancement can be achieved by reasoning new events from existing ones. However, reasoning accuracy is adversely impacted due to an imbalance between new and recurring events in the datasets. To achieve more accurate TKG reasoning, we propose an attention masking-based contrastive event network (AMCEN) with local-global temporal patterns for the two-stage prediction of future events. In the network, historical and non-historical attention mask vectors are designed to control the attention bias towards historical and non-historical entities, acting as the key to alleviating the imbalance. A local-global message-passing module is proposed to comprehensively consider and capture multi-hop structural dependencies and local-global temporal evolution for the in-depth exploration of latent impact factors of different event types. A contrastive event classifier is used to classify events more accurately by incorporating local-global temporal patterns into contrastive learning. Therefore, AMCEN refines the prediction scope with the results of the contrastive event classification, followed by utilizing attention masking-based decoders to finalize the specific outcomes. The results of our experiments on four benchmark datasets highlight the superiority of AMCEN. Especially, the considerable improvements in Hits@1 prove that AMCEN can make more precise predictions about future occurrences.
[2] arXiv:2405.10347 [pdf, other]: Title: Networking Systems for Video Anomaly Detection: A Tutorial and Survey

Authors: Jing Liu, Yang Liu, Jieyu Lin, Jielin Li, Peng Sun, Bo Hu, Liang Song, Azzedine Boukerche, Victor C.M. Leung

Comments: Submitted to ACM Computing Surveys, under review,for more information and supplementary material, please see this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

The increasing prevalence of surveillance cameras in smart cities, coupled with the surge of online video applications, has heightened concerns regarding public security and privacy protection, which propelled automated Video Anomaly Detection (VAD) into a fundamental research task within the Artificial Intelligence (AI) community. With the advancements in deep learning and edge computing, VAD has made significant progress and advances synergized with emerging applications in smart cities and video internet, which has moved beyond the conventional research scope of algorithm engineering to deployable Networking Systems for VAD (NSVAD), a practical hotspot for intersection exploration in the AI, IoVT, and computing fields. In this article, we delineate the foundational assumptions, learning frameworks, and applicable scenarios of various deep learning-driven VAD routes, offering an exhaustive tutorial for novices in NSVAD. This article elucidates core concepts by reviewing recent advances and typical solutions, and aggregating available research resources (e.g., literatures, code, tools, and workshops) accessible at https://github.com/fdjingliu/NSVAD. Additionally, we showcase our latest NSVAD research in industrial IoT and smart cities, along with an end-cloud collaborative architecture for deployable NSVAD to further elucidate its potential scope of research and application. Lastly, this article projects future development trends and discusses how the integration of AI and computing technologies can address existing research challenges and promote open opportunities, serving as an insightful guide for prospective researchers and engineers.
[3] arXiv:2405.10350 [pdf, other]: Title: Monitizer: Automating Design and Evaluation of Neural Network Monitors

Authors: Muqsit Azeem, Marta Grobelna, Sudeep Kanav, Jan Kretinsky, Stefanie Mohr, Sabine Rieder

Comments: accepted at CAV 2024

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Software Engineering (cs.SE)

The behavior of neural networks (NNs) on previously unseen types of data (out-of-distribution or OOD) is typically unpredictable. This can be dangerous if the network's output is used for decision-making in a safety-critical system. Hence, detecting that an input is OOD is crucial for the safe application of the NN. Verification approaches do not scale to practical NNs, making runtime monitoring more appealing for practical use. While various monitors have been suggested recently, their optimization for a given problem, as well as comparison with each other and reproduction of results, remain challenging. We present a tool for users and developers of NN monitors. It allows for (i) application of various types of monitors from the literature to a given input NN, (ii) optimization of the monitor's hyperparameters, and (iii) experimental evaluation and comparison to other approaches. Besides, it facilitates the development of new monitoring approaches. We demonstrate the tool's usability on several use cases of different types of users as well as on a case study comparing different approaches from recent literature.
[4] arXiv:2405.10357 [pdf, other]: Title: RGB Guided ToF Imaging System: A Survey of Deep Learning-based Methods

Authors: Xin Qiao, Matteo Poggi, Pengchao Deng, Hao Wei, Chenyang Ge, Stefano Mattoccia

Comments: To appear on International Journal of Computer Vision (IJCV)

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Integrating an RGB camera into a ToF imaging system has become a significant technique for perceiving the real world. The RGB guided ToF imaging system is crucial to several applications, including face anti-spoofing, saliency detection, and trajectory prediction. Depending on the distance of the working range, the implementation schemes of the RGB guided ToF imaging systems are different. Specifically, ToF sensors with a uniform field of illumination, which can output dense depth but have low resolution, are typically used for close-range measurements. In contrast, LiDARs, which emit laser pulses and can only capture sparse depth, are usually employed for long-range detection. In the two cases, depth quality improvement for RGB guided ToF imaging corresponds to two sub-tasks: guided depth super-resolution and guided depth completion. In light of the recent significant boost to the field provided by deep learning, this paper comprehensively reviews the works related to RGB guided ToF imaging, including network structures, learning strategies, evaluation metrics, benchmark datasets, and objective functions. Besides, we present quantitative comparisons of state-of-the-art methods on widely used benchmark datasets. Finally, we discuss future trends and the challenges in real applications for further research.
[5] arXiv:2405.10370 [pdf, other]: Title: Grounded 3D-LLM with Referent Tokens

Authors: Yilun Chen, Shuai Yang, Haifeng Huang, Tai Wang, Ruiyuan Lyu, Runsen Xu, Dahua Lin, Jiangmiao Pang

Comments: Preprint

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Prior studies on 3D scene understanding have primarily developed specialized models for specific tasks or required task-specific fine-tuning. In this study, we propose Grounded 3D-LLM, which explores the potential of 3D large multi-modal models (3D LMMs) to consolidate various 3D vision tasks within a unified generative framework. The model uses scene referent tokens as special noun phrases to reference 3D scenes, enabling the handling of sequences that interleave 3D and textual data. It offers a natural approach for translating 3D vision tasks into language formats using task-specific instruction templates. To facilitate the use of referent tokens in subsequent language modeling, we have curated large-scale grounded language datasets that offer finer scene-text correspondence at the phrase level by bootstrapping existing object labels. Subsequently, we introduced Contrastive LAnguage-Scene Pre-training (CLASP) to effectively leverage this data, thereby integrating 3D vision with language models. Our comprehensive evaluation covers open-ended tasks like dense captioning and 3D QA, alongside close-ended tasks such as object detection and language grounding. Experiments across multiple 3D benchmarks reveal the leading performance and the broad applicability of Grounded 3D-LLM. Code and datasets will be released on the project page: https://groundedscenellm.github.io/grounded_3d-llm.github.io.
[6] arXiv:2405.10372 [pdf, other]: Title: Efficient model predictive control for nonlinear systems modelled by deep neural networks

Authors: Jianglin Lan

Comments: 8 pages, 5 figures

Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG); Optimization and Control (math.OC)

This paper presents a model predictive control (MPC) for dynamic systems whose nonlinearity and uncertainty are modelled by deep neural networks (NNs), under input and state constraints. Since the NN output contains a high-order complex nonlinearity of the system state and control input, the MPC problem is nonlinear and challenging to solve for real-time control. This paper proposes two types of methods for solving the MPC problem: the mixed integer programming (MIP) method which produces an exact solution to the nonlinear MPC, and linear relaxation (LR) methods which generally give suboptimal solutions but are much computationally cheaper. Extensive numerical simulation for an inverted pendulum system modelled by ReLU NNs of various sizes is used to demonstrate and compare performance of the MIP and LR methods.
[7] arXiv:2405.10373 [pdf, other]: Title: A Transdisciplinary Approach to Cybersecurity: A Framework for Encouraging Transdisciplinary Thinking

Authors: Emily Kesler

Subjects: Cryptography and Security (cs.CR)

Classical cybersecurity is often perceived as a rigid science discipline filled with computer scientists and mathematicians. However, due to the rapid pace of technology development and integration, new criminal enterprises, new defense tactics, and the understanding of the human element, cybersecurity is quickly beginning to encompass more than just computers. Cybersecurity experts must broaden their perspectives beyond traditional disciplinary boundaries to provide the best protection possible. They must start to practice transdisciplinary cybersecurity. Taking influence from the Stakeholder Theory in business ethics, this paper presents a framework to encourage transdisciplinary thinking and assist experts in tackling the new challenges of the modern day. The framework uses the simple Think, Plan, Do approach to enable experts to develop their transdisciplinary thinking. The framework is intended to be used as an evaluation tool for existing cybersecurity practices or postures, as a development tool to engage with other disciplines to foster learning and create new methods, and as a guidance tool to encourage new ways of thinking about, perceiving, and executing cybersecurity practices. For each of those intended uses, a use case is presented as an example to showcase how the framework might be used. The ultimate goal of this paper is not the framework but transdisciplinary thinking. By using the tool presented here and developing their own transdisciplinary thinking, cybersecurity experts can be better prepared to face cybersecurity's unique and complex challenges.
[8] arXiv:2405.10375 [pdf, ps, other]: Title: Implementing a GRU Neural Network for Flood Prediction in Ashland City, Tennessee

Authors: George K. Fordjour, Alfred J. Kalyanapu

Comments: 13 pages, 3 figures, 3 tables

Subjects: Machine Learning (cs.LG)

Ashland City, Tennessee, located within the Lower Cumberland Sycamore watershed, is highly susceptible to flooding due to increased upstream water levels. This study aimed to develop a robust flood prediction model for the city, utilizing water level data at 30-minute intervals from ten USGS gauge stations within the watershed. A Gated Recurrent Unit (GRU) network, known for its ability to effectively process sequential time-series data, was used. The model was trained, validated, and tested using a year-long dataset (January 2021-January 2022), and its performance was evaluated using statistical metrics including Nash-Sutcliffe Efficiency (NSE), Root Mean Squared Error (RMSE), Percent Bias (PBIAS), Mean Absolute Error (MAE), and Coefficient of Determination (R^2). The results demonstrated a high level of accuracy, with the model explaining 98.2% of the variance in the data. Despite minor discrepancies between predicted and observed values, the GRU model proved to be an effective tool for flood prediction in Ashland City, with potential applications for enhancing disaster preparedness and response efforts in Ashland City.
[9] arXiv:2405.10376 [pdf, ps, other]: Title: Dealing Doubt: Unveiling Threat Models in Gradient Inversion Attacks under Federated Learning, A Survey and Taxonomy

Authors: Yichuan Shi, Olivera Kotevska, Viktor Reshniak, Abhishek Singh, Ramesh Raskar

Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)

Federated Learning (FL) has emerged as a leading paradigm for decentralized, privacy preserving machine learning training. However, recent research on gradient inversion attacks (GIAs) have shown that gradient updates in FL can leak information on private training samples. While existing surveys on GIAs have focused on the honest-but-curious server threat model, there is a dearth of research categorizing attacks under the realistic and far more privacy-infringing cases of malicious servers and clients. In this paper, we present a survey and novel taxonomy of GIAs that emphasize FL threat models, particularly that of malicious servers and clients. We first formally define GIAs and contrast conventional attacks with the malicious attacker. We then summarize existing honest-but-curious attack strategies, corresponding defenses, and evaluation metrics. Critically, we dive into attacks with malicious servers and clients to highlight how they break existing FL defenses, focusing specifically on reconstruction methods, target model architectures, target data, and evaluation metrics. Lastly, we discuss open problems and future research directions.
[10] arXiv:2405.10377 [pdf, ps, other]: Title: Smart Routing with Precise Link Estimation: DSEE-Based Anypath Routing for Reliable Wireless Networking

Authors: Narjes Nourzad, Bhaskar Krishnamachari

Comments: ICMLCN 2024

Subjects: Networking and Internet Architecture (cs.NI); Machine Learning (cs.LG)

In dynamic and resource-constrained environments, such as multi-hop wireless mesh networks, traditional routing protocols often falter by relying on predetermined paths that prove ineffective in unpredictable link conditions. Shortest Anypath routing offers a solution by adapting routing decisions based on real-time link conditions. However, the effectiveness of such routing is fundamentally dependent on the quality and reliability of the available links, and predicting these variables with certainty is challenging. This paper introduces a novel approach that leverages the Deterministic Sequencing of Exploration and Exploitation (DSEE), a multi-armed bandit algorithm, to address the need for accurate and real-time estimation of link delivery probabilities. This approach augments the reliability and resilience of the Shortest Anypath routing in the face of fluctuating link conditions. By coupling DSEE with Anypath routing, this algorithm continuously learns and ensures accurate delivery probability estimation and selects the most suitable way to efficiently route packets while maintaining a provable near-logarithmic regret bound. We also theoretically prove that our proposed scheme offers better regret scaling with respect to the network size than the previously proposed Thompson Sampling-based Opportunistic Routing (TSOR).
[11] arXiv:2405.10378 [pdf, other]: Title: A Polynomial-Time Approximation for Pairwise Fair $k$-Median Clustering

Authors: Sayan Bandyapadhyay, Eden Chlamtáč, Yury Makarychev, Ali Vakilian

Subjects: Data Structures and Algorithms (cs.DS); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

In this work, we study pairwise fair clustering with $\ell \ge 2$ groups, where for every cluster $C$ and every group $i \in [\ell]$, the number of points in $C$ from group $i$ must be at most $t$ times the number of points in $C$ from any other group $j \in [\ell]$, for a given integer $t$. To the best of our knowledge, only bi-criteria approximation and exponential-time algorithms follow for this problem from the prior work on fair clustering problems when $\ell > 2$. In our work, focusing on the $\ell > 2$ case, we design the first polynomial-time $(t^{\ell}\cdot \ell\cdot k)^{O(\ell)}$-approximation for this problem with $k$-median cost that does not violate the fairness constraints. We complement our algorithmic result by providing hardness of approximation results, which show that our problem even when $\ell=2$ is almost as hard as the popular uniform capacitated $k$-median, for which no polynomial-time algorithm with an approximation factor of $o(\log k)$ is known.
[12] arXiv:2405.10385 [pdf, other]: Title: AmazUtah_NLP at SemEval-2024 Task 9: A MultiChoice Question Answering System for Commonsense Defying Reasoning

Authors: Mina Ghashami, Soumya Smruti Mishra

Comments: Accepted at SemEval 2024 (Colocated with NAACL 2024)

Journal-ref: Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Machine Learning (cs.LG)

The SemEval 2024 BRAINTEASER task represents a pioneering venture in Natural Language Processing (NLP) by focusing on lateral thinking, a dimension of cognitive reasoning that is often overlooked in traditional linguistic analyses. This challenge comprises of Sentence Puzzle and Word Puzzle subtasks and aims to test language models' capacity for divergent thinking.
In this paper, we present our approach to the BRAINTEASER task. We employ a holistic strategy by leveraging cutting-edge pre-trained models in multiple choice architecture, and diversify the training data with Sentence and Word Puzzle datasets. To gain further improvement, we fine-tuned the model with synthetic humor/jokes dataset and the RiddleSense dataset which helped augmenting the model's lateral thinking abilities. Empirical results show that our approach achieve 92.5\% accuracy in Sentence Puzzle subtask and 80.2\% accuracy in Word Puzzle subtask.
[13] arXiv:2405.10389 [pdf, other]: Title: Physics-Informed Heterogeneous Graph Neural Networks for DC Blocker Placement

Authors: Hongwei Jin, Prasanna Balaprakash, Allen Zou, Pieter Ghysels, Aditi S. Krishnapriyan, Adam Mate, Arthur Barnes, Russell Bent

Comments: Paper is accepted by PSCC 2024

Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)

The threat of geomagnetic disturbances (GMDs) to the reliable operation of the bulk energy system has spurred the development of effective strategies for mitigating their impacts. One such approach involves placing transformer neutral blocking devices, which interrupt the path of geomagnetically induced currents (GICs) to limit their impact. The high cost of these devices and the sparsity of transformers that experience high GICs during GMD events, however, calls for a sparse placement strategy that involves high computational cost. To address this challenge, we developed a physics-informed heterogeneous graph neural network (PIHGNN) for solving the graph-based dc-blocker placement problem. Our approach combines a heterogeneous graph neural network (HGNN) with a physics-informed neural network (PINN) to capture the diverse types of nodes and edges in ac/dc networks and incorporates the physical laws of the power grid. We train the PIHGNN model using a surrogate power flow model and validate it using case studies. Results demonstrate that PIHGNN can effectively and efficiently support the deployment of GIC dc-current blockers, ensuring the continued supply of electricity to meet societal demands. Our approach has the potential to contribute to the development of more reliable and resilient power grids capable of withstanding the growing threat that GMDs pose.
[14] arXiv:2405.10390 [pdf, ps, other]: Title: Two-point stress approximation: A simple and robust finite volume method for linearized (poro-)mechanics and Stokes flow

Authors: Jan Martin Nordbotten, Eirik Keilegavlen

Subjects: Numerical Analysis (math.NA)

We construct a simple and robust finite volume discretization for linearized mechanics, Stokes and poromechanics, based only on co-located, cell-centered variables. The discretization has a minimal stencil, using only the two neighboring cells to a face to calculate numerical stresses and fluxes.
We fully justify the method theoretically in terms of stability and convergence, both of which are robust in terms of the material parameters. Numerical experiments support the theoretical results, and shed light on grid families not explicitly treated by the theoretical results.
[15] arXiv:2405.10391 [pdf, other]: Title: Vision Transformers for End-to-End Vision-Based Quadrotor Obstacle Avoidance

Authors: Anish Bhattacharya, Nishanth Rao, Dhruv Parikh, Pratik Kunapuli, Nikolai Matni, Vijay Kumar

Comments: 8 pages, 10 figures, 3 tables

Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Image and Video Processing (eess.IV)

We demonstrate the capabilities of an attention-based end-to-end approach for high-speed quadrotor obstacle avoidance in dense, cluttered environments, with comparison to various state-of-the-art architectures. Quadrotor unmanned aerial vehicles (UAVs) have tremendous maneuverability when flown fast; however, as flight speed increases, traditional vision-based navigation via independent mapping, planning, and control modules breaks down due to increased sensor noise, compounding errors, and increased processing latency. Thus, learning-based, end-to-end planning and control networks have shown to be effective for online control of these fast robots through cluttered environments. We train and compare convolutional, U-Net, and recurrent architectures against vision transformer models for depth-based end-to-end control, in a photorealistic, high-physics-fidelity simulator as well as in hardware, and observe that the attention-based models are more effective as quadrotor speeds increase, while recurrent models with many layers provide smoother commands at lower speeds. To the best of our knowledge, this is the first work to utilize vision transformers for end-to-end vision-based quadrotor control.
[16] arXiv:2405.10392 [pdf, other]: Title: Transport based particle methods for the Fokker-Planck-Landau equation

Authors: Vasily Ilin, Jingwei Hu, Zhenfu Wang

Comments: 26 pages, 6 figures, code this https URL

Subjects: Numerical Analysis (math.NA); Machine Learning (cs.LG); Analysis of PDEs (math.AP)

We propose a particle method for numerically solving the Landau equation, inspired by the score-based transport modeling (SBTM) method for the Fokker-Planck equation. This method can preserve some important physical properties of the Landau equation, such as the conservation of mass, momentum, and energy, and decay of estimated entropy. We prove that matching the gradient of the logarithm of the approximate solution is enough to recover the true solution to the Landau equation with Maxwellian molecules. Several numerical experiments in low and moderately high dimensions are performed, with particular emphasis on comparing the proposed method with the traditional particle or blob method.
[17] arXiv:2405.10398 [pdf, other]: Title: Drone-type-Set: Drone types detection benchmark for drone detection and tracking

Authors: Kholoud AlDosari, AIbtisam Osman, Omar Elharrouss, Somaya AlMaadeed, Mohamed Zied Chaari

Subjects: Computer Vision and Pattern Recognition (cs.CV)

The Unmanned Aerial Vehicles (UAVs) market has been significantly growing and Considering the availability of drones at low-cost prices the possibility of misusing them, for illegal purposes such as drug trafficking, spying, and terrorist attacks posing high risks to national security, is rising. Therefore, detecting and tracking unauthorized drones to prevent future attacks that threaten lives, facilities, and security, become a necessity. Drone detection can be performed using different sensors, while image-based detection is one of them due to the development of artificial intelligence techniques. However, knowing unauthorized drone types is one of the challenges due to the lack of drone types datasets. For that, in this paper, we provide a dataset of various drones as well as a comparison of recognized object detection models on the proposed dataset including YOLO algorithms with their different versions, like, v3, v4, and v5 along with the Detectronv2. The experimental results of different models are provided along with a description of each method. The collected dataset can be found in https://drive.google.com/drive/folders/1EPOpqlF4vG7hp4MYnfAecVOsdQ2JwBEd?usp=share_link
[18] arXiv:2405.10402 [pdf, other]: Title: Formulae and transformations for simplicial tensorial finite elements via polytopal templates

Authors: Adam Sky, Michael Neunteufel, Jack S. Hale, Andreas Zilian

Subjects: Numerical Analysis (math.NA)

We introduce a unified method for constructing the basis functions of a wide variety of partially continuous tensor-valued finite elements on simplices using polytopal templates. These finite element spaces are essential for achieving well-posed discretisations of mixed formulations of partial differential equations that involve tensor-valued functions, such as the Hellinger-Reissner formulation of linear elasticity. In our proposed polytopal template method, the basis functions are constructed from template tensors associated with the geometric polytopes (vertices, edges, faces etc.) of the reference simplex and any scalar-valued $H^1$-conforming finite element space. From this starting point we can construct the Regge, Hellan-Herrmann-Johnson, Pechstein-Sch\"oberl, Hu-Zhang, Hu-Ma-Sun and Gopalakrishnan-Lederer-Sch\"oberl elements. Because the Hu-Zhang element and the Hu-Ma-Sun element cannot be mapped from the reference simplex to a physical simplex via standard double Piola mappings, we also demonstrate that the polytopal template tensors can be used to define a consistent mapping from a reference simplex even to a non-affine simplex in the physical mesh. Finally, we discuss the implications of element regularity with two numerical examples for the Reissner-Mindlin plate problem.
[19] arXiv:2405.10410 [pdf, ps, other]: Title: The fast committor machine: Interpretable prediction with kernels

Authors: D. Aristoff, M. Johnson, G. Simpson, R. J. Webber

Comments: 10 pages, 7 figures

Subjects: Numerical Analysis (math.NA); Machine Learning (stat.ML)

In the study of stochastic dynamics, the committor function describes the probability that a process starting from an initial configuration $x$ will reach set $A$ before set $B$. This paper introduces a fast and interpretable method for approximating the committor, called the "fast committor machine" (FCM). The FCM is based on simulated trajectory data, and it uses this data to train a kernel model. The FCM identifies low-dimensional subspaces that optimally describe the $A$ to $B$ transitions, and the subspaces are emphasized in the kernel model. The FCM uses randomized numerical linear algebra to train the model with runtime that scales linearly in the number of data points. This paper applies the FCM to example systems including the alanine dipeptide miniprotein: in these experiments, the FCM is generally more accurate and trains more quickly than a neural network with a similar number of parameters.
[20] arXiv:2405.10421 [pdf, ps, other]: Title: Pointwise Metrics for Clustering Evaluation

Authors: Stephan van Staden

Subjects: Information Retrieval (cs.IR)

This paper defines pointwise clustering metrics, a collection of metrics for characterizing the similarity of two clusterings. These metrics have several interesting properties which make them attractive for practical applications. They can take into account the relative importance of the various items that are clustered. The metric definitions are based on standard set-theoretic notions and are simple to understand. They characterize aspects that are important for typical applications, such as cluster homogeneity and completeness. It is possible to assign metrics to individual items, clusters, arbitrary slices of items, and the overall clustering. The metrics can provide deep insights, for example they can facilitate drilling deeper into clustering mistakes to understand where they happened, or help to explore slices of items to understand how they were affected. Since the pointwise metrics are mathematically well-behaved, they can provide a strong foundation for a variety of clustering evaluation techniques. In this paper we discuss in depth how the pointwise metrics can be used to evaluate an actual clustering with respect to a ground truth clustering.
[21] arXiv:2405.10422 [pdf, other]: Title: A First Look at Immersive Telepresence on Apple Vision Pro

Authors: Ruizhi Cheng, Nan Wu, Matteo Varvello, Eugene Chai, Songqing Chen, Bo Han

Subjects: Networking and Internet Architecture (cs.NI)

Due to the widespread adoption of "work-from-home" policies, videoconferencing applications (e.g., Zoom) have become indispensable for remote communication. However, these systems lack immersiveness, leading to the so-called "Zoom fatigue" and degrading communication efficiency. The recent debut of Apple Vision Pro, a mixed reality headset that supports "spatial persona", aims to offer an immersive telepresence experience with these applications. In this paper, we conduct a first-of-its-kind in-depth and empirical study to analyze the performance of immersive telepresence with four applications, Apple FaceTime, Cisco Webex, Microsoft Teams, and Zoom, on Vision Pro. We find that only FaceTime provides a truly immersive experience with spatial personas, whereas other applications still operate 2D personas. Our measurement results reveal that (1) FaceTime delivers semantic information to optimize bandwidth consumption, which is even lower than that of 2D persona for other applications, and (2) it employs visibility-aware optimizations to reduce rendering overhead. However, the scalability of FaceTime remains limited, with a simple server allocation strategy that potentially leads to high network delay among users.
[22] arXiv:2405.10423 [pdf, other]: Title: Diversity-Aware Sign Language Production through a Pose Encoding Variational Autoencoder

Authors: Mohamed Ilyes Lakhal, Richard Bowden

Subjects: Computer Vision and Pattern Recognition (cs.CV)

This paper addresses the problem of diversity-aware sign language production, where we want to give an image (or sequence) of a signer and produce another image with the same pose but different attributes (\textit{e.g.} gender, skin color). To this end, we extend the variational inference paradigm to include information about the pose and the conditioning of the attributes. This formulation improves the quality of the synthesised images. The generator framework is presented as a UNet architecture to ensure spatial preservation of the input pose, and we include the visual features from the variational inference to maintain control over appearance and style. We generate each body part with a separate decoder. This architecture allows the generator to deliver better overall results. Experiments on the SMILE II dataset show that the proposed model performs quantitatively better than state-of-the-art baselines regarding diversity, per-pixel image quality, and pose estimation. Quantitatively, it faithfully reproduces non-manual features for signers.
[23] arXiv:2405.10425 [pdf, other]: Title: Data Selection for Transfer Unlearning

Authors: Nazanin Mohammadi Sepahvand, Vincent Dumoulin, Eleni Triantafillou, Gintare Karolina Dziugaite

Subjects: Machine Learning (cs.LG)

As deep learning models are becoming larger and data-hungrier, there are growing ethical, legal and technical concerns over use of data: in practice, agreements on data use may change over time, rendering previously-used training data impermissible for training purposes. These issues have driven increased attention to machine unlearning: removing "the influence of" a subset of training data from a trained model. In this work, we advocate for a relaxed definition of unlearning that does not address privacy applications but targets a scenario where a data owner withdraws permission of use of their data for training purposes. In this context, we consider the important problem of \emph{transfer unlearning} where a pretrained model is transferred to a target dataset that contains some "non-static" data that may need to be unlearned in the future. We propose a new method that uses a mechanism for selecting relevant examples from an auxiliary "static" dataset, and finetunes on the selected data instead of "non-static" target data; addressing all unlearning requests ahead of time. We also adapt a recent relaxed definition of unlearning to our problem setting and demonstrate that our approach is an exact transfer unlearner according to it, while being highly efficient (amortized). We find that our method outperforms the gold standard "exact unlearning" (finetuning on only the "static" portion of the target dataset) on several datasets, especially for small "static" sets, sometimes approaching an upper bound for test accuracy. We also analyze factors influencing the accuracy boost obtained by data selection.
[24] arXiv:2405.10426 [pdf, other]: Title: Memory-efficient Energy-adaptive Inference of Pre-Trained Models on Batteryless Embedded Systems

Authors: Pietro Farina, Subrata Biswas, Eren Yıldız, Khakim Akhunov, Saad Ahmed, Bashima Islam, Kasım Sinan Yıldırım

Comments: This paper has been selected for publication at the 21st International Conference on Embedded Wireless Systems and Networks (EWSN'24)

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

Batteryless systems frequently face power failures, requiring extra runtime buffers to maintain inference progress and leaving only a memory space for storing ultra-tiny deep neural networks (DNNs). Besides, making these models responsive to stochastic energy harvesting dynamics during inference requires a balance between inference accuracy, latency, and energy overhead. Recent works on compression mostly focus on time and memory, but often ignore energy dynamics or significantly reduce the accuracy of pre-trained DNNs. Existing energy-adaptive inference works modify the architecture of pre-trained models and have significant memory overhead. Thus, energy-adaptive and accurate inference of pre-trained DNNs on batteryless devices with extreme memory constraints is more challenging than traditional microcontrollers. We combat these issues by proposing FreeML, a framework to optimize pre-trained DNN models for memory-efficient and energy-adaptive inference on batteryless systems. FreeML comprises (1) a novel compression technique to reduce the model footprint and runtime memory requirements simultaneously, making them executable on extremely memory-constrained batteryless platforms; and (2) the first early exit mechanism that uses a single exit branch for all exit points to terminate inference at any time, making models energy-adaptive with minimal memory overhead. Our experiments showed that FreeML reduces the model sizes by up to $95 \times$, supports adaptive inference with a $2.03-19.65 \times$ less memory overhead, and provides significant time and energy benefits with only a negligible accuracy drop compared to the state-of-the-art.
[25] arXiv:2405.10429 [pdf, other]: Title: Physics-Guided State-Space Model Augmentation Using Weighted Regularized Neural Networks

Authors: Yuhan Liu, Roland Tóth, Maarten Schoukens

Subjects: Systems and Control (eess.SY)

Physics-guided neural networks (PGNN) is an effective tool that combines the benefits of data-driven modeling with the interpretability and generalization of underlying physical information. However, for a classical PGNN, the penalization of the physics-guided part is at the output level, which leads to a conservative result as systems with highly similar state-transition functions, i.e. only slight differences in parameters, can have significantly different time-series outputs. Furthermore, the classical PGNN cost function regularizes the model estimate over the entire state space with a constant trade-off hyperparameter. In this paper, we introduce a novel model augmentation strategy for nonlinear state-space model identification based on PGNN, using a weighted function regularization (W-PGNN). The proposed approach can efficiently augment the prior physics-based state-space models based on measurement data. A new weighted regularization term is added to the cost function to penalize the difference between the state and output function of the baseline physics-based and final identified model. This ensures the estimated model follows the baseline physics model functions in regions where the data has low information content, while placing greater trust in the data when a high informativity is present. The effectiveness of the proposed strategy over the current PGNN method is demonstrated on a benchmark example.
[26] arXiv:2405.10431 [pdf, other]: Title: Thinking Fair and Slow: On the Efficacy of Structured Prompts for Debiasing Language Models

Authors: Shaz Furniturewala, Surgan Jandial, Abhinav Java, Pragyan Banerjee, Simra Shahid, Sumit Bhatia, Kokil Jaidka

Comments: The first two authors have equal contribution

Subjects: Computation and Language (cs.CL)

Existing debiasing techniques are typically training-based or require access to the model's internals and output distributions, so they are inaccessible to end-users looking to adapt LLM outputs for their particular needs. In this study, we examine whether structured prompting techniques can offer opportunities for fair text generation. We evaluate a comprehensive end-user-focused iterative framework of debiasing that applies System 2 thinking processes for prompts to induce logical, reflective, and critical text generation, with single, multi-step, instruction, and role-based variants. By systematically evaluating many LLMs across many datasets and different prompting strategies, we show that the more complex System 2-based Implicative Prompts significantly improve over other techniques demonstrating lower mean bias in the outputs with competitive performance on the downstream tasks. Our work offers research directions for the design and the potential of end-user-focused evaluative frameworks for LLM use.
[27] arXiv:2405.10435 [pdf, other]: Title: Two-Stage Stochastic Optimal Power Flow for Microgrids With Uncertain Wildfire Effects

Authors: Sifat Chowdhury, Yu Zhang

Subjects: Systems and Control (eess.SY)

Large-scale power outages caused by extreme weather events are one of the major factors weakening grid resilience. In order to prevent the critical infrastructure from cascading failure, power lines are often proactively de-energized under the threat of a progressing wildfire. In this context, the potential of microgrid (MG) functioning in islanded mode can be exploited to enhance the resiliency of the power grid. However, there are numerous uncertainties originating from these types of events and an accurate modeling of the MG is required to harness its full potential. In this paper, we consider the uncertainty in line outages depending on fire propagation and reduced solar power generation due to the particulate matter in wildfire smoke. We formulate a two-stage stochastic MG optimal power flow problem by utilizing a second-order cone relaxation of the DistFlow model. Leveraging an effective approximation of the resistive heat gain, we separate the complicating constraints of dynamic line rating from the resulting optimization problem. Extensive simulation results corroborate the merits of our proposed framework, which is tested on a modified IEEE 22-bus system.
[28] arXiv:2405.10436 [pdf, other]: Title: Positional encoding is not the same as context: A study on positional encoding for Sequential recommendation

Authors: Alejo Lopez-Avila, Jinhua Du, Abbas Shimary, Ze Li

Comments: 19 pages, 3 figures, 12 tables

Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)

The expansion of streaming media and e-commerce has led to a boom in recommendation systems, including Sequential recommendation systems, which consider the user's previous interactions with items. In recent years, research has focused on architectural improvements such as transformer blocks and feature extraction that can augment model information. Among these features are context and attributes. Of particular importance is the temporal footprint, which is often considered part of the context and seen in previous publications as interchangeable with positional information. Other publications use positional encodings with little attention to them. In this paper, we analyse positional encodings, showing that they provide relative information between items that are not inferable from the temporal footprint. Furthermore, we evaluate different encodings and how they affect metrics and stability using Amazon datasets. We added some new encodings to help with these problems along the way. We found that we can reach new state-of-the-art results by finding the correct positional encoding, but more importantly, certain encodings stabilise the training.
[29] arXiv:2405.10439 [pdf, other]: Title: Beyond Traditional Single Object Tracking: A Survey

Authors: Omar Abdelaziz, Mohamed Shehata, Mohamed Mohamed

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Single object tracking is a vital task of many applications in critical fields. However, it is still considered one of the most challenging vision tasks. In recent years, computer vision, especially object tracking, witnessed the introduction or adoption of many novel techniques, setting new fronts for performance. In this survey, we visit some of the cutting-edge techniques in vision, such as Sequence Models, Generative Models, Self-supervised Learning, Unsupervised Learning, Reinforcement Learning, Meta-Learning, Continual Learning, and Domain Adaptation, focusing on their application in single object tracking. We propose a novel categorization of single object tracking methods based on novel techniques and trends. Also, we conduct a comparative analysis of the performance reported by the methods presented on popular tracking benchmarks. Moreover, we analyze the pros and cons of the presented approaches and present a guide for non-traditional techniques in single object tracking. Finally, we suggest potential avenues for future research in single-object tracking.
[30] arXiv:2405.10440 [pdf, other]: Title: Retrieving and Refining: A Hybrid Framework with Large Language Models for Rare Disease Identification

Authors: Jinge Wu, Hang Dong, Zexi Li, Arijit Patra, Honghan Wu

Subjects: Computation and Language (cs.CL)

The infrequency and heterogeneity of clinical presentations in rare diseases often lead to underdiagnosis and their exclusion from structured datasets. This necessitates the utilization of unstructured text data for comprehensive analysis. However, the manual identification from clinical reports is an arduous and intrinsically subjective task. This study proposes a novel hybrid approach that synergistically combines a traditional dictionary-based natural language processing (NLP) tool with the powerful capabilities of large language models (LLMs) to enhance the identification of rare diseases from unstructured clinical notes. We comprehensively evaluate various prompting strategies on six large language models (LLMs) of varying sizes and domains (general and medical). This evaluation encompasses zero-shot, few-shot, and retrieval-augmented generation (RAG) techniques to enhance the LLMs' ability to reason about and understand contextual information in patient reports. The results demonstrate effectiveness in rare disease identification, highlighting the potential for identifying underdiagnosed patients from clinical notes.
[31] arXiv:2405.10441 [pdf, ps, other]: Title: Trajectory tracking control of a Remotely Operated Underwater Vehicle based on Fuzzy Disturbance Adaptation and Controller Parameter Optimization

Authors: Hanzhi Yang

Subjects: Robotics (cs.RO); Systems and Control (eess.SY)

The exploration of under-ice environments presents unique challenges due to limited access for scientific research. This report investigates the potential of deploying a fully actuated Remotely Operated Vehicle (ROV) for shallow area exploration beneath ice sheets. Leveraging advancements in marine robotics technology, ROVs offer a promising solution for extending human presence into remote underwater locations. To enable successful under-ice exploration, the ROV must follow precise trajectories for effective localization signal reception. This study develops a multi-input-multi-output (MIMO) nonlinear system controller, incorporating a Lyapunov-based stability guarantee and an adaptation law to mitigate unknown environmental disturbances. Fuzzy logic is employed to dynamically adjust adaptation rates, enhancing performance in highly nonlinear ROV dynamic systems. Additionally, a Particle Swarm Optimization (PSO) algorithm automates the tuning of controller parameters for optimal trajectory tracking. The report details the ROV dynamic model, the proposed control framework, and the PSO-based tuning process. Simulation-based experiments validate the efficacy of the methodology, with experimental results demonstrating superior trajectory tracking performance compared to baseline controllers. This work contributes to the advancement of under-ice exploration capabilities and sets the stage for future research in marine robotics and autonomous underwater systems.
[32] arXiv:2405.10443 [pdf, other]: Title: Simultaneous Masking, Not Prompting Optimization: A Paradigm Shift in Fine-tuning LLMs for Simultaneous Translation

Authors: Matthew Raffel, Victor Agostinelli, Lizhong Chen

Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

Large language models (LLMs) have achieved state-of-the-art performance in various language processing tasks, motivating their adoption in simultaneous translation. Current fine-tuning methods to adapt LLMs for simultaneous translation focus on prompting optimization strategies using either data augmentation or prompt structure modifications. However, these methods suffer from several issues, such as an unnecessarily expanded training set, computational inefficiency from dumping the KV cache, increased prompt sizes, or restriction to a single decision policy. To eliminate these issues, we propose a new paradigm in fine-tuning LLMs for simultaneous translation, called SimulMask. It utilizes a novel attention mask technique that models simultaneous translation during fine-tuning by masking attention connections under a desired decision policy. Applying the proposed SimulMask on a Falcon LLM for the IWSLT 2017 dataset, we have observed a significant translation quality improvement compared to state-of-the-art prompting optimization strategies on three language pairs when averaged across four different latency regimes while reducing the computational cost.
[33] arXiv:2405.10444 [pdf, other]: Title: A Novel Bounding Box Regression Method for Single Object Tracking

Authors: Omar Abdelaziz, Mohamed Sami Shehata

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Locating an object in a sequence of frames, given its appearance in the first frame of the sequence, is a hard problem that involves many stages. Usually, state-of-the-art methods focus on bringing novel ideas in the visual encoding or relational modelling phases. However, in this work, we show that bounding box regression from learned joint search and template features is of high importance as well. While previous methods relied heavily on well-learned features representing interactions between search and template, we hypothesize that the receptive field of the input convolutional bounding box network plays an important role in accurately determining the object location. To this end, we introduce two novel bounding box regression networks: inception and deformable. Experiments and ablation studies show that our inception module installed on the recent ODTrack outperforms the latter on three benchmarks: the GOT-10k, the UAV123 and the OTB2015.
[34] arXiv:2405.10446 [pdf, other]: Title: Tell me more: Intent Fulfilment Framework for Enhancing User Experiences in Conversational XAI

Authors: Anjana Wijekoon, David Corsar, Nirmalie Wiratunga, Kyle Martin, Pedram Salimi

Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI)

The evolution of Explainable Artificial Intelligence (XAI) has emphasised the significance of meeting diverse user needs. The approaches to identifying and addressing these needs must also advance, recognising that explanation experiences are subjective, user-centred processes that interact with users towards a better understanding of AI decision-making. This paper delves into the interrelations in multi-faceted XAI and examines how different types of explanations collaboratively meet users' XAI needs. We introduce the Intent Fulfilment Framework (IFF) for creating explanation experiences. The novelty of this paper lies in recognising the importance of "follow-up" on explanations for obtaining clarity, verification and/or substitution. Moreover, the Explanation Experience Dialogue Model integrates the IFF and "Explanation Followups" to provide users with a conversational interface for exploring their explanation needs, thereby creating explanation experiences. Quantitative and qualitative findings from our comparative user study demonstrate the impact of the IFF in improving user engagement, the utility of the AI system and the overall user experience. Overall, we reinforce the principle that "one explanation does not fit all" to create explanation experiences that guide the complex interaction through conversation.
[35] arXiv:2405.10447 [pdf, other]: Title: Codes for Limited-Magnitude Probability Error in DNA Storage

Authors: Wenkai Zhang, Zhiying Wang

Comments: Part of work is published in ICC 2022-IEEE International Conference on Communications

Subjects: Information Theory (cs.IT)

DNA, with remarkable properties of high density, durability, and replicability, is one of the most appealing storage media. Emerging DNA storage technologies use composite DNA letters, where information is represented by probability vectors, leading to higher information density and lower synthesizing costs than regular DNA letters. However, it faces the problem of inevitable noise and information corruption. This paper explores the channel of composite DNA letters in DNA-based storage systems and introduces block codes for limited-magnitude probability errors on probability vectors. First, outer and inner bounds for limited-magnitude probability error correction codes are provided. Moreover, code constructions are proposed where the number of errors is bounded by t, the error magnitudes are bounded by l, and the probability resolution is fixed as k. These constructions focus on leveraging the properties of limited-magnitude probability errors in DNA-based storage systems, leading to improved performance in terms of complexity and redundancy. In addition, the asymptotic optimality for one of the proposed constructions is established. Finally, systematic codes based on one of the proposed constructions are presented, which enable efficient information extraction for practical implementation.
[36] arXiv:2405.10452 [pdf, other]: Title: Navigating Public Sentiment in the Circular Economy through Topic Modelling and Hyperparameter Optimisation

Authors: Junhao Song, Yingfang Yuan, Kaiwen Chang, Bing Xu, Jin Xuan, Wei Pang

Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

To advance the circular economy (CE), it is crucial to gain insights into the evolution of public sentiments, cognitive pathways of the masses concerning circular products and digital technology, and recognise the primary concerns. To achieve this, we collected data related to the CE from diverse platforms including Twitter, Reddit, and The Guardian. This comprehensive data collection spanned across three distinct strata of the public: the general public, professionals, and official sources. Subsequently, we utilised three topic models on the collected data. Topic modelling represents a type of data-driven and machine learning approach for text mining, capable of automatically categorising a large number of documents into distinct semantic groups. Simultaneously, these groups are described by topics, and these topics can aid in understanding the semantic content of documents at a high level. However, the performance of topic modelling may vary depending on different hyperparameter values. Therefore, in this study, we proposed a framework for topic modelling with hyperparameter optimisation for CE and conducted a series of systematic experiments to ensure that topic models are set with appropriate hyperparameters and to gain insights into the correlations between the CE and public opinion based on well-established models. The results of this study indicate that concerns about sustainability and economic impact persist across all three datasets. Official sources demonstrate a higher level of engagement with the application and regulation of CE. To the best of our knowledge, this study is pioneering in investigating various levels of public opinions concerning CE through topic modelling with the exploration of hyperparameter optimisation.
[37] arXiv:2405.10456 [pdf, other]: Title: Region-level labels in ice charts can produce pixel-level segmentation for Sea Ice types

Authors: Muhammed Patel, Xinwei Chen, Linlin Xu, Yuhao Chen, K Andrea Scott, David A. Clausi

Comments: Published at ICLR 2024 Machine Learning for Remote Sensing (ML4RS) Workshop

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Fully supervised deep learning approaches have demonstrated impressive accuracy in sea ice classification, but their dependence on high-resolution labels presents a significant challenge due to the difficulty of obtaining such data. In response, our weakly supervised learning method provides a compelling alternative by utilizing lower-resolution regional labels from expert-annotated ice charts. This approach achieves exceptional pixel-level classification performance by introducing regional loss representations during training to measure the disparity between predicted and ice chart-derived sea ice type distributions. Leveraging the AI4Arctic Sea Ice Challenge Dataset, our method outperforms the fully supervised U-Net benchmark, the top solution of the AutoIce challenge, in both mapping resolution and class-wise accuracy, marking a significant advancement in automated operational sea ice mapping.
[38] arXiv:2405.10457 [pdf, other]: Title: Participle-Prepended Nominals Have Lower Entropy Than Nominals Appended After the Participle

Authors: Kristie Denlinger, Stephen Wechsler, Kyle Mahowald

Comments: Accepted to CogSci 2024, 6 pages, 2 figures

Subjects: Computation and Language (cs.CL)

English allows for both compounds (e.g., London-made) and phrasal paraphrases (e.g., made in London). While these constructions have roughly the same truth-conditional meaning, we hypothesize that the compound allows less freedom to express the nature of the semantic relationship between the participle and the pre-participle nominal. We thus predict that the pre-participle slot is more constrained than the equivalent position in the phrasal construction. We test this prediction in a large corpus by measuring the entropy of corresponding nominal slots, conditional on the participle used. That is, we compare the entropy of $\alpha$ in compound construction slots like $\alpha$-[V]ed to the entropy of $\alpha$ in phrasal constructions like [V]ed by $\alpha$ for a given verb V. As predicted, there is significantly lower entropy in the compound construction than in the phrasal construction. We consider how these predictions follow from more general grammatical properties and processing factors.
[39] arXiv:2405.10460 [pdf, other]: Title: The AI Collaborator: Bridging Human-AI Interaction in Educational and Professional Settings

Authors: Mohammad Amin Samadi, Spencer JaQuay, Jing Gu, Nia Nixon

Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI)

AI Collaborator, powered by OpenAI's GPT-4, is a groundbreaking tool designed for human-AI collaboration research. Its standout feature is the ability for researchers to create customized AI personas for diverse experimental setups using a user-friendly interface. This functionality is essential for simulating various interpersonal dynamics in team settings. AI Collaborator excels in mimicking different team behaviors, enabled by its advanced memory system and a sophisticated personality framework. Researchers can tailor AI personas along a spectrum from dominant to cooperative, enhancing the study of their impact on team processes. The tool's modular design facilitates integration with digital platforms like Slack, making it versatile for various research scenarios. AI Collaborator is thus a crucial resource for exploring human-AI team dynamics more profoundly.
[40] arXiv:2405.10465 [pdf, ps, other]: Title: Error Analysis of Randomized Symplectic Model Order Reduction for Hamiltonian systems

Authors: Robin Herkert, Patrick Buchfink, Bernard Haasdonk, Johannes Rettberg, Jörg Fehr

Comments: 27 pages, 4 figures

Subjects: Numerical Analysis (math.NA)

Solving high-dimensional dynamical systems in multi-query or real-time applications requires efficient surrogate modelling techniques, as e.g., achieved via model order reduction (MOR). If these systems are Hamiltonian systems their physical structure should be preserved during the reduction, which can be ensured by applying symplectic basis generation techniques such as the complex SVD (cSVD). Recently, randomized symplectic methods such as the randomized complex singular value decomposition (rcSVD) have been developed for a more efficient computation of symplectic bases that preserve the Hamiltonian structure during MOR. In the current paper, we present two error bounds for the rcSVD basis depending on the choice of hyperparameters and show that with a proper choice of hyperparameters, the projection error of rcSVD is at most a constant factor worse than the projection error of cSVD. We provide numerical experiments that demonstrate the efficiency of randomized symplectic basis generation and compare the bounds numerically.
[41] arXiv:2405.10467 [pdf, other]: Title: Agent Design Pattern Catalogue: A Collection of Architectural Patterns for Foundation Model based Agents

Authors: Yue Liu, Sin Kit Lo, Qinghua Lu, Liming Zhu, Dehai Zhao, Xiwei Xu, Stefan Harrer, Jon Whittle

Subjects: Artificial Intelligence (cs.AI); Software Engineering (cs.SE)

Foundation model-enabled generative artificial intelligence facilitates the development and implementation of agents, which can leverage distinguished reasoning and language processing capabilities to takes a proactive, autonomous role to pursue users' goals. Nevertheless, there is a lack of systematic knowledge to guide practitioners in designing the agents considering challenges of goal-seeking (including generating instrumental goals and plans), such as hallucinations inherent in foundation models, explainability of reasoning process, complex accountability, etc. To address this issue, we have performed a systematic literature review to understand the state-of-the-art foundation model-based agents and the broader ecosystem. In this paper, we present a pattern catalogue consisting of 16 architectural patterns with analyses of the context, forces, and trade-offs as the outcomes from the previous literature review. The proposed catalogue can provide holistic guidance for the effective use of patterns, and support the architecture design of foundation model-based agents by facilitating goal-seeking and plan generation.
[42] arXiv:2405.10469 [pdf, other]: Title: Simulation-Based Benchmarking of Reinforcement Learning Agents for Personalized Retail Promotions

Authors: Yu Xia, Sriram Narayanamoorthy, Zhengyuan Zhou, Joshua Mabry

Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Econometrics (econ.EM); Machine Learning (stat.ML)

The development of open benchmarking platforms could greatly accelerate the adoption of AI agents in retail. This paper presents comprehensive simulations of customer shopping behaviors for the purpose of benchmarking reinforcement learning (RL) agents that optimize coupon targeting. The difficulty of this learning problem is largely driven by the sparsity of customer purchase events. We trained agents using offline batch data comprising summarized customer purchase histories to help mitigate this effect. Our experiments revealed that contextual bandit and deep RL methods that are less prone to over-fitting the sparse reward distributions significantly outperform static policies. This study offers a practical framework for simulating AI agents that optimize the entire retail customer journey. It aims to inspire the further development of simulation tools for retail AI systems.
[43] arXiv:2405.10473 [pdf, other]: Title: Infrastructure Engineering: A Still Missing, Undervalued Role in the Research Ecosystem

Authors: Vanessa Sochat

Comments: 13 pages, 1 figure, 1 table

Subjects: Software Engineering (cs.SE)

Research has become increasingly reliant on software, serving as the driving force behind bioinformatics, high performance computing, physics, machine learning and artificial intelligence, to name a few. While substantial progress has been made in advocating for the research software engineer, a kind of software engineer that typically works directly on software and associated assets that go into research, little attention has been placed on the workforce behind research infrastructure and innovation, namely compilers and compatibility tool development, orchestration and scheduling infrastructure, developer environments, container technologies, and workflow managers. As economic incentives are moving toward different models of cloud computing and innovating is required to develop new paradigms that represent the best of both worlds, an effort called "converged computing," the need for such a role is not just ideal, but essential for the continued success of science. While scattered staff in non-traditional roles have found time to work on some facets of this space, the lack of a larger workforce and incentive to support it has led to the scientific community falling behind. In this article we will highlight the importance of this missing layer, providing examples of how a missing role of infrastructure engineer has led to inefficiencies in the interoperability, portability, and reproducibility of science. We suggest that an inability to allocate, provide resources for, and sustain individuals to work explicitly on these technologies could lead to possible futures that are sub-optimal for the continued success of our scientific communities.
[44] arXiv:2405.10474 [pdf, ps, other]: Title: Rethinking ChatGPT's Success: Usability and Cognitive Behaviors Enabled by Auto-regressive LLMs' Prompting

Authors: Xinzhe Li, Ming Liu

Subjects: Computation and Language (cs.CL)

Over the last decade, a wide range of training and deployment strategies for Large Language Models (LLMs) have emerged. Among these, the prompting paradigms of Auto-regressive LLMs (AR-LLMs) have catalyzed a significant surge in Artificial Intelligence (AI). This paper aims to emphasize the significance of utilizing free-form modalities (forms of input and output) and verbal free-form contexts as user-directed channels (methods for transforming modalities) for downstream deployment. Specifically, we analyze the structure of modalities within both two types of LLMs and six task-specific channels during deployment. From the perspective of users, our analysis introduces and applies the analytical metrics of task customizability, transparency, and complexity to gauge their usability, highlighting the superior nature of AR-LLMs' prompting paradigms. Moreover, we examine the stimulation of diverse cognitive behaviors in LLMs through the adoption of free-form text and verbal contexts, mirroring human linguistic expressions of such behaviors. We then detail four common cognitive behaviors to underscore how AR-LLMs' prompting successfully imitate human-like behaviors using this free-form modality and channel. Lastly, the potential for improving LLM deployment, both as autonomous agents and within multi-agent systems, is identified via cognitive behavior concepts and principles.
[45] arXiv:2405.10476 [pdf, other]: Title: Analysis, Modeling and Design of Personalized Digital Learning Environment

Authors: Sanjaya Khanal, Shiva Raj Pokhrel

Comments: IEEE Trans on Education, 2024

Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Software Engineering (cs.SE)

This research analyzes, models and develops a novel Digital Learning Environment (DLE) fortified by the innovative Private Learning Intelligence (PLI) framework. The proposed PLI framework leverages federated machine learning (FL) techniques to autonomously construct and continuously refine personalized learning models for individual learners, ensuring robust privacy protection. Our approach is pivotal in advancing DLE capabilities, empowering learners to actively participate in personalized real-time learning experiences. The integration of PLI within a DLE also streamlines instructional design and development demands for personalized teaching/learning. We seek ways to establish a foundation for the seamless integration of FL into learning systems, offering a transformative approach to personalized learning in digital environments. Our implementation details and code are made public.
[46] arXiv:2405.10478 [pdf, other]: Title: GridapTopOpt.jl: A scalable Julia toolbox for level set-based topology optimisation

Authors: Zachary J. Wegert, Jordi Manyer, Connor Mallon, Santiago Badia, Vivien J. Challis

Subjects: Mathematical Software (cs.MS)

In this paper we present GridapTopOpt, an extendable framework for level set-based topology optimisation that can be readily distributed across a personal computer or high-performance computing cluster. The package is written in Julia and uses the Gridap package ecosystem for parallel finite element assembly from arbitrary weak formulations of partial differential equation (PDEs) along with the scalable solvers from the Portable and Extendable Toolkit for Scientific Computing (PETSc). The resulting user interface is intuitive and easy-to-use, allowing for the implementation of a wide range of topology optimisation problems with a syntax that is near one-to-one with the mathematical notation. Furthermore, we implement automatic differentiation to help mitigate the bottleneck associated with the analytic derivation of sensitivities for complex problems. GridapTopOpt is capable of solving a range of benchmark and research topology optimisation problems with large numbers of degrees of freedom. This educational article demonstrates the usability and versatility of the package by describing the formulation and step-by-step implementation of several distinct topology optimisation problems. The driver scripts for these problems are provided and the package source code is available at https://github$.$com/zjwegert/GridapTopOpt.jl.
[47] arXiv:2405.10479 [pdf, ps, other]: Title: Convexification for a Coefficient Inverse Problem for a System of Two Coupled Nonlinear Parabolic Equations

Authors: Michael V. Klibanov, Jingzhi Li, Zhipeng Yang

Comments: arXiv admin note: text overlap with arXiv:2310.08878

Subjects: Numerical Analysis (math.NA)

A system of two coupled nonlinear parabolic partial differential equations with two opposite directions of time is considered. In fact, this is the so-called "Mean Field Games System" (MFGS), which is derived in the mean field games (MFG) theory. This theory has numerous applications in social sciences. The topic of Coefficient Inverse Problems (CIPs) in the MFG theory is in its infant age, both in theory and computations. A numerical method for this CIP is developed. Convergence analysis ensures the global convergence of this method. Numerical experiments are presented.
[48] arXiv:2405.10480 [pdf, other]: Title: Lean Attention: Hardware-Aware Scalable Attention Mechanism for the Decode-Phase of Transformers

Authors: Rya Sanovar, Srikant Bharadwaj, Renee St. Amant, Victor Rühle, Saravan Rajmohan

Comments: 13 pages, 10 figures

Subjects: Hardware Architecture (cs.AR); Machine Learning (cs.LG)

Transformer-based models have emerged as one of the most widely used architectures for natural language processing, natural language generation, and image generation. The size of the state-of-the-art models has increased steadily reaching billions of parameters. These huge models are memory hungry and incur significant inference latency even on cutting edge AI-accelerators, such as GPUs. Specifically, the time and memory complexity of the attention operation is quadratic in terms of the total context length, i.e., prompt and output tokens. Thus, several optimizations such as key-value tensor caching and FlashAttention computation have been proposed to deliver the low latency demands of applications relying on such large models. However, these techniques do not cater to the computationally distinct nature of different phases during inference.
To that end, we propose LeanAttention, a scalable technique of computing self-attention for the token-generation phase (decode-phase) of decoder-only transformer models. LeanAttention enables scaling the attention mechanism implementation for the challenging case of long context lengths by re-designing the execution flow for the decode-phase. We identify that the associative property of online softmax can be treated as a reduction operation thus allowing us to parallelize the attention computation over these large context lengths. We extend the "stream-K" style reduction of tiled calculation to self-attention to enable parallel computation resulting in an average of 2.6x attention execution speedup over FlashAttention-2 and up to 8.33x speedup for 512k context lengths.
[49] arXiv:2405.10481 [pdf, other]: Title: Multi-Evidence based Fact Verification via A Confidential Graph Neural Network

Authors: Yuqing Lan, Zhenghao Liu, Yu Gu, Xiaoyuan Yi, Xiaohua Li, Liner Yang, Ge Yu

Comments: 12pages

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Fact verification tasks aim to identify the integrity of textual contents according to the truthful corpus. Existing fact verification models usually build a fully connected reasoning graph, which regards claim-evidence pairs as nodes and connects them with edges. They employ the graph to propagate the semantics of the nodes. Nevertheless, the noisy nodes usually propagate their semantics via the edges of the reasoning graph, which misleads the semantic representations of other nodes and amplifies the noise signals. To mitigate the propagation of noisy semantic information, we introduce a Confidential Graph Attention Network (CO-GAT), which proposes a node masking mechanism for modeling the nodes. Specifically, CO-GAT calculates the node confidence score by estimating the relevance between the claim and evidence pieces. Then, the node masking mechanism uses the node confidence scores to control the noise information flow from the vanilla node to the other graph nodes. CO-GAT achieves a 73.59% FEVER score on the FEVER dataset and shows the generalization ability by broadening the effectiveness to the science-specific domain.
[50] arXiv:2405.10485 [pdf, ps, other]: Title: CNER: A tool Classifier of Named-Entity Relationships

Authors: Jefferson A. Peña Torres, Raúl E. Gutiérrez De Piñerez

Subjects: Computation and Language (cs.CL); Human-Computer Interaction (cs.HC)

We introduce CNER, an ensemble of capable tools for extraction of semantic relationships between named entities in Spanish language. Built upon a container-based architecture, CNER integrates different Named entity recognition and relation extraction tools with a user-friendly interface that allows users to input free text or files effortlessly, facilitating streamlined analysis. Developed as a prototype version for the Natural Language Processing (NLP) Group at Universidad del Valle, CNER serves as a practical educational resource, illustrating how machine learning techniques can effectively tackle diverse NLP tasks in Spanish. Our preliminary results reveal the promising potential of CNER in advancing the understanding and development of NLP tools, particularly within Spanish-language contexts.
[51] arXiv:2405.10489 [pdf, other]: Title: MixCut:A Data Augmentation Method for Facial Expression Recognition

Authors: Jiaxiang Yu, Yiyang Liu, Ruiyang Fan, Guobing Sun

Subjects: Computer Vision and Pattern Recognition (cs.CV)

In the facial expression recognition task, researchers always get low accuracy of expression classification due to a small amount of training samples. In order to solve this kind of problem, we proposes a new data augmentation method named MixCut. In this method, we firstly interpolate the two original training samples at the pixel level in a random ratio to generate new samples. Then, pixel removal is performed in random square regions on the new samples to generate the final training samples. We evaluated the MixCut method on Fer2013Plus and RAF-DB. With MixCut, we achieved 85.63% accuracy in eight-label classification on Fer2013Plus and 87.88% accuracy in seven-label classification on RAF-DB, effectively improving the classification accuracy of facial expression image recognition. Meanwhile, on Fer2013Plus, MixCut achieved performance improvements of +0.59%, +0.36%, and +0.39% compared to the other three data augmentation methods: CutOut, Mixup, and CutMix, respectively. MixCut improves classification accuracy on RAF-DB by +0.22%, +0.65%, and +0.5% over these three data augmentation methods.
[52] arXiv:2405.10492 [pdf, ps, other]: Title: Automatic News Generation and Fact-Checking System Based on Language Processing

Authors: Xirui Peng, Qiming Xu, Zheng Feng, Haopeng Zhao, Lianghao Tan, Yan Zhou, Zecheng Zhang, Chenwei Gong, Yingqiao Zheng

Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

This paper explores an automatic news generation and fact-checking system based on language processing, aimed at enhancing the efficiency and quality of news production while ensuring the authenticity and reliability of the news content. With the rapid development of Natural Language Processing (NLP) and deep learning technologies, automatic news generation systems are capable of extracting key information from massive data and generating well-structured, fluent news articles. Meanwhile, by integrating fact-checking technology, the system can effectively prevent the spread of false news and improve the accuracy and credibility of news. This study details the key technologies involved in automatic news generation and factchecking, including text generation, information extraction, and the application of knowledge graphs, and validates the effectiveness of these technologies through experiments. Additionally, the paper discusses the future development directions of automatic news generation and fact-checking systems, emphasizing the importance of further integration and innovation of technologies. The results show that with continuous technological optimization and practical application, these systems will play an increasingly important role in the future news industry, providing more efficient and reliable news services.
[53] arXiv:2405.10496 [pdf, other]: Title: Electromagnetic Information Theory for Holographic MIMO Communications

Authors: Li Wei, Tierui Gong, Chongwen Huang, Zhaoyang Zhang, Wei E. I. Sha, Zhi Ning Chen, Linglong Dai, Merouane Debbah, Chau Yuen

Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

Holographic multiple-input multiple-output (HMIMO) utilizes a compact antenna array to form a nearly continuous aperture, thereby enhancing higher capacity and more flexible configurations compared with conventional MIMO systems, making it attractive in current scientific research. Key questions naturally arise regarding the potential of HMIMO to surpass Shannon's theoretical limits and how far its capabilities can be extended. However, the traditional Shannon information theory falls short in addressing these inquiries because it only focuses on the information itself while neglecting the underlying carrier, electromagnetic (EM) waves, and environmental interactions. To fill up the gap between the theoretical analysis and the practical application for HMIMO systems, we introduce electromagnetic information theory (EIT) in this paper. This paper begins by laying the foundation for HMIMO-oriented EIT, encompassing EM wave equations and communication regions. In the context of HMIMO systems, the resultant physical limitations are presented, involving Chu's limit, Harrington's limit, Hannan's limit, and the evaluation of coupling effects. Field sampling and HMIMO-assisted oversampling are also discussed to guide the optimal HMIMO design within the EIT framework. To comprehensively depict the EM-compliant propagation process, we present the approximate and exact channel modeling approaches in near-/far-field zones. Furthermore, we discuss both traditional Shannon's information theory, employing the probabilistic method, and Kolmogorov information theory, utilizing the functional analysis, for HMIMO-oriented EIT systems.
[54] arXiv:2405.10497 [pdf, other]: Title: SMP Challenge: An Overview and Analysis of Social Media Prediction Challenge

Authors: Bo Wu, Peiye Liu, Wen-Huang Cheng, Bei Liu, Zhaoyang Zeng, Jia Wang, Qiushi Huang, Jiebo Luo

Comments: ACM Multimedia. arXiv admin note: text overlap with arXiv:1910.01795

Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Social and Information Networks (cs.SI)

Social Media Popularity Prediction (SMPP) is a crucial task that involves automatically predicting future popularity values of online posts, leveraging vast amounts of multimodal data available on social media platforms. Studying and investigating social media popularity becomes central to various online applications and requires novel methods of comprehensive analysis, multimodal comprehension, and accurate prediction.
SMP Challenge is an annual research activity that has spurred academic exploration in this area. This paper summarizes the challenging task, data, and research progress. As a critical resource for evaluating and benchmarking predictive models, we have released a large-scale SMPD benchmark encompassing approximately half a million posts authored by around 70K users. The research progress analysis provides an overall analysis of the solutions and trends in recent years. The SMP Challenge website (www.smp-challenge.com) provides the latest information and news.
[55] arXiv:2405.10499 [pdf, ps, other]: Title: Predictive Monitoring with Strong Trace Prefixes

Authors: Zhendong Ang, Umang Mathur

Subjects: Programming Languages (cs.PL)

Runtime predictive analyses enhance coverage of traditional dynamic analyses based bug detection techniques by identifying a space of feasible reorderings of the observed execution and determining if any of these witnesses the violation of some desired safety property. The most popular approach for modelling the space of feasible reorderings is through Mazurkiewicz's trace equivalence. The simplicity of the framework also gives rise to efficient predictive analyses, and has been the de facto means for obtaining space and time efficient algorithms for monitoring concurrent programs. In this work, we investigate how to enhance the predictive power of trace-based reasoning, while still retaining the algorithmic benefits it offers. Towards this, we extend trace theory by naturally embedding a class of prefixes, which we call strong trace prefixes. We formally characterize strong trace prefixes using an enhanced dependence relation, study its predictive power and establish a tight connection to the previously proposed notion of synchronization preserving correct reorderings developed in the context of data race and deadlock prediction. We then show that despite the enhanced predictive power, strong trace prefixes continue to enjoy the algorithmic benefits of Mazurkiewicz traces in the context of prediction against co-safety properties, and derive new algorithms for synchronization preserving data races and deadlocks with better asymptotic space and time usage. We also show that strong trace prefixes can capture more violations of pattern languages. We implement our proposed algorithms and our evaluation confirms the practical utility of reasoning based on strong prefix traces.
[56] arXiv:2405.10502 [pdf, other]: Title: Enhancing DMI Interactions by Integrating Haptic Feedback for Intricate Vibrato Technique

Authors: Ziyue Piao, Christian Frisson, Bavo Van Kerrebroeck, Marcelo M.Wanderley

Subjects: Human-Computer Interaction (cs.HC); Sound (cs.SD)

This paper investigates the integration of force feedback in Digital Musical Instruments (DMI), specifically evaluating the reproduction of intricate vibrato techniques using haptic feedback controllers. We introduce our system for vibrato modulation using force feedback, composed of Bend-aid (a web-based sequencer platform using pre-designed haptic feedback models) and TorqueTuner (an open-source 1 Degree-of-Freedom (DoF) rotary haptic device for generating programmable haptic effects). We designed a formal user study to assess the impact of each haptic mode on user experience in a vibrato mimicry task. Twenty musically trained participants rated their user experience for the three haptic modes (Smooth, Detent, and Spring) using four Likert-scale scores: comfort, flexibility, ease of control, and helpfulness for the task. Finally, we asked participants to share their reflections. Our research indicates that while the Spring mode can help with light vibrato, preferences for haptic modes vary based on musical training background. This emphasizes the need for adaptable task interfaces and flexible haptic feedback in DMI design.
[57] arXiv:2405.10504 [pdf, ps, other]: Title: Multi-scale Semantic Prior Features Guided Deep Neural Network for Urban Street-view Image

Authors: Jianshun Zeng, Wang Li, Yanjie Lv, Shuai Gao, YuChu Qin

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Street-view image has been widely applied as a crucial mobile mapping data source. The inpainting of street-view images is a critical step for street-view image processing, not only for the privacy protection, but also for the urban environment mapping applications. This paper presents a novel Deep Neural Network (DNN), multi-scale semantic prior Feature guided image inpainting Network (MFN) for inpainting street-view images, which generate static street-view images without moving objects (e.g., pedestrians, vehicles). To enhance global context understanding, a semantic prior prompter is introduced to learn rich semantic priors from large pre-trained model. We design the prompter by stacking multiple Semantic Pyramid Aggregation (SPA) modules, capturing a broad range of visual feature patterns. A semantic-enhanced image generator with a decoder is proposed that incorporates a novel cascaded Learnable Prior Transferring (LPT) module at each scale level. For each decoder block, an attention transfer mechanism is applied to capture long-term dependencies, and the semantic prior features are fused with the image features to restore plausible structure in an adaptive manner. Additionally, a background-aware data processing scheme is adopted to prevent the generation of hallucinated objects within holes. Experiments on Apolloscapes and Cityscapes datasets demonstrate better performance than state-of-the-art methods, with MAE, and LPIPS showing improvements of about 9.5% and 41.07% respectively. Visual comparison survey among multi-group person is also conducted to provide performance evaluation, and the results suggest that the proposed MFN offers a promising solution for privacy protection and generate more reliable scene for urban applications with street-view images.
[58] arXiv:2405.10505 [pdf, other]: Title: Local Time-Stepping for the Shallow Water Equations using CFL Optimized Forward-Backward Runge-Kutta Schemes

Authors: Jeremy R. Lilly, Giacomo Capodaglio, Darren Engwrida, Robert L. Higdon, Mark R. Petersen

Subjects: Numerical Analysis (math.NA); Computational Physics (physics.comp-ph)

The Courant-Friedrichs-Lewy (CFL) condition is a well known, necessary condition for the stability of explicit time-stepping schemes that effectively places a limit on the size of the largest admittable time-step for a given problem. We formulate and present a new local time-stepping (LTS) scheme optimized, in the CFL sense, for the shallow water equations (SWEs). This new scheme, called FB-LTS, is based on the CFL optimized forward-backward Runge-Kutta schemes from Lilly et al. (2023). We show that FB-LTS maintains exact conservation of mass and absolute vorticity when applied to the TRiSK spatial discretization (Ringler et al., 2010), and provide numerical experiments showing that it retains the temporal order of the scheme on which it is based (second order). In terms of computational performance, we show that when applied to a real-world test case on a highly-variable resolution mesh, the MPAS-Ocean implementation of FB-LTS is up to 10 times faster than the classical four-stage, fourth-order Runge-Kutta method (RK4), and 2.3 times faster than an existing strong stability preserving Runge-Kutta based LTS scheme (LTS3). Despite this significant increase in efficiency, the solutions produced by FB-LTS are qualitatively equivalent to those produced by both RK4 and LTS3.
[59] arXiv:2405.10506 [pdf, other]: Title: Lock-Free Augmented Trees

Authors: Panagiota Fatourou, Eric Ruppert

Comments: 35 pages, 12 figures

Subjects: Data Structures and Algorithms (cs.DS); Distributed, Parallel, and Cluster Computing (cs.DC)

Augmenting an existing sequential data structure with extra information to support greater functionality is a widely used technique. For example, search trees are augmented to build sequential data structures like order-statistic trees, interval trees, tango trees, link/cut trees and many others.
We study how to design concurrent augmented tree data structures. We present a new, general technique that can augment a lock-free tree to add any new fields to each tree node, provided the new fields' values can be computed from information in the node and its children. This enables the design of lock-free, linearizable analogues of a wide variety of classical augmented data structures. As a first example, we give a wait-free trie that stores a set $S$ of elements drawn from $\{1,\ldots,N\}$ and supports linearizable order-statistic queries such as finding the $k$th smallest element of $S$. Updates and queries take $O(\log N)$ steps. We also apply our technique to a lock-free binary search tree (BST), where changes to the structure of the tree make the linearization argument more challenging. Our augmented BST supports order statistic queries in $O(h)$ steps on a tree of height $h$. The augmentation does not affect the asymptotic running time of the updates.
For both our trie and BST, we give an alternative augmentation to improve searches and order-statistic queries to run in $O(\log |S|)$ steps (with a small increase in step complexity of updates). As an added bonus, our technique supports arbitrary multi-point queries (such as range queries) with the same time complexity as they would have in the corresponding sequential data structure.
[60] arXiv:2405.10508 [pdf, other]: Title: ART3D: 3D Gaussian Splatting for Text-Guided Artistic Scenes Generation

Authors: Pengzhi Li, Chengshuai Tang, Qinxuan Huang, Zhiheng Li

Comments: Accepted at CVPR 2024 Workshop on AI3DG

Subjects: Computer Vision and Pattern Recognition (cs.CV)

In this paper, we explore the existing challenges in 3D artistic scene generation by introducing ART3D, a novel framework that combines diffusion models and 3D Gaussian splatting techniques. Our method effectively bridges the gap between artistic and realistic images through an innovative image semantic transfer algorithm. By leveraging depth information and an initial artistic image, we generate a point cloud map, addressing domain differences. Additionally, we propose a depth consistency module to enhance 3D scene consistency. Finally, the 3D scene serves as initial points for optimizing Gaussian splats. Experimental results demonstrate ART3D's superior performance in both content and structural consistency metrics when compared to existing methods. ART3D significantly advances the field of AI in art creation by providing an innovative solution for generating high-quality 3D artistic scenes.
[61] arXiv:2405.10511 [pdf, ps, other]: Title: Defect Category Prediction Based on Multi-Source Domain Adaptation

Authors: Ying Xing, Mengci Zhao, Bin Yang, Yuwei Zhang, Wenjin Li, Jiawei Gu, Jun Yuan

Comments: 17 pages, in Chinese language, 8 figures (Due to length constraints of the abstract field, please refer to the original PDF file for the full content of abstract.)

Journal-ref: Journal of Software [2024]

Subjects: Software Engineering (cs.SE)

In recent years, defect prediction techniques based on deep learning have become a prominent research topic in the field of software engineering. These techniques can identify potential defects without executing the code. However, existing approaches mostly concentrate on determining the presence of defects at the method-level code, lacking the ability to precisely classify specific defect categories. Consequently, this undermines the efficiency of developers in locating and rectifying defects. Furthermore, in practical software development, new projects often lack sufficient defect data to train high-accuracy deep learning models. Models trained on historical data from existing projects frequently struggle to achieve satisfactory generalization performance on new projects. Hence, this paper initially reformulates the traditional binary defect prediction task into a multi-label classification problem, employing defect categories described in the Common Weakness Enumeration (CWE) as fine-grained predictive labels. To enhance the model performance in cross-project scenarios, this paper proposes a multi-source domain adaptation framework that integrates adversarial training and attention mechanisms. Specifically, the proposed framework employs adversarial training to mitigate domain (i.e., software projects) discrepancies, and further utilizes domain-invariant features to capture feature correlations between each source domain and the target domain. Simultaneously, the proposed framework employs a weighted maximum mean discrepancy as an attention mechanism to minimize the representation distance between source and target domain features, facilitating model in learning more domain-independent features. The experiments on 8 real-world open-source projects show that the proposed approach achieves significant performance improvements compared to state-of-the-art baselines.
[62] arXiv:2405.10512 [pdf, other]: Title: In-context Contrastive Learning for Event Causality Identification

Authors: Chao Liang, Wei Xiang, Bang Wang

Subjects: Information Retrieval (cs.IR); Machine Learning (cs.LG)

Event Causality Identification (ECI) aims at determining the existence of a causal relation between two events. Although recent prompt learning-based approaches have shown promising improvements on the ECI task, their performance are often subject to the delicate design of multiple prompts and the positive correlations between the main task and derivate tasks. The in-context learning paradigm provides explicit guidance for label prediction in the prompt learning paradigm, alleviating its reliance on complex prompts and derivative tasks. However, it does not distinguish between positive and negative demonstrations for analogy learning. Motivated from such considerations, this paper proposes an In-Context Contrastive Learning (ICCL) model that utilizes contrastive learning to enhance the effectiveness of both positive and negative demonstrations. Additionally, we apply contrastive learning to event pairs to better facilitate event causality identification. Our ICCL is evaluated on the widely used corpora, including the EventStoryLine and Causal-TimeBank, and results show significant performance improvements over the state-of-the-art algorithms.
[63] arXiv:2405.10513 [pdf, other]: Title: Federated Learning With Energy Harvesting Devices: An MDP Framework

Authors: Kai Zhang, Xuanyu Cao

Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP)

Federated learning (FL) requires edge devices to perform local training and exchange information with a parameter server, leading to substantial energy consumption. A critical challenge in practical FL systems is the rapid energy depletion of battery-limited edge devices, which curtails their operational lifespan and affects the learning performance. To address this issue, we apply energy harvesting technique in FL systems to extract ambient energy for continuously powering edge devices. We first establish the convergence bound for the wireless FL system with energy harvesting devices, illustrating that the convergence is impacted by partial device participation and packet drops, both of which depend on the energy supply. To accelerate the convergence, we formulate a joint device scheduling and power control problem and model it as a Markov decision process (MDP). By solving this MDP, we derive the optimal transmission policy and demonstrate that it possesses a monotone structure with respect to the battery and channel states. To overcome the curse of dimensionality caused by the exponential complexity of computing the optimal policy, we propose a low-complexity algorithm, which is asymptotically optimal as the number of devices increases. Furthermore, for unknown channels and harvested energy statistics, we develop a structure-enhanced deep reinforcement learning algorithm that leverages the monotone structure of the optimal policy to improve the training performance. Finally, extensive numerical experiments on real-world datasets are presented to validate the theoretical results and corroborate the effectiveness of the proposed algorithms.
[64] arXiv:2405.10514 [pdf, other]: Title: Secrecy Performance Analysis of Multi-Functional RIS-Assisted NOMA Networks

Authors: Yingjie Pei, Wanli Ni, Jin Xu, Xinwei Yue, Xiaofeng Tao, Dusit Niyato

Comments: 14 pages, 9 figures, submitted to IEEE transactions on wireless communication

Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

Although reconfigurable intelligent surface (RIS) can improve the secrecy communication performance of wireless users, it still faces challenges such as limited coverage and double-fading effect. To address these issues, in this paper, we utilize a novel multi-functional RIS (MF-RIS) to enhance the secrecy performance of wireless users, and investigate the physical layer secrecy problem in non-orthogonal multiple access (NOMA) networks. Specifically, we derive closed-form expressions for the secrecy outage probability (SOP) and secrecy throughput of users in the MF-RIS-assisted NOMA networks with external and internal eavesdroppers. The asymptotic expressions for SOP and secrecy diversity order are also analyzed under high signal-to-noise ratio (SNR) conditions. Additionally, we examine the impact of receiver hardware limitations and error transmission-induced imperfect successive interference cancellation (SIC) on the secrecy performance. Numerical results indicate that: i) under the same power budget, the secrecy performance achieved by MF-RIS significantly outperforms active RIS and simultaneously transmitting and reflecting RIS; ii) with increasing power budget, residual interference caused by imperfect SIC surpasses thermal noise as the primary factor affecting secrecy capacity; and iii) deploying additional elements at the MF-RIS brings significant secrecy enhancements for the external eavesdropping scenario, in contrast to the internal eavesdropping case.
[65] arXiv:2405.10515 [pdf, ps, other]: Title: Improved AdaBoost for Virtual Reality Experience Prediction Based on Long Short-Term Memory Network

Authors: Wenhan Fan, Zhicheng Ding, Ruixin Huang, Chang Zhou, Xuyang Zhang

Subjects: Machine Learning (cs.LG)

A classification prediction algorithm based on Long Short-Term Memory Network (LSTM) improved AdaBoost is used to predict virtual reality (VR) user experience. The dataset is randomly divided into training and test sets in the ratio of 7:3.During the training process, the model's loss value decreases from 0.65 to 0.31, which shows that the model gradually reduces the discrepancy between the prediction results and the actual labels, and improves the accuracy and generalisation ability.The final loss value of 0.31 indicates that the model fits the training data well, and is able to make predictions and classifications more accurately. The confusion matrix for the training set shows a total of 177 correct predictions and 52 incorrect predictions, with an accuracy of 77%, precision of 88%, recall of 77% and f1 score of 82%. The confusion matrix for the test set shows a total of 167 correct and 53 incorrect predictions with 75% accuracy, 87% precision, 57% recall and 69% f1 score. In summary, the classification prediction algorithm based on LSTM with improved AdaBoost shows good prediction ability for virtual reality user experience. This study is of great significance to enhance the application of virtual reality technology in user experience. By combining LSTM and AdaBoost algorithms, significant progress has been made in user experience prediction, which not only improves the accuracy and generalisation ability of the model, but also provides useful insights for related research in the field of virtual reality. This approach can help developers better understand user requirements, optimise virtual reality product design, and enhance user satisfaction, promoting the wide application of virtual reality technology in various fields.
[66] arXiv:2405.10516 [pdf, other]: Title: Language Models can Evaluate Themselves via Probability Discrepancy

Authors: Tingyu Xia, Bowen Yu, Yuan Wu, Yi Chang, Chang Zhou

Comments: ACL 2024 Findings

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

In this paper, we initiate our discussion by demonstrating how Large Language Models (LLMs), when tasked with responding to queries, display a more even probability distribution in their answers if they are more adept, as opposed to their less skilled counterparts. Expanding on this foundational insight, we propose a new self-evaluation method ProbDiff for assessing the efficacy of various LLMs. This approach obviates the necessity for an additional evaluation model or the dependence on external, proprietary models like GPT-4 for judgment. It uniquely utilizes the LLMs being tested to compute the probability discrepancy between the initial response and its revised versions. A higher discrepancy for a given query between two LLMs indicates a relatively weaker capability. Our findings reveal that ProbDiff achieves results on par with those obtained from evaluations based on GPT-4, spanning a range of scenarios that include natural language generation (NLG) tasks such as translation, summarization, and our proposed Xiaohongshu blog writing task, and benchmarks for LLM evaluation like AlignBench, MT-Bench, and AlpacaEval, across LLMs of varying magnitudes.
[67] arXiv:2405.10517 [pdf, other]: Title: Towards Better Question Generation in QA-Based Event Extraction

Authors: Zijin Hong, Jian Liu

Comments: Accepted to ACL2024

Subjects: Computation and Language (cs.CL)

Event Extraction (EE) is an essential information extraction task that aims to extract event-related information from unstructured texts. The paradigm of this task has shifted from conventional classification-based methods to more contemporary question-answering (QA)-based approaches. However, in QA-based EE, the questions' quality dramatically affects the extraction accuracy, and how to generate high-quality questions for QA-based EE still remains a challenge. In this work, to tackle this challenge, we suggest four criteria to evaluate the quality of a question and propose a reinforcement learning method for QA-Based EE that can generate fluent, generalizable, and context-dependent questions and provides clear guidance to QA models. The extensive experiments conducted on ACE and RAMS datasets have strongly validated our approach's effectiveness, which also demonstrates its robustness in scenarios with limited training data.
[68] arXiv:2405.10518 [pdf, ps, other]: Title: Enhancing Perception Quality in Remote Sensing Image Compression via Invertible Neural Network

Authors: Junhui Li, Xingsong Hou

Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

Decoding remote sensing images to achieve high perceptual quality, particularly at low bitrates, remains a significant challenge. To address this problem, we propose the invertible neural network-based remote sensing image compression (INN-RSIC) method. Specifically, we capture compression distortion from an existing image compression algorithm and encode it as a set of Gaussian-distributed latent variables via INN. This ensures that the compression distortion in the decoded image becomes independent of the ground truth. Therefore, by leveraging the inverse mapping of INN, we can input the decoded image along with a set of randomly resampled Gaussian distributed variables into the inverse network, effectively generating enhanced images with better perception quality. To effectively learn compression distortion, channel expansion, Haar transformation, and invertible blocks are employed to construct the INN. Additionally, we introduce a quantization module (QM) to mitigate the impact of format conversion, thus enhancing the framework's generalization and improving the perceptual quality of enhanced images. Extensive experiments demonstrate that our INN-RSIC significantly outperforms the existing state-of-the-art traditional and deep learning-based image compression methods in terms of perception quality.
[69] arXiv:2405.10521 [pdf, other]: Title: Generative AI for Secure and Privacy-Preserving Mobile Crowdsensing

Authors: Yaoqi Yang, Bangning Zhang, Daoxing Guo, Hongyang Du, Zehui Xiong, Dusit Niyato, Zhu Han

Subjects: Cryptography and Security (cs.CR)

Recently, generative AI has attracted much attention from both academic and industrial fields, which has shown its potential, especially in the data generation and synthesis aspects. Simultaneously, secure and privacy-preserving mobile crowdsensing (SPPMCS) has been widely applied in data collection/ acquirement due to an advantage on low deployment cost, flexible implementation, and high adaptability. Since generative AI can generate new synthetic data to replace the original data to be analyzed and processed, it can lower data attacks and privacy leakage risks for the original data. Therefore, integrating generative AI into SPPMCS is feasible and significant. Moreover, this paper investigates an integration of generative AI in SPPMCS, where we present potential research focuses, solutions, and case studies. Specifically, we firstly review the preliminaries for generative AI and SPPMCS, where their integration potential is presented. Then, we discuss research issues and solutions for generative AI-enabled SPPMCS, including security defense of malicious data injection, illegal authorization, malicious spectrum manipulation at the physical layer, and privacy protection on sensing data content, sensing terminals' identification and location. Next, we propose a framework for sensing data content protection with generative AI, and simulations results have clearly demonstrated the effectiveness of the proposed framework. Finally, we present major research directions for generative AI-enabled SPPMCS.
[70] arXiv:2405.10523 [pdf, other]: Title: Smart Expert System: Large Language Models as Text Classifiers

Authors: Zhiqiang Wang, Yiran Pang, Yanbin Lin

Comments: 11 pages, 3 figures, and 8 tables

Subjects: Computation and Language (cs.CL)

Text classification is a fundamental task in Natural Language Processing (NLP), and the advent of Large Language Models (LLMs) has revolutionized the field. This paper introduces the Smart Expert System, a novel approach that leverages LLMs as text classifiers. The system simplifies the traditional text classification workflow, eliminating the need for extensive preprocessing and domain expertise. The performance of several LLMs, machine learning (ML) algorithms, and neural network (NN) based structures is evaluated on four datasets. Results demonstrate that certain LLMs surpass traditional methods in sentiment analysis, spam SMS detection and multi-label classification. Furthermore, it is shown that the system's performance can be further enhanced through few-shot or fine-tuning strategies, making the fine-tuned model the top performer across all datasets. Source code and datasets are available in this GitHub repository: https://github.com/yeyimilk/llm-zero-shot-classifiers.
[71] arXiv:2405.10526 [pdf, ps, other]: Title: Guidelines for evaluation of complex multi agent test scenarios

Authors: Ana Isabel Garcia Guerra, Teng Sung Shiuan

Subjects: Robotics (cs.RO)

To support the testing of AVs, CETRAN has created a guideline for the evaluation of complex multi agent test scenarios presented in this report. This allows for a clear structured manner in evaluating complexity elements based on the corresponding difficulties an AV might encounter in Singapore traffic. This study aims to understand the source of complexity for AVs from traffic hazard, by breaking down the difficulties on AV capabilities as perception, situation awareness and decision-making. Guidelines created through this study are composed by a list of elements to be considered in the future as selection criteria to evaluate complexity of scenarios to support AV behaviour assessment. This study is intended to be a guide to understand the sources of complexity for Avs and can be used to challenge the risk management ability of autonomous vehicles in a scenario-based test approach or traffic situations faced on road trials.
The report includes the usage of the guidelines created as application to evaluate the complexity of a set of 5 real events that occur on Singapore roads from Resembler webtool which is a database of real human accidents/incidents. Four scenarios were also designed for creation in simulation by the CETRAN team, applying the guidelines for complexity elements created in this work, to illustrate the difficulties an ADS could experience with such scenarios.
[72] arXiv:2405.10529 [pdf, other]: Title: Safeguarding Vision-Language Models Against Patched Visual Prompt Injectors

Authors: Jiachen Sun, Changsheng Wang, Jiongxiao Wang, Yiwei Zhang, Chaowei Xiao

Comments: 15 pages

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Large language models have become increasingly prominent, also signaling a shift towards multimodality as the next frontier in artificial intelligence, where their embeddings are harnessed as prompts to generate textual content. Vision-language models (VLMs) stand at the forefront of this advancement, offering innovative ways to combine visual and textual data for enhanced understanding and interaction. However, this integration also enlarges the attack surface. Patch-based adversarial attack is considered the most realistic threat model in physical vision applications, as demonstrated in many existing literature. In this paper, we propose to address patched visual prompt injection, where adversaries exploit adversarial patches to generate target content in VLMs. Our investigation reveals that patched adversarial prompts exhibit sensitivity to pixel-wise randomization, a trait that remains robust even against adaptive attacks designed to counteract such defenses. Leveraging this insight, we introduce SmoothVLM, a defense mechanism rooted in smoothing techniques, specifically tailored to protect VLMs from the threat of patched visual prompt injectors. Our framework significantly lowers the attack success rate to a range between 0% and 5.0% on two leading VLMs, while achieving around 67.3% to 95.0% context recovery of the benign images, demonstrating a balance between security and usability.
[73] arXiv:2405.10530 [pdf, other]: Title: CM-UNet: Hybrid CNN-Mamba UNet for Remote Sensing Image Semantic Segmentation

Authors: Mushui Liu, Jun Dan, Ziqian Lu, Yunlong Yu, Yingming Li, Xi Li

Comments: 5 pages, 6 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Due to the large-scale image size and object variations, current CNN-based and Transformer-based approaches for remote sensing image semantic segmentation are suboptimal for capturing the long-range dependency or limited to the complex computational complexity. In this paper, we propose CM-UNet, comprising a CNN-based encoder for extracting local image features and a Mamba-based decoder for aggregating and integrating global information, facilitating efficient semantic segmentation of remote sensing images. Specifically, a CSMamba block is introduced to build the core segmentation decoder, which employs channel and spatial attention as the gate activation condition of the vanilla Mamba to enhance the feature interaction and global-local information fusion. Moreover, to further refine the output features from the CNN encoder, a Multi-Scale Attention Aggregation (MSAA) module is employed to merge the different scale features. By integrating the CSMamba block and MSAA module, CM-UNet effectively captures the long-range dependencies and multi-scale global contextual information of large-scale remote-sensing images. Experimental results obtained on three benchmarks indicate that the proposed CM-UNet outperforms existing methods in various performance metrics. The codes are available at https://github.com/XiaoBuL/CM-UNet.
[74] arXiv:2405.10531 [pdf, other]: Title: Nonparametric Teaching of Implicit Neural Representations

Authors: Chen Zhang, Steven Tin Sui Luo, Jason Chun Lok Li, Yik-Chung Wu, Ngai Wong

Comments: ICML 2024 (24 pages, 13 figures)

Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

We investigate the learning of implicit neural representation (INR) using an overparameterized multilayer perceptron (MLP) via a novel nonparametric teaching perspective. The latter offers an efficient example selection framework for teaching nonparametrically defined (viz. non-closed-form) target functions, such as image functions defined by 2D grids of pixels. To address the costly training of INRs, we propose a paradigm called Implicit Neural Teaching (INT) that treats INR learning as a nonparametric teaching problem, where the given signal being fitted serves as the target function. The teacher then selects signal fragments for iterative training of the MLP to achieve fast convergence. By establishing a connection between MLP evolution through parameter-based gradient descent and that of function evolution through functional gradient descent in nonparametric teaching, we show for the first time that teaching an overparameterized MLP is consistent with teaching a nonparametric learner. This new discovery readily permits a convenient drop-in of nonparametric teaching algorithms to broadly enhance INR training efficiency, demonstrating 30%+ training time savings across various input modalities.
[75] arXiv:2405.10534 [pdf, other]: Title: CMA-ES for Safe Optimization

Authors: Kento Uchida, Ryoki Hamano, Masahiro Nomura, Shota Saito, Shinichi Shirakawa

Comments: This paper has been accepted as a full paper at GECCO2024

Subjects: Neural and Evolutionary Computing (cs.NE)

In several real-world applications in medical and control engineering, there are unsafe solutions whose evaluations involve inherent risk. This optimization setting is known as safe optimization and formulated as a specialized type of constrained optimization problem with constraints for safety functions. Safe optimization requires performing efficient optimization without evaluating unsafe solutions. A few studies have proposed the optimization methods for safe optimization based on Bayesian optimization and the evolutionary algorithm. However, Bayesian optimization-based methods often struggle to achieve superior solutions, and the evolutionary algorithm-based method fails to effectively reduce unsafe evaluations. This study focuses on CMA-ES as an efficient evolutionary algorithm and proposes an optimization method termed safe CMA-ES. The safe CMA-ES is designed to achieve both safety and efficiency in safe optimization. The safe CMA-ES estimates the Lipschitz constants of safety functions transformed with the distribution parameters using the maximum norm of the gradient in Gaussian process regression. Subsequently, the safe CMA-ES projects the samples to the nearest point in the safe region constructed with the estimated Lipschitz constants. The numerical simulation using the benchmark functions shows that the safe CMA-ES successfully performs optimization, suppressing the unsafe evaluations, while the existing methods struggle to significantly reduce the unsafe evaluations.
[76] arXiv:2405.10536 [pdf, other]: Title: Time-Varying Constraint-Aware Reinforcement Learning for Energy Storage Control

Authors: Jaeik Jeong, Tai-Yeon Ku, Wan-Ki Park

Comments: ICLR 2024 Workshop: Tackling Climate Change with Machine Learning

Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Energy storage devices, such as batteries, thermal energy storages, and hydrogen systems, can help mitigate climate change by ensuring a more stable and sustainable power supply. To maximize the effectiveness of such energy storage, determining the appropriate charging and discharging amounts for each time period is crucial. Reinforcement learning is preferred over traditional optimization for the control of energy storage due to its ability to adapt to dynamic and complex environments. However, the continuous nature of charging and discharging levels in energy storage poses limitations for discrete reinforcement learning, and time-varying feasible charge-discharge range based on state of charge (SoC) variability also limits the conventional continuous reinforcement learning. In this paper, we propose a continuous reinforcement learning approach that takes into account the time-varying feasible charge-discharge range. An additional objective function was introduced for learning the feasible action range for each time period, supplementing the objectives of training the actor for policy learning and the critic for value learning. This actively promotes the utilization of energy storage by preventing them from getting stuck in suboptimal states, such as continuous full charging or discharging. This is achieved through the enforcement of the charging and discharging levels into the feasible action range. The experimental results demonstrated that the proposed method further maximized the effectiveness of energy storage by actively enhancing its utilization.
[77] arXiv:2405.10542 [pdf, other]: Title: Benchmarking Large Language Models on CFLUE -- A Chinese Financial Language Understanding Evaluation Dataset

Authors: Jie Zhu, Junhui Li, Yalong Wen, Lifan Guo

Comments: Accepted by ACL 2024

Journal-ref: The 62nd Annual Meeting of the Association for Computational Linguistics(ACL),2024

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

In light of recent breakthroughs in large language models (LLMs) that have revolutionized natural language processing (NLP), there is an urgent need for new benchmarks to keep pace with the fast development of LLMs. In this paper, we propose CFLUE, the Chinese Financial Language Understanding Evaluation benchmark, designed to assess the capability of LLMs across various dimensions. Specifically, CFLUE provides datasets tailored for both knowledge assessment and application assessment. In knowledge assessment, it consists of 38K+ multiple-choice questions with associated solution explanations. These questions serve dual purposes: answer prediction and question reasoning. In application assessment, CFLUE features 16K+ test instances across distinct groups of NLP tasks such as text classification, machine translation, relation extraction, reading comprehension, and text generation. Upon CFLUE, we conduct a thorough evaluation of representative LLMs. The results reveal that only GPT-4 and GPT-4-turbo achieve an accuracy exceeding 60\% in answer prediction for knowledge assessment, suggesting that there is still substantial room for improvement in current LLMs. In application assessment, although GPT-4 and GPT-4-turbo are the top two performers, their considerable advantage over lightweight LLMs is noticeably diminished. The datasets and scripts associated with CFLUE are openly accessible at https://github.com/aliyun/cflue.
[78] arXiv:2405.10543 [pdf, other]: Title: SMARD: A Cost Effective Smart Agro Development Technology for Crops Disease Classification

Authors: Tanoy Debnath, Shadman Wadith, Anichur Rahman

Subjects: Cryptography and Security (cs.CR)

Agriculture has a significant role in a country's economy. The "SMARD" project aims to strengthen the country's agricultural sector by giving farmers with the information and tools they need to solve common difficulties and increase productivity. The project provides farmers with information on crop care, seed selection, and disease management best practices, as well as access to tools for recognizing and treating crop diseases. Farmers can also contact the expert panel through text message, voice call, or video call to purchase fertilizer, seeds, and pesticides at low prices, as well as secure bank loans. The project's goal is to empower farmers and rural communities by providing them with the resources they need to increase crop yields. Additionally, the "SMARD" will not only help farmers and rural communities live better lives, but it will also have a good effect on the economy of the nation. Farmers are now able to recognize plant illnesses more quickly because of the application of machine learning techniques based on image processing categorization. Our experiments' results show that our system "SMARD" outperforms the cutting-edge web applications by attaining 97.3% classification accuracy and 96% F1-score in crop disease classification. Overall, our project is an important endeavor for the nation's agricultural sector because its main goal is to give farmers the information, resources, and tools they need to increase crop yields, improve economic outcomes, and improve livelihoods.
[79] arXiv:2405.10545 [pdf, other]: Title: Dynamic Cluster Analysis to Detect and Track Novelty in Network Telescopes

Authors: Kai Huang, Luca Gioacchini, Marco Mellia, Luca Vassio

Subjects: Networking and Internet Architecture (cs.NI)

In the context of cybersecurity, tracking the activities of coordinated hosts over time is a daunting task because both participants and their behaviours evolve at a fast pace. We address this scenario by solving a dynamic novelty discovery problem with the aim of both re-identifying patterns seen in the past and highlighting new patterns. We focus on traffic collected by Network Telescopes, a primary and noisy source for cybersecurity analysis. We propose a 3-stage pipeline: (i) we learn compact representations (embeddings) of hosts through their traffic in a self-supervised fashion; (ii) via clustering, we distinguish groups of hosts performing similar activities; (iii) we track the cluster temporal evolution to highlight novel patterns. We apply our methodology to 20 days of telescope traffic during which we observe more than 8 thousand active hosts. Our results show that we efficiently identify 50-70 well-shaped clusters per day, 60-70% of which we associate with already analysed cases, while we pinpoint 10-20 previously unseen clusters per day. These correspond to activity changes and new incidents, of which we document some. In short, our novelty discovery methodology enormously simplifies the manual analysis the security analysts have to conduct to gain insights to interpret novel coordinated activities.
[80] arXiv:2405.10546 [pdf, other]: Title: You Can't Solve These Super Mario Bros. Levels: Undecidable Mario Games

Authors: MIT Hardness Group, Hayashi Ani, Erik D. Demaine, Holden Hall, Ricardo Ruiz, Naveen Venkat

Subjects: Computational Complexity (cs.CC)

We prove RE-completeness (and thus undecidability) of several 2D games in the Super Mario Bros. platform video game series: the New Super Mario Bros. series (original, Wii, U, and 2), and both Super Mario Maker games in all five game styles (Super Mario Bros. 1 and 3, Super Mario World, New Super Mario Bros. U, and Super Mario 3D World). These results hold even when we restrict to constant-size levels and screens, but they do require generalizing to allow arbitrarily many enemies at each location and onscreen, as well as allowing for exponentially large (or no) timer. Our New Super Mario Bros. constructions fit within one standard screen size. In our Super Mario Maker reductions, we work within the standard screen size and use the property that the game engine remembers offscreen objects that are global because they are supported by "global ground". To prove these Mario results, we build a new theory of counter gadgets in the motion-planning-through-gadgets framework, and provide a suite of simple gadgets for which reachability is RE-complete.
[81] arXiv:2405.10547 [pdf, other]: Title: GPTs Window Shopping: An analysis of the Landscape of Custom ChatGPT Models

Authors: Benjamin Zi Hao Zhao, Muhammad Ikram, Mohamed Ali Kaafar

Comments: 9 pages

Subjects: Social and Information Networks (cs.SI)

OpenAI's ChatGPT initiated a wave of technical iterations in the space of Large Language Models (LLMs) by demonstrating the capability and disruptive power of LLMs. OpenAI has prompted large organizations to respond with their own advancements and models to push the LLM performance envelope. OpenAI has prompted large organizations to respond with their own advancements and models to push the LLM performance envelope. OpenAI's success in spotlighting AI can be partially attributed to decreased barriers to entry, enabling any individual with an internet-enabled device to interact with LLMs. What was previously relegated to a few researchers and developers with necessary computing resources is now available to all. A desire to customize LLMs to better accommodate individual needs prompted OpenAI's creation of the GPT Store, a central platform where users can create and share custom GPT models. Customization comes in the form of prompt-tuning, analysis of reference resources, browsing, and external API interactions, alongside a promise of revenue sharing for created custom GPTs. In this work, we peer into the window of the GPT Store and measure its impact. Our analysis constitutes a large-scale overview of the store exploring community perception, GPT details, and the GPT authors, in addition to a deep-dive into a 3rd party storefront indexing user-submitted GPTs, exploring if creators seek to monetize their creations in the absence of OpenAI's revenue sharing.
[82] arXiv:2405.10548 [pdf, other]: Title: Language Models can Exploit Cross-Task In-context Learning for Data-Scarce Novel Tasks

Authors: Anwoy Chatterjee, Eshaan Tanwar, Subhabrata Dutta, Tanmoy Chakraborty

Comments: Accepted at ACL 2024 Main

Subjects: Computation and Language (cs.CL)

Large Language Models (LLMs) have transformed NLP with their remarkable In-context Learning (ICL) capabilities. Automated assistants based on LLMs are gaining popularity; however, adapting them to novel tasks is still challenging. While colossal models excel in zero-shot performance, their computational demands limit widespread use, and smaller language models struggle without context. This paper investigates whether LLMs can generalize from labeled examples of predefined tasks to novel tasks. Drawing inspiration from biological neurons and the mechanistic interpretation of the Transformer architecture, we explore the potential for information sharing across tasks. We design a cross-task prompting setup with three LLMs and show that LLMs achieve significant performance improvements despite no examples from the target task in the context. Cross-task prompting leads to a remarkable performance boost of 107% for LLaMA-2 7B, 18.6% for LLaMA-2 13B, and 3.2% for GPT 3.5 on average over zero-shot prompting, and performs comparable to standard in-context learning. The effectiveness of generating pseudo-labels for in-task examples is demonstrated, and our analyses reveal a strong correlation between the effect of cross-task examples and model activation similarities in source and target input tokens. This paper offers a first-of-its-kind exploration of LLMs' ability to solve novel tasks based on contextual signals from different task examples.
[83] arXiv:2405.10554 [pdf, other]: Title: NeRO: Neural Road Surface Reconstruction

Authors: Ruibo Wang, Song Zhang, Ping Huang, Donghai Zhang, Haoyu Chen

Subjects: Computer Vision and Pattern Recognition (cs.CV)

In computer vision and graphics, the accurate reconstruction of road surfaces is pivotal for various applications, especially in autonomous driving. This paper introduces a novel method leveraging the Multi-Layer Perceptrons (MLPs) framework to reconstruct road surfaces in height, color, and semantic information by input world coordinates x and y. Our approach NeRO uses encoding techniques based on MLPs, significantly improving the performance of the complex details, speeding up the training speed, and reducing neural network size. The effectiveness of this method is demonstrated through its superior performance, which indicates a promising direction for rendering road surfaces with semantics applications, particularly in applications demanding visualization of road conditions, 4D labeling, and semantic groupings.
[84] arXiv:2405.10556 [pdf, other]: Title: Parameterized Complexity of Dominating Set Variants in Almost Cluster and Split Graphs

Authors: Dishant Goyal, Ashwin Jacob, Kaushtubh Kumar, Diptapriyo Majumdar, Venkatesh Raman

Comments: Some of the results appeared in proceedings of CSR 2018

Subjects: Data Structures and Algorithms (cs.DS)

We consider structural parameterizations of the fundamental Dominating Set problem and its variants in the parameter ecology program. We give improved FPT algorithms and lower bounds under well-known conjectures for dominating set in graphs that are k vertices away from a cluster graph or a split graph. These are graphs in which there is a set of k vertices (called the modulator) whose deletion results in a cluster graph or a split graph. We also call k as the deletion distance (to the appropriate class of graphs). When parameterized by the deletion distance k to cluster graphs - we can find a minimum dominating set (DS) in 3^k n^{O(1)}-time. Within the same time, we can also find a minimum independent dominating set (IDS) or a minimum dominating clique (DC) or a minimum efficient dominating set (EDS) or a minimum total dominating set (TDS). We also show that most of these variants of dominating set do not have polynomial sized kernel. Additionally, we show that when parameterized by the deletion distance k to split graphs - IDS can be solved in 2^k n^{O(1)}-time and EDS can be solved in 3^{k/2}n^{O(1)}.
[85] arXiv:2405.10557 [pdf, other]: Title: Resolving Symmetry Ambiguity in Correspondence-based Methods for Instance-level Object Pose Estimation

Authors: Yongliang Lin, Yongzhi Su, Sandeep Inuganti, Yan Di, Naeem Ajilforoushan, Hanqing Yang, Yu Zhang, Jason Rambach

Comments: 8 pages,10 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Estimating the 6D pose of an object from a single RGB image is a critical task that becomes additionally challenging when dealing with symmetric objects. Recent approaches typically establish one-to-one correspondences between image pixels and 3D object surface vertices. However, the utilization of one-to-one correspondences introduces ambiguity for symmetric objects. To address this, we propose SymCode, a symmetry-aware surface encoding that encodes the object surface vertices based on one-to-many correspondences, eliminating the problem of one-to-one correspondence ambiguity. We also introduce SymNet, a fast end-to-end network that directly regresses the 6D pose parameters without solving a PnP problem. We demonstrate faster runtime and comparable accuracy achieved by our method on the T-LESS and IC-BIN benchmarks of mostly symmetric objects. Our source code will be released upon acceptance.
[86] arXiv:2405.10558 [pdf, other]: Title: CACL: Community-Aware Heterogeneous Graph Contrastive Learning for Social Media Bot Detection

Authors: Sirry Chen, Shuo Feng, Songsong Liang, Chen-Chen Zong, Jing Li, Piji Li

Comments: Accepted by ACL 2024 findings

Subjects: Social and Information Networks (cs.SI)

Social media bot detection is increasingly crucial with the rise of social media platforms. Existing methods predominantly construct social networks as graph and utilize graph neural networks (GNNs) for bot detection. However, most of these methods focus on how to improve the performance of GNNs while neglecting the community structure within social networks. Moreover, GNNs based methods still face problems such as poor model generalization due to the relatively small scale of the dataset and over-smoothness caused by information propagation mechanism. To address these problems, we propose a Community-Aware Heterogeneous Graph Contrastive Learning framework (CACL), which constructs social network as heterogeneous graph with multiple node types and edge types, and then utilizes community-aware module to dynamically mine both hard positive samples and hard negative samples for supervised graph contrastive learning with adaptive graph enhancement algorithms. Extensive experiments demonstrate that our framework addresses the previously mentioned challenges and outperforms competitive baselines on three social media bot benchmarks.
[87] arXiv:2405.10563 [pdf, other]: Title: Function Extrapolation with Neural Networks and Its Application for Manifolds

Authors: Guy Hay, Nir Sharon

Comments: 32 pages, 11 figures

Subjects: Machine Learning (cs.LG); Numerical Analysis (math.NA)

This paper addresses the problem of accurately estimating a function on one domain when only its discrete samples are available on another domain. To answer this challenge, we utilize a neural network, which we train to incorporate prior knowledge of the function. In addition, by carefully analyzing the problem, we obtain a bound on the error over the extrapolation domain and define a condition number for this problem that quantifies the level of difficulty of the setup. Compared to other machine learning methods that provide time series prediction, such as transformers, our approach is suitable for setups where the interpolation and extrapolation regions are general subdomains and, in particular, manifolds. In addition, our construction leads to an improved loss function that helps us boost the accuracy and robustness of our neural network. We conduct comprehensive numerical tests and comparisons of our extrapolation versus standard methods. The results illustrate the effectiveness of our approach in various scenarios.
[88] arXiv:2405.10565 [pdf, other]: Title: Real-time Level-of-Detail Strand-based Hair Rendering

Authors: Tao Huang, Yang Zhou, Daqi Lin, Junqiu Zhu, Ling-Qi Yan, Kui Wu

Comments: 12 pages, 10 figures, 1 performance plot

Subjects: Graphics (cs.GR)

Strand-based hair rendering has become increasingly popular in production for its realistic appearance. However, the prevailing level-of-detail solution employing hair cards for distant hair models introduces a significant discontinuity in dynamics and appearance during the transition from strands to cards. We introduce an innovative real-time framework for strand-based hair rendering that ensures seamless transitions between different levels of detail (LOD) while maintaining a consistent hair appearance. Our method uses elliptical thick hairs that contain multiple hair strands at each LOD to maintain the shapes of hair clusters. In addition to geometric fitting, we formulate an elliptical Bidirectional Curve Scattering Distribution Functions (BCSDF) model for a thick hair, accurately capturing single scattering and multiple scattering within the hair cluster, accommodating a spectrum from sparse to dense hair distributions. Our framework, tested on various hairstyles with dynamics as well as knits, shows that it can produce highly similar appearances to full hair geometries at different viewing distances with seamless LOD transitions, while achieving up to a 3x speedup.
[89] arXiv:2405.10567 [pdf, other]: Title: Team Samsung-RAL: Technical Report for 2024 RoboDrive Challenge-Robust Map Segmentation Track

Authors: Xiaoshuai Hao, Yifan Yang, Hui Zhang, Mengchuan Wei, Yi Zhou, Haimei Zhao, Jing Zhang

Comments: ICRA 2024 RoboDrive Challenge Robust Map Segmentation Track 3rd Place Technical Report. arXiv admin note: text overlap with arXiv:2205.09743 by other authors

Subjects: Computer Vision and Pattern Recognition (cs.CV)

In this report, we describe the technical details of our submission to the 2024 RoboDrive Challenge Robust Map Segmentation Track. The Robust Map Segmentation track focuses on the segmentation of complex driving scene elements in BEV maps under varied driving conditions. Semantic map segmentation provides abundant and precise static environmental information crucial for autonomous driving systems' planning and navigation. While current methods excel in ideal circumstances, e.g., clear daytime conditions and fully functional sensors, their resilience to real-world challenges like adverse weather and sensor failures remains unclear, raising concerns about system safety. In this paper, we explored several methods to improve the robustness of the map segmentation task. The details are as follows: 1) Robustness analysis of utilizing temporal information; 2) Robustness analysis of utilizing different backbones; and 3) Data Augmentation to boost corruption robustness. Based on the evaluation results, we draw several important findings including 1) The temporal fusion module is effective in improving the robustness of the map segmentation model; 2) A strong backbone is effective for improving the corruption robustness; and 3) Some data augmentation methods are effective in improving the robustness of map segmentation models. These novel findings allowed us to achieve promising results in the 2024 RoboDrive Challenge-Robust Map Segmentation Track.
[90] arXiv:2405.10572 [pdf, ps, other]: Title: Resonances as a computational tool

Authors: Frédéric Rousset, Katharina Schratz

Subjects: Numerical Analysis (math.NA)

A large toolbox of numerical schemes for dispersive equations has been established, based on different discretization techniques such as discretizing the variation-of-constants formula (e.g., exponential integrators) or splitting the full equation into a series of simpler subproblems (e.g., splitting methods). In many situations these classical schemes allow a precise and efficient approximation. This, however, drastically changes whenever non-smooth phenomena enter the scene such as for problems at low regularity and high oscillations. Classical schemes fail to capture the oscillatory nature of the solution, and this may lead to severe instabilities and loss of convergence. In this article we review a new class of resonance-based schemes. The key idea in the construction of the new schemes is to tackle and deeply embed the underlying nonlinear structure of resonances into the numerical discretization. As in the continuous case, these terms are central to structure preservation and offer the new schemes strong properties at low regularity.
[91] arXiv:2405.10575 [pdf, other]: Title: Accurate Training Data for Occupancy Map Prediction in Automated Driving Using Evidence Theory

Authors: Jonas Kälble, Sascha Wirges, Maxim Tatarchenko, Eddy Ilg

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Automated driving fundamentally requires knowledge about the surrounding geometry of the scene. Modern approaches use only captured images to predict occupancy maps that represent the geometry. Training these approaches requires accurate data that may be acquired with the help of LiDAR scanners. We show that the techniques used for current benchmarks and training datasets to convert LiDAR scans into occupancy grid maps yield very low quality, and subsequently present a novel approach using evidence theory that yields more accurate reconstructions. We demonstrate that these are superior by a large margin, both qualitatively and quantitatively, and that we additionally obtain meaningful uncertainty estimates. When converting the occupancy maps back to depth estimates and comparing them with the raw LiDAR measurements, our method yields a MAE improvement of 30% to 52% on nuScenes and 53% on Waymo over other occupancy ground-truth data. Finally, we use the improved occupancy maps to train a state-of-the-art occupancy prediction method and demonstrate that it improves the MAE by 25% on nuScenes.
[92] arXiv:2405.10576 [pdf, other]: Title: An Efficient Learning Control Framework With Sim-to-Real for String-Type Artificial Muscle-Driven Robotic Systems

Authors: Jiyue Tao, Yunsong Zhang, Sunil Kumar Rajendran, Feitian Zhang, Dexin Zhao, Tongsheng Shen

Subjects: Robotics (cs.RO)

Robotic systems driven by artificial muscles present unique challenges due to the nonlinear dynamics of actuators and the complex designs of mechanical structures. Traditional model-based controllers often struggle to achieve desired control performance in such systems. Deep reinforcement learning (DRL), a trending machine learning technique widely adopted in robot control, offers a promising alternative. However, integrating DRL into these robotic systems faces significant challenges, including the requirement for large amounts of training data and the inevitable sim-to-real gap when deployed to real-world robots. This paper proposes an efficient reinforcement learning control framework with sim-to-real transfer to address these challenges. Bootstrap and augmentation enhancements are designed to improve the data efficiency of baseline DRL algorithms, while a sim-to-real transfer technique, namely randomization of muscle dynamics, is adopted to bridge the gap between simulation and real-world deployment. Extensive experiments and ablation studies are conducted utilizing two string-type artificial muscle-driven robotic systems including a two degree-of-freedom robotic eye and a parallel robotic wrist, the results of which demonstrate the effectiveness of the proposed learning control strategy.
[93] arXiv:2405.10577 [pdf, other]: Title: DuoSpaceNet: Leveraging Both Bird's-Eye-View and Perspective View Representations for 3D Object Detection

Authors: Zhe Huang, Yizhe Zhao, Hao Xiao, Chenyan Wu, Lingting Ge

Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)

Recent advances in multi-view camera-only 3D object detection either rely on an accurate reconstruction of bird's-eye-view (BEV) 3D features or on traditional 2D perspective view (PV) image features. While both have their own pros and cons, few have found a way to stitch them together in order to benefit from "the best of both worlds". To this end, we explore a duo space (i.e., BEV and PV) 3D perception framework, in conjunction with some useful duo space fusion strategies that allow effective aggregation of the two feature representations. To the best of our knowledge, our proposed method, DuoSpaceNet, is the first to leverage two distinct feature spaces and achieves the state-of-the-art 3D object detection and BEV map segmentation results on nuScenes dataset.
[94] arXiv:2405.10578 [pdf, other]: Title: Jacobi Stability Analysis for Systems of ODEs Using Symbolic Computation

Authors: Bo Huang, Dongming Wang, Jing Yang

Comments: To appear in Proceedings of the 2024 International Symposium on Symbolic and Algebraic Computation (ISSAC 2024)

Subjects: Symbolic Computation (cs.SC)

The classical theory of Kosambi-Cartan-Chern (KCC) developed in differential geometry provides a powerful method for analyzing the behaviors of dynamical systems. In the KCC theory, the properties of a dynamical system are described in terms of five geometrical invariants, of which the second corresponds to the so-called Jacobi stability of the system. Different from that of the Lyapunov stability that has been studied extensively in the literature, the analysis of the Jacobi stability has been investigated more recently using geometrical concepts and tools. It turns out that the existing work on the Jacobi stability analysis remains theoretical and the problem of algorithmic and symbolic treatment of Jacobi stability analysis has yet to be addressed. In this paper, we initiate our study on the problem for a class of ODE systems of arbitrary dimension and propose two algorithmic schemes using symbolic computation to check whether a nonlinear dynamical system may exhibit Jacobi stability. The first scheme, based on the construction of the complex root structure of a characteristic polynomial and on the method of quantifier elimination, is capable of detecting the existence of the Jacobi stability of the given dynamical system. The second algorithmic scheme exploits the method of semi-algebraic system solving and allows one to determine conditions on the parameters for a given dynamical system to have a prescribed number of Jacobi stable fixed points. Several examples are presented to demonstrate the effectiveness of the proposed algorithmic schemes.
[95] arXiv:2405.10579 [pdf, other]: Title: A Hard Nut to Crack: Idiom Detection with Conversational Large Language Models

Authors: Francesca De Luca Fornaciari, Begoña Altuna, Itziar Gonzalez-Dios, Maite Melero

Subjects: Computation and Language (cs.CL)

In this work, we explore idiomatic language processing with Large Language Models (LLMs). We introduce the Idiomatic language Test Suite IdioTS, a new dataset of difficult examples specifically designed by language experts to assess the capabilities of LLMs to process figurative language at sentence level. We propose a comprehensive evaluation methodology based on an idiom detection task, where LLMs are prompted with detecting an idiomatic expression in a given English sentence. We present a thorough automatic and manual evaluation of the results and an extensive error analysis.
[96] arXiv:2405.10581 [pdf, other]: Title: Future Aware Safe Active Learning of Time Varying Systems using Gaussian Processes

Authors: Markus Lange-Hegermann, Christoph Zimmer

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Optimization and Control (math.OC); Probability (math.PR)

Experimental exploration of high-cost systems with safety constraints, common in engineering applications, is a challenging endeavor. Data-driven models offer a promising solution, but acquiring the requisite data remains expensive and is potentially unsafe. Safe active learning techniques prove essential, enabling the learning of high-quality models with minimal expensive data points and high safety. This paper introduces a safe active learning framework tailored for time-varying systems, addressing drift, seasonal changes, and complexities due to dynamic behavior. The proposed Time-aware Integrated Mean Squared Prediction Error (T-IMSPE) method minimizes posterior variance over current and future states, optimizing information gathering also in the time domain. Empirical results highlight T-IMSPE's advantages in model quality through toy and real-world examples. State of the art Gaussian processes are compatible with T-IMSPE. Our theoretical contributions include a clear delineation which Gaussian process kernels, domains, and weighting measures are suitable for T-IMSPE and even beyond for its non-time aware predecessor IMSPE.
[97] arXiv:2405.10584 [pdf, ps, other]: Title: A Hybrid Deep Learning Framework for Stock Price Prediction Considering the Investor Sentiment of Online Forum Enhanced by Popularity

Authors: Huiyu Li, Junhua Hu

Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)

Stock price prediction has always been a difficult task for forecasters. Using cutting-edge deep learning techniques, stock price prediction based on investor sentiment extracted from online forums has become feasible. We propose a novel hybrid deep learning framework for predicting stock prices. The framework leverages the XLNET model to analyze the sentiment conveyed in user posts on online forums, combines these sentiments with the post popularity factor to compute daily group sentiments, and integrates this information with stock technical indicators into an improved BiLSTM-highway model for stock price prediction. Through a series of comparative experiments involving four stocks on the Chinese stock market, it is demonstrated that the hybrid framework effectively predicts stock prices. This study reveals the necessity of analyzing investors' textual views for stock price prediction.
[98] arXiv:2405.10587 [pdf, other]: Title: RDRec: Rationale Distillation for LLM-based Recommendation

Authors: Xinfeng Wang, Jin Cui, Yoshimi Suzuki, Fumiyo Fukumoto

Comments: 10 pages. Accepted to ACL 2024 Main as a short paper

Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)

Large language model (LLM)-based recommender models that bridge users and items through textual prompts for effective semantic reasoning have gained considerable attention. However, few methods consider the underlying rationales behind interactions, such as user preferences and item attributes, limiting the reasoning capability of LLMs for recommendations. This paper proposes a rationale distillation recommender (RDRec), a compact model designed to learn rationales generated by a larger language model (LM). By leveraging rationales from reviews related to users and items, RDRec remarkably specifies their profiles for recommendations. Experiments show that RDRec achieves state-of-the-art (SOTA) performance in both top-N and sequential recommendations. Our source code is released at https://github.com/WangXFng/RDRec.
[99] arXiv:2405.10589 [pdf, other]: Title: Improving Point-based Crowd Counting and Localization Based on Auxiliary Point Guidance

Authors: I-Hsiang Chen, Wei-Ting Chen, Yu-Wei Liu, Ming-Hsuan Yang, Sy-Yen Kuo

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Image and Video Processing (eess.IV)

Crowd counting and localization have become increasingly important in computer vision due to their wide-ranging applications. While point-based strategies have been widely used in crowd counting methods, they face a significant challenge, i.e., the lack of an effective learning strategy to guide the matching process. This deficiency leads to instability in matching point proposals to target points, adversely affecting overall performance. To address this issue, we introduce an effective approach to stabilize the proposal-target matching in point-based methods. We propose Auxiliary Point Guidance (APG) to provide clear and effective guidance for proposal selection and optimization, addressing the core issue of matching uncertainty. Additionally, we develop Implicit Feature Interpolation (IFI) to enable adaptive feature extraction in diverse crowd scenarios, further enhancing the model's robustness and accuracy. Extensive experiments demonstrate the effectiveness of our approach, showing significant improvements in crowd counting and localization performance, particularly under challenging conditions. The source codes and trained models will be made publicly available.
[100] arXiv:2405.10591 [pdf, other]: Title: GEOcc: Geometrically Enhanced 3D Occupancy Network with Implicit-Explicit Depth Fusion and Contextual Self-Supervision

Authors: Xin Tan, Wenbin Wu, Zhiwei Zhang, Chaojie Fan, Yong Peng, Zhizhong Zhang, Yuan Xie, Lizhuang Ma

Subjects: Computer Vision and Pattern Recognition (cs.CV)

3D occupancy perception holds a pivotal role in recent vision-centric autonomous driving systems by converting surround-view images into integrated geometric and semantic representations within dense 3D grids. Nevertheless, current models still encounter two main challenges: modeling depth accurately in the 2D-3D view transformation stage, and overcoming the lack of generalizability issues due to sparse LiDAR supervision. To address these issues, this paper presents GEOcc, a Geometric-Enhanced Occupancy network tailored for vision-only surround-view perception. Our approach is three-fold: 1) Integration of explicit lift-based depth prediction and implicit projection-based transformers for depth modeling, enhancing the density and robustness of view transformation. 2) Utilization of mask-based encoder-decoder architecture for fine-grained semantic predictions; 3) Adoption of context-aware self-training loss functions in the pertaining stage to complement LiDAR supervision, involving the re-rendering of 2D depth maps from 3D occupancy features and leveraging image reconstruction loss to obtain denser depth supervision besides sparse LiDAR ground-truths. Our approach achieves State-Of-The-Art performance on the Occ3D-nuScenes dataset with the least image resolution needed and the most weightless image backbone compared with current models, marking an improvement of 3.3% due to our proposed contributions. Comprehensive experimentation also demonstrates the consistent superiority of our method over baselines and alternative approaches.
[101] arXiv:2405.10596 [pdf, other]: Title: CELA: Cost-Efficient Language Model Alignment for CTR Prediction

Authors: Xingmei Wang, Weiwen Liu, Xiaolong Chen, Qi Liu, Xu Huang, Defu Lian, Xiangyang Li, Yasheng Wang, Zhenhua Dong, Ruiming Tang

Comments: 10 pages, 5 figures

Subjects: Information Retrieval (cs.IR)

Click-Through Rate (CTR) prediction holds a paramount position in recommender systems. The prevailing ID-based paradigm underperforms in cold-start scenarios due to the skewed distribution of feature frequency. Additionally, the utilization of a single modality fails to exploit the knowledge contained within textual features. Recent efforts have sought to mitigate these challenges by integrating Pre-trained Language Models (PLMs). They design hard prompts to structure raw features into text for each interaction and then apply PLMs for text processing. With external knowledge and reasoning capabilities, PLMs extract valuable information even in cases of sparse interactions. Nevertheless, compared to ID-based models, pure text modeling degrades the efficacy of collaborative filtering, as well as feature scalability and efficiency during both training and inference. To address these issues, we propose \textbf{C}ost-\textbf{E}fficient \textbf{L}anguage Model \textbf{A}lignment (\textbf{CELA}) for CTR prediction. CELA incorporates textual features and language models while preserving the collaborative filtering capabilities of ID-based models. This model-agnostic framework can be equipped with plug-and-play textual features, with item-level alignment enhancing the utilization of external information while maintaining training and inference efficiency. Through extensive offline experiments, CELA demonstrates superior performance compared to state-of-the-art methods. Furthermore, an online A/B test conducted on an industrial App recommender system showcases its practical effectiveness, solidifying the potential for real-world applications of CELA.
[102] arXiv:2405.10597 [pdf, other]: Title: UniCL: A Universal Contrastive Learning Framework for Large Time Series Models

Authors: Jiawei Li, Jingshu Peng, Haoyang Li, Lei Chen

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

Time-series analysis plays a pivotal role across a range of critical applications, from finance to healthcare, which involves various tasks, such as forecasting and classification. To handle the inherent complexities of time-series data, such as high dimensionality and noise, traditional supervised learning methods first annotate extensive labels for time-series data in each task, which is very costly and impractical in real-world applications. In contrast, pre-trained foundation models offer a promising alternative by leveraging unlabeled data to capture general time series patterns, which can then be fine-tuned for specific tasks. However, existing approaches to pre-training such models typically suffer from high-bias and low-generality issues due to the use of predefined and rigid augmentation operations and domain-specific data training. To overcome these limitations, this paper introduces UniCL, a universal and scalable contrastive learning framework designed for pretraining time-series foundation models across cross-domain datasets. Specifically, we propose a unified and trainable time-series augmentation operation to generate pattern-preserved, diverse, and low-bias time-series data by leveraging spectral information. Besides, we introduce a scalable augmentation algorithm capable of handling datasets with varying lengths, facilitating cross-domain pretraining. Extensive experiments on two benchmark datasets across eleven domains validate the effectiveness of UniCL, demonstrating its high generalization on time-series analysis across various fields.
[103] arXiv:2405.10598 [pdf, other]: Title: Learning Object-Centric Representation via Reverse Hierarchy Guidance

Authors: Junhong Zou, Xiangyu Zhu, Zhaoxiang Zhang, Zhen Lei

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Object-Centric Learning (OCL) seeks to enable Neural Networks to identify individual objects in visual scenes, which is crucial for interpretable visual comprehension and reasoning. Most existing OCL models adopt auto-encoding structures and learn to decompose visual scenes through specially designed inductive bias, which causes the model to miss small objects during reconstruction. Reverse hierarchy theory proposes that human vision corrects perception errors through a top-down visual pathway that returns to bottom-level neurons and acquires more detailed information, inspired by which we propose Reverse Hierarchy Guided Network (RHGNet) that introduces a top-down pathway that works in different ways in the training and inference processes. This pathway allows for guiding bottom-level features with top-level object representations during training, as well as encompassing information from bottom-level features into perception during inference. Our model achieves SOTA performance on several commonly used datasets including CLEVR, CLEVRTex and MOVi-C. We demonstrate with experiments that our method promotes the discovery of small objects and also generalizes well on complex real-world scenes. Code will be available at https://anonymous.4open.science/r/RHGNet-6CEF.
[104] arXiv:2405.10608 [pdf, other]: Title: ECATS: Explainable-by-design concept-based anomaly detection for time series

Authors: Irene Ferfoglia, Gaia Saveri, Laura Nenzi, Luca Bortolussi

Comments: 14 pages, 8 figures, submitted to 18th International Conference on Neural-Symbolic Learning and Reasoning (NeSy 2024)

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Deep learning methods for time series have already reached excellent performances in both prediction and classification tasks, including anomaly detection. However, the complexity inherent in Cyber Physical Systems (CPS) creates a challenge when it comes to explainability methods. To overcome this inherent lack of interpretability, we propose ECATS, a concept-based neuro-symbolic architecture where concepts are represented as Signal Temporal Logic (STL) formulae. Leveraging kernel-based methods for STL, concept embeddings are learnt in an unsupervised manner through a cross-attention mechanism. The network makes class predictions through these concept embeddings, allowing for a meaningful explanation to be naturally extracted for each input. Our preliminary experiments with a simple CPS-based dataset show that our model is able to achieve great classification performance while ensuring local interpretability.
[105] arXiv:2405.10610 [pdf, other]: Title: Driving Referring Video Object Segmentation with Vision-Language Pre-trained Models

Authors: Zikun Zhou, Wentao Xiong, Li Zhou, Xin Li, Zhenyu He, Yaowei Wang

Subjects: Computer Vision and Pattern Recognition (cs.CV)

The crux of Referring Video Object Segmentation (RVOS) lies in modeling dense text-video relations to associate abstract linguistic concepts with dynamic visual contents at pixel-level. Current RVOS methods typically use vision and language models pre-trained independently as backbones. As images and texts are mapped to uncoupled feature spaces, they face the arduous task of learning Vision-Language~(VL) relation modeling from scratch. Witnessing the success of Vision-Language Pre-trained (VLP) models, we propose to learn relation modeling for RVOS based on their aligned VL feature space. Nevertheless, transferring VLP models to RVOS is a deceptively challenging task due to the substantial gap between the pre-training task (image/region-level prediction) and the RVOS task (pixel-level prediction in videos). In this work, we introduce a framework named VLP-RVOS to address this transfer challenge. We first propose a temporal-aware prompt-tuning method, which not only adapts pre-trained representations for pixel-level prediction but also empowers the vision encoder to model temporal clues. We further propose to perform multi-stage VL relation modeling while and after feature extraction for comprehensive VL understanding. Besides, we customize a cube-frame attention mechanism for spatial-temporal reasoning. Extensive experiments demonstrate that our method outperforms state-of-the-art algorithms and exhibits strong generalization abilities.
[106] arXiv:2405.10611 [pdf, ps, other]: Title: A Certified Proof Checker for Deep Neural Network Verification

Authors: Remi Desmartin, Omri Isac, Ekaterina Komendantskaya, Kathrin Stark, Grant Passmore, Guy Katz

Subjects: Logic in Computer Science (cs.LO); Artificial Intelligence (cs.AI); Programming Languages (cs.PL)

Recent advances in the verification of deep neural networks (DNNs) have opened the way for broader usage of DNN verification technology in many application areas, including safety-critical ones. DNN verifiers are themselves complex programs that have been shown to be susceptible to errors and imprecisions; this in turn has raised the question of trust in DNN verifiers. One prominent attempt to address this issue is enhancing DNN verifiers with the capability of producing proofs of their results that are subject to independent algorithmic certification (proof checking). Formulations of proof production and proof checking already exist on top of the state-of-the-art Marabou DNN verifier. The native implementation of the proof checking algorithm for Marabou was done in C++ and itself raised the question of trust in the code (e.g., in the precision of floating point calculations or guarantees for implementation soundness). Here, we present an alternative implementation of the Marabou proof checking algorithm in Imandra -- an industrial functional programming language and prover -- that allows us to obtain an implementation with formal guarantees, including proofs of mathematical results underlying the algorithm, such as the use of the Farkas lemma.
[107] arXiv:2405.10612 [pdf, other]: Title: Not All Prompts Are Secure: A Switchable Backdoor Attack Against Pre-trained Vision Transformers

Authors: Sheng Yang, Jiawang Bai, Kuofeng Gao, Yong Yang, Yiming Li, Shu-tao Xia

Subjects: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR); Machine Learning (cs.LG)

Given the power of vision transformers, a new learning paradigm, pre-training and then prompting, makes it more efficient and effective to address downstream visual recognition tasks. In this paper, we identify a novel security threat towards such a paradigm from the perspective of backdoor attacks. Specifically, an extra prompt token, called the switch token in this work, can turn the backdoor mode on, i.e., converting a benign model into a backdoored one. Once under the backdoor mode, a specific trigger can force the model to predict a target class. It poses a severe risk to the users of cloud API, since the malicious behavior can not be activated and detected under the benign mode, thus making the attack very stealthy. To attack a pre-trained model, our proposed attack, named SWARM, learns a trigger and prompt tokens including a switch token. They are optimized with the clean loss which encourages the model always behaves normally even the trigger presents, and the backdoor loss that ensures the backdoor can be activated by the trigger when the switch is on. Besides, we utilize the cross-mode feature distillation to reduce the effect of the switch token on clean samples. The experiments on diverse visual recognition tasks confirm the success of our switchable backdoor attack, i.e., achieving 95%+ attack success rate, and also being hard to be detected and removed. Our code is available at https://github.com/20000yshust/SWARM.
[108] arXiv:2405.10616 [pdf, other]: Title: Feature-based Low-Rank Compression of Large Language Models via Bayesian Optimization

Authors: Yixin Ji, Yang Xiang, Juntao Li, Wei Chen, Zhongyi Liu, Kehai Chen, Min Zhang

Comments: Accepted by 2024 ACL findings

Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

In recent years, large language models (LLMs) have driven advances in natural language processing. Still, their growing scale has increased the computational burden, necessitating a balance between efficiency and performance. Low-rank compression, a promising technique, reduces non-essential parameters by decomposing weight matrices into products of two low-rank matrices. Yet, its application in LLMs has not been extensively studied. The key to low-rank compression lies in low-rank factorization and low-rank dimensions allocation. To address the challenges of low-rank compression in LLMs, we conduct empirical research on the low-rank characteristics of large models. We propose a low-rank compression method suitable for LLMs. This approach involves precise estimation of feature distributions through pooled covariance matrices and a Bayesian optimization strategy for allocating low-rank dimensions. Experiments on the LLaMA-2 models demonstrate that our method outperforms existing strong structured pruning and low-rank compression techniques in maintaining model performance at the same compression ratio.
[109] arXiv:2405.10618 [pdf, other]: Title: Distributed Event-Based Learning via ADMM

Authors: Guner Dilsad Er, Sebastian Trimpe, Michael Muehlebach

Comments: 29 pages, 12 figures

Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)

We consider a distributed learning problem, where agents minimize a global objective function by exchanging information over a network. Our approach has two distinct features: (i) It substantially reduces communication by triggering communication only when necessary, and (ii) it is agnostic to the data-distribution among the different agents. We can therefore guarantee convergence even if the local data-distributions of the agents are arbitrarily distinct. We analyze the convergence rate of the algorithm and derive accelerated convergence rates in a convex setting. We also characterize the effect of communication drops and demonstrate that our algorithm is robust to communication failures. The article concludes by presenting numerical results from a distributed LASSO problem, and distributed learning tasks on MNIST and CIFAR-10 datasets. The experiments underline communication savings of 50% or more due to the event-based communication strategy, show resilience towards heterogeneous data-distributions, and highlight that our approach outperforms common baselines such as FedAvg, FedProx, and FedADMM.
[110] arXiv:2405.10620 [pdf, other]: Title: MC-GPT: Empowering Vision-and-Language Navigation with Memory Map and Reasoning Chains

Authors: Zhaohuan Zhan, Lisha Yu, Sijie Yu, Guang Tan

Subjects: Artificial Intelligence (cs.AI)

In the Vision-and-Language Navigation (VLN) task, the agent is required to navigate to a destination following a natural language instruction. While learning-based approaches have been a major solution to the task, they suffer from high training costs and lack of interpretability. Recently, Large Language Models (LLMs) have emerged as a promising tool for VLN due to their strong generalization capabilities. However, existing LLM-based methods face limitations in memory construction and diversity of navigation strategies. To address these challenges, we propose a suite of techniques. Firstly, we introduce a method to maintain a topological map that stores navigation history, retaining information about viewpoints, objects, and their spatial relationships. This map also serves as a global action space. Additionally, we present a Navigation Chain of Thoughts module, leveraging human navigation examples to enrich navigation strategy diversity. Finally, we establish a pipeline that integrates navigational memory and strategies with perception and action prediction modules. Experimental results on the REVERIE and R2R datasets show that our method effectively enhances the navigation ability of the LLM and improves the interpretability of navigation reasoning.
[111] arXiv:2405.10621 [pdf, other]: Title: Historically Relevant Event Structuring for Temporal Knowledge Graph Reasoning

Authors: Jinchuan Zhang, Bei Hui, Chong Mu, Ming Sun, Ling Tian

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Temporal Knowledge Graph (TKG) reasoning focuses on predicting events through historical information within snapshots distributed on a timeline. Existing studies mainly concentrate on two perspectives of leveraging the history of TKGs, including capturing evolution of each recent snapshot or correlations among global historical facts. Despite the achieved significant accomplishments, these models still fall short of (1) investigating the influences of multi-granularity interactions across recent snapshots and (2) harnessing the expressive semantics of significant links accorded with queries throughout the entire history, especially events exerting a profound impact on the future. These inadequacies restrict representation ability to reflect historical dependencies and future trends thoroughly. To overcome these drawbacks, we propose an innovative TKG reasoning approach towards \textbf{His}torically \textbf{R}elevant \textbf{E}vents \textbf{S}tructuring ($\mathsf{HisRES}$). Concretely, $\mathsf{HisRES}$ comprises two distinctive modules excelling in structuring historically relevant events within TKGs, including a multi-granularity evolutionary encoder that captures structural and temporal dependencies of the most recent snapshots, and a global relevance encoder that concentrates on crucial correlations among events relevant to queries from the entire history. Furthermore, $\mathsf{HisRES}$ incorporates a self-gating mechanism for adaptively merging multi-granularity recent and historically relevant structuring representations. Extensive experiments on four event-based benchmarks demonstrate the state-of-the-art performance of $\mathsf{HisRES}$ and indicate the superiority and effectiveness of structuring historical relevance for TKG reasoning.
[112] arXiv:2405.10622 [pdf, ps, other]: Title: Differentially Private Machine Learning-powered Combinatorial Auction Design

Authors: Arash Jamshidi, Seyed Mohammad Hosseini, Seyed Mahdi Noormousavi, Mahdi Jafari Siavoshani

Subjects: Computer Science and Game Theory (cs.GT); Information Theory (cs.IT)

We present a new approach to machine learning-powered combinatorial auctions, which is based on the principles of Differential Privacy. Our methodology guarantees that the auction mechanism is truthful, meaning that rational bidders have the incentive to reveal their true valuation functions. We achieve this by inducing truthfulness in the auction dynamics, ensuring that bidders consistently provide accurate information about their valuation functions.
Our method not only ensures truthfulness but also preserves the efficiency of the original auction. This means that if the initial auction outputs an allocation with high social welfare, our modified truthful version of the auction will also achieve high social welfare. We use techniques from Differential Privacy, such as the Exponential Mechanism, to achieve these results. Additionally, we examine the application of differential privacy in auctions across both asymptotic and non-asymptotic regimes.
[113] arXiv:2405.10623 [pdf, other]: Title: Model-free fast charging of lithium-ion batteries by online gradient descent

Authors: Hamed Taghavian, Malin Andersson, Mikael Johansson

Subjects: Systems and Control (eess.SY)

A data-driven solution is provided for the fast-charging problem of lithium-ion batteries with multiple safety and aging constraints. The proposed method optimizes the charging current based on the observed history of measurable battery quantities, such as the input current, terminal voltage, and temperature. The proposed method does not need any detailed battery model or full-charging training episodes. The theoretical convergence is proven under mild conditions and is validated numerically on several linear and nonlinear battery models, including single-particle and equivalent-circuit models.
[114] arXiv:2405.10624 [pdf, ps, other]: Title: Sample-Efficient Constrained Reinforcement Learning with General Parameterization

Authors: Washim Uddin Mondal, Vaneet Aggarwal

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

We consider a constrained Markov Decision Problem (CMDP) where the goal of an agent is to maximize the expected discounted sum of rewards over an infinite horizon while ensuring that the expected discounted sum of costs exceeds a certain threshold. Building on the idea of momentum-based acceleration, we develop the Primal-Dual Accelerated Natural Policy Gradient (PD-ANPG) algorithm that guarantees an $\epsilon$ global optimality gap and $\epsilon$ constraint violation with $\mathcal{O}(\epsilon^{-3})$ sample complexity. This improves the state-of-the-art sample complexity in CMDP by a factor of $\mathcal{O}(\epsilon^{-1})$.
[115] arXiv:2405.10625 [pdf, other]: Title: Specialising and Analysing Instruction-Tuned and Byte-Level Language Models for Organic Reaction Prediction

Authors: Jiayun Pang, Ivan Vulić

Comments: Preprint

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Biomolecules (q-bio.BM)

Transformer-based encoder-decoder models have demonstrated impressive results in chemical reaction prediction tasks. However, these models typically rely on pretraining using tens of millions of unlabelled molecules, which can be time-consuming and GPU-intensive. One of the central questions we aim to answer in this work is: Can FlanT5 and ByT5, the encode-decoder models pretrained solely on language data, be effectively specialised for organic reaction prediction through task-specific fine-tuning? We conduct a systematic empirical study on several key issues of the process, including tokenisation, the impact of (SMILES-oriented) pretraining, fine-tuning sample efficiency, and decoding algorithms at inference. Our key findings indicate that although being pretrained only on language tasks, FlanT5 and ByT5 provide a solid foundation to fine-tune for reaction prediction, and thus become `chemistry domain compatible' in the process. This suggests that GPU-intensive and expensive pretraining on a large dataset of unlabelled molecules may be useful yet not essential to leverage the power of language models for chemistry. All our models achieve comparable Top-1 and Top-5 accuracy although some variation across different models does exist. Notably, tokenisation and vocabulary trimming slightly affect final performance but can speed up training and inference; The most efficient greedy decoding strategy is very competitive while only marginal gains can be achieved from more sophisticated decoding algorithms. In summary, we evaluate FlanT5 and ByT5 across several dimensions and benchmark their impact on organic reaction prediction, which may guide more effective use of these state-of-the-art language models for chemistry-related tasks in the future.
[116] arXiv:2405.10626 [pdf, other]: Title: Dynamic data sampler for cross-language transfer learning in large language models

Authors: Yudong Li, Yuhao Feng, Wen Zhou, Zhe Zhao, Linlin Shen, Cheng Hou, Xianxu Hou

Comments: Accepted by ICASSP 2024

Subjects: Computation and Language (cs.CL)

Large Language Models (LLMs) have gained significant attention in the field of natural language processing (NLP) due to their wide range of applications. However, training LLMs for languages other than English poses significant challenges, due to the difficulty in acquiring large-scale corpus and the requisite computing resources. In this paper, we propose ChatFlow, a cross-language transfer-based LLM, to address these challenges and train large Chinese language models in a cost-effective manner. We employ a mix of Chinese, English, and parallel corpus to continuously train the LLaMA2 model, aiming to align cross-language representations and facilitate the knowledge transfer specifically to the Chinese language model. In addition, we use a dynamic data sampler to progressively transition the model from unsupervised pre-training to supervised fine-tuning. Experimental results demonstrate that our approach accelerates model convergence and achieves superior performance. We evaluate ChatFlow on popular Chinese and English benchmarks, the results indicate that it outperforms other Chinese models post-trained on LLaMA-2-7B.
[117] arXiv:2405.10629 [pdf, other]: Title: DeepPavlov at SemEval-2024 Task 8: Leveraging Transfer Learning for Detecting Boundaries of Machine-Generated Texts

Authors: Anastasia Voznyuk, Vasily Konovalov

Comments: New best score from the leaderboard, to appear in SemEval-2024 Workshop proceedings

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

The Multigenerator, Multidomain, and Multilingual Black-Box Machine-Generated Text Detection shared task in the SemEval-2024 competition aims to tackle the problem of misusing collaborative human-AI writing. Although there are a lot of existing detectors of AI content, they are often designed to give a binary answer and thus may not be suitable for more nuanced problem of finding the boundaries between human-written and machine-generated texts, while hybrid human-AI writing becomes more and more popular. In this paper, we address the boundary detection problem. Particularly, we present a pipeline for augmenting data for supervised fine-tuning of DeBERTaV3. We receive new best MAE score, according to the leaderboard of the competition, with this pipeline.
[118] arXiv:2405.10630 [pdf, other]: Title: Medical Dialogue: A Survey of Categories, Methods, Evaluation and Challenges

Authors: Xiaoming Shi, Zeming Liu, Li Du, Yuxuan Wang, Hongru Wang, Yuhang Guo, Tong Ruan, Jie Xu, Shaoting Zhang

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

This paper surveys and organizes research works on medical dialog systems, which is an important yet challenging task. Although these systems have been surveyed in the medical community from an application perspective, a systematic review from a rigorous technical perspective has to date remained noticeably absent. As a result, an overview of the categories, methods, and evaluation of medical dialogue systems remain limited and underspecified, hindering the further improvement of this area. To fill this gap, we investigate an initial pool of 325 papers from well-known computer science, and natural language processing conferences and journals, and make an overview. Recently, large language models have shown strong model capacity on downstream tasks, which also reshaped medical dialog systems' foundation. Despite the alluring practical application value, current medical dialogue systems still suffer from problems. To this end, this paper lists the grand challenges of medical dialog systems, especially of large language models.
[119] arXiv:2405.10632 [pdf, other]: Title: Beyond static AI evaluations: advancing human interaction evaluations for LLM harms and risks

Authors: Lujain Ibrahim, Saffron Huang, Lama Ahmad, Markus Anderljung

Comments: 15 pages, 1 figure

Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)

Model evaluations are central to understanding the safety, risks, and societal impacts of AI systems. While most real-world AI applications involve human-AI interaction, most current evaluations (e.g., common benchmarks) of AI models do not. Instead, they incorporate human factors in limited ways, assessing the safety of models in isolation, thereby falling short of capturing the complexity of human-model interactions. In this paper, we discuss and operationalize a definition of an emerging category of evaluations -- "human interaction evaluations" (HIEs) -- which focus on the assessment of human-model interactions or the process and the outcomes of humans using models. First, we argue that HIEs can be used to increase the validity of safety evaluations, assess direct human impact and interaction-specific harms, and guide future assessments of models' societal impact. Second, we propose a safety-focused HIE design framework -- containing a human-LLM interaction taxonomy -- with three stages: (1) identifying the risk or harm area, (2) characterizing the use context, and (3) choosing the evaluation parameters. Third, we apply our framework to two potential evaluations for overreliance and persuasion risks. Finally, we conclude with tangible recommendations for addressing concerns over costs, replicability, and unrepresentativeness of HIEs.
[120] arXiv:2405.10633 [pdf, other]: Title: Harnessing Collective Structure Knowledge in Data Augmentation for Graph Neural Networks

Authors: Rongrong Ma, Guansong Pang, Ling Chen

Subjects: Machine Learning (cs.LG)

Graph neural networks (GNNs) have achieved state-of-the-art performance in graph representation learning. Message passing neural networks, which learn representations through recursively aggregating information from each node and its neighbors, are among the most commonly-used GNNs. However, a wealth of structural information of individual nodes and full graphs is often ignored in such process, which restricts the expressive power of GNNs. Various graph data augmentation methods that enable the message passing with richer structure knowledge have been introduced as one main way to tackle this issue, but they are often focused on individual structure features and difficult to scale up with more structure features. In this work we propose a novel approach, namely collective structure knowledge-augmented graph neural network (CoS-GNN), in which a new message passing method is introduced to allow GNNs to harness a diverse set of node- and graph-level structure features, together with original node features/attributes, in augmented graphs. In doing so, our approach largely improves the structural knowledge modeling of GNNs in both node and graph levels, resulting in substantially improved graph representations. This is justified by extensive empirical results where CoS-GNN outperforms state-of-the-art models in various graph-level learning tasks, including graph classification, anomaly detection, and out-of-distribution generalization.
[121] arXiv:2405.10635 [pdf, other]: Title: Implementation of OpenAPI Wireshark Dissectors to Validate SBI Messages of 5G Core Networks

Authors: Lukas Schauer, Thorsten Horstmann, Steffen Druesedow, Michael Rademacher

Comments: Author's version of a paper accepted for publication in Proceedings of the 28th ITG-Symposium Mobile Communication - Technologies and Applications

Subjects: Networking and Internet Architecture (cs.NI)

This paper introduces a novel Wireshark dissector designed to facilitate the analysis of Service-Based Interface (SBI) communication in 5G Core Networks. Our approach involves parsing the OpenAPI schemes provided by the 5G specification to automatically generate the dissector code. Our tool enables the validation of 5G Core Network traces to ensure compliance with the specifications. Through testing against three open-source 5G Core Network projects, we identified several issues where messages deviate from specification standards, highlighting the significance of our implementation in ensuring protocol conformity and network reliability.
[122] arXiv:2405.10637 [pdf, other]: Title: Layer-Condensed KV Cache for Efficient Inference of Large Language Models

Authors: Haoyi Wu, Kewei Tu

Comments: Accepted to ACL2024 main conference

Subjects: Computation and Language (cs.CL)

Huge memory consumption has been a major bottleneck for deploying high-throughput large language models in real-world applications. In addition to the large number of parameters, the key-value (KV) cache for the attention mechanism in the transformer architecture consumes a significant amount of memory, especially when the number of layers is large for deep language models. In this paper, we propose a novel method that only computes and caches the KVs of a small number of layers, thus significantly saving memory consumption and improving inference throughput. Our experiments on large language models show that our method achieves up to 26$\times$ higher throughput than standard transformers and competitive performance in language modeling and downstream tasks. In addition, our method is orthogonal to existing transformer memory-saving techniques, so it is straightforward to integrate them with our model, achieving further improvement in inference efficiency. Our code is available at https://github.com/whyNLP/LCKV.
[123] arXiv:2405.10640 [pdf, other]: Title: COMET: NFT Price Prediction with Wallet Profiling

Authors: Tianfu Wang, Liwei Deng, Chao Wang, Jianxun Lian, Yue Yan, Nicholas Jing Yuan, Qi Zhang, Hui Xiong

Comments: Accepted by KDD 2024 (ADS Track)

Subjects: Social and Information Networks (cs.SI)

As the non-fungible token (NFT) market flourishes, price prediction emerges as a pivotal direction for investors gaining valuable insight to maximize returns. However, existing works suffer from a lack of practical definitions and standardized evaluations, limiting their practical application. Moreover, the influence of users' multi-behaviour transactions that are publicly accessible on NFT price is still not explored and exhibits challenges. In this paper, we address these gaps by presenting a practical and hierarchical problem definition. This approach unifies both collection-level and token-level task and evaluation methods, which cater to varied practical requirements of investors. To further understand the impact of user behaviours on the variation of NFT price, we propose a general wallet profiling framework and develop a COmmunity enhanced Multi-bEhavior Transaction graph model, named COMET. COMET profiles wallets with a comprehensive view and considers the impact of diverse relations and interactions within the NFT ecosystem on NFT price variations, thereby improving prediction performance. Extensive experiments conducted in our deployed system demonstrate the superiority of COMET, underscoring its potential in the insight toolkit for NFT investors.
[124] arXiv:2405.10642 [pdf, other]: Title: Hi-GMAE: Hierarchical Graph Masked Autoencoders

Authors: Chuang Liu, Zelin Yao, Yibing Zhan, Xueqi Ma, Dapeng Tao, Jia Wu, Wenbin Hu, Shirui Pan, Bo Du

Comments: 10 pages, 6 figures, 3 tables

Subjects: Machine Learning (cs.LG)

Graph Masked Autoencoders (GMAEs) have emerged as a notable self-supervised learning approach for graph-structured data. Existing GMAE models primarily focus on reconstructing node-level information, categorizing them as single-scale GMAEs. This methodology, while effective in certain contexts, tends to overlook the complex hierarchical structures inherent in many real-world graphs. For instance, molecular graphs exhibit a clear hierarchical organization in the form of the atoms-functional groups-molecules structure. Hence, the inability of single-scale GMAE models to incorporate these hierarchical relationships often leads to their inadequate capture of crucial high-level graph information, resulting in a noticeable decline in performance. To address this limitation, we propose Hierarchical Graph Masked AutoEncoders (Hi-GMAE), a novel multi-scale GMAE framework designed to handle the hierarchical structures within graphs. First, Hi-GMAE constructs a multi-scale graph hierarchy through graph pooling, enabling the exploration of graph structures across different granularity levels. To ensure masking uniformity of subgraphs across these scales, we propose a novel coarse-to-fine strategy that initiates masking at the coarsest scale and progressively back-projects the mask to the finer scales. Furthermore, we integrate a gradual recovery strategy with the masking process to mitigate the learning challenges posed by completely masked subgraphs. Diverging from the standard graph neural network (GNN) used in GMAE models, Hi-GMAE modifies its encoder and decoder into hierarchical structures. This entails using GNN at the finer scales for detailed local graph analysis and employing a graph transformer at coarser scales to capture global information. Our experiments on 15 graph datasets consistently demonstrate that Hi-GMAE outperforms 17 state-of-the-art self-supervised competitors.
[125] arXiv:2405.10645 [pdf, ps, other]: Title: ChatGPT in Classrooms: Transforming Challenges into Opportunities in Education

Authors: Harris Bin Munawar, Nikolaos Misirlis

Comments: 5 pages, 1 figure, 1 table

Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)

In the era of exponential technology growth, one unexpected guest has claimed a seat in classrooms worldwide, Artificial Intelligence. Generative AI, such as ChatGPT, promises a revolution in education, yet it arrives with a double-edged sword. Its potential for personalized learning is offset by issues of cheating, inaccuracies, and educators struggling to incorporate it effectively into their lesson design. We are standing on the brink of this educational frontier, and it is clear that we need to navigate this terrain with a lot of care. This is a major challenge that could undermine the integrity and value of our educational process. So, how can we turn these challenges into opportunities? When used inappropriately, AI tools can become the perfect tool for the cut copy paste mentality, and quickly begin to corrode critical thinking, creativity, and deep understanding, the most important skills in our rapidly changing world. Teachers feel that they are not equipped to leverage this technology, widening the digital divide among educators and institutions. Addressing these concerns calls for an in depth research approach. We will employ empirical research, drawing on the Technology Acceptance Model, to assess the attitudes toward generative AI among educators and students. Understanding their perceptions, usage patterns, and hurdles is the first crucial step in creating an effective solution. The present study will be used as a process manual for future researchers to apply, running their own data, based on the steps explained here
[126] arXiv:2405.10647 [pdf, other]: Title: Cyclical Weight Consolidation: Towards Solving Catastrophic Forgetting in Serial Federated Learning

Authors: Haoyue Song, Jiacheng Wang, Liansheng Wang

Comments: 12 pages, 8 figures

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)

Federated Learning (FL) has gained attention for addressing data scarcity and privacy concerns. While parallel FL algorithms like FedAvg exhibit remarkable performance, they face challenges in scenarios with diverse network speeds and concerns about centralized control, especially in multi-institutional collaborations like the medical domain. Serial FL presents an alternative solution, circumventing these challenges by transferring model updates serially between devices in a cyclical manner. Nevertheless, it is deemed inferior to parallel FL in that (1) its performance shows undesirable fluctuations, and (2) it converges to a lower plateau, particularly when dealing with non-IID data. The observed phenomenon is attributed to catastrophic forgetting due to knowledge loss from previous sites. In this paper, to overcome fluctuation and low efficiency in the iterative learning and forgetting process, we introduce cyclical weight consolidation (CWC), a straightforward yet potent approach specifically tailored for serial FL. CWC employs a consolidation matrix to regulate local optimization. This matrix tracks the significance of each parameter on the overall federation throughout the entire training trajectory, preventing abrupt changes in significant weights. During revisitation, to maintain adaptability, old memory undergoes decay to incorporate new information. Our comprehensive evaluations demonstrate that in various non-IID settings, CWC mitigates the fluctuation behavior of the original serial FL approach and enhances the converged performance consistently and significantly. The improved performance is either comparable to or better than the parallel vanilla.
[127] arXiv:2405.10648 [pdf, other]: Title: Optimal Service Placement, Request Routing and CPU Sizing in Cooperative Mobile Edge Computing Networks for Delay-Sensitive Applications

Authors: Naeimeh Omidvar, Mahdieh Ahmadi, Seyed Mohammad Hosseini

Subjects: Networking and Internet Architecture (cs.NI); Information Theory (cs.IT)

We study joint optimization of service placement, request routing, and CPU sizing in a cooperative MEC system. The problem is considered from the perspective of the service provider (SP), which delivers heterogeneous MEC-enabled delay-sensitive services, and needs to pay for the used resources to the mobile network operators and the cloud provider, while earning revenue from the served requests. We formulate the problem of maximizing the SP's total profit subject to the computation, storage, and communication constraints of each edge node and end-to-end delay requirements of the services as a mixed-integer non-convex optimization problem, and prove it to be NP-hard.
To tackle the challenges in solving the problem, we first introduce a design trade-off parameter for different delay requirements of each service, which maintains flexibility in prioritizing them, and transform the original optimization problem by the new delay constraints. Then, by exploiting a hidden convexity, we reformulate the delay constraints into an equivalent form. Next, to handle the challenge of the complicating (integer) variables, using primal decomposition, we decompose the problem into an equivalent form of master and inner sub-problems over the mixed and real variables, respectively. We then employ a cutting-plane approach for building up adequate representations of the extremal value of the inner problem as a function of the complicating variables and the set of values of the complicating variables for which the inner problem is feasible. Finally, we propose a solution strategy based on generalized Benders decomposition and prove its convergence to the optimal solution within a limited number of iterations. Extensive simulation results demonstrate that the proposed scheme significantly outperforms the existing mechanisms in terms of the SP's profit, cache hit ratio, running time, and end-to-end delay.
[128] arXiv:2405.10650 [pdf, other]: Title: SPOR: A Comprehensive and Practical Evaluation Method for Compositional Generalization in Data-to-Text Generation

Authors: Ziyao Xu, Houfeng Wang

Subjects: Computation and Language (cs.CL)

Compositional generalization is an important ability of language models and has many different manifestations. For data-to-text generation, previous research on this ability is limited to a single manifestation called Systematicity and lacks consideration of large language models (LLMs), which cannot fully cover practical application scenarios. In this work, we propose SPOR, a comprehensive and practical evaluation method for compositional generalization in data-to-text generation. SPOR includes four aspects of manifestations (Systematicity, Productivity, Order invariance, and Rule learnability) and allows high-quality evaluation without additional manual annotations based on existing datasets. We demonstrate SPOR on two different datasets and evaluate some existing language models including LLMs. We find that the models are deficient in various aspects of the evaluation and need further improvement. Our work shows the necessity for comprehensive research on different manifestations of compositional generalization in data-to-text generation and provides a framework for evaluation.
[129] arXiv:2405.10658 [pdf, other]: Title: Cost-Effective Fault Tolerance for CNNs Using Parameter Vulnerability Based Hardening and Pruning

Authors: Mohammad Hasan Ahmadilivani, Seyedhamidreza Mousavi, Jaan Raik, Masoud Daneshtalab, Maksim Jenihhin

Comments: 7 pages, 7 figures, 2 tables, 32 references, the paper is accepted at IOLTS 2024

Subjects: Machine Learning (cs.LG)

Convolutional Neural Networks (CNNs) have become integral in safety-critical applications, thus raising concerns about their fault tolerance. Conventional hardware-dependent fault tolerance methods, such as Triple Modular Redundancy (TMR), are computationally expensive, imposing a remarkable overhead on CNNs. Whereas fault tolerance techniques can be applied either at the hardware level or at the model levels, the latter provides more flexibility without sacrificing generality. This paper introduces a model-level hardening approach for CNNs by integrating error correction directly into the neural networks. The approach is hardware-agnostic and does not require any changes to the underlying accelerator device. Analyzing the vulnerability of parameters enables the duplication of selective filters/neurons so that their output channels are effectively corrected with an efficient and robust correction layer. The proposed method demonstrates fault resilience nearly equivalent to TMR-based correction but with significantly reduced overhead. Nevertheless, there exists an inherent overhead to the baseline CNNs. To tackle this issue, a cost-effective parameter vulnerability based pruning technique is proposed that outperforms the conventional pruning method, yielding smaller networks with a negligible accuracy loss. Remarkably, the hardened pruned CNNs perform up to 24\% faster than the hardened un-pruned ones.
[130] arXiv:2405.10659 [pdf, other]: Title: Realistic Evaluation of Toxicity in Large Language Models

Authors: Tinh Son Luong, Thanh-Thien Le, Linh Ngo Van, Thien Huu Nguyen

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Large language models (LLMs) have become integral to our professional workflows and daily lives. Nevertheless, these machine companions of ours have a critical flaw: the huge amount of data which endows them with vast and diverse knowledge, also exposes them to the inevitable toxicity and bias. While most LLMs incorporate defense mechanisms to prevent the generation of harmful content, these safeguards can be easily bypassed with minimal prompt engineering. In this paper, we introduce the new Thoroughly Engineered Toxicity (TET) dataset, comprising manually crafted prompts designed to nullify the protective layers of such models. Through extensive evaluations, we demonstrate the pivotal role of TET in providing a rigorous benchmark for evaluation of toxicity awareness in several popular LLMs: it highlights the toxicity in the LLMs that might remain hidden when using normal prompts, thus revealing subtler issues in their behavior.
[131] arXiv:2405.10661 [pdf, other]: Title: Verification Algorithms for Automated Separation Logic Verifiers

Authors: Marco Eilers, Malte Schwerhoff, Peter Müller

Subjects: Programming Languages (cs.PL)

Most automated program verifiers for separation logic use either symbolic execution or verification condition generation to extract proof obligations, which are then handed over to an SMT solver. Existing verification algorithms are designed to be sound, but differ in performance and completeness. These characteristics may also depend on the programs and properties to be verified. Consequently, developers and users of program verifiers have to select a verification algorithm carefully for their application domain. Taking an informed decision requires a systematic comparison of the performance and completeness characteristics of the verification algorithms used by modern separation logic verifiers, but such a comparison does not exist.
This paper describes five verification algorithms for separation logic, three that are used in existing tools and two novel algorithms that combine characteristics of existing symbolic execution and verification condition generation algorithms. A detailed evaluation of implementations of these five algorithms in the Viper infrastructure assesses their performance and completeness for different classes of input programs. Based on the experimental results, we identify candidate portfolios of algorithms that maximize completeness and performance.
[132] arXiv:2405.10672 [pdf, other]: Title: Pragmatic Communication for Remote Control of Finite-State Markov Processes

Authors: Pietro Talli, Edoardo David Santi, Federico Chiariotti, Touraj Soleymani, Federico Mason, Andrea Zanella, Deniz Gündüz

Comments: Submitted for publication in the IEEE Journal on Selected Areas in Communications

Subjects: Multiagent Systems (cs.MA); Networking and Internet Architecture (cs.NI)

Pragmatic or goal-oriented communication can optimize communication decisions beyond the reliable transmission of data, instead aiming at directly affecting application performance with the minimum channel utilization. In this paper, we develop a general theoretical framework for the remote control of finite-state Markov processes, using pragmatic communication over a costly zero-delay communication channel. To that end, we model a cyber-physical system composed of an encoder, which observes and transmits the states of a process in real-time, and a decoder, which receives that information and controls the behavior of the process. The encoder and the decoder should cooperatively optimize the trade-off between the control performance (i.e., reward) and the communication cost (i.e., channel use). This scenario underscores a pragmatic (i.e., goal-oriented) communication problem, where the purpose is to convey only the data that is most valuable for the underlying task, taking into account the state of the decoder (hence, the pragmatic aspect). We investigate two different decision-making architectures: in pull-based remote control, the decoder is the only decision-maker, while in push-based remote control, the encoder and the decoder constitute two independent decision-makers, leading to a multi-agent scenario. We propose three algorithms to optimize our system (i.e., design the encoder and the decoder policies), discuss the optimality guarantees ofs the algorithms, and shed light on their computational complexity and fundamental limits.
[133] arXiv:2405.10674 [pdf, other]: Title: From Sora What We Can See: A Survey of Text-to-Video Generation

Authors: Rui Sun, Yumin Zhang, Tejal Shah, Jiahao Sun, Shuoying Zhang, Wenqi Li, Haoran Duan, Bo Wei, Rajiv Ranjan

Comments: A comprehensive list of text-to-video generation studies in this survey is available at this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

With impressive achievements made, artificial intelligence is on the path forward to artificial general intelligence. Sora, developed by OpenAI, which is capable of minute-level world-simulative abilities can be considered as a milestone on this developmental path. However, despite its notable successes, Sora still encounters various obstacles that need to be resolved. In this survey, we embark from the perspective of disassembling Sora in text-to-video generation, and conducting a comprehensive review of literature, trying to answer the question, \textit{From Sora What We Can See}. Specifically, after basic preliminaries regarding the general algorithms are introduced, the literature is categorized from three mutually perpendicular dimensions: evolutionary generators, excellent pursuit, and realistic panorama. Subsequently, the widely used datasets and metrics are organized in detail. Last but more importantly, we identify several challenges and open problems in this domain and propose potential future directions for research and development.
[134] arXiv:2405.10678 [pdf, ps, other]: Title: IT Strategic alignment in the decentralized finance (DeFi): CBDC and digital currencies

Authors: Carlos Alberto Durigan Junior, Fernando Jose Barbin Laurindo

Comments: Keywords: IT Strategic alignment, Decentralized Finance (DeFi), Cryptocurrency, Digital Economy

Subjects: Computers and Society (cs.CY); Cryptography and Security (cs.CR)

Cryptocurrency can be understood as a digital asset transacted among participants in the crypto economy. Every cryptocurrency must have an associated Blockchain. Blockchain is a Distributed Ledger Technology (DLT) which supports cryptocurrencies, this may be considered as the most promising disruptive technology in the industry 4.0 context. Decentralized finance (DeFi) is a Blockchain-based financial infrastructure, the term generally refers to an open, permissionless, and highly interoperable protocol stack built on public smart contract platforms, such as the Ethereum Blockchain. It replicates existing financial services in a more open and transparent way. DeFi does not rely on intermediaries and centralized institutions. Instead, it is based on open protocols and decentralized applications (Dapps). Considering that there are many digital coins, stablecoins and central bank digital currencies (CBDCs), these currencies should interact among each other sometime. For this interaction the Information Technology elements play an important whole as enablers and IT strategic alignment. This paper considers the strategic alignment model proposed by Henderson and Venkatraman (1993) and Luftman (1996). This paper seeks to answer two main questions 1) What are the common IT elements in the DeFi? And 2) How the elements connect to the IT strategic alignment in DeFi? Through a Systematic Literature Review (SLR). Results point out that there are many IT elements already mentioned by literature, however there is a lack in the literature about the connection between IT elements and IT strategic alignment in a Decentralized Finance (DeFi) architectural network. After final considerations, limitations and future research agenda are presented. Keywords: IT Strategic alignment, Decentralized Finance (DeFi), Cryptocurrency, Digital Economy.
[135] arXiv:2405.10679 [pdf, ps, other]: Title: Off-the-Shelf Neural Network Architectures for Forex Time Series Prediction come at a Cost

Authors: Theodoros Zafeiriou, Dimitris Kalles

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE); Mathematical Finance (q-fin.MF)

Our study focuses on comparing the performance and resource requirements between different Long Short-Term Memory (LSTM) neural network architectures and an ANN specialized architecture for forex market prediction. We analyze the execution time of the models as well as the resources consumed, such as memory and computational power. Our aim is to demonstrate that the specialized architecture not only achieves better results in forex market prediction but also executes using fewer resources and in a shorter time frame compared to LSTM architectures. This comparative analysis will provide significant insights into the suitability of these two types of architectures for time series prediction in the forex market environment.
[136] arXiv:2405.10681 [pdf, other]: Title: Know in AdVance: Linear-Complexity Forecasting of Ad Campaign Performance with Evolving User Interest

Authors: XiaoYu Wang, YongHui Guo, Hui Sheng, Peili Lv, Chi Zhou, Wei Huang, ShiQin Ta, Dongbo Huang, XiuJin Yang, Lan Xu, Hao Zhou, Yusheng Ji

Comments: 12 pages, 4 figures, accepted at ACM SIGKDD 2024

Subjects: Information Retrieval (cs.IR)

Real-time Bidding (RTB) advertisers wish to \textit{know in advance} the expected cost and yield of ad campaigns to avoid trial-and-error expenses. However, Campaign Performance Forecasting (CPF), a sequence modeling task involving tens of thousands of ad auctions, poses challenges of evolving user interest, auction representation, and long context, making coarse-grained and static-modeling methods sub-optimal. We propose \textit{AdVance}, a time-aware framework that integrates local auction-level and global campaign-level modeling. User preference and fatigue are disentangled using a time-positioned sequence of clicked items and a concise vector of all displayed items. Cross-attention, conditioned on the fatigue vector, captures the dynamics of user interest toward each candidate ad. Bidders compete with each other, presenting a complete graph similar to the self-attention mechanism. Hence, we employ a Transformer Encoder to compress each auction into embedding by solving auxiliary tasks. These sequential embeddings are then summarized by a conditional state space model (SSM) to comprehend long-range dependencies while maintaining global linear complexity. Considering the irregular time intervals between auctions, we make SSM's parameters dependent on the current auction embedding and the time interval. We further condition SSM's global predictions on the accumulation of local results. Extensive evaluations and ablation studies demonstrate its superiority over state-of-the-art methods. AdVance has been deployed on the Tencent Advertising platform, and A/B tests show a remarkable 4.5\% uplift in Average Revenue per User (ARPU).
[137] arXiv:2405.10689 [pdf, ps, other]: Title: Revolutionizing Process Mining: A Novel Architecture for ChatGPT Integration and Enhanced User Experience through Optimized Prompt Engineering

Authors: Mehrdad Agha Mohammad Ali Kermani, Hamid Reza Seddighi, Mehrdad Maghsoudi

Subjects: Computation and Language (cs.CL)

In the rapidly evolving field of business process management, there is a growing need for analytical tools that can transform complex data into actionable insights. This research introduces a novel approach by integrating Large Language Models (LLMs), such as ChatGPT, into process mining tools, making process analytics more accessible to a wider audience. The study aims to investigate how ChatGPT enhances analytical capabilities, improves user experience, increases accessibility, and optimizes the architectural frameworks of process mining tools. The key innovation of this research lies in developing a tailored prompt engineering strategy for each process mining submodule, ensuring that the AI-generated outputs are accurate and relevant to the context. The integration architecture follows an Extract, Transform, Load (ETL) process, which includes various process mining engine modules and utilizes zero-shot and optimized prompt engineering techniques. ChatGPT is connected via APIs and receives structured outputs from the process mining modules, enabling conversational interactions. To validate the effectiveness of this approach, the researchers used data from 17 companies that employ BehfaLab's Process Mining Tool. The results showed significant improvements in user experience, with an expert panel rating 72% of the results as "Good". This research contributes to the advancement of business process analysis methodologies by combining process mining with artificial intelligence. Future research directions include further optimization of prompt engineering, exploration of integration with other AI technologies, and assessment of scalability across various business environments. This study paves the way for continuous innovation at the intersection of process mining and artificial intelligence, promising to revolutionize the way businesses analyze and optimize their processes.
[138] arXiv:2405.10690 [pdf, other]: Title: CoLeaF: A Contrastive-Collaborative Learning Framework for Weakly Supervised Audio-Visual Video Parsing

Authors: Faegheh Sardari, Armin Mustafa, Philip J. B. Jackson, Adrian Hilton

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Weakly supervised audio-visual video parsing (AVVP) methods aim to detect audible-only, visible-only, and audible-visible events using only video-level labels. Existing approaches tackle this by leveraging unimodal and cross-modal contexts. However, we argue that while cross-modal learning is beneficial for detecting audible-visible events, in the weakly supervised scenario, it negatively impacts unaligned audible or visible events by introducing irrelevant modality information. In this paper, we propose CoLeaF, a novel learning framework that optimizes the integration of cross-modal context in the embedding space such that the network explicitly learns to combine cross-modal information for audible-visible events while filtering them out for unaligned events. Additionally, as videos often involve complex class relationships, modelling them improves performance. However, this introduces extra computational costs into the network. Our framework is designed to leverage cross-class relationships during training without incurring additional computations at inference. Furthermore, we propose new metrics to better evaluate a method's capabilities in performing AVVP. Our extensive experiments demonstrate that CoLeaF significantly improves the state-of-the-art results by an average of 1.9% and 2.4% F-score on the LLP and UnAV-100 datasets, respectively.
[139] arXiv:2405.10695 [pdf, ps, other]: Title: On the Design of Super Constellations

Authors: Thrassos K. Oikonomou, Dimitrios Tyrovolas, Sotiris A. Tegos, Panagiotis D. Diamantoulakis, Panagiotis Sarigiannidis, George K. Karagiannidis

Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

In the evolving landscape of sixth-generation (6G) wireless networks, which demand ultra high data rates, this study introduces the concept of super constellation communications. Also, we present super amplitude phase shift keying (SAPSK), an innovative modulation technique designed to achieve these ultra high data rate demands. SAPSK is complemented by the generalized polar distance detector (GPD-D), which approximates the optimal maximum likelihood detector in channels with Gaussian phase noise (GPN). By leveraging the decision regions formulated by GPD-D, a tight closed-form approximation for the symbol error probability (SEP) of SAPSK constellations is derived, while a detection algorithm with O(1) time complexity is developed to ensure fast and efficient SAPSK symbol detection. Finally, the theoretical performance of SAPSK and the efficiency of the proposed O(1) algorithm are validated by numerical simulations, highlighting both its superiority in terms of SEP compared to various constellations and its practical advantages in terms of fast and accurate symbol detection.
[140] arXiv:2405.10696 [pdf, other]: Title: Autonomous AI-enabled Industrial Sorting Pipeline for Advanced Textile Recycling

Authors: Yannis Spyridis, Vasileios Argyriou, Antonios Sarigiannidis, Panagiotis Radoglou, Panagiotis Sarigiannidis

Subjects: Computer Vision and Pattern Recognition (cs.CV)

The escalating volumes of textile waste globally necessitate innovative waste management solutions to mitigate the environmental impact and promote sustainability in the fashion industry. This paper addresses the inefficiencies of traditional textile sorting methods by introducing an autonomous textile analysis pipeline. Utilising robotics, spectral imaging, and AI-driven classification, our system enhances the accuracy, efficiency, and scalability of textile sorting processes, contributing to a more sustainable and circular approach to waste management. The integration of a Digital Twin system further allows critical evaluation of technical and economic feasibility, providing valuable insights into the sorting system's accuracy and reliability. The proposed framework, inspired by Industry 4.0 principles, comprises five interconnected layers facilitating seamless data exchange and coordination within the system. Preliminary results highlight the potential of our holistic approach to mitigate environmental impact and foster a positive shift towards recycling in the textile industry.
[141] arXiv:2405.10700 [pdf, other]: Title: SynDy: Synthetic Dynamic Dataset Generation Framework for Misinformation Tasks

Authors: Michael Shliselberg, Ashkan Kazemi, Scott A. Hale, Shiri Dori-Hacohen

Journal-ref: Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '24), July 14--18, 2024, Washington, DC, USA

Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computers and Society (cs.CY)

Diaspora communities are disproportionately impacted by off-the-radar misinformation and often neglected by mainstream fact-checking efforts, creating a critical need to scale-up efforts of nascent fact-checking initiatives. In this paper we present SynDy, a framework for Synthetic Dynamic Dataset Generation to leverage the capabilities of the largest frontier Large Language Models (LLMs) to train local, specialized language models. To the best of our knowledge, SynDy is the first paper utilizing LLMs to create fine-grained synthetic labels for tasks of direct relevance to misinformation mitigation, namely Claim Matching, Topical Clustering, and Claim Relationship Classification. SynDy utilizes LLMs and social media queries to automatically generate distantly-supervised, topically-focused datasets with synthetic labels on these three tasks, providing essential tools to scale up human-led fact-checking at a fraction of the cost of human-annotated data. Training on SynDy's generated labels shows improvement over a standard baseline and is not significantly worse compared to training on human labels (which may be infeasible to acquire). SynDy is being integrated into Meedan's chatbot tiplines that are used by over 50 organizations, serve over 230K users annually, and automatically distribute human-written fact-checks via messaging apps such as WhatsApp. SynDy will also be integrated into our deployed Co-Insights toolkit, enabling low-resource organizations to launch tiplines for their communities. Finally, we envision SynDy enabling additional fact-checking tools such as matching new misinformation claims to high-quality explainers on common misinformation topics.
[142] arXiv:2405.10702 [pdf, ps, other]: Title: Empowering Prior to Court Legal Analysis: A Transparent and Accessible Dataset for Defensive Statement Classification and Interpretation

Authors: Yannis Spyridis, Jean-Paul, Haneen Deeb, Vasileios Argyriou

Subjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)

The classification of statements provided by individuals during police interviews is a complex and significant task within the domain of natural language processing (NLP) and legal informatics. The lack of extensive domain-specific datasets raises challenges to the advancement of NLP methods in the field. This paper aims to address some of the present challenges by introducing a novel dataset tailored for classification of statements made during police interviews, prior to court proceedings. Utilising the curated dataset for training and evaluation, we introduce a fine-tuned DistilBERT model that achieves state-of-the-art performance in distinguishing truthful from deceptive statements. To enhance interpretability, we employ explainable artificial intelligence (XAI) methods to offer explainability through saliency maps, that interpret the model's decision-making process. Lastly, we present an XAI interface that empowers both legal professionals and non-specialists to interact with and benefit from our system. Our model achieves an accuracy of 86%, and is shown to outperform a custom transformer architecture in a comparative study. This holistic approach advances the accessibility, transparency, and effectiveness of statement analysis, with promising implications for both legal practice and research.
[143] arXiv:2405.10703 [pdf, other]: Title: Safe Control using Occupancy Grid Map-based Control Barrier Function (OGM-CBF)

Authors: Golnaz Raja, Teemu Mökkönen, Reza Ghabcheloo

Subjects: Robotics (cs.RO)

Safe navigation in unknown environments stands as a significant challenge in the field of robotics. Control Barrier Function (CBF) is a strong mathematical tool to guarantee safety requirements. However, a common assumption in many works is that the CBF is already known and obstacles have predefined shapes. In this letter, we present a novel method called Occupancy Grid Map-based Control Barrier Function (OGM-CBF), which defines Control Barrier Function based on Occupancy Grid Maps. This enables generalization to unknown environments while generating online local or global maps of the environment using onboard perception sensors such as LiDAR or camera. With this method, the system guarantees safety via a single, continuously differentiable CBF per time step, which can be represented as one constraint in the CBF-QP optimization formulation while having an arbitrary number of obstacles with unknown shapes in the environment. This enables practical real-time implementation of CBF in both unknown and known environments. The efficacy of OGM-CBF is demonstrated in the safe control of an autonomous car in the CARLA simulator and a real-world industrial mobile robot.
[144] arXiv:2405.10706 [pdf, other]: Title: Challenging the Human-in-the-loop in Algorithmic Decision-making

Authors: Sebastian Tschiatschek, Eugenia Stamboliev, Timoth ee Schmude, Mark Coeckelbergh, Laura Koesten

Subjects: Machine Learning (cs.LG)

We discuss the role of humans in algorithmic decision-making (ADM) for socially relevant problems from a technical and philosophical perspective. In particular, we illustrate tensions arising from diverse expectations, values, and constraints by and on the humans involved. To this end, we assume that a strategic decision-maker (SDM) introduces ADM to optimize strategic and societal goals while the algorithms' recommended actions are overseen by a practical decision-maker (PDM) - a specific human-in-the-loop - who makes the final decisions. While the PDM is typically assumed to be a corrective, it can counteract the realization of the SDM's desired goals and societal values not least because of a misalignment of these values and unmet information needs of the PDM. This has significant implications for the distribution of power between the stakeholders in ADM, their constraints, and information needs. In particular, we emphasize the overseeing PDM's role as a potential political and ethical decision maker, who acts expected to balance strategic, value-driven objectives and on-the-ground individual decisions and constraints. We demonstrate empirically, on a machine learning benchmark dataset, the significant impact an overseeing PDM's decisions can have even if the PDM is constrained to performing only a limited amount of actions differing from the algorithms' recommendations. To ensure that the SDM's intended values are realized, the PDM needs to be provided with appropriate information conveyed through tailored explanations and its role must be characterized clearly. Our findings emphasize the need for an in-depth discussion of the role and power of the PDM and challenge the often-taken view that just including a human-in-the-loop in ADM ensures the 'correct' and 'ethical' functioning of the system.
[145] arXiv:2405.10707 [pdf, ps, other]: Title: HARIS: Human-Like Attention for Reference Image Segmentation

Authors: Mengxi Zhang, Heqing Lian, Yiming Liu, Kang Rong, Jie Chen

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Referring image segmentation (RIS) aims to locate the particular region corresponding to the language expression. Existing methods incorporate features from different modalities in a \emph{bottom-up} manner. This design may get some unnecessary image-text pairs, which leads to an inaccurate segmentation mask. In this paper, we propose a referring image segmentation method called HARIS, which introduces the Human-Like Attention mechanism and uses the parameter-efficient fine-tuning (PEFT) framework. To be specific, the Human-Like Attention gets a \emph{feedback} signal from multi-modal features, which makes the network center on the specific objects and discard the irrelevant image-text pairs. Besides, we introduce the PEFT framework to preserve the zero-shot ability of pre-trained encoders. Extensive experiments on three widely used RIS benchmarks and the PhraseCut dataset demonstrate that our method achieves state-of-the-art performance and great zero-shot ability.
[146] arXiv:2405.10708 [pdf, other]: Title: Numerical Recovery of the Diffusion Coefficient in Diffusion Equations from Terminal Measurement

Authors: Bangti Jin, Xiliang Lu, Qimeng Quan, Zhi Zhou

Comments: 22 pages, 2 figures

Subjects: Numerical Analysis (math.NA)

In this work, we investigate a numerical procedure for recovering a space-dependent diffusion coefficient in a (sub)diffusion model from the given terminal data, and provide a rigorous numerical analysis of the procedure. By exploiting decay behavior of the observation in time, we establish a novel H{\"o}lder type stability estimate for a large terminal time $T$. This is achieved by novel decay estimates of the (fractional) time derivative of the solution. To numerically recover the diffusion coefficient, we employ the standard output least-squares formulation with an $H^1(\Omega)$-seminorm penalty, and discretize the regularized problem by the Galerkin finite element method with continuous piecewise linear finite elements in space and backward Euler convolution quadrature in time. Further, we provide an error analysis of discrete approximations, and prove a convergence rate that matches the stability estimate. The derived $L^2(\Omega)$ error bound depends explicitly on the noise level, regularization parameter and discretization parameter(s), which gives a useful guideline of the \textsl{a priori} choice of discretization parameters with respect to the noise level in practical implementation. The error analysis is achieved using the conditional stability argument and discrete maximum-norm resolvent estimates. Several numerical experiments are also given to illustrate and complement the theoretical analysis.
[147] arXiv:2405.10713 [pdf, ps, other]: Title: Development of Semantics-Based Distributed Middleware for Heterogeneous Data Integration and its Application for Drought

Authors: A Akanbi

Comments: 286 Pages

Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI)

Drought is a complex environmental phenomenon that affects millions of people and communities all over the globe and is too elusive to be accurately predicted. This is mostly due to the scalability and variability of the web of environmental parameters that directly/indirectly causes the onset of different categories of drought. Since the dawn of man, efforts have been made to uniquely understand the natural indicators that provide signs of likely environmental events. These indicators/signs in the form of indigenous knowledge system have been used for generations. The intricate complexity of drought has, however, always been a major stumbling block for accurate drought prediction and forecasting systems. Recently, scientists in the field of agriculture and environmental monitoring have been discussing the integration of indigenous knowledge and scientific knowledge for a more accurate environmental forecasting system in order to incorporate diverse environmental information for a reliable drought forecast. Hence, in this research, the core objective is the development of a semantics-based data integration middleware that encompasses and integrates heterogeneous data models of local indigenous knowledge and sensor data towards an accurate drought forecasting system for the study areas. The local indigenous knowledge on drought gathered from the domain experts is transformed into rules to be used for performing deductive inference in conjunction with sensors data for determining the onset of drought through an automated inference generation module of the middleware. The semantic middleware incorporates, inter alia, a distributed architecture that consists of a streaming data processing engine based on Apache Kafka for real-time stream processing; a rule-based reasoning module; an ontology module for semantic representation of the knowledge bases.
[148] arXiv:2405.10714 [pdf, ps, other]: Title: Persian Pronoun Resolution: Leveraging Neural Networks and Language Models

Authors: Hassan Haji Mohammadi, Alireza Talebpour, Ahmad Mahmoudi Aznaveh, Samaneh Yazdani

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Coreference resolution, critical for identifying textual entities referencing the same entity, faces challenges in pronoun resolution, particularly identifying pronoun antecedents. Existing methods often treat pronoun resolution as a separate task from mention detection, potentially missing valuable information. This study proposes the first end-to-end neural network system for Persian pronoun resolution, leveraging pre-trained Transformer models like ParsBERT. Our system jointly optimizes both mention detection and antecedent linking, achieving a 3.37 F1 score improvement over the previous state-of-the-art system (which relied on rule-based and statistical methods) on the Mehr corpus. This significant improvement demonstrates the effectiveness of combining neural networks with linguistic models, potentially marking a significant advancement in Persian pronoun resolution and paving the way for further research in this under-explored area.
[149] arXiv:2405.10718 [pdf, other]: Title: SignLLM: Sign Languages Production Large Language Models

Authors: Sen Fang, Lei Wang, Ce Zheng, Yapeng Tian, Chen Chen

Comments: 33 pages, website at this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)

In this paper, we introduce the first comprehensive multilingual sign language dataset named Prompt2Sign, which builds from public data including American Sign Language (ASL) and seven others. Our dataset transforms a vast array of videos into a streamlined, model-friendly format, optimized for training with translation models like seq2seq and text2text. Building on this new dataset, we propose SignLLM, the first multilingual Sign Language Production (SLP) model, which includes two novel multilingual SLP modes that allow for the generation of sign language gestures from input text or prompt. Both of the modes can use a new loss and a module based on reinforcement learning, which accelerates the training by enhancing the model's capability to autonomously sample high-quality data. We present benchmark results of SignLLM, which demonstrate that our model achieves state-of-the-art performance on SLP tasks across eight sign languages.
[150] arXiv:2405.10722 [pdf, other]: Title: About the Burton-Miller factor in the low frequency region

Authors: Wolfgang Kreuzer

Comments: 12 pages, 9 figures

Subjects: Numerical Analysis (math.NA)

The Burton-Miller method is a widely used approach in acoustics to enhance the stability of the boundary element method for exterior Helmholtz problems at so-called critical frequencies. This method depends on a coupling parameter $\eta$ and it can be shown that as long as $\eta$ has an imaginary part different from 0, the boundary integral formulation for the Helmholtz equation has a unique solution at all frequencies. A popular choice for this parameter is $\eta = \frac{\mathrm{i}}{k}$, where $k$ is the wavenumber. It can be shown that this choice is quasi optimal. However, especially in the low frequency region, where the critical frequencies are still sparsely distributed, different choices for this factor result in a smaller condition number and a smaller error of the solution. In this work, alternative choices for this factor are compared based on numerical experiments. Additionally, a way to enhance the Burton-Miller solution with $\eta = \frac{\mathrm{i}}{k}$ for a sound hard scatterer in the low frequency region by an additional step of a modified Richardson iteration is introduced.
[151] arXiv:2405.10725 [pdf, other]: Title: INDUS: Effective and Efficient Language Models for Scientific Applications

Authors: Bishwaranjan Bhattacharjee, Aashka Trivedi, Masayasu Muraoka, Muthukumaran Ramasubramanian, Takuma Udagawa, Iksha Gurung, Rong Zhang, Bharath Dandala, Rahul Ramachandran, Manil Maskey, Kayleen Bugbee, Mike Little, Elizabeth Fancher, Lauren Sanders, Sylvain Costes, Sergi Blanco-Cuaresma, Kelly Lockhart, Thomas Allen, Felix Grazes, Megan Ansdel, Alberto Accomazzi, Yousef El-Kurdi, Davis Wertheimer, Birgit Pfitzmann, Cesar Berrospi Ramis, Michele Dolfi, Rafael Teixeira de Lima, Panos Vegenas, S. Karthik Mukkavilli, Peter Staar, Sanaz Vahidinia, Ryan McGranaghan, Armin Mehrabian, Tsendgar Lee

Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)

Large language models (LLMs) trained on general domain corpora showed remarkable results on natural language processing (NLP) tasks. However, previous research demonstrated LLMs trained using domain-focused corpora perform better on specialized tasks. Inspired by this pivotal insight, we developed INDUS, a comprehensive suite of LLMs tailored for the Earth science, biology, physics, heliophysics, planetary sciences and astrophysics domains and trained using curated scientific corpora drawn from diverse data sources. The suite of models include: (1) an encoder model trained using domain-specific vocabulary and corpora to address natural language understanding tasks, (2) a contrastive-learning-based general text embedding model trained using a diverse set of datasets drawn from multiple sources to address information retrieval tasks and (3) smaller versions of these models created using knowledge distillation techniques to address applications which have latency or resource constraints. We also created three new scientific benchmark datasets namely, CLIMATE-CHANGE-NER (entity-recognition), NASA-QA (extractive QA) and NASA-IR (IR) to accelerate research in these multi-disciplinary fields. Finally, we show that our models outperform both general-purpose encoders (RoBERTa) and existing domain-specific encoders (SciBERT) on these new tasks as well as existing benchmark tasks in the domains of interest.
[152] arXiv:2405.10729 [pdf, other]: Title: Contestable AI needs Computational Argumentation

Authors: Francesco Leofante, Hamed Ayoobi, Adam Dejl, Gabriel Freedman, Deniz Gorur, Junqi Jiang, Guilherme Paulino-Passos, Antonio Rago, Anna Rapberger, Fabrizio Russo, Xiang Yin, Dekai Zhang, Francesca Toni

Subjects: Artificial Intelligence (cs.AI)

AI has become pervasive in recent years, but state-of-the-art approaches predominantly neglect the need for AI systems to be contestable. Instead, contestability is advocated by AI guidelines (e.g. by the OECD) and regulation of automated decision-making (e.g. GDPR). In this position paper we explore how contestability can be achieved computationally in and for AI. We argue that contestable AI requires dynamic (human-machine and/or machine-machine) explainability and decision-making processes, whereby machines can (i) interact with humans and/or other machines to progressively explain their outputs and/or their reasoning as well as assess grounds for contestation provided by these humans and/or other machines, and (ii) revise their decision-making processes to redress any issues successfully raised during contestation. Given that much of the current AI landscape is tailored to static AIs, the need to accommodate contestability will require a radical rethinking, that, we argue, computational argumentation is ideally suited to support.
[153] arXiv:2405.10736 [pdf, other]: Title: StackOverflowVQA: Stack Overflow Visual Question Answering Dataset

Authors: Motahhare Mirzaei, Mohammad Javad Pirhadi, Sauleh Eetemadi

Subjects: Computer Vision and Pattern Recognition (cs.CV)

In recent years, people have increasingly used AI to help them with their problems by asking questions on different topics. One of these topics can be software-related and programming questions. In this work, we focus on the questions which need the understanding of images in addition to the question itself. We introduce the StackOverflowVQA dataset, which includes questions from StackOverflow that have one or more accompanying images. This is the first VQA dataset that focuses on software-related questions and contains multiple human-generated full-sentence answers. Additionally, we provide a baseline for answering the questions with respect to images in the introduced dataset using the GIT model. All versions of the dataset are available at https://huggingface.co/mirzaei2114.
[154] arXiv:2405.10738 [pdf, other]: Title: Feature-Adaptive and Data-Scalable In-Context Learning

Authors: Jiahao Li, Quan Wang, Licheng Zhang, Guoqing Jin, Zhendong Mao

Comments: Accepted at ACL 2024 main conference

Subjects: Computation and Language (cs.CL)

In-context learning (ICL), which promotes inference with several demonstrations, has become a widespread paradigm to stimulate LLM capabilities for downstream tasks. Due to context length constraints, it cannot be further improved in spite of more training data, and general features directly from LLMs in ICL are not adaptive to the specific downstream task. In this paper, we propose a feature-adaptive and data-scalable in-context learning framework (FADS-ICL), which can leverage task-adaptive features to promote inference on the downstream task, with the supervision of beyond-context samples. Specifically, it first extracts general features of beyond-context samples via the LLM with ICL input form one by one, and introduces a task-specific modulator to perform feature refinement and prediction after fitting a specific downstream task. We conduct extensive experiments on FADS-ICL under varying data settings (4$\sim$128 shots) and LLM scale (0.8$\sim$70B) settings. Experimental results show that FADS-ICL consistently outperforms previous state-of-the-art methods by a significant margin under all settings, verifying the effectiveness and superiority of FADS-ICL. For example, under the 1.5B and 32 shots setting, FADS-ICL can achieve \textbf{+14.3} average accuracy from feature adaptation over vanilla ICL on 10 datasets, with \textbf{+6.2} average accuracy over the previous state-of-the-art method, and the performance can further improve with increasing training data. Code and data are publicly available at \url{https://github.com/jiahaozhenbang/FADS-ICL}.
[155] arXiv:2405.10739 [pdf, other]: Title: Efficient Multimodal Large Language Models: A Survey

Authors: Yizhang Jin, Jian Li, Yexin Liu, Tianjun Gu, Kai Wu, Zhengkai Jiang, Muyang He, Bo Zhao, Xin Tan, Zhenye Gan, Yabiao Wang, Chengjie Wang, Lizhuang Ma

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

In the past year, Multimodal Large Language Models (MLLMs) have demonstrated remarkable performance in tasks such as visual question answering, visual understanding and reasoning. However, the extensive model size and high training and inference costs have hindered the widespread application of MLLMs in academia and industry. Thus, studying efficient and lightweight MLLMs has enormous potential, especially in edge computing scenarios. In this survey, we provide a comprehensive and systematic review of the current state of efficient MLLMs. Specifically, we summarize the timeline of representative efficient MLLMs, research state of efficient structures and strategies, and the applications. Finally, we discuss the limitations of current efficient MLLM research and promising future directions. Please refer to our GitHub repository for more details: https://github.com/lijiannuist/Efficient-Multimodal-LLMs-Survey.
[156] arXiv:2405.10741 [pdf, other]: Title: SBAAM! Eliminating Transcript Dependency in Automatic Subtitling

Authors: Marco Gaido, Sara Papi, Matteo Negri, Mauro Cettolo, Luisa Bentivogli

Comments: Accepted to ACL 2024 main conference

Subjects: Computation and Language (cs.CL)

Subtitling plays a crucial role in enhancing the accessibility of audiovisual content and encompasses three primary subtasks: translating spoken dialogue, segmenting translations into concise textual units, and estimating timestamps that govern their on-screen duration. Past attempts to automate this process rely, to varying degrees, on automatic transcripts, employed diversely for the three subtasks. In response to the acknowledged limitations associated with this reliance on transcripts, recent research has shifted towards transcription-free solutions for translation and segmentation, leaving the direct generation of timestamps as uncharted territory. To fill this gap, we introduce the first direct model capable of producing automatic subtitles, entirely eliminating any dependence on intermediate transcripts also for timestamp prediction. Experimental results, backed by manual evaluation, showcase our solution's new state-of-the-art performance across multiple language pairs and diverse conditions.
[157] arXiv:2405.10743 [pdf, other]: Title: Occupancy-SLAM: Simultaneously Optimizing Robot Poses and Continuous Occupancy Map

Authors: Liang Zhao, Yingyu Wang, Shoudong Huang

Comments: This paper has been accpeted by Robotics: Science and Systems 2022

Journal-ref: Robotics: Science and Systems 2022

Subjects: Robotics (cs.RO)

In this paper, we propose an optimization based SLAM approach to simultaneously optimize the robot trajectory and the occupancy map using 2D laser scans (and odometry) information. The key novelty is that the robot poses and the occupancy map are optimized together, which is significantly different from existing occupancy mapping strategies where the robot poses need to be obtained first before the map can be estimated. In our formulation, the map is represented as a continuous occupancy map where each 2D point in the environment has a corresponding evidence value. The Occupancy-SLAM problem is formulated as an optimization problem where the variables include all the robot poses and the occupancy values at the selected discrete grid cell nodes. We propose a variation of Gauss-Newton method to solve this new formulated problem, obtaining the optimized occupancy map and robot trajectory together with their uncertainties. Our algorithm is an offline approach since it is based on batch optimization and the number of variables involved is large. Evaluations using simulations and publicly available practical 2D laser datasets demonstrate that the proposed approach can estimate the maps and robot trajectories more accurately than the state-of-the-art techniques, when a relatively accurate initial guess is provided to our algorithm. The video shows the convergence process of the proposed Occupancy-SLAM and comparison of results to Cartographer can be found at \url{https://youtu.be/4oLyVEUC4iY}.
[158] arXiv:2405.10745 [pdf, other]: Title: Empowering Small-Scale Knowledge Graphs: A Strategy of Leveraging General-Purpose Knowledge Graphs for Enriched Embeddings

Authors: Albert Sawczyn, Jakub Binkowski, Piotr Bielak, Tomasz Kajdanowicz

Comments: Accepted for LREC-COLING 2024

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

Knowledge-intensive tasks pose a significant challenge for Machine Learning (ML) techniques. Commonly adopted methods, such as Large Language Models (LLMs), often exhibit limitations when applied to such tasks. Nevertheless, there have been notable endeavours to mitigate these challenges, with a significant emphasis on augmenting LLMs through Knowledge Graphs (KGs). While KGs provide many advantages for representing knowledge, their development costs can deter extensive research and applications. Addressing this limitation, we introduce a framework for enriching embeddings of small-scale domain-specific Knowledge Graphs with well-established general-purpose KGs. Adopting our method, a modest domain-specific KG can benefit from a performance boost in downstream tasks when linked to a substantial general-purpose KG. Experimental evaluations demonstrate a notable enhancement, with up to a 44% increase observed in the Hits@10 metric. This relatively unexplored research direction can catalyze more frequent incorporation of KGs in knowledge-intensive tasks, resulting in more robust, reliable ML implementations, which hallucinates less than prevalent LLM solutions.
Keywords: knowledge graph, knowledge graph completion, entity alignment, representation learning, machine learning
[159] arXiv:2405.10746 [pdf, other]: Title: Causality in the Can: Diet Coke's Impact on Fatness

Authors: Yicheng Qi, Ang Li

Comments: 9 pages, 4 figures, uses aaai24.sty

Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI)

Artificially sweetened beverages like Diet Coke are often considered healthier alternatives, but the debate over their impact on obesity persists. Previous research has predominantly relied on observational data or randomized controlled trials (RCTs), which may not accurately capture the causal relationship between Diet Coke consumption and obesity. This study uses causal inference methods, employing data from the National Health and Nutrition Examination Survey (NHANES) to examine this relationship across diverse demographics. Instead of relying on RCT data, we constructed a causal graph and applied the back-door criterion with its adjustment formula to estimate the RCT distributions. We then calculated the counterfactual quantity, the Probability of Necessity and Sufficiency (PNS), using both NHANES data and estimated RCT data. We propose that PNS is the essential metric for assessing the impact of Diet Coke on obesity. Our results indicate that between 20% to 50% of individuals, especially those with poor dietary habits, are more likely to gain weight from Diet Coke. Conversely, in groups like young females with healthier diets, only a small proportion experience weight gain due to Diet Coke. These findings highlight the influence of individual lifestyle and potential hormonal factors on the varied effects of Diet Coke, providing a new framework for understanding its nutritional impacts on health.
[160] arXiv:2405.10748 [pdf, other]: Title: Deep Data Consistency: a Fast and Robust Diffusion Model-based Solver for Inverse Problems

Authors: Hanyu Chen, Zhixiu Hao, Liying Xiao

Comments: Codes: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Diffusion models have become a successful approach for solving various image inverse problems by providing a powerful diffusion prior. Many studies tried to combine the measurement into diffusion by score function replacement, matrix decomposition, or optimization algorithms, but it is hard to balance the data consistency and realness. The slow sampling speed is also a main obstacle to its wide application. To address the challenges, we propose Deep Data Consistency (DDC) to update the data consistency step with a deep learning model when solving inverse problems with diffusion models. By analyzing existing methods, the variational bound training objective is used to maximize the conditional posterior and reduce its impact on the diffusion process. In comparison with state-of-the-art methods in linear and non-linear tasks, DDC demonstrates its outstanding performance of both similarity and realness metrics in generating high-quality solutions with only 5 inference steps in 0.77 seconds on average. In addition, the robustness of DDC is well illustrated in the experiments across datasets, with large noise and the capacity to solve multiple tasks in only one pre-trained model.
[161] arXiv:2405.10750 [pdf, other]: Title: Parameter Identification for Electrochemical Models of Lithium-Ion Batteries Using Bayesian Optimization

Authors: Jianzong Pi, Samuel Filgueira da Silva, Mehmet Fatih Ozkan, Abhishek Gupta, Marcello Canova

Comments: 6 pages

Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)

Efficient parameter identification of electrochemical models is crucial for accurate monitoring and control of lithium-ion cells. This process becomes challenging when applied to complex models that rely on a considerable number of interdependent parameters that affect the output response. Gradient-based and metaheuristic optimization techniques, although previously employed for this task, are limited by their lack of robustness, high computational costs, and susceptibility to local minima. In this study, Bayesian Optimization is used for tuning the dynamic parameters of an electrochemical equivalent circuit battery model (E-ECM) for a nickel-manganese-cobalt (NMC)-graphite cell. The performance of the Bayesian Optimization is compared with baseline methods based on gradient-based and metaheuristic approaches. The robustness of the parameter optimization method is tested by performing verification using an experimental drive cycle. The results indicate that Bayesian Optimization outperforms Gradient Descent and PSO optimization techniques, achieving reductions on average testing loss by 28.8% and 5.8%, respectively. Moreover, Bayesian optimization significantly reduces the variance in testing loss by 95.8% and 72.7%, respectively.
[162] arXiv:2405.10751 [pdf, other]: Title: Some remarks on a mathematical model for water flow in porous media with competition between transport and diffusion

Authors: Judita Runcziková, Jan Chleboun, Chiara Gavioli, Pavel Krejčí

Comments: 7 pages

Subjects: Numerical Analysis (math.NA); Analysis of PDEs (math.AP)

The contribution deals with the mathematical modelling of fluid flow in porous media, in particular water flow in soils. The motivation is to describe the competition between gravity and capillarity, or, in other words, between transport and diffusion. The analysis is based on a mathematical model developed by B. Detmann, C. Gavioli, and P. Krej\v{c}\'i, in which the effects of gravity are included in a novel way. The model consists of a nonlinear partial differential equation describing both the gravitational transport and the capillary diffusion of water. Although analytical solutions can be obtained for some special cases, only numerical solutions are available in more general situations. The solving algorithm is based on a time discretisation and the finite element method, and is written in Matlab. The results of the numerical simulations are shown and the behaviour of the model is discussed.
[163] arXiv:2405.10757 [pdf, other]: Title: Rethinking Graph Backdoor Attacks: A Distribution-Preserving Perspective

Authors: Zhiwei Zhang, Minhua Lin, Enyan Dai, Suhang Wang

Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)

Graph Neural Networks (GNNs) have shown remarkable performance in various tasks. However, recent works reveal that GNNs are vulnerable to backdoor attacks. Generally, backdoor attack poisons the graph by attaching backdoor triggers and the target class label to a set of nodes in the training graph. A GNN trained on the poisoned graph will then be misled to predict test nodes attached with trigger to the target class. Despite their effectiveness, our empirical analysis shows that triggers generated by existing methods tend to be out-of-distribution (OOD), which significantly differ from the clean data. Hence, these injected triggers can be easily detected and pruned with widely used outlier detection methods in real-world applications. Therefore, in this paper, we study a novel problem of unnoticeable graph backdoor attacks with in-distribution (ID) triggers. To generate ID triggers, we introduce an OOD detector in conjunction with an adversarial learning strategy to generate the attributes of the triggers within distribution. To ensure a high attack success rate with ID triggers, we introduce novel modules designed to enhance trigger memorization by the victim model trained on poisoned graph. Extensive experiments on real-world datasets demonstrate the effectiveness of the proposed method in generating in distribution triggers that can by-pass various defense strategies while maintaining a high attack success rate.
[164] arXiv:2405.10758 [pdf, other]: Title: Seeing is (Not) Believing: Practical Phishing Attacks Targeting Social Media Sharing Cards

Authors: Wangchenlu Huang, Shenao Wang, Yanjie Zhao, Guosheng Xu, Haoyu Wang

Subjects: Cryptography and Security (cs.CR)

In the digital era, Online Social Networks (OSNs) play a crucial role in information dissemination, with sharing cards for link previews emerging as a key feature. These cards offer snapshots of shared content, including titles, descriptions, and images. In this study, we investigate the construction and dissemination mechanisms of these cards, focusing on two primary server-side generation methods based on Share-SDK and HTML meta tags. Our investigation reveals a novel type of attack, i.e., Sharing Card Forgery (SCF) attack that can be exploited to create forged benign sharing cards for malicious links. We demonstrate the feasibility of these attacks through practical implementations and evaluate their effectiveness across 13 various online social networks. Our findings indicate a significant risk, as the deceptive cards can evade detection and persist on social platforms, thus posing a substantial threat to user security. We also delve into countermeasures and discuss the challenges in effectively mitigating these types of attacks. This study not only sheds light on a novel phishing technique but also calls for heightened awareness and improved defensive strategies in the OSN ecosystem.
[165] arXiv:2405.10765 [pdf, other]: Title: Fast Collision Probability Estimation for Automated Driving using Multi-circular Shape Approximations

Authors: Leon Tolksdorf, Christian Birkner, Arturo Tejada, Nathan van de Wouw

Comments: Accepted for the 2024 Intelligent Vehicles Symposium, 8 pages

Subjects: Robotics (cs.RO); Probability (math.PR)

Many state-of-the-art methods for safety assessment and motion planning for automated driving require estimation of the probability of collision (POC). To estimate the POC, a shape approximation of the colliding actors and probability density functions of the associated uncertain kinematic variables are required. Even with such information available, the derivation of the POC is in general, i.e., for any shape and density, only possible with Monte Carlo sampling (MCS). Random sampling of the POC, however, is challenging as computational resources are limited in real-world applications. We present expressions for the POC in the presence of Gaussian uncertainties, based on multi-circular shape approximations. In addition, we show that the proposed approach is computationally more efficient than MCS. Lastly, we provide a method for upper and lower bounding the estimation error for the POC induced by the used shape approximations.
[166] arXiv:2405.10767 [pdf, other]: Title: Evaluating Saliency Explanations in NLP by Crowdsourcing

Authors: Xiaotian Lu, Jiyi Li, Zhen Wan, Xiaofeng Lin, Koh Takeuchi, Hisashi Kashima

Comments: 13 pages, 4 figures, Accepted for LREC-Coling 2024 (Oral)

Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI)

Deep learning models have performed well on many NLP tasks. However, their internal mechanisms are typically difficult for humans to understand. The development of methods to explain models has become a key issue in the reliability of deep learning models in many important applications. Various saliency explanation methods, which give each feature of input a score proportional to the contribution of output, have been proposed to determine the part of the input which a model values most. Despite a considerable body of work on the evaluation of saliency methods, whether the results of various evaluation metrics agree with human cognition remains an open question. In this study, we propose a new human-based method to evaluate saliency methods in NLP by crowdsourcing. We recruited 800 crowd workers and empirically evaluated seven saliency methods on two datasets with the proposed method. We analyzed the performance of saliency methods, compared our results with existing automated evaluation methods, and identified notable differences between NLP and computer vision (CV) fields when using saliency methods. The instance-level data of our crowdsourced experiments and the code to reproduce the explanations are available at https://github.com/xtlu/lreccoling_evaluation.
[167] arXiv:2405.10768 [pdf, ps, other]: Title: What should be observed for optimal reward in POMDPs?

Authors: Alyzia-Maria Konsta, Alberto Lluch Lafuente, Christoph Matheja

Subjects: Artificial Intelligence (cs.AI)

Partially observable Markov Decision Processes (POMDPs) are a standard model for agents making decisions in uncertain environments. Most work on POMDPs focuses on synthesizing strategies based on the available capabilities. However, system designers can often control an agent's observation capabilities, e.g. by placing or selecting sensors. This raises the question of how one should select an agent's sensors cost-effectively such that it achieves the desired goals. In this paper, we study the novel optimal observability problem OOP: Given a POMDP M, how should one change M's observation capabilities within a fixed budget such that its (minimal) expected reward remains below a given threshold? We show that the problem is undecidable in general and decidable when considering positional strategies only. We present two algorithms for a decidable fragment of the OOP: one based on optimal strategies of M's underlying Markov decision process and one based on parameter synthesis with SMT. We report promising results for variants of typical examples from the POMDP literature.
[168] arXiv:2405.10774 [pdf, ps, other]: Title: Injective hardness condition for PCSPs

Authors: Demian Banakh, Marcin Kozik

Comments: To be published in LICS'24

Subjects: Computational Complexity (cs.CC)

We present a template for the Promise Constraint Satisfaction Problem (PCSP) which is NP-hard but does not satisfy the current state-of-the-art hardness condition [ACMTCT'21]. We introduce a new "injective" condition based on the smooth version of the layered PCP Theorem and use this new condition to confirm that the problem is indeed NP-hard. In the second part of the article, we establish a dichotomy for Boolean PCSPs defined by templates with polymorphisms in the set of linear threshold functions. The reasoning relies on the new injective condition.
[169] arXiv:2405.10779 [pdf, ps, other]: Title: Baseline Results for Selected Nonlinear System Identification Benchmarks

Authors: Max D. Champneys, Gerben I. Beintema, Roland Tóth, Maarten Schoukens, Maarten Schoukens, Timothy J. Rogers

Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)

Nonlinear system identification remains an important open challenge across research and academia. Large numbers of novel approaches are seen published each year, each presenting improvements or extensions to existing methods. It is natural, therefore, to consider how one might choose between these competing models. Benchmark datasets provide one clear way to approach this question. However, to make meaningful inference based on benchmark performance it is important to understand how well a new method performs comparatively to results available with well-established methods. This paper presents a set of ten baseline techniques and their relative performances on five popular benchmarks. The aim of this contribution is to stimulate thought and discussion regarding objective comparison of identification methodologies.
[170] arXiv:2405.10787 [pdf, other]: Title: On the Application of Reliability Theory to Cellular Network Mobility Performance Analysis

Authors: Subhyal Bin Iqbal, Behnam Khodapanah, Philipp Schulz, Gerhard P. Fettweis

Comments: 6 pages, 4 figures. Submitted to IEEE Wireless Communication Letters for possible publication

Subjects: Networking and Internet Architecture (cs.NI)

Achieving connectivity reliability is one of the significant challenges for 5G and beyond 5G cellular networks. The present understanding of reliability in the context of mobile communication does not adequately cover the stochastic temporal aspects of the network, such as the duration and spread of packet errors that an outage session may cause. Rather, it simply confines the definition to the percentage of successful packet delivery. In this letter, we offer an elaborate modeling of the outage for a cellular mobile network by showcasing the different types of outages and their contiguity characteristic. Thereafter, using the outage metrics, we define two new key performance indicators (KPIs), namely mean outage time and mean time between outages as counterparts to akin KPIs that already exist in classical reliability theory, i.e., mean down time and mean time between failures. Using a system-level simulation where user mobility is a crucial component, it is shown that these newly defined KPIs can be used to quantify the reliability requirements of different user applications in cellular services.
[171] arXiv:2405.10788 [pdf, other]: Title: QEdgeProxy: QoS-Aware Load Balancing for IoT Services in the Computing Continuum

Authors: Ivan Čilić, Valentin Jukanović, Ivana Podnar Žarko, Pantelis Frangoudis, Schahram Dustdar

Comments: 7 pages, accepted for publication at IEEE EDGE 2024

Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

While various service orchestration aspects within Computing Continuum (CC) systems have been extensively addressed, including service placement, replication, and scheduling, an open challenge lies in ensuring uninterrupted data delivery from IoT devices to running service instances in this dynamic environment, while adhering to specific Quality of Service (QoS) requirements and balancing the load on service instances. To address this challenge, we introduce QEdgeProxy, an adaptive and QoS-aware load balancing framework specifically designed for routing client requests to appropriate IoT service instances in the CC. QEdgeProxy integrates naturally within Kubernetes, adapts to changes in dynamic environments, and manages to seamlessly deliver data to IoT service instances while consistently meeting QoS requirements and effectively distributing load across them. This is verified by extensive experiments over a realistic K3s cluster with instance failures and network variability, where QEdgeProxy outperforms both Kubernetes built-in mechanisms and a state-of-the-art solution, while introducing minimal computational overhead.
[172] arXiv:2405.10793 [pdf, ps, other]: Title: CCTNet: A Circular Convolutional Transformer Network for LiDAR-based Place Recognition Handling Movable Objects Occlusion

Authors: Gang Wang, Chaoran Zhu, Qian Xu, Tongzhou Zhang, Hai Zhang, XiaoPeng Fan, Jue Hu

Subjects: Robotics (cs.RO)

Place recognition is a fundamental task for robotic application, allowing robots to perform loop closure detection within simultaneous localization and mapping (SLAM), and achieve relocalization on prior maps. Current range image-based networks use single-column convolution to maintain feature invariance to shifts in image columns caused by LiDAR viewpoint change.However, this raises the issues such as "restricted receptive fields" and "excessive focus on local regions", degrading the performance of networks. To address the aforementioned issues, we propose a lightweight circular convolutional Transformer network denoted as CCTNet, which boosts performance by capturing structural information in point clouds and facilitating crossdimensional interaction of spatial and channel information. Initially, a Circular Convolution Module (CCM) is introduced, expanding the network's perceptual field while maintaining feature consistency across varying LiDAR perspectives. Then, a Range Transformer Module (RTM) is proposed, which enhances place recognition accuracy in scenarios with movable objects by employing a combination of channel and spatial attention mechanisms. Furthermore, we propose an Overlap-based loss function, transforming the place recognition task from a binary loop closure classification into a regression problem linked to the overlap between LiDAR frames. Through extensive experiments on the KITTI and Ford Campus datasets, CCTNet surpasses comparable methods, achieving Recall@1 of 0.924 and 0.965, and Recall@1% of 0.990 and 0.993 on the test set, showcasing a superior performance. Results on the selfcollected dataset further demonstrate the proposed method's potential for practical implementation in complex scenarios to handle movable objects, showing improved generalization in various datasets.
[173] arXiv:2405.10799 [pdf, ps, other]: Title: Training Compute Thresholds: Features and Functions in AI Governance

Authors: Lennart Heim

Comments: Working paper

Subjects: Computers and Society (cs.CY); Machine Learning (cs.LG)

This paper examines the use of training compute thresholds as a tool for governing artificial intelligence (AI) systems. We argue that compute thresholds serve as a valuable trigger for further evaluation of AI models, rather than being the sole determinant of the regulation. Key advantages of compute thresholds include their correlation with model capabilities and risks, quantifiability, ease of measurement, robustness to circumvention, knowability before model development and deployment, potential for external verification, and targeted scope. Compute thresholds provide a practical starting point for identifying potentially high-risk models and can be used as an initial filter in AI governance frameworks alongside other sector-specific regulations and broader governance measures.
[174] arXiv:2405.10800 [pdf, other]: Title: Heterogeneity-Informed Meta-Parameter Learning for Spatiotemporal Time Series Forecasting

Authors: Zheng Dong, Renhe Jiang, Haotian Gao, Hangchen Liu, Jinliang Deng, Qingsong Wen, Xuan Song

Comments: Accepted by KDD'24 Research Track

Subjects: Machine Learning (cs.LG)

Spatiotemporal time series forecasting plays a key role in a wide range of real-world applications. While significant progress has been made in this area, fully capturing and leveraging spatiotemporal heterogeneity remains a fundamental challenge. Therefore, we propose a novel Heterogeneity-Informed Meta-Parameter Learning scheme. Specifically, our approach implicitly captures spatiotemporal heterogeneity through learning spatial and temporal embeddings, which can be viewed as a clustering process. Then, a novel spatiotemporal meta-parameter learning paradigm is proposed to learn spatiotemporal-specific parameters from meta-parameter pools, which is informed by the captured heterogeneity. Based on these ideas, we develop a Heterogeneity-Informed Spatiotemporal Meta-Network (HimNet) for spatiotemporal time series forecasting. Extensive experiments on five widely-used benchmarks demonstrate our method achieves state-of-the-art performance while exhibiting superior interpretability. Our code is available at https://github.com/XDZhelheim/HimNet.
[175] arXiv:2405.10801 [pdf, ps, other]: Title: The Relational Machine Calculus

Authors: Chris Barrett, Daniel Castle, Willem Heijltjes

Comments: LICS paper, 15 pages excluding references

Subjects: Programming Languages (cs.PL)

This paper presents the Relational Machine Calculus (RMC): a simple, foundational model of first-order relational programming. The RMC originates from the Functional Machine Calculus (FMC), which generalizes the lambda-calculus and its standard call-by-name stack machine in two directions. One, "locations", introduces multiple stacks, which enable effect operators to be encoded into the abstraction and application constructs. The second, "sequencing", introduces the imperative notions of "skip" and "sequence", similar to kappa-calculus and concatenative programming languages. The key observation of the RMC is that the first-order fragment of the FMC exhibits a latent duality which, given a simple decomposition of the relevant constructors, can be concretely expressed as an involution on syntax. Semantically, this gives rise to a sound and complete calculus for string diagrams of Frobenius monoids. We consider unification as the corresponding symmetric generalization of beta-reduction. By further including standard operators of Kleene algebra, the RMC embeds a range of computational models: the kappa-calculus, logic programming, automata, Interaction Nets, and Petri Nets, among others. These embeddings preserve operational semantics, which for the RMC is again given by a generalization of the standard stack machine for the lambda-calculus. The equational theory of the RMC (which supports reasoning about its operational semantics) is conservative over both the first-order lambda-calculus and Kleene algebra, and can be oriented to give a confluent reduction relation.
[176] arXiv:2405.10802 [pdf, other]: Title: Reduced storage direct tensor ring decomposition for convolutional neural networks compression

Authors: Mateusz Gabor, Rafał Zdunek

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Convolutional neural networks (CNNs) are among the most widely used machine learning models for computer vision tasks, such as image classification. To improve the efficiency of CNNs, many CNNs compressing approaches have been developed. Low-rank methods approximate the original convolutional kernel with a sequence of smaller convolutional kernels, which leads to reduced storage and time complexities. In this study, we propose a novel low-rank CNNs compression method that is based on reduced storage direct tensor ring decomposition (RSDTR). The proposed method offers a higher circular mode permutation flexibility, and it is characterized by large parameter and FLOPS compression rates, while preserving a good classification accuracy of the compressed network. The experiments, performed on the CIFAR-10 and ImageNet datasets, clearly demonstrate the efficiency of RSDTR in comparison to other state-of-the-art CNNs compression approaches.
[177] arXiv:2405.10808 [pdf, other]: Title: ActiveLLM: Large Language Model-based Active Learning for Textual Few-Shot Scenarios

Authors: Markus Bayer, Christian Reuter

Comments: 18 pages, 7 figures, 4 tables

Subjects: Computation and Language (cs.CL)

Active learning is designed to minimize annotation efforts by prioritizing instances that most enhance learning. However, many active learning strategies struggle with a 'cold start' problem, needing substantial initial data to be effective. This limitation often reduces their utility for pre-trained models, which already perform well in few-shot scenarios. To address this, we introduce ActiveLLM, a novel active learning approach that leverages large language models such as GPT-4, Llama 3, and Mistral Large for selecting instances. We demonstrate that ActiveLLM significantly enhances the classification performance of BERT classifiers in few-shot scenarios, outperforming both traditional active learning methods and the few-shot learning method SetFit. Additionally, ActiveLLM can be extended to non-few-shot scenarios, allowing for iterative selections. In this way, ActiveLLM can even help other active learning strategies to overcome their cold start problem. Our results suggest that ActiveLLM offers a promising solution for improving model performance across various learning setups.
[178] arXiv:2405.10814 [pdf, other]: Title: Data-Driven Symbol Detection for Intersymbol Interference Channels with Bursty Impulsive Noise

Authors: Boris Karanov, Chin-Hung Chen, Yan Wu, Alex Young, Wim van Houtum

Comments: This work has been submitted to the IEEE for possible publication

Subjects: Information Theory (cs.IT); Machine Learning (cs.LG); Signal Processing (eess.SP)

We developed machine learning approaches for data-driven trellis-based soft symbol detection in coded transmission over intersymbol interference (ISI) channels in presence of bursty impulsive noise (IN), for example encountered in wireless digital broadcasting systems and vehicular communications. This enabled us to obtain optimized detectors based on the Bahl-Cocke-Jelinek-Raviv (BCJR) algorithm while circumventing the use of full channel state information (CSI) for computing likelihoods and trellis state transition probabilities. First, we extended the application of the neural network (NN)-aided BCJR, recently proposed for ISI channels with additive white Gaussian noise (AWGN). Although suitable for estimating likelihoods via labeling of transmission sequences, the BCJR-NN method does not provide a framework for learning the trellis state transitions. In addition to detection over the joint ISI and IN states we also focused on another scenario where trellis transitions are not trivial: detection for the ISI channel with AWGN with inaccurate knowledge of the channel memory at the receiver. Without access to the accurate state transition matrix, the BCJR- NN performance significantly degrades in both settings. To this end, we devised an alternative approach for data-driven BCJR detection based on the unsupervised learning of a hidden Markov model (HMM). The BCJR-HMM allowed us to optimize both the likelihood function and the state transition matrix without labeling. Moreover, we demonstrated the viability of a hybrid NN and HMM BCJR detection where NN is used for learning the likelihoods, while the state transitions are optimized via HMM. While reducing the required prior channel knowledge, the examined data-driven detectors with learned trellis state transitions achieve bit error rates close to the optimal full CSI-based BCJR, significantly outperforming detection with inaccurate CSI.
[179] arXiv:2405.10818 [pdf, ps, other]: Title: Modeling Supply Chain Interaction and Disruption: Insights from Real-world Data and Complex Adaptive System

Authors: Jiawei Feng, Mengsi Cai, Fangze Dai, Tianci Bu, Xiaoyu Zhang, Huijun Zheng, Xin Lu

Comments: arXiv admin note: text overlap with arXiv:2304.10428 by other authors

Subjects: Social and Information Networks (cs.SI)

In the rapidly evolving automotive industry, Systems-on-Chips (SoCs) are playing an increasingly crucial role in enhancing vehicle intelligence, connectivity, and safety features. For enterprises whose business encompasses automotive SoCs, the sustained and stable provision and receipt of SoC relevant goods or services are essential. Considering the imperative for a resilient and adaptable supply network, enterprises are concentrating their efforts on formulating strategies to address risks stemming from supply chain disruptions caused by technological obsolescence, natural disasters, and geopolitical tensions. This study presents an open supply knowledge extraction and complement approach and build a supply chain network of automotive SoC enterprises in China, which incorporates cross-domain named entity recognition under limited information, fuzzy matching of firm entities, and supply relation inferring based on knowledge graph. Subsequently, we exhibit the degree and registered capital distribution across firms, and analyze the correlations between centrality metrics in the supply chain network. Finally, based on recovery capacity and risk transfer, two interaction disruption models (IDMs) are developed to elucidate the adaptive behaviors and effect of network disruptions under various business and attack strategies. This research not only aids in exploring the complexities of Chinese automotive SoC supply chain but also enriches our understanding of the dynamics of firm behavior in this crucial industry sector.
[180] arXiv:2405.10822 [pdf, other]: Title: Generative modeling through internal high-dimensional chaotic activity

Authors: Samantha J. Fournier, Pierfrancesco Urbani

Subjects: Machine Learning (cs.LG); Disordered Systems and Neural Networks (cond-mat.dis-nn)

Generative modeling aims at producing new datapoints whose statistical properties resemble the ones in a training dataset. In recent years, there has been a burst of machine learning techniques and settings that can achieve this goal with remarkable performances. In most of these settings, one uses the training dataset in conjunction with noise, which is added as a source of statistical variability and is essential for the generative task. Here, we explore the idea of using internal chaotic dynamics in high-dimensional chaotic systems as a way to generate new datapoints from a training dataset. We show that simple learning rules can achieve this goal within a set of vanilla architectures and characterize the quality of the generated datapoints through standard accuracy measures.
[181] arXiv:2405.10824 [pdf, other]: Title: Real-World Graph Analysis: Techniques for Static, Dynamic, and Temporal Communities

Authors: Davide Rucci

Subjects: Data Structures and Algorithms (cs.DS)

Graphs are widely used in various fields of computer science. They have also found application in unrelated areas, leading to a diverse range of problems. These problems can be modeled as relationships between entities in various contexts, such as social networks, protein interactions in cells, and route maps. Therefore it is logical to analyze these data structures with diverse approaches, whether they are numerical or structural, global or local, approximate or exact. In particular, the concept of community plays an important role in local structural analysis, as it is able to highlight the composition of the underlying graph while providing insights into what the organization and importance of the nodes in a network look like. This thesis pursues the goal of extracting knowledge from different kinds of graphs, including static, dynamic, and temporal graphs, with a particular focus on their community substructures. To tackle this task we use combinatorial algorithms that can list all the communities in a graph according to different formalizations, such as cliques, $k$-graphlets, and $k$-cores. We first develop new algorithms to enumerate subgraphs, using traditional and novel techniques such as push-out amortization, and CPU cache analysis to boost their efficiency. We then extend these concepts to the analysis of real-world graphs across diverse domains, ranging from social networks to autonomous systems modeled as temporal graphs. In this field, there is currently no widely accepted adaptation, even for straightforward subgraphs like $k$-cores, and the available data is expanding both in terms of quantity and scale. As a result, our findings advance the state of the art both from a theoretical and a practical perspective and can be used in a static or dynamic setting to further speed up and refine graph analysis techniques.
[182] arXiv:2405.10825 [pdf, other]: Title: Large Language Model (LLM) for Telecommunications: A Comprehensive Survey on Principles, Key Techniques, and Opportunities

Authors: Hao Zhou, Chengming Hu, Ye Yuan, Yufei Cui, Yili Jin, Can Chen, Haolun Wu, Dun Yuan, Li Jiang, Di Wu, Xue Liu, Charlie Zhang, Xianbin Wang, Jiangchuan Liu

Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)

Large language models (LLMs) have received considerable attention recently due to their outstanding comprehension and reasoning capabilities, leading to great progress in many fields. The advancement of LLM techniques also offers promising opportunities to automate many tasks in the telecommunication (telecom) field. After pre-training and fine-tuning, LLMs can perform diverse downstream tasks based on human instructions, paving the way to artificial general intelligence (AGI)-enabled 6G. Given the great potential of LLM technologies, this work aims to provide a comprehensive overview of LLM-enabled telecom networks. In particular, we first present LLM fundamentals, including model architecture, pre-training, fine-tuning, inference and utilization, model evaluation, and telecom deployment. Then, we introduce LLM-enabled key techniques and telecom applications in terms of generation, classification, optimization, and prediction problems. Specifically, the LLM-enabled generation applications include telecom domain knowledge, code, and network configuration generation. After that, the LLM-based classification applications involve network security, text, image, and traffic classification problems. Moreover, multiple LLM-enabled optimization techniques are introduced, such as automated reward function design for reinforcement learning and verbal reinforcement learning. Furthermore, for LLM-aided prediction problems, we discussed time-series prediction models and multi-modality prediction problems for telecom. Finally, we highlight the challenges and identify the future directions of LLM-enabled telecom networks.
[183] arXiv:2405.10829 [pdf, ps, other]: Title: The participation of public in knowledge production: a citizen science projects overview

Authors: Nuria Bautista-Puig, Enrique Orduna-Malea, Philippe Mongeon

Comments: 11 pages, 4 figures, 1 table

Subjects: Digital Libraries (cs.DL)

Citizen Science (CS) is related to public engagement in scientific research. The tasks in which the citizens can be involved are diverse and can range from data collection and tagging images to participation in the planning and research design. However, little is known about the involvement degree of the citizens to CS projects, and the contribution of those projects to the advancement of knowledge (e.g. scientific outcomes). This study aims to gain a better understanding by analysing the SciStarter database. A total of 2,346 CS projects were identified, mainly from Ecology and Environmental Sciences. Of these projects, 91% show low participation of the citizens (Level 1 "citizens as sensors" and 2 "citizens as interpreters", from Haklay's scale). In terms of scientific output, 918 papers indexed in the Web of Science (WoS) were identified. The most prolific projects were found to have lower levels of citizen involvement, specifically at Levels 1 and 2.
[184] arXiv:2405.10830 [pdf, other]: Title: Combining Teacher-Student with Representation Learning: A Concurrent Teacher-Student Reinforcement Learning Paradigm for Legged Locomotion

Authors: Hongxi Wang, Haoxiang Luo, Wei Zhang, Hua Chen

Comments: This paper presents a novel concurrent teacher-student reinforcement learning architecture for legged locomotion over challenging terrains, based only on proprioceptive measurements in real-world deployment. The effectiveness of the proposed architecture is demonstrated through extensive indoor and outdoor experiments on quadrupedal robots and a point-foot bipedal robot

Subjects: Robotics (cs.RO)

Thanks to the explosive developments of data-driven learning methodologies recently, reinforcement learning (RL) emerges as a promising solution to address the legged locomotion problem in robotics. In this manuscript, we propose a novel concurrent teacher-student reinforcement learning architecture for legged locomotion over challenging terrains, based only on proprioceptive measurements in real-world deployment. Different from convectional teacher-student architecture that trains the teacher policy via RL and transfers the knowledge to the student policy through supervised learning, our proposed architecture trains teacher and student policy networks concurrently under the reinforcement learning paradigm. To achieve this, we develop a new training scheme based on conventional proximal policy gradient (PPO) method to accommodate the interaction between teacher policy network and student policy network. The effectiveness of the proposed architecture as well as the new training scheme is demonstrated through extensive indoor and outdoor experiments on quadrupedal robots and point-foot bipedal robot, showcasing robust locomotion over challenging terrains and improved performance compared to two-stage training methods.
[185] arXiv:2405.10832 [pdf, other]: Title: Open-Vocabulary Spatio-Temporal Action Detection

Authors: Tao Wu, Shuqiu Ge, Jie Qin, Gangshan Wu, Limin Wang

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Spatio-temporal action detection (STAD) is an important fine-grained video understanding task. Current methods require box and label supervision for all action classes in advance. However, in real-world applications, it is very likely to come across new action classes not seen in training because the action category space is large and hard to enumerate. Also, the cost of data annotation and model training for new classes is extremely high for traditional methods, as we need to perform detailed box annotations and re-train the whole network from scratch. In this paper, we propose a new challenging setting by performing open-vocabulary STAD to better mimic the situation of action detection in an open world. Open-vocabulary spatio-temporal action detection (OV-STAD) requires training a model on a limited set of base classes with box and label supervision, which is expected to yield good generalization performance on novel action classes. For OV-STAD, we build two benchmarks based on the existing STAD datasets and propose a simple but effective method based on pretrained video-language models (VLM). To better adapt the holistic VLM for the fine-grained action detection task, we carefully fine-tune it on the localized video region-text pairs. This customized fine-tuning endows the VLM with better motion understanding, thus contributing to a more accurate alignment between video regions and texts. Local region feature and global video feature fusion before alignment is adopted to further improve the action detection performance by providing global context. Our method achieves a promising performance on novel classes.
[186] arXiv:2405.10835 [pdf, other]: Title: A Unified Search and Recommendation Framework Based on Multi-Scenario Learning for Ranking in E-commerce

Authors: Jinhan Liu, Qiyu Chen, Junjie Xu, Junjie Li, Baoli Li, Sulong Xu

Comments: Accepted by SIGIR 2024

Subjects: Information Retrieval (cs.IR)

Search and recommendation (S&R) are the two most important scenarios in e-commerce. The majority of users typically interact with products in S&R scenarios, indicating the need and potential for joint modeling. Traditional multi-scenario models use shared parameters to learn the similarity of multiple tasks, and task-specific parameters to learn the divergence of individual tasks. This coarse-grained modeling approach does not effectively capture the differences between S&R scenarios. Furthermore, this approach does not sufficiently exploit the information across the global label space. These issues can result in the suboptimal performance of multi-scenario models in handling both S&R scenarios. To address these issues, we propose an effective and universal framework for Unified Search and Recommendation (USR), designed with S&R Views User Interest Extractor Layer (IE) and S&R Views Feature Generator Layer (FG) to separately generate user interests and scenario-agnostic feature representations for S&R. Next, we introduce a Global Label Space Multi-Task Layer (GLMT) that uses global labels as supervised signals of auxiliary tasks and jointly models the main task and auxiliary tasks using conditional probability. Extensive experimental evaluations on real-world industrial datasets show that USR can be applied to various multi-scenario models and significantly improve their performance. Online A/B testing also indicates substantial performance gains across multiple metrics. Currently, USR has been successfully deployed in the 7Fresh App.
[187] arXiv:2405.10842 [pdf, ps, other]: Title: Automated Radiology Report Generation: A Review of Recent Advances

Authors: Phillip Sloan, Philip Clatworthy, Edwin Simpson, Majid Mirmehdi

Comments: 24 pages, 8 figures, 6 tables. Submitted to IEEE Reviews in Biomedical Engineering

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Increasing demands on medical imaging departments are taking a toll on the radiologist's ability to deliver timely and accurate reports. Recent technological advances in artificial intelligence have demonstrated great potential for automatic radiology report generation (ARRG), sparking an explosion of research. This survey paper conducts a methodological review of contemporary ARRG approaches by way of (i) assessing datasets based on characteristics, such as availability, size, and adoption rate, (ii) examining deep learning training methods, such as contrastive learning and reinforcement learning, (iii) exploring state-of-the-art model architectures, including variations of CNN and transformer models, (iv) outlining techniques integrating clinical knowledge through multimodal inputs and knowledge graphs, and (v) scrutinising current model evaluation techniques, including commonly applied NLP metrics and qualitative clinical reviews. Furthermore, the quantitative results of the reviewed models are analysed, where the top performing models are examined to seek further insights. Finally, potential new directions are highlighted, with the adoption of additional datasets from other radiological modalities and improved evaluation methods predicted as important areas of future development.
[188] arXiv:2405.10845 [pdf, other]: Title: Natural Language Processing for Requirements Traceability

Authors: Jin L.C. Guo, Jan-Philipp Steghöfer, Andreas Vogelsang, Jane Cleland-Huang

Comments: Book Chapter in the Handbook of Natural Language Processing for Requirements Engineering

Subjects: Software Engineering (cs.SE)

Traceability, the ability to trace relevant software artifacts to support reasoning about the quality of the software and its development process, plays a crucial role in requirements and software engineering, particularly for safety-critical systems. In this chapter, we provide a comprehensive overview of the representative tasks in requirement traceability for which natural language processing (NLP) and related techniques have made considerable progress in the past decade. We first present the definition of traceability in the context of requirements and the overall engineering process, as well as other important concepts related to traceability tasks. Then, we discuss two tasks in detail, including trace link recovery and trace link maintenance. We also introduce two other related tasks concerning when trace links are used in practical contexts. For each task, we explain the characteristics of the task, how it can be approached through NLP techniques, and how to design and conduct the experiment to demonstrate the performance of the NLP techniques. We further discuss practical considerations on how to effectively apply NLP techniques and assess their effectiveness regarding the data set collection, the metrics selection, and the role of humans when evaluating the NLP approaches. Overall, this chapter prepares the readers with the fundamental knowledge of designing automated traceability solutions enabled by NLP in practice.
[189] arXiv:2405.10847 [pdf, other]: Title: Model Predictive Contouring Control for Vehicle Obstacle Avoidance at the Limit of Handling Using Torque Vectoring

Authors: Alberto Bertipaglia, Davide Tavernini, Umberto Montanaro, Mohsen Alirezaei, Riender Happee, Aldo Sorniotti, Barys Shyrokau

Comments: Accepted at IEEE/ASME International Conference on Advanced Intelligent Mechatronics, Boston, USA, 2024

Subjects: Robotics (cs.RO); Systems and Control (eess.SY)

This paper presents an original approach to vehicle obstacle avoidance. It involves the development of a nonlinear Model Predictive Contouring Control, which uses torque vectoring to stabilise and drive the vehicle in evasive manoeuvres at the limit of handling. The proposed algorithm combines motion planning, path tracking and vehicle stability objectives, prioritising collision avoidance in emergencies. The controller's prediction model is a nonlinear double-track vehicle model based on an extended Fiala tyre to capture the nonlinear coupled longitudinal and lateral dynamics. The controller computes the optimal steering angle and the longitudinal forces per each of the four wheels to minimise tracking error in safe situations and maximise the vehicle-to-obstacle distance in emergencies. Thanks to the optimisation of the longitudinal tyre forces, the proposed controller can produce an extra yaw moment, increasing the vehicle's lateral agility to avoid obstacles while keeping the vehicle stable. The optimal forces are constrained in the tyre friction circle not to exceed the tyres and vehicle capabilities. In a high-fidelity simulation environment, we demonstrate the benefits of torque vectoring, showing that our proposed approach is capable of successfully avoiding obstacles and keeping the vehicle stable while driving a double-lane change manoeuvre, in comparison to baselines lacking torque vectoring or collision avoidance prioritisation.
[190] arXiv:2405.10849 [pdf, other]: Title: Generative AI for Test Driven Development: Preliminary Results

Authors: Moritz Mock, Jorge Melegati, Barbara Russo

Comments: Accepted to AI4ASE workshop at XP'24

Subjects: Software Engineering (cs.SE)

Test Driven Development (TDD) is one of the major practices of Extreme Programming for which incremental testing and refactoring trigger the code development. TDD has limited adoption in the industry, as it requires more code to be developed and experienced developers. Generative AI (GenAI) may reduce the extra effort imposed by TDD. In this work, we introduce an approach to automatize TDD by embracing GenAI either in a collaborative interaction pattern in which developers create tests and supervise the AI generation during each iteration or a fully-automated pattern in which developers only supervise the AI generation at the end of the iterations. We run an exploratory experiment with ChatGPT in which the interaction patterns are compared with the non-AI TDD regarding test and code quality and development speed. Overall, we found that, for our experiment and settings, GenAI can be efficiently used in TDD, but it requires supervision of the quality of the produced code. In some cases, it can even mislead non-expert developers and propose solutions just for the sake of the query.
[191] arXiv:2405.10852 [pdf, other]: Title: KernelSHAP-IQ: Weighted Least-Square Optimization for Shapley Interactions

Authors: Fabian Fumagalli, Maximilian Muschalik, Patrick Kolpaczki, Eyke Hüllermeier, Barbara Hammer

Comments: Accepted Paper at ICML 2024. This version is not the Camera Ready Version

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

The Shapley value (SV) is a prevalent approach of allocating credit to machine learning (ML) entities to understand black box ML models. Enriching such interpretations with higher-order interactions is inevitable for complex systems, where the Shapley Interaction Index (SII) is a direct axiomatic extension of the SV. While it is well-known that the SV yields an optimal approximation of any game via a weighted least square (WLS) objective, an extension of this result to SII has been a long-standing open problem, which even led to the proposal of an alternative index. In this work, we characterize higher-order SII as a solution to a WLS problem, which constructs an optimal approximation via SII and $k$-Shapley values ($k$-SII). We prove this representation for the SV and pairwise SII and give empirically validated conjectures for higher orders. As a result, we propose KernelSHAP-IQ, a direct extension of KernelSHAP for SII, and demonstrate state-of-the-art performance for feature interactions.
[192] arXiv:2405.10853 [pdf, other]: Title: The Future of Large Language Model Pre-training is Federated

Authors: Lorenzo Sani, Alex Iacob, Zeyu Cao, Bill Marino, Yan Gao, Tomas Paulik, Wanru Zhao, William F. Shen, Preslav Aleksandrov, Xinchi Qiu, Nicholas D. Lane

Comments: 10 pages, 4 figures, pre-print

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)

Generative pre-trained large language models (LLMs) have demonstrated impressive performance over a wide range of tasks, thanks to the unprecedented amount of data they have been trained on. As established scaling laws indicate, LLMs' future performance improvement depends on the amount of computing and data sources we can leverage for pre-training. Federated learning (FL) has the potential to unleash the majority of the planet's data and computational resources, which are underutilized by the data-center-focused training methodology of current LLM practice. Our work presents a robust, flexible, reproducible FL approach that enables large-scale collaboration across institutions to train LLMs. This would mobilize more computational and data resources while matching or potentially exceeding centralized performance. We further show the effectiveness of the federated training scales with model size and present our approach for training a billion-scale federated LLM using limited resources. This will help data-rich actors to become the protagonists of LLMs pre-training instead of leaving the stage to compute-rich actors alone.
[193] arXiv:2405.10859 [pdf, other]: Title: A Nonlinear Model Predictive Control for Automated Drifting with a Standard Passenger Vehicle

Authors: Stan Meijer, Alberto Bertipaglia, Barys Shyrokau

Comments: Accepted at IEEE/ASME International Conference on Advanced Intelligent Mechatronics, Boston, USA, 2024

Subjects: Robotics (cs.RO); Systems and Control (eess.SY)

This paper presents a novel approach to automated drifting with a standard passenger vehicle, which involves a Nonlinear Model Predictive Control to stabilise and maintain the vehicle at high sideslip angle conditions. The proposed controller architecture is split into three components. The first part consists of the offline computed equilibrium maps, which provide the equilibrium points for each vehicle state given the desired sideslip angle and radius of the path. The second is the predictive controller minimising the errors between the equilibrium and actual vehicle states. The third is a path-following controller, which reduces the path error, altering the equilibrium curvature path. In a high-fidelity simulation environment, we validate the controller architecture capacity to stabilise the vehicle in automated drifting along a desired path, with a maximal lateral path deviation of 1 m. In the experiments with a standard passenger vehicle, we demonstrate that the proposed approach is capable of bringing and maintaining the vehicle at the desired 30 deg sideslip angle in both high and low friction conditions.
[194] arXiv:2405.10860 [pdf, other]: Title: ECR-Chain: Advancing Generative Language Models to Better Emotion-Cause Reasoners through Reasoning Chains

Authors: Zhaopei Huang, Jinming Zhao, Qin Jin

Comments: Accepted by IJCAI 2024

Subjects: Computation and Language (cs.CL)

Understanding the process of emotion generation is crucial for analyzing the causes behind emotions. Causal Emotion Entailment (CEE), an emotion-understanding task, aims to identify the causal utterances in a conversation that stimulate the emotions expressed in a target utterance. However, current works in CEE mainly focus on modeling semantic and emotional interactions in conversations, neglecting the exploration of the emotion-generation process. This hinders the models from deeply understanding emotions, restricting their ability to produce explainable predictions. In this work, inspired by the emotion generation process of "stimulus-appraisal-emotion" in the cognitive appraisal theory, we introduce a step-by-step reasoning method, Emotion-Cause Reasoning Chain (ECR-Chain), to infer the stimulus from the target emotional expressions in conversations. Specifically, we first introduce the ECR-Chain to ChatGPT via few-shot prompting, which significantly improves its performance on the CEE task. We further propose an automated construction process to utilize ChatGPT in building an ECR-Chain set, which can enhance the reasoning abilities of smaller models through supervised training and assist the Vicuna-7B model in achieving state-of-the-art CEE performance. Moreover, our methods can enable these generative language models to effectively perform emotion-cause reasoning in an explainable manner. Our code, data and more details are at https://github.com/hzp3517/ECR-Chain.
[195] arXiv:2405.10861 [pdf, other]: Title: Tailoring Vaccine Messaging with Common-Ground Opinions

Authors: Rickard Stureborg, Sanxing Chen, Ruoyu Xie, Aayushi Patel, Christopher Li, Chloe Qinyu Zhu, Tingnan Hu, Jun Yang, Bhuwan Dhingra

Comments: NAACL Findings 2024

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)

One way to personalize chatbot interactions is by establishing common ground with the intended reader. A domain where establishing mutual understanding could be particularly impactful is vaccine concerns and misinformation. Vaccine interventions are forms of messaging which aim to answer concerns expressed about vaccination. Tailoring responses in this domain is difficult, since opinions often have seemingly little ideological overlap. We define the task of tailoring vaccine interventions to a Common-Ground Opinion (CGO). Tailoring responses to a CGO involves meaningfully improving the answer by relating it to an opinion or belief the reader holds. In this paper we introduce TAILOR-CGO, a dataset for evaluating how well responses are tailored to provided CGOs. We benchmark several major LLMs on this task; finding GPT-4-Turbo performs significantly better than others. We also build automatic evaluation metrics, including an efficient and accurate BERT model that outperforms finetuned LLMs, investigate how to successfully tailor vaccine messaging to CGOs, and provide actionable recommendations from this investigation.
Code and model weights: https://github.com/rickardstureborg/tailor-cgo Dataset: https://huggingface.co/datasets/DukeNLP/tailor-cgo
[196] arXiv:2405.10864 [pdf, other]: Title: Improving face generation quality and prompt following with synthetic captions

Authors: Michail Tarasiou, Stylianos Moschoglou, Jiankang Deng, Stefanos Zafeiriou

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Recent advancements in text-to-image generation using diffusion models have significantly improved the quality of generated images and expanded the ability to depict a wide range of objects. However, ensuring that these models adhere closely to the text prompts remains a considerable challenge. This issue is particularly pronounced when trying to generate photorealistic images of humans. Without significant prompt engineering efforts models often produce unrealistic images and typically fail to incorporate the full extent of the prompt information. This limitation can be largely attributed to the nature of captions accompanying the images used in training large scale diffusion models, which typically prioritize contextual information over details related to the person's appearance. In this paper we address this issue by introducing a training-free pipeline designed to generate accurate appearance descriptions from images of people. We apply this method to create approximately 250,000 captions for publicly available face datasets. We then use these synthetic captions to fine-tune a text-to-image diffusion model. Our results demonstrate that this approach significantly improves the model's ability to generate high-quality, realistic human faces and enhances adherence to the given prompts, compared to the baseline model. We share our synthetic captions, pretrained checkpoints and training code.
[197] arXiv:2405.10868 [pdf, other]: Title: Air Signing and Privacy-Preserving Signature Verification for Digital Documents

Authors: P. Sarveswarasarma, T. Sathulakjan, V. J. V. Godfrey, Thanuja D. Ambegoda

Subjects: Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC)

This paper presents a novel approach to the digital signing of electronic documents through the use of a camera-based interaction system, single-finger tracking for sign recognition, and multi commands executing hand gestures. The proposed solution, referred to as "Air Signature," involves writing the signature in front of the camera, rather than relying on traditional methods such as mouse drawing or physically signing on paper and showing it to a web camera. The goal is to develop a state-of-the-art method for detecting and tracking gestures and objects in real-time. The proposed methods include applying existing gesture recognition and object tracking systems, improving accuracy through smoothing and line drawing, and maintaining continuity during fast finger movements. An evaluation of the fingertip detection, sketching, and overall signing process is performed to assess the effectiveness of the proposed solution. The secondary objective of this research is to develop a model that can effectively recognize the unique signature of a user. This type of signature can be verified by neural cores that analyze the movement, speed, and stroke pixels of the signing in real time. The neural cores use machine learning algorithms to match air signatures to the individual's stored signatures, providing a secure and efficient method of verification. Our proposed System does not require sensors or any hardware other than the camera.
[198] arXiv:2405.10871 [pdf, other]: Title: BraTS-Path Challenge: Assessing Heterogeneous Histopathologic Brain Tumor Sub-regions

Authors: Spyridon Bakas, Siddhesh P. Thakur, Shahriar Faghani, Mana Moassefi, Ujjwal Baid, Verena Chung, Sarthak Pati, Shubham Innani, Bhakti Baheti, Jake Albrecht, Alexandros Karargyris, Hasan Kassem, MacLean P. Nasrallah, Jared T. Ahrendsen, Valeria Barresi, Maria A. Gubbiotti, Giselle Y. López, Calixto-Hope G. Lucas, Michael L. Miller, Lee A. D. Cooper, Jason T. Huse, William R. Bell

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Glioblastoma is the most common primary adult brain tumor, with a grim prognosis - median survival of 12-18 months following treatment, and 4 months otherwise. Glioblastoma is widely infiltrative in the cerebral hemispheres and well-defined by heterogeneous molecular and micro-environmental histopathologic profiles, which pose a major obstacle in treatment. Correctly diagnosing these tumors and assessing their heterogeneity is crucial for choosing the precise treatment and potentially enhancing patient survival rates. In the gold-standard histopathology-based approach to tumor diagnosis, detecting various morpho-pathological features of distinct histology throughout digitized tissue sections is crucial. Such "features" include the presence of cellular tumor, geographic necrosis, pseudopalisading necrosis, areas abundant in microvascular proliferation, infiltration into the cortex, wide extension in subcortical white matter, leptomeningeal infiltration, regions dense with macrophages, and the presence of perivascular or scattered lymphocytes. With these features in mind and building upon the main aim of the BraTS Cluster of Challenges https://www.synapse.org/brats2024, the goal of the BraTS-Path challenge is to provide a systematically prepared comprehensive dataset and a benchmarking environment to develop and fairly compare deep-learning models capable of identifying tumor sub-regions of distinct histologic profile. These models aim to further our understanding of the disease and assist in the diagnosis and grading of conditions in a consistent manner.
[199] arXiv:2405.10874 [pdf, other]: Title: Square-Root Inverse Filter-based GNSS-Visual-Inertial Navigation

Authors: Jun Hu, Xiaoming Lang, Feng Zhang, Yinian Mao, Guoquan Huang

Subjects: Robotics (cs.RO)

While Global Navigation Satellite System (GNSS) is often used to provide global positioning if available, its intermittency and/or inaccuracy calls for fusion with other sensors. In this paper, we develop a novel GNSS-Visual-Inertial Navigation System (GVINS) that fuses visual, inertial, and raw GNSS measurements within the square-root inverse sliding window filtering (SRI-SWF) framework in a tightly coupled fashion, which thus is termed SRI-GVINS. In particular, for the first time, we deeply fuse the GNSS pseudorange, Doppler shift, single-differenced pseudorange, and double-differenced carrier phase measurements, along with the visual-inertial measurements. Inherited from the SRI-SWF, the proposed SRI-GVINS gains significant numerical stability and computational efficiency over the start-of-the-art methods. Additionally, we propose to use a filter to sequentially initialize the reference frame transformation till converges, rather than collecting measurements for batch optimization. We also perform online calibration of GNSS-IMU extrinsic parameters to mitigate the possible extrinsic parameter degradation. The proposed SRI-GVINS is extensively evaluated on our own collected UAV datasets and the results demonstrate that the proposed method is able to suppress VIO drift in real-time and also show the effectiveness of online GNSS-IMU extrinsic calibration. The experimental validation on the public datasets further reveals that the proposed SRI-GVINS outperforms the state-of-the-art methods in terms of both accuracy and efficiency.
[200] arXiv:2405.10875 [pdf, other]: Title: Recursively Feasible Shrinking-Horizon MPC in Dynamic Environments with Conformal Prediction Guarantees

Authors: Charis Stamouli, Lars Lindemann, George J. Pappas

Subjects: Systems and Control (eess.SY); Machine Learning (stat.ML)

In this paper, we focus on the problem of shrinking-horizon Model Predictive Control (MPC) in uncertain dynamic environments. We consider controlling a deterministic autonomous system that interacts with uncontrollable stochastic agents during its mission. Employing tools from conformal prediction, existing works derive high-confidence prediction regions for the unknown agent trajectories, and integrate these regions in the design of suitable safety constraints for MPC. Despite guaranteeing probabilistic safety of the closed-loop trajectories, these constraints do not ensure feasibility of the respective MPC schemes for the entire duration of the mission. We propose a shrinking-horizon MPC that guarantees recursive feasibility via a gradual relaxation of the safety constraints as new prediction regions become available online. This relaxation enforces the safety constraints to hold over the least restrictive prediction region from the set of all available prediction regions. In a comparative case study with the state of the art, we empirically show that our approach results in tighter prediction regions and verify recursive feasibility of our MPC scheme.
[201] arXiv:2405.10877 [pdf, other]: Title: WEITS: A Wavelet-enhanced residual framework for interpretable time series forecasting

Authors: Ziyou Guo, Yan Sun, Tieru Wu

Comments: arXiv admin note: text overlap with arXiv:2310.09488 by other authors

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Time series (TS) forecasting has been an unprecedentedly popular problem in recent years, with ubiquitous applications in both scientific and business fields. Various approaches have been introduced to time series analysis, including both statistical approaches and deep neural networks. Although neural network approaches have illustrated stronger ability of representation than statistical methods, they struggle to provide sufficient interpretablility, and can be too complicated to optimize. In this paper, we present WEITS, a frequency-aware deep learning framework that is highly interpretable and computationally efficient. Through multi-level wavelet decomposition, WEITS novelly infuses frequency analysis into a highly deep learning framework. Combined with a forward-backward residual architecture, it enjoys both high representation capability and statistical interpretability. Extensive experiments on real-world datasets have demonstrated competitive performance of our model, along with its additional advantage of high computation efficiency. Furthermore, WEITS provides a general framework that can always seamlessly integrate with state-of-the-art approaches for time series forecast.
[202] arXiv:2405.10879 [pdf, other]: Title: One registration is worth two segmentations

Authors: Shiqi Huang, Tingfa Xu, Ziyi Shen, Shaheer Ullah Saeed, Wen Yan, Dean Barratt, Yipeng Hu

Comments: Early Accepted by MICCAI2024

Subjects: Computer Vision and Pattern Recognition (cs.CV)

The goal of image registration is to establish spatial correspondence between two or more images, traditionally through dense displacement fields (DDFs) or parametric transformations (e.g., rigid, affine, and splines). Rethinking the existing paradigms of achieving alignment via spatial transformations, we uncover an alternative but more intuitive correspondence representation: a set of corresponding regions-of-interest (ROI) pairs, which we demonstrate to have sufficient representational capability as other correspondence representation methods.Further, it is neither necessary nor sufficient for these ROIs to hold specific anatomical or semantic significance. In turn, we formulate image registration as searching for the same set of corresponding ROIs from both moving and fixed images - in other words, two multi-class segmentation tasks on a pair of images. For a general-purpose and practical implementation, we integrate the segment anything model (SAM) into our proposed algorithms, resulting in a SAM-enabled registration (SAMReg) that does not require any training data, gradient-based fine-tuning or engineered prompts. We experimentally show that the proposed SAMReg is capable of segmenting and matching multiple ROI pairs, which establish sufficiently accurate correspondences, in three clinical applications of registering prostate MR, cardiac MR and abdominal CT images. Based on metrics including Dice and target registration errors on anatomical structures, the proposed registration outperforms both intensity-based iterative algorithms and DDF-predicting learning-based networks, even yielding competitive performance with weakly-supervised registration which requires fully-segmented training data.
[203] arXiv:2405.10880 [pdf, ps, other]: Title: The MESA Security Model 2.0: A Dynamic Framework for Mitigating Stealth Data Exfiltration

Authors: Sanjeev Pratap Singh, Naveed Afzal

Journal-ref: International Journal of Network Security & Its Applications (IJNSA) 2024

Subjects: Cryptography and Security (cs.CR)

The rising complexity of cyber threats calls for a comprehensive reassessment of current security frameworks in business environments. This research focuses on Stealth Data Exfiltration, a significant cyber threat characterized by covert infiltration, extended undetectability, and unauthorized dissemination of confidential data. Our findings reveal that conventional defense-in-depth strategies often fall short in combating these sophisticated threats, highlighting the immediate need for a shift in information risk management across businesses. The evolving nature of cyber threats, driven by advancements in techniques such as social engineering, multi-vector attacks, and Generative AI, underscores the need for robust, adaptable, and comprehensive security strategies. As we navigate this complex landscape, it is crucial to anticipate potential threats and continually update our defenses. We propose a shift from traditional perimeter-based, prevention-focused models, which depend on a static attack surface, to a more dynamic framework that prepares for inevitable breaches. This suggested model, known as MESA 2.0 Security Model, prioritizes swift detection, immediate response, and ongoing resilience, thereby enhancing an organizations ability to promptly identify and neutralize threats, significantly reducing the consequences of security breaches. This study suggests that businesses adopt a forward-thinking and adaptable approach to security management to stay ahead of the ever-changing cyber threat landscape.
[204] arXiv:2405.10883 [pdf, ps, other]: Title: Application of Artificial Intelligence in Schizophrenia Rehabilitation Management: Systematic Literature Review

Authors: Hongyi Yang, Fangyuan Chang, Dian Zhu, Muroi Fumie, Zhao Liu

Subjects: Artificial Intelligence (cs.AI)

This review aims to systematically assess the current status and prospects of artificial intelligence (AI) in the rehabilitation management of patients with schizophrenia and their impact on the rehabilitation process. We selected 70 studies from 2012 to the present, focusing on application, technology categories, products, and data types of machine learning, deep learning, reinforcement learning, and other technologies in mental health interventions and management. The results indicate that AI can be widely used in symptom monitoring, relapse risk prediction, and rehabilitation treatment by analyzing ecological momentary assessment, behavioral, and speech data. This review further explores the potential challenges and future directions of emerging products, technologies, and analytical methods based on AI, such as social media analysis, serious games, and large language models in rehabilitation. In summary, this study systematically reviews the application status of AI in schizophrenia rehabilitation management and provides valuable insights and recommendations for future research paths.
[205] arXiv:2405.10885 [pdf, other]: Title: FA-Depth: Toward Fast and Accurate Self-supervised Monocular Depth Estimation

Authors: Fei Wang, Jun Cheng

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Most existing methods often rely on complex models to predict scene depth with high accuracy, resulting in slow inference that is not conducive to deployment. To better balance precision and speed, we first designed SmallDepth based on sparsity. Second, to enhance the feature representation ability of SmallDepth during training under the condition of equal complexity during inference, we propose an equivalent transformation module(ETM). Third, to improve the ability of each layer in the case of a fixed SmallDepth to perceive different context information and improve the robustness of SmallDepth to the left-right direction and illumination changes, we propose pyramid loss. Fourth, to further improve the accuracy of SmallDepth, we utilized the proposed function approximation loss (APX) to transfer knowledge in the pretrained HQDecv2, obtained by optimizing the previous HQDec to address grid artifacts in some regions, to SmallDepth. Extensive experiments demonstrate that each proposed component improves the precision of SmallDepth without changing the complexity of SmallDepth during inference, and the developed approach achieves state-of-the-art results on KITTI at an inference speed of more than 500 frames per second and with approximately 2 M parameters. The code and models will be publicly available at https://github.com/fwucas/FA-Depth.
[206] arXiv:2405.10887 [pdf, ps, other]: Title: Preservation theorems on sparse classes revisited

Authors: Anuj Dawar, Ioannis Eleftheriadis

Comments: 16 pages

Subjects: Logic in Computer Science (cs.LO); Logic (math.LO)

We revisit the work studying homomorphism preservation for first-order logic in sparse classes of structures initiated in [Atserias et al., JACM 2006] and [Dawar, JCSS 2010]. These established that first-order logic has the homomorphism preservation property in any sparse class that is monotone and addable. It turns out that the assumption of addability is not strong enough for the proofs given. We demonstrate this by constructing classes of graphs of bounded treewidth which are monotone and addable but fail to have homomorphism preservation. We also show that homomorphism preservation fails on the class of planar graphs. On the other hand, the proofs of homomorphism preservation can be recovered by replacing addability by a stronger condition of amalgamation over bottlenecks. This is analogous to a similar condition formulated for extension preservation in [Ateserias et al., SiCOMP 2008].
[207] arXiv:2405.10891 [pdf, other]: Title: Prioritising GitHub Priority Labels

Authors: James Caddy, Christoph Treude

Comments: 4 pages, 5 tables, 2 figures, appearing in PROMISE 2024

Subjects: Software Engineering (cs.SE)

Communities on GitHub often use issue labels as a way of triaging issues by assigning them priority ratings based on how urgently they should be addressed. The labels used are determined by the repository contributors and not standardised by GitHub. This makes it difficult for priority-related reasoning across repositories for both researchers and contributors. Previous work shows interest in how issues are labelled and what the consequences for those labels are. For instance, some previous work has used clustering models and natural language processing to categorise labels without a particular emphasis on priority. With this publication, we introduce a unique data set of 812 manually categorised labels pertaining to priority; normalised and ranked as low-, medium-, or high-priority. To provide an example of how this data set could be used, we have created a tool for GitHub contributors that will create a list of the highest priority issues from the repositories to which they contribute. We have released the data set and the tool for anyone to use on Zenodo because we hope that this will help the open source community address high-priority issues more effectively and inspire other uses.
[208] arXiv:2405.10892 [pdf, other]: Title: Neuroscheduling for Remote Estimation

Authors: Marcos M. Vasconcelos, Yifei Zhang

Comments: Submitted for presentation at the 2024 Asilomar Conference on Signals, Systems, and Computers

Subjects: Systems and Control (eess.SY)

Many modern distributed systems consist of devices that generate more data than what can be transmitted via a communication link in near real time with high-fidelity. We consider the scheduling problem in which a device has access to multiple data sources, but at any moment, only one of them is revealed in real-time to a remote receiver. Even when the sources are Gaussian, and the fidelity criterion is the mean squared error, the globally optimal data selection strategy is not known. We propose a data-driven methodology to search for the elusive optimal solution using linear function approximation approach called neuroscheduling and establish necessary and sufficient conditions for the optimal scheduler to not over fit training data. Additionally, we present several numerical results that show that the globally optimal scheduler and estimator pair to the Gaussian case are nonlinear.
[209] arXiv:2405.10893 [pdf, other]: Title: COGNET-MD, an evaluation framework and dataset for Large Language Model benchmarks in the medical domain

Authors: Dimitrios P. Panagoulias, Persephone Papatheodosiou, Anastasios P. Palamidas, Mattheos Sanoudos, Evridiki Tsoureli-Nikita, Maria Virvou, George A. Tsihrintzis

Comments: Technical Paper

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Large Language Models (LLMs) constitute a breakthrough state-of-the-art Artificial Intelligence (AI) technology which is rapidly evolving and promises to aid in medical diagnosis either by assisting doctors or by simulating a doctor's workflow in more advanced and complex implementations. In this technical paper, we outline Cognitive Network Evaluation Toolkit for Medical Domains (COGNET-MD), which constitutes a novel benchmark for LLM evaluation in the medical domain. Specifically, we propose a scoring-framework with increased difficulty to assess the ability of LLMs in interpreting medical text. The proposed framework is accompanied with a database of Multiple Choice Quizzes (MCQs). To ensure alignment with current medical trends and enhance safety, usefulness, and applicability, these MCQs have been constructed in collaboration with several associated medical experts in various medical domains and are characterized by varying degrees of difficulty. The current (first) version of the database includes the medical domains of Psychiatry, Dentistry, Pulmonology, Dermatology and Endocrinology, but it will be continuously extended and expanded to include additional medical domains.
[210] arXiv:2405.10894 [pdf, other]: Title: Labelled Well Quasi Ordered Classes of Bounded Linear Clique Width

Authors: Aliaume Lopez

Comments: 24 pages

Subjects: Logic in Computer Science (cs.LO)

We provide an algorithm to decide whether a class of finite graphs that has bounded linear clique width is well-quasi-ordered by the induced subgraph relation in the presence of a labelling of the vertices, where the class is given by an $\mathsf{MSO}$-transduction from finite words. This study leverages tools from automata theory, and the proof scheme allows to derive a weak version of the Pouzet conjecture for classes of bounded linear clique-width. We also provide an automata based characterization of which classes of $\mathsf{NLC}$ graphs are labelled-well-quasi-ordered by the induced subgraph relation, where we recover the results of Daligault Rao and Thomass\'e by encoding the models into trees with the gap embedding relation of Dershowitz and Tzameret.
[211] arXiv:2405.10902 [pdf, other]: Title: Where do developers admit their security-related concerns?

Authors: Moritz Mock, Thomas Forrer, Barbara Russo

Comments: Accepted as an extended abstract for XP'24

Subjects: Software Engineering (cs.SE)

Developers use different means to document the security concerns of their code. Because of all of these opportunities, they may forget where the information is stored, or others may not be aware of it, and leave it unmaintained for so long that it becomes obsolete, if not useless. In this work, we analyzed different sources of code documentation from four large-scale, real-world, open-source projects in an industrial setting to understand where developers report their security concerns. In particular, we manually inspected 2.559 instances taken from source code comments, commit messages, and issue trackers. Overall, we found that developers prefer to document security concerns in source code comments and issue trackers. We also found that the longer the comments stay unfixed, the more likely they remain unfixed. Thus, to create awareness among developers, we implemented a pipeline to remind them about the introduction or removal of comments pointing to a security problem.
[212] arXiv:2405.10904 [pdf, other]: Title: Broadening Privacy and Surveillance: Eliciting Interconnected Values with a Scenarios Workbook on Smart Home Cameras

Authors: Richmond Y. Wong, Jason Caleb Valdez, Ashten Alexander, Ariel Chiang, Olivia Quesada, James Pierce

Comments: Proceedings of the 2023 ACM Designing Interactive Systems Conference (DIS '23)

Subjects: Human-Computer Interaction (cs.HC); Computers and Society (cs.CY)

We use a design workbook of speculative scenarios as a values elicitation activity with 14 participants. The workbook depicts use case scenarios with smart home camera technologies that involve surveillance and uneven power relations. The scenarios were initially designed by the researchers to explore scenarios of privacy and surveillance within three social relationships involving "primary" and "non-primary" users: Parents-Children, Landlords-Tenants, and Residents-Domestic Workers. When the scenarios were utilized as part of a values elicitation activity with participants, we found that they reflected on a broader set of interconnected social values beyond privacy and surveillance, including autonomy and agency, physical safety, property rights, trust and accountability, and fairness. The paper suggests that future research about ethical issues in smart homes should conceptualize privacy as interconnected with a broader set of social values (which can align or be in tension with privacy), and reflects on considerations for doing research with non-primary users.
[213] arXiv:2405.10906 [pdf, other]: Title: POSTER: Testing network-based RTK for GNSS receiver security

Authors: Marco Spanghero, Panos Papadimitratos

Comments: To appear in the 17th ACM Conference on Security and Privacy in Wireless and Mobile Networks

Subjects: Cryptography and Security (cs.CR)

Global Navigation Satellite Systems (GNSS) provide precise location, while Real Time Kinematics (RTK) allow mobile receivers (termed rovers), leveraging fixed stations, to correct errors in their Position Navigation and Timing (PNT) solution. This allows compensating for multi-path effects, ionospheric errors, and observation biases, enabling consumer receivers to achieve centimeter-level accuracy. While network distribution of correction streams can be protected with common secure networking practices, the reference stations can still be attacked by GNSS spoofing or jamming. This work investigates (i) the effect RTK reference station spoofing has on the rover's PNT solution quality and (ii) the potential countermeasures towards hardening the RTK infrastructure.
[214] arXiv:2405.10912 [pdf, other]: Title: Synthesis of Temporal Causality

Authors: Bernd Finkbeiner, Hadar Frenkel, Niklas Metzger, Julian Siber

Comments: 36th International Conference on Computer Aided Verification (CAV 2024)

Subjects: Logic in Computer Science (cs.LO)

We present an automata-based algorithm to synthesize omega-regular causes for omega-regular effects on executions of a reactive system, such as counterexamples uncovered by a model checker. Our theory is a generalization of temporal causality, which has recently been proposed as a framework for drawing causal relationships between trace properties on a given trace. So far, algorithms exist only for verifying a single causal relationship and, as an extension, cause synthesis through enumeration, which is complete only for a small fragment of effect properties. This work presents the first complete cause-synthesis algorithm for the class of omega-regular effects. We show that in this case, causes are guaranteed to be omega-regular themselves and can be computed as, e.g., nondeterministic B\"uchi automata. We demonstrate the practical feasibility of this algorithm with a prototype tool and evaluate its performance for cause synthesis and cause checking.
[215] arXiv:2405.10913 [pdf, other]: Title: Blackbox Adaptation for Medical Image Segmentation

Authors: Jay N. Paranjape, Shameema Sikder, S. Swaroop Vedula, Vishal M. Patel

Comments: Accepted early at MICCAI 2024

Subjects: Computer Vision and Pattern Recognition (cs.CV)

In recent years, various large foundation models have been proposed for image segmentation. There models are often trained on large amounts of data corresponding to general computer vision tasks. Hence, these models do not perform well on medical data. There have been some attempts in the literature to perform parameter-efficient finetuning of such foundation models for medical image segmentation. However, these approaches assume that all the parameters of the model are available for adaptation. But, in many cases, these models are released as APIs or blackboxes, with no or limited access to the model parameters and data. In addition, finetuning methods also require a significant amount of compute, which may not be available for the downstream task. At the same time, medical data can't be shared with third-party agents for finetuning due to privacy reasons. To tackle these challenges, we pioneer a blackbox adaptation technique for prompted medical image segmentation, called BAPS. BAPS has two components - (i) An Image-Prompt decoder (IP decoder) module that generates visual prompts given an image and a prompt, and (ii) A Zero Order Optimization (ZOO) Method, called SPSA-GC that is used to update the IP decoder without the need for backpropagating through the foundation model. Thus, our method does not require any knowledge about the foundation model's weights or gradients. We test BAPS on four different modalities and show that our method can improve the original model's performance by around 4%.
[216] arXiv:2405.10918 [pdf, other]: Title: GenToC: Leveraging Partially-Labeled Data for Product Attribute-Value Identification

Authors: D. Subhalingam, Keshav Kolluru, Mausam, Saurabh Singal

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Machine Learning (cs.LG)

In the e-commerce domain, the accurate extraction of attribute-value pairs from product listings (e.g., Brand: Apple) is crucial for enhancing search and recommendation systems. The automation of this extraction process is challenging due to the vast diversity of product categories and their respective attributes, compounded by the lack of extensive, accurately annotated training datasets and the demand for low latency to meet the real-time needs of e-commerce platforms. To address these challenges, we introduce GenToC, a novel two-stage model for extracting attribute-value pairs from product titles. GenToC is designed to train with partially-labeled data, leveraging incomplete attribute-value pairs and obviating the need for a fully annotated dataset. Moreover, we introduce a bootstrapping method that enables GenToC to progressively refine and expand its training dataset. This enhancement substantially improves the quality of data available for training other neural network models that are typically faster but are inherently less capable than GenToC in terms of their capacity to handle partially-labeled data. By supplying an enriched dataset for training, GenToC significantly advances the performance of these alternative models, making them more suitable for real-time deployment. Our results highlight the unique capability of GenToC to learn from a limited set of labeled data and to contribute to the training of more efficient models, marking a significant leap forward in the automated extraction of attribute-value pairs from product titles. GenToC has been successfully integrated into India's largest B2B e-commerce platform, IndiaMART.com, achieving a significant increase of 21.1% in recall over the existing deployed system while maintaining a high precision of 89.5% in this challenging task.
[217] arXiv:2405.10923 [pdf, other]: Title: Randomized Householder QR

Authors: Laura Grigori, Edouard Timsit

Subjects: Numerical Analysis (math.NA)

This paper introduces a randomized Householder QR factorization (RHQR). This factorization can be used to obtain a well conditioned basis of a set of vectors and thus can be employed in a variety of applications. The RHQR factorization of the input matrix $W$ is equivalent to the standard Householder QR factorization of matrix $\Psi W$, where $\Psi$ is a sketching matrix that can be obtained from any subspace embedding technique. For this reason, the RHQR algorithm can also be reconstructed from the Householder QR factorization of the sketched problem, yielding a single-synchronization randomized QR factorization (reconstructRHQR). In most contexts, left-looking RHQR requires a single synchronization per iteration, with half the computational cost of Householder QR, and a similar cost to Randomized Gram-Schmidt (RGS) overall. We discuss the usage of RHQR factorization in the Arnoldi process and then in GMRES, showing thus how it can be used in Krylov subspace methods to solve systems of linear equations. Based on Charles Sheffield's connection between Householder QR and Modified Gram-Schmidt (MGS), a BLAS2-RGS is also derived.
Numerical experiments show that RHQR produces a well conditioned basis whose sketch is numerically orthogonal even for the most difficult inputs, and an accurate factorization. The same results were observed with the high-dimensional operations made in half-precision. The reconstructed RHQR from the HQR factorization of the sketch was stabler than the standard Randomized Cholesky QR.
The first version of this work was made available on HAL on the 7th of July 2023 and can be found at https://hal.science/hal-04156310/
[218] arXiv:2405.10924 [pdf, other]: Title: Boosting Few-Pixel Robustness Verification via Covering Verification Designs

Authors: Yuval Shapira, Naor Wiesel, Shahar Shabelman, Dana Drachsler-Cohen

Subjects: Machine Learning (cs.LG); Logic in Computer Science (cs.LO); Programming Languages (cs.PL)

Proving local robustness is crucial to increase the reliability of neural networks. While many verifiers prove robustness in $L_\infty$ $\epsilon$-balls, very little work deals with robustness verification in $L_0$ $\epsilon$-balls, capturing robustness to few pixel attacks. This verification introduces a combinatorial challenge, because the space of pixels to perturb is discrete and of exponential size. A previous work relies on covering designs to identify sets for defining $L_\infty$ neighborhoods, which if proven robust imply that the $L_0$ $\epsilon$-ball is robust. However, the number of neighborhoods to verify remains very high, leading to a high analysis time. We propose covering verification designs, a combinatorial design that tailors effective but analysis-incompatible coverings to $L_0$ robustness verification. The challenge is that computing a covering verification design introduces a high time and memory overhead, which is intensified in our setting, where multiple candidate coverings are required to identify how to reduce the overall analysis time. We introduce CoVerD, an $L_0$ robustness verifier that selects between different candidate coverings without constructing them, but by predicting their block size distribution. This prediction relies on a theorem providing closed-form expressions for the mean and variance of this distribution. CoVerD constructs the chosen covering verification design on-the-fly, while keeping the memory consumption minimal and enabling to parallelize the analysis. The experimental results show that CoVerD reduces the verification time on average by up to 5.1x compared to prior work and that it scales to larger $L_0$ $\epsilon$-balls.
[219] arXiv:2405.10927 [pdf, other]: Title: Using Degeneracy in the Loss Landscape for Mechanistic Interpretability

Authors: Lucius Bushnaq, Jake Mendel, Stefan Heimersheim, Dan Braun, Nicholas Goldowsky-Dill, Kaarel Hänni, Cindy Wu, Marius Hobbhahn

Subjects: Machine Learning (cs.LG)

Mechanistic Interpretability aims to reverse engineer the algorithms implemented by neural networks by studying their weights and activations. An obstacle to reverse engineering neural networks is that many of the parameters inside a network are not involved in the computation being implemented by the network. These degenerate parameters may obfuscate internal structure. Singular learning theory teaches us that neural network parameterizations are biased towards being more degenerate, and parameterizations with more degeneracy are likely to generalize further. We identify 3 ways that network parameters can be degenerate: linear dependence between activations in a layer; linear dependence between gradients passed back to a layer; ReLUs which fire on the same subset of datapoints. We also present a heuristic argument that modular networks are likely to be more degenerate, and we develop a metric for identifying modules in a network that is based on this argument. We propose that if we can represent a neural network in a way that is invariant to reparameterizations that exploit the degeneracies, then this representation is likely to be more interpretable, and we provide some evidence that such a representation is likely to have sparser interactions. We introduce the Interaction Basis, a tractable technique to obtain a representation that is invariant to degeneracies from linear dependence of activations or Jacobians.
[220] arXiv:2405.10928 [pdf, other]: Title: The Local Interaction Basis: Identifying Computationally-Relevant and Sparsely Interacting Features in Neural Networks

Authors: Lucius Bushnaq, Stefan Heimersheim Nicholas Goldowsky-Dill, Dan Braun, Jake Mendel, Kaarel Hänni, Avery Griffin, Jörn Stöhler, Magdalena Wache, Marius Hobbhahn

Subjects: Machine Learning (cs.LG)

Mechanistic interpretability aims to understand the behavior of neural networks by reverse-engineering their internal computations. However, current methods struggle to find clear interpretations of neural network activations because a decomposition of activations into computational features is missing. Individual neurons or model components do not cleanly correspond to distinct features or functions. We present a novel interpretability method that aims to overcome this limitation by transforming the activations of the network into a new basis - the Local Interaction Basis (LIB). LIB aims to identify computational features by removing irrelevant activations and interactions. Our method drops irrelevant activation directions and aligns the basis with the singular vectors of the Jacobian matrix between adjacent layers. It also scales features based on their importance for downstream computation, producing an interaction graph that shows all computationally-relevant features and interactions in a model. We evaluate the effectiveness of LIB on modular addition and CIFAR-10 models, finding that it identifies more computationally-relevant features that interact more sparsely, compared to principal component analysis. However, LIB does not yield substantial improvements in interpretability or interaction sparsity when applied to language models. We conclude that LIB is a promising theory-driven approach for analyzing neural networks, but in its current form is not applicable to large language models.
[221] arXiv:2405.10931 [pdf, other]: Title: FitNets: An Adaptive Framework to Learn Accurate Traffic Distributions

Authors: Alexander Dietmüller, Laurent Vanbever

Subjects: Networking and Internet Architecture (cs.NI)

Learning precise distributions of traffic features (e.g., burst sizes, packet inter-arrival time) is still a largely unsolved problem despite being critical for management tasks such as capacity planning or anomaly detection. A key limitation nowadays is the lack of feedback between the control plane and the data plane. Programmable data planes offer the opportunity to create systems that let data- and control plane to work together, compensating their respective shortcomings.
We present FitNets, an adaptive network monitoring system leveraging feedback between the data- and the control plane to learn accurate traffic distributions. In the control plane, FitNets relies on Kernel Density Estimators which allow to provably learn distributions of any shape. In the data plane, FitNets tests the accuracy of the learned distributions while dynamically adapting data collection to the observed distribution fitness, prioritizing under-fitted features.
We have implemented FitNets in Python and P4 (including on commercially available programmable switches) and tested it on real and synthetic traffic traces. FitNets is practical: it is able to estimate hundreds of distributions from up to 60 millions samples per second, while providing accurate error estimates and adapting to complex traffic patterns.
[222] arXiv:2405.10934 [pdf, other]: Title: Reconstruction of Manipulated Garment with Guided Deformation Prior

Authors: Ren Li, Corentin Dumery, Zhantao Deng, Pascal Fua

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Modeling the shape of garments has received much attention, but most existing approaches assume the garments to be worn by someone, which constrains the range of shapes they can assume. In this work, we address shape recovery when garments are being manipulated instead of worn, which gives rise to an even larger range of possible shapes. To this end, we leverage the implicit sewing patterns (ISP) model for garment modeling and extend it by adding a diffusion-based deformation prior to represent these shapes. To recover 3D garment shapes from incomplete 3D point clouds acquired when the garment is folded, we map the points to UV space, in which our priors are learned, to produce partial UV maps, and then fit the priors to recover complete UV maps and 2D to 3D mappings. Experimental results demonstrate the superior reconstruction accuracy of our method compared to previous ones, especially when dealing with large non-rigid deformations arising from the manipulations.
[223] arXiv:2405.10936 [pdf, other]: Title: A Survey on Large Language Models with Multilingualism: Recent Advances and New Frontiers

Authors: Kaiyu Huang, Fengran Mo, Hongliang Li, You Li, Yuanchi Zhang, Weijian Yi, Yulong Mao, Jinchen Liu, Yuzhuang Xu, Jinan Xu, Jian-Yun Nie, Yang Liu

Comments: 54 pages, Work in Progress

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

The rapid development of Large Language Models (LLMs) demonstrates remarkable multilingual capabilities in natural language processing, attracting global attention in both academia and industry. To mitigate potential discrimination and enhance the overall usability and accessibility for diverse language user groups, it is important for the development of language-fair technology. Despite the breakthroughs of LLMs, the investigation into the multilingual scenario remains insufficient, where a comprehensive survey to summarize recent approaches, developments, limitations, and potential solutions is desirable. To this end, we provide a survey with multiple perspectives on the utilization of LLMs in the multilingual scenario. We first rethink the transitions between previous and current research on pre-trained language models. Then we introduce several perspectives on the multilingualism of LLMs, including training and inference methods, model security, multi-domain with language culture, and usage of datasets. We also discuss the major challenges that arise in these aspects, along with possible solutions. Besides, we highlight future research directions that aim at further enhancing LLMs with multilingualism. The survey aims to help the research community address multilingual problems and provide a comprehensive understanding of the core concepts, key techniques, and latest developments in multilingual natural language processing based on LLMs.
[224] arXiv:2405.10938 [pdf, other]: Title: Observational Scaling Laws and the Predictability of Language Model Performance

Authors: Yangjun Ruan, Chris J. Maddison, Tatsunori Hashimoto

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (stat.ML)

Understanding how language model performance varies with scale is critical to benchmark and algorithm development. Scaling laws are one approach to building this understanding, but the requirement of training models across many different scales has limited their use. We propose an alternative, observational approach that bypasses model training and instead builds scaling laws from ~80 publically available models. Building a single scaling law from multiple model families is challenging due to large variations in their training compute efficiencies and capabilities. However, we show that these variations are consistent with a simple, generalized scaling law where language model performance is a function of a low-dimensional capability space, and model families only vary in their efficiency in converting training compute to capabilities. Using this approach, we show the surprising predictability of complex scaling phenomena: we show that several emergent phenomena follow a smooth, sigmoidal behavior and are predictable from small models; we show that the agent performance of models such as GPT-4 can be precisely predicted from simpler non-agentic benchmarks; and we show how to predict the impact of post-training interventions like Chain-of-Thought and Self-Consistency as language model capabilities continue to improve.
[225] arXiv:2405.10939 [pdf, other]: Title: DINO as a von Mises-Fisher mixture model

Authors: Hariprasath Govindarajan, Per Sidén, Jacob Roll, Fredrik Lindsten

Comments: Accepted to ICLR 2023

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

Self-distillation methods using Siamese networks are popular for self-supervised pre-training. DINO is one such method based on a cross-entropy loss between $K$-dimensional probability vectors, obtained by applying a softmax function to the dot product between representations and learnt prototypes. Given the fact that the learned representations are $L^2$-normalized, we show that DINO and its derivatives, such as iBOT, can be interpreted as a mixture model of von Mises-Fisher components. With this interpretation, DINO assumes equal precision for all components when the prototypes are also $L^2$-normalized. Using this insight we propose DINO-vMF, that adds appropriate normalization constants when computing the cluster assignment probabilities. Unlike DINO, DINO-vMF is stable also for the larger ViT-Base model with unnormalized prototypes. We show that the added flexibility of the mixture model is beneficial in terms of better image representations. The DINO-vMF pre-trained model consistently performs better than DINO on a range of downstream tasks. We obtain similar improvements for iBOT-vMF vs iBOT and thereby show the relevance of our proposed modification also for other methods derived from DINO.

Cross-lists for Mon, 20 May 24

[226] arXiv:2405.09843 (cross-list from econ.TH) [pdf, other]: Title: Organizational Selection of Innovation

Authors: Lucas Böttcher, Ronald Klingebiel

Comments: 40 pages, 13 figures, 2 tables

Subjects: Theoretical Economics (econ.TH); Multiagent Systems (cs.MA); Physics and Society (physics.soc-ph); Applications (stat.AP)

Budgetary constraints force organizations to pursue only a subset of possible innovation projects. Identifying which subset is most promising is an error-prone exercise, and involving multiple decision makers may be prudent. This raises the question of how to most effectively aggregate their collective nous. Our model of organizational portfolio selection provides some first answers. We show that portfolio performance can vary widely. Delegating evaluation makes sense when organizations employ the relevant experts and can assign projects to them. In most other settings, aggregating the impressions of multiple agents leads to better performance than delegation. In particular, letting agents rank projects often outperforms alternative aggregation rules -- including averaging agents' project scores as well as counting their approval votes -- especially when organizations have tight budgets and can select only a few project alternatives out of many.
[227] arXiv:2405.09993 (cross-list from hep-th) [pdf, other]: Title: Learning BPS Spectra and the Gap Conjecture

Authors: Sergei Gukov, Rak-Kyeong Seong

Comments: 11 pages, 4 figures, 3 tables

Subjects: High Energy Physics - Theory (hep-th); Machine Learning (cs.LG); Mathematical Physics (math-ph); Geometric Topology (math.GT)

We explore statistical properties of BPS q-series for 3d N=2 strongly coupled supersymmetric theories that correspond to a particular family of 3-manifolds Y. We discover that gaps between exponents in the q-series are statistically more significant at the beginning of the q-series compared to gaps that appear in higher powers of q. Our observations are obtained by calculating saliencies of q-series features used as input data for principal component analysis, which is a standard example of an explainable machine learning technique that allows for a direct calculation and a better analysis of feature saliencies.
[228] arXiv:2405.10322 (cross-list from physics.soc-ph) [pdf, ps, other]: Title: Exploring the Independent Cascade Model and Its Evolution in Social Network Information Diffusion

Authors: Jixuan He, Yutong Guo, Jiacheng Zhao

Subjects: Physics and Society (physics.soc-ph); Artificial Intelligence (cs.AI)

This paper delves into the paramount significance of information dissemination within the dynamic realm of social networks. It underscores the pivotal role of information communication models in unraveling the intricacies of data propagation in the digital age. By shedding light on the profound influence of these models, it not only lays the groundwork for exploring various hierarchies and their manifestations but also serves as a catalyst for further research in this formidable field.
[229] arXiv:2405.10327 (cross-list from physics.geo-ph) [pdf, other]: Title: WISER: multimodal variational inference for full-waveform inversion without dimensionality reduction

Authors: Ziyi Yin, Rafael Orozco, Felix J. Herrmann

Subjects: Geophysics (physics.geo-ph); Computational Engineering, Finance, and Science (cs.CE)

We present a semi-amortized variational inference framework designed for computationally feasible uncertainty quantification in 2D full-waveform inversion to explore the multimodal posterior distribution without dimensionality reduction. The framework is called WISER, short for full-Waveform variational Inference via Subsurface Extensions with Refinements. WISER leverages the power of generative artificial intelligence to perform approximate amortized inference that is low-cost albeit showing an amortization gap. This gap is closed through non-amortized refinements that make frugal use of acoustic wave physics. Case studies illustrate that WISER is capable of full-resolution, computationally feasible, and reliable uncertainty estimates of velocity models and imaged reflectivities.
[230] arXiv:2405.10329 (cross-list from stat.AP) [pdf, ps, other]: Title: Causal inference approach to appraise long-term effects of maintenance policy on functional performance of asphalt pavements

Authors: Lingyun You, Nanning Guo, Zhengwu Long, Fusong Wang, Chundi Si, Aboelkasim Diab

Subjects: Applications (stat.AP); Artificial Intelligence (cs.AI)

Asphalt pavements as the most prevalent transportation infrastructure, are prone to serious traffic safety problems due to functional or structural damage caused by stresses or strains imposed through repeated traffic loads and continuous climatic cycles. The good quality or high serviceability of infrastructure networks is vital to the urbanization and industrial development of nations. In order to maintain good functional pavement performance and extend the service life of asphalt pavements, the long-term performance of pavements under maintenance policies needs to be evaluated and favorable options selected based on the condition of the pavement. A major challenge in evaluating maintenance policies is to produce valid treatments for the outcome assessment under the control of uncertainty of vehicle loads and the disturbance of freeze-thaw cycles in the climatic environment. In this study, a novel causal inference approach combining a classical causal structural model and a potential outcome model framework is proposed to appraise the long-term effects of four preventive maintenance treatments for longitudinal cracking over a 5-year period of upkeep. Three fundamental issues were brought to our attention: 1) detection of causal relationships prior to variables under environmental loading (identification of causal structure); 2) obtaining direct causal effects of treatment on outcomes excluding covariates (identification of causal effects); and 3) sensitivity analysis of causal relationships. The results show that the method can accurately evaluate the effect of preventive maintenance treatments and assess the maintenance time to cater well for the functional performance of different preventive maintenance approaches. This framework could help policymakers to develop appropriate maintenance strategies for pavements.
[231] arXiv:2405.10343 (cross-list from q-bio.BM) [pdf, other]: Title: UniCorn: A Unified Contrastive Learning Approach for Multi-view Molecular Representation Learning

Authors: Shikun Feng, Yuyan Ni, Minghao Li, Yanwen Huang, Zhi-Ming Ma, Wei-Ying Ma, Yanyan Lan

Subjects: Biomolecules (q-bio.BM); Artificial Intelligence (cs.AI)

Recently, a noticeable trend has emerged in developing pre-trained foundation models in the domains of CV and NLP. However, for molecular pre-training, there lacks a universal model capable of effectively applying to various categories of molecular tasks, since existing prevalent pre-training methods exhibit effectiveness for specific types of downstream tasks. Furthermore, the lack of profound understanding of existing pre-training methods, including 2D graph masking, 2D-3D contrastive learning, and 3D denoising, hampers the advancement of molecular foundation models. In this work, we provide a unified comprehension of existing pre-training methods through the lens of contrastive learning. Thus their distinctions lie in clustering different views of molecules, which is shown beneficial to specific downstream tasks. To achieve a complete and general-purpose molecular representation, we propose a novel pre-training framework, named UniCorn, that inherits the merits of the three methods, depicting molecular views in three different levels. SOTA performance across quantum, physicochemical, and biological tasks, along with comprehensive ablation study, validate the universality and effectiveness of UniCorn.
[232] arXiv:2405.10345 (cross-list from q-bio.QM) [pdf, other]: Title: Machine Learning Driven Biomarker Selection for Medical Diagnosis

Authors: Divyagna Bavikadi, Ayushi Agarwal, Shashank Ganta, Yunro Chung, Lusheng Song, Ji Qiu, Paulo Shakarian

Subjects: Quantitative Methods (q-bio.QM); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Recent advances in experimental methods have enabled researchers to collect data on thousands of analytes simultaneously. This has led to correlational studies that associated molecular measurements with diseases such as Alzheimer's, Liver, and Gastric Cancer. However, the use of thousands of biomarkers selected from the analytes is not practical for real-world medical diagnosis and is likely undesirable due to potentially formed spurious correlations. In this study, we evaluate 4 different methods for biomarker selection and 4 different machine learning (ML) classifiers for identifying correlations, evaluating 16 approaches in all. We found that contemporary methods outperform previously reported logistic regression in cases where 3 and 10 biomarkers are permitted. When specificity is fixed at 0.9, ML approaches produced a sensitivity of 0.240 (3 biomarkers) and 0.520 (10 biomarkers), while standard logistic regression provided a sensitivity of 0.000 (3 biomarkers) and 0.040 (10 biomarkers). We also noted that causal-based methods for biomarker selection proved to be the most performant when fewer biomarkers were permitted, while univariate feature selection was the most performant when a greater number of biomarkers were permitted.
[233] arXiv:2405.10348 (cross-list from q-bio.QM) [pdf, other]: Title: Learning to Predict Mutation Effects of Protein-Protein Interactions by Microenvironment-aware Hierarchical Prompt Learning

Authors: Lirong Wu, Yijun Tian, Haitao Lin, Yufei Huang, Siyuan Li, Nitesh V Chawla, Stan Z. Li

Subjects: Quantitative Methods (q-bio.QM); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Protein-protein bindings play a key role in a variety of fundamental biological processes, and thus predicting the effects of amino acid mutations on protein-protein binding is crucial. To tackle the scarcity of annotated mutation data, pre-training with massive unlabeled data has emerged as a promising solution. However, this process faces a series of challenges: (1) complex higher-order dependencies among multiple (more than paired) structural scales have not yet been fully captured; (2) it is rarely explored how mutations alter the local conformation of the surrounding microenvironment; (3) pre-training is costly, both in data size and computational burden. In this paper, we first construct a hierarchical prompt codebook to record common microenvironmental patterns at different structural scales independently. Then, we develop a novel codebook pre-training task, namely masked microenvironment modeling, to model the joint distribution of each mutation with their residue types, angular statistics, and local conformational changes in the microenvironment. With the constructed prompt codebook, we encode the microenvironment around each mutation into multiple hierarchical prompts and combine them to flexibly provide information to wild-type and mutated protein complexes about their microenvironmental differences. Such a hierarchical prompt learning framework has demonstrated superior performance and training efficiency over state-of-the-art pre-training-based methods in mutation effect prediction and a case study of optimizing human antibodies against SARS-CoV-2.
[234] arXiv:2405.10355 (cross-list from physics.soc-ph) [pdf, other]: Title: Assessing the Impact of Case Correction Methods on the Fairness of COVID-19 Predictive Models

Authors: Daniel Smolyak, Saad Abrar, Naman Awasthi, Vanessa Frias-Martinez

Subjects: Physics and Society (physics.soc-ph); Computers and Society (cs.CY)

One of the central difficulties of addressing the COVID-19 pandemic has been accurately measuring and predicting the spread of infections. In particular, official COVID-19 case counts in the United States are under counts of actual caseloads due to the absence of universal testing policies. Researchers have proposed a variety of methods for recovering true caseloads, often through the estimation of statistical models on more reliable measures, such as death and hospitalization counts, positivity rates, and demographics. However, given the disproportionate impact of COVID-19 on marginalized racial, ethnic, and socioeconomic groups, it is important to consider potential unintended effects of case correction methods on these groups. Thus, we investigate two of these correction methods for their impact on a downstream COVID-19 case prediction task. For that purpose, we tailor an auditing approach and evaluation protocol to analyze the fairness of the COVID-19 prediction task by measuring the difference in model performance between majority-White counties and majority-minority counties. We find that one of the correction methods improves fairness, decreasing differences in performance between majority-White and majority-minority counties, while the other method increases differences, introducing bias. While these results are mixed, it is evident that correction methods have the potential to exacerbate existing biases in COVID-19 case data and in downstream prediction tasks. Researchers planning to develop or use case correction methods must be careful to consider negative effects on marginalized groups.
[235] arXiv:2405.10360 (cross-list from quant-ph) [pdf, other]: Title: Adversarial Robustness Guarantees for Quantum Classifiers

Authors: Neil Dowling, Maxwell T. West, Angus Southwell, Azar C. Nakhl, Martin Sevior, Muhammad Usman, Kavan Modi

Comments: 9+12 pages, 3 figures. Comments welcome

Subjects: Quantum Physics (quant-ph); Statistical Mechanics (cond-mat.stat-mech); Machine Learning (cs.LG); Chaotic Dynamics (nlin.CD)

Despite their ever more widespread deployment throughout society, machine learning algorithms remain critically vulnerable to being spoofed by subtle adversarial tampering with their input data. The prospect of near-term quantum computers being capable of running {quantum machine learning} (QML) algorithms has therefore generated intense interest in their adversarial vulnerability. Here we show that quantum properties of QML algorithms can confer fundamental protections against such attacks, in certain scenarios guaranteeing robustness against classically-armed adversaries. We leverage tools from many-body physics to identify the quantum sources of this protection. Our results offer a theoretical underpinning of recent evidence which suggest quantum advantages in the search for adversarial robustness. In particular, we prove that quantum classifiers are: (i) protected against weak perturbations of data drawn from the trained distribution, (ii) protected against local attacks if they are insufficiently scrambling, and (iii) protected against universal adversarial attacks if they are sufficiently quantum chaotic. Our analytic results are supported by numerical evidence demonstrating the applicability of our theorems and the resulting robustness of a quantum classifier in practice. This line of inquiry constitutes a concrete pathway to advantage in QML, orthogonal to the usually sought improvements in model speed or accuracy.
[236] arXiv:2405.10369 (cross-list from astro-ph.IM) [pdf, other]: Title: Reinforcement learning

Authors: Sarod Yatawatta

Comments: To appear, Astronomy & Computing

Subjects: Instrumentation and Methods for Astrophysics (astro-ph.IM); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Observing celestial objects and advancing our scientific knowledge about them involves tedious planning, scheduling, data collection and data post-processing. Many of these operational aspects of astronomy are guided and executed by expert astronomers. Reinforcement learning is a mechanism where we (as humans and astronomers) can teach agents of artificial intelligence to perform some of these tedious tasks. In this paper, we will present a state of the art overview of reinforcement learning and how it can benefit astronomy.
[237] arXiv:2405.10399 (cross-list from stat.ML) [pdf, ps, other]: Title: A note on continuous-time online learning

Authors: Lexing Ying

Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Numerical Analysis (math.NA); Optimization and Control (math.OC)

In online learning, the data is provided in a sequential order, and the goal of the learner is to make online decisions to minimize overall regrets. This note is concerned with continuous-time models and algorithms for several online learning problems: online linear optimization, adversarial bandit, and adversarial linear bandit. For each problem, we extend the discrete-time algorithm to the continuous-time setting and provide a concise proof of the optimal regret bound.
[238] arXiv:2405.10414 (cross-list from math.OC) [pdf, ps, other]: Title: A Reliability Theory of Compromise Decisions for Large-Scale Stochastic Programs

Authors: Shuotao Diao, Suvrajeet Sen

Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG)

Stochastic programming models can lead to very large-scale optimization problems for which it may be impossible to enumerate all possible scenarios. In such cases, one adopts a sampling-based solution methodology in which case the reliability of the resulting decisions may be suspect. For such instances, it is advisable to adopt methodologies that promote variance reduction. One such approach goes under a framework known as "compromise decision", which requires multiple replications of the solution procedure. This paper studies the reliability of stochastic programming solutions resulting from the "compromise decision" process. This process is characterized by minimizing an aggregation of objective function approximations across replications, presumably conducted in parallel. We refer to the post-parallel-processing problem as the problem of "compromise decision". We quantify the reliability of compromise decisions by estimating the expectation and variance of the "pessimistic distance" of sampled instances from the set of true optimal decisions. Such pessimistic distance is defined as an estimate of the largest possible distance of the solution of the sampled instance from the "true" optimal solution set. The Rademacher average of instances is used to bound the sample complexity of the compromise decision.
[239] arXiv:2405.10438 (cross-list from math.OC) [pdf, ps, other]: Title: Optimization-Aided Construction of Multivariate Chebyshev Polynomials

Authors: Mareike Dressler, Simon Foucart, Etienne de Klerk, Mioara Joldes, Jean Bernard Lasserre, Yuan Xu

Subjects: Optimization and Control (math.OC); Numerical Analysis (math.NA)

This article is concerned with an extension of univariate Chebyshev polynomials of the first kind to the multivariate setting, where one chases best approximants to specific monomials by polynomials of lower degree relative to the uniform norm. Exploiting the Moment-SOS hierarchy, we devise a versatile semidefinite-programming-based procedure to compute such best approximants, as well as associated signatures. Applying this procedure in three variables leads to the values of best approximation errors for all mononials up to degree six on the euclidean ball, the simplex, and the cross-polytope. Furthermore, inspired by numerical experiments, we obtain explicit expressions for Chebyshev polynomials in two cases unresolved before, namely for the monomial $x_1^2 x_2^2 x_3$ on the euclidean ball and for the monomial $x_1^2 x_2 x_3$ on the simplex.
[240] arXiv:2405.10442 (cross-list from physics.flu-dyn) [pdf, ps, other]: Title: Data-driven low-dimensional model of a sedimenting flexible fiber

Authors: Andrew J Fox, Michael D. Graham

Subjects: Fluid Dynamics (physics.flu-dyn); Machine Learning (cs.LG)

The dynamics of flexible filaments entrained in flow, important for understanding many biological and industrial processes, are computationally expensive to model with full-physics simulations. This work describes a data-driven technique to create high-fidelity low-dimensional models of flexible fiber dynamics using machine learning; the technique is applied to sedimentation in a quiescent, viscous Newtonian fluid, using results from detailed simulations as the data set. The approach combines an autoencoder neural network architecture to learn a low-dimensional latent representation of the filament shape, with a neural ODE that learns the evolution of the particle in the latent state. The model was designed to model filaments of varying flexibility, characterized by an elasto-gravitational number $\mathcal{B}$, and was trained on a data set containing the evolution of fibers beginning at set angles of inclination. For the range of $\mathcal{B}$ considered here (100-10000), the filament shape dynamics can be represented with high accuracy with only four degrees of freedom, in contrast to the 93 present in the original bead-spring model used to generate the dynamic trajectories. We predict the evolution of fibers set at arbitrary angles and demonstrate that our data-driven model can accurately forecast the evolution of a fiber at both trained and untrained elasto-gravitational numbers.
[241] arXiv:2405.10448 (cross-list from cond-mat.mtrl-sci) [pdf, other]: Title: Dynamic In-context Learning with Conversational Models for Data Extraction and Materials Property Prediction

Authors: Chinedu Ekuma

Subjects: Materials Science (cond-mat.mtrl-sci); Artificial Intelligence (cs.AI); Computational Physics (physics.comp-ph)

The advent of natural language processing and large language models (LLMs) has revolutionized the extraction of data from unstructured scholarly papers. However, ensuring data trustworthiness remains a significant challenge. In this paper, we introduce PropertyExtractor, an open-source tool that leverages advanced conversational LLMs like Google Gemini-Pro and OpenAI GPT-4, blends zero-shot with few-shot in-context learning, and employs engineered prompts for the dynamic refinement of structured information hierarchies, enabling autonomous, efficient, scalable, and accurate identification, extraction, and verification of material property data. Our tests on material data demonstrate precision and recall exceeding 93% with an error rate of approximately 10%, highlighting the effectiveness and versatility of the toolkit. We apply PropertyExtractor to generate a database of 2D material thicknesses, a critical parameter for device integration. The rapid evolution of the field has outpaced both experimental measurements and computational methods, creating a significant data gap. Our work addresses this gap and showcases the potential of PropertyExtractor as a reliable and efficient tool for the autonomous generation of diverse material property databases, advancing the field.
[242] arXiv:2405.10449 (cross-list from econ.EM) [pdf, other]: Title: Optimal Text-Based Time-Series Indices

Authors: David Ardia, Keven Bluteau

Subjects: Econometrics (econ.EM); Artificial Intelligence (cs.AI); Computational Finance (q-fin.CP)

We propose an approach to construct text-based time-series indices in an optimal way--typically, indices that maximize the contemporaneous relation or the predictive performance with respect to a target variable, such as inflation. We illustrate our methodology with a corpus of news articles from the Wall Street Journal by optimizing text-based indices focusing on tracking the VIX index and inflation expectations. Our results highlight the superior performance of our approach compared to existing indices.
[243] arXiv:2405.10490 (cross-list from stat.ME) [pdf, ps, other]: Title: Neural Optimization with Adaptive Heuristics for Intelligent Marketing System

Authors: Changshuai Wei, Benjamin Zelditch, Joyce Chen, Andre Assuncao Silva T Ribeiro, Jingyi Kenneth Tay, Borja Ocejo Elizondo, Keerthi Selvaraj, Aman Gupta, Licurgo Benemann De Almeida

Comments: KDD 2024

Subjects: Methodology (stat.ME); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Machine Learning (cs.LG); Optimization and Control (math.OC)

Computational marketing has become increasingly important in today's digital world, facing challenges such as massive heterogeneous data, multi-channel customer journeys, and limited marketing budgets. In this paper, we propose a general framework for marketing AI systems, the Neural Optimization with Adaptive Heuristics (NOAH) framework. NOAH is the first general framework for marketing optimization that considers both to-business (2B) and to-consumer (2C) products, as well as both owned and paid channels. We describe key modules of the NOAH framework, including prediction, optimization, and adaptive heuristics, providing examples for bidding and content optimization. We then detail the successful application of NOAH to LinkedIn's email marketing system, showcasing significant wins over the legacy ranking system. Additionally, we share details and insights that are broadly useful, particularly on: (i) addressing delayed feedback with lifetime value, (ii) performing large-scale linear programming with randomization, (iii) improving retrieval with audience expansion, (iv) reducing signal dilution in targeting tests, and (v) handling zero-inflated heavy-tail metrics in statistical testing.
[244] arXiv:2405.10550 (cross-list from eess.IV) [pdf, other]: Title: LighTDiff: Surgical Endoscopic Image Low-Light Enhancement with T-Diffusion

Authors: Tong Chen, Qingcheng Lyu, Long Bai, Erjian Guo, Huxin Gao, Xiaoxiao Yang, Hongliang Ren, Luping Zhou

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Advances in endoscopy use in surgeries face challenges like inadequate lighting. Deep learning, notably the Denoising Diffusion Probabilistic Model (DDPM), holds promise for low-light image enhancement in the medical field. However, DDPMs are computationally demanding and slow, limiting their practical medical applications. To bridge this gap, we propose a lightweight DDPM, dubbed LighTDiff. It adopts a T-shape model architecture to capture global structural information using low-resolution images and gradually recover the details in subsequent denoising steps. We further prone the model to significantly reduce the model size while retaining performance. While discarding certain downsampling operations to save parameters leads to instability and low efficiency in convergence during the training, we introduce a Temporal Light Unit (TLU), a plug-and-play module, for more stable training and better performance. TLU associates time steps with denoised image features, establishing temporal dependencies of the denoising steps and improving denoising outcomes. Moreover, while recovering images using the diffusion model, potential spectral shifts were noted. We further introduce a Chroma Balancer (CB) to mitigate this issue. Our LighTDiff outperforms many competitive LLIE methods with exceptional computational efficiency.
[245] arXiv:2405.10552 (cross-list from stat.ML) [pdf, other]: Title: Data Science Principles for Interpretable and Explainable AI

Authors: Kris Sankaran

Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

Society's capacity for algorithmic problem-solving has never been greater. Artificial Intelligence is now applied across more domains than ever, a consequence of powerful abstractions, abundant data, and accessible software. As capabilities have expanded, so have risks, with models often deployed without fully understanding their potential impacts. Interpretable and interactive machine learning aims to make complex models more transparent and controllable, enhancing user agency. This review synthesizes key principles from the growing literature in this field.
We first introduce precise vocabulary for discussing interpretability, like the distinction between glass box and explainable algorithms. We then explore connections to classical statistical and design principles, like parsimony and the gulfs of interaction. Basic explainability techniques -- including learned embeddings, integrated gradients, and concept bottlenecks -- are illustrated with a simple case study. We also review criteria for objectively evaluating interpretability approaches. Throughout, we underscore the importance of considering audience goals when designing interactive algorithmic systems. Finally, we outline open challenges and discuss the potential role of data science in addressing them. Code to reproduce all examples can be found at https://go.wisc.edu/3k1ewe.
[246] arXiv:2405.10561 (cross-list from eess.IV) [pdf, other]: Title: Infrared Image Super-Resolution via Lightweight Information Split Network

Authors: Shijie Liu, Kang Yan, Feiwei Qin, Changmiao Wang, Ruiquan Ge, Kai Zhang, Jie Huang

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Single image super-resolution (SR) is an established pixel-level vision task aimed at reconstructing a high-resolution image from its degraded low-resolution counterpart. Despite the notable advancements achieved by leveraging deep neural networks for SR, most existing deep learning architectures feature an extensive number of layers, leading to high computational complexity and substantial memory demands. These issues become particularly pronounced in the context of infrared image SR, where infrared devices often have stringent storage and computational constraints. To mitigate these challenges, we introduce a novel, efficient, and precise single infrared image SR model, termed the Lightweight Information Split Network (LISN). The LISN comprises four main components: shallow feature extraction, deep feature extraction, dense feature fusion, and high-resolution infrared image reconstruction. A key innovation within this model is the introduction of the Lightweight Information Split Block (LISB) for deep feature extraction. The LISB employs a sequential process to extract hierarchical features, which are then aggregated based on the relevance of the features under consideration. By integrating channel splitting and shift operations, the LISB successfully strikes an optimal balance between enhanced SR performance and a lightweight framework. Comprehensive experimental evaluations reveal that the proposed LISN achieves superior performance over contemporary state-of-the-art methods in terms of both SR quality and model complexity, affirming its efficacy for practical deployment in resource-constrained infrared imaging applications.
[247] arXiv:2405.10570 (cross-list from eess.IV) [pdf, ps, other]: Title: Simultaneous Deep Learning of Myocardium Segmentation and T2 Quantification for Acute Myocardial Infarction MRI

Authors: Yirong Zhou, Chengyan Wang, Mengtian Lu, Kunyuan Guo, Zi Wang, Dan Ruan, Rui Guo, Peijun Zhao, Jianhua Wang, Naiming Wu, Jianzhong Lin, Yinyin Chen, Hang Jin, Lianxin Xie, Lilan Wu, Liuhong Zhu, Jianjun Zhou, Congbo Cai, He Wang, Xiaobo Qu

Comments: 10 pages, 8 figures, 6 tables

Subjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI)

In cardiac Magnetic Resonance Imaging (MRI) analysis, simultaneous myocardial segmentation and T2 quantification are crucial for assessing myocardial pathologies. Existing methods often address these tasks separately, limiting their synergistic potential. To address this, we propose SQNet, a dual-task network integrating Transformer and Convolutional Neural Network (CNN) components. SQNet features a T2-refine fusion decoder for quantitative analysis, leveraging global features from the Transformer, and a segmentation decoder with multiple local region supervision for enhanced accuracy. A tight coupling module aligns and fuses CNN and Transformer branch features, enabling SQNet to focus on myocardium regions. Evaluation on healthy controls (HC) and acute myocardial infarction patients (AMI) demonstrates superior segmentation dice scores (89.3/89.2) compared to state-of-the-art methods (87.7/87.9). T2 quantification yields strong linear correlations (Pearson coefficients: 0.84/0.93) with label values for HC/AMI, indicating accurate mapping. Radiologist evaluations confirm SQNet's superior image quality scores (4.60/4.58 for segmentation, 4.32/4.42 for T2 quantification) over state-of-the-art methods (4.50/4.44 for segmentation, 3.59/4.37 for T2 quantification). SQNet thus offers accurate simultaneous segmentation and quantification, enhancing cardiac disease diagnosis, such as AMI.
[248] arXiv:2405.10649 (cross-list from eess.SP) [pdf, other]: Title: Recovery of Sparse Graph Signals

Authors: Gal Morgenstern, Tirza Routtenberg

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

Subjects: Signal Processing (eess.SP); Systems and Control (eess.SY); Optimization and Control (math.OC)

This paper investigates the recovery of a node-domain sparse graph signal from the output of a graph filter. This problem, often referred to as the identification of the source of a diffused sparse graph signal, is seminal in the field of graph signal processing (GSP). Sparse graph signals can be used in the modeling of a variety of real-world applications in networks, such as social, biological, and power systems, and enable various GSP tasks, such as graph signal reconstruction, blind deconvolution, and sampling. In this paper, we assume double sparsity of both the graph signal and the graph topology, as well as a low-order graph filter. We propose three algorithms to reconstruct the support set of the input sparse graph signal from the graph filter output samples, leveraging these assumptions and the generalized information criterion (GIC). First, we describe the graph multiple GIC (GM-GIC) method, which is based on partitioning the dictionary elements (graph filter matrix columns) that capture information on the signal into smaller subsets. Then, the local GICs are computed for each subset and aggregated to make a global decision. Second, inspired by the well-known branch and bound (BNB) approach, we develop the graph-based branch and bound GIC (graph-BNB-GIC), and incorporate a new tractable heuristic bound tailored to the graph and graph filter characteristics. Finally, we propose the graph-based first order correction (GFOC) method, which improves existing sparse recovery methods by iteratively examining potential improvements to the GIC cost function through replacing elements from the estimated support set with elements from their one-hop neighborhood. We conduct simulations that demonstrate that the proposed sparse recovery methods outperform existing methods in terms of support set recovery accuracy, and without a significant computational overhead.
[249] arXiv:2405.10691 (cross-list from eess.IV) [pdf, other]: Title: LoCI-DiffCom: Longitudinal Consistency-Informed Diffusion Model for 3D Infant Brain Image Completion

Authors: Zihao Zhu, Tianli Tao, Yitian Tao, Haowen Deng, Xinyi Cai, Gaofeng Wu, Kaidong Wang, Haifeng Tang, Lixuan Zhu, Zhuoyang Gu, Jiawei Huang, Dinggang Shen, Han Zhang

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

The infant brain undergoes rapid development in the first few years after birth.Compared to cross-sectional studies, longitudinal studies can depict the trajectories of infants brain development with higher accuracy, statistical power and flexibility.However, the collection of infant longitudinal magnetic resonance (MR) data suffers a notorious dropout problem, resulting in incomplete datasets with missing time points. This limitation significantly impedes subsequent neuroscience and clinical modeling. Yet, existing deep generative models are facing difficulties in missing brain image completion, due to sparse data and the nonlinear, dramatic contrast/geometric variations in the developing brain. We propose LoCI-DiffCom, a novel Longitudinal Consistency-Informed Diffusion model for infant brain image Completion,which integrates the images from preceding and subsequent time points to guide a diffusion model for generating high-fidelity missing data. Our designed LoCI module can work on highly sparse sequences, relying solely on data from two temporal points. Despite wide separation and diversity between age time points, our approach can extract individualized developmental features while ensuring context-aware consistency. Our experiments on a large infant brain MR dataset demonstrate its effectiveness with consistent performance on missing infant brain MR completion even in big gap scenarios, aiding in better delineation of early developmental trajectories.
[250] arXiv:2405.10705 (cross-list from eess.IV) [pdf, other]: Title: 3D Vessel Reconstruction from Sparse-View Dynamic DSA Images via Vessel Probability Guided Attenuation Learning

Authors: Zhentao Liu, Huangxuan Zhao, Wenhui Qin, Zhenghong Zhou, Xinggang Wang, Wenping Wang, Xiaochun Lai, Chuansheng Zheng, Dinggang Shen, Zhiming Cui

Comments: 12 pages, 13 figures, 5 tables

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Digital Subtraction Angiography (DSA) is one of the gold standards in vascular disease diagnosing. With the help of contrast agent, time-resolved 2D DSA images deliver comprehensive insights into blood flow information and can be utilized to reconstruct 3D vessel structures. Current commercial DSA systems typically demand hundreds of scanning views to perform reconstruction, resulting in substantial radiation exposure. However, sparse-view DSA reconstruction, aimed at reducing radiation dosage, is still underexplored in the research community. The dynamic blood flow and insufficient input of sparse-view DSA images present significant challenges to the 3D vessel reconstruction task. In this study, we propose to use a time-agnostic vessel probability field to solve this problem effectively. Our approach, termed as vessel probability guided attenuation learning, represents the DSA imaging as a complementary weighted combination of static and dynamic attenuation fields, with the weights derived from the vessel probability field. Functioning as a dynamic mask, vessel probability provides proper gradients for both static and dynamic fields adaptive to different scene types. This mechanism facilitates a self-supervised decomposition between static backgrounds and dynamic contrast agent flow, and significantly improves the reconstruction quality. Our model is trained by minimizing the disparity between synthesized projections and real captured DSA images. We further employ two training strategies to improve our reconstruction quality: (1) coarse-to-fine progressive training to achieve better geometry and (2) temporal perturbed rendering loss to enforce temporal consistency. Experimental results have demonstrated superior quality on both 3D vessel reconstruction and 2D view synthesis.
[251] arXiv:2405.10723 (cross-list from eess.IV) [pdf, other]: Title: Eddeep: Fast eddy-current distortion correction for diffusion MRI with deep learning

Authors: Antoine Legouhy, Ross Callaghan, Whitney Stee, Philippe Peigneux, Hojjat Azadbakht, Hui Zhang

Comments: submitted to MICCAI 2024

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Modern diffusion MRI sequences commonly acquire a large number of volumes with diffusion sensitization gradients of differing strengths or directions. Such sequences rely on echo-planar imaging (EPI) to achieve reasonable scan duration. However, EPI is vulnerable to off-resonance effects, leading to tissue susceptibility and eddy-current induced distortions. The latter is particularly problematic because it causes misalignment between volumes, disrupting downstream modelling and analysis. The essential correction of eddy distortions is typically done post-acquisition, with image registration. However, this is non-trivial because correspondence between volumes can be severely disrupted due to volume-specific signal attenuations induced by varying directions and strengths of the applied gradients. This challenge has been successfully addressed by the popular FSL~Eddy tool but at considerable computational cost. We propose an alternative approach, leveraging recent advances in image processing enabled by deep learning (DL). It consists of two convolutional neural networks: 1) An image translator to restore correspondence between images; 2) A registration model to align the translated images. Results demonstrate comparable distortion estimates to FSL~Eddy, while requiring only modest training sample sizes. This work, to the best of our knowledge, is the first to tackle this problem with deep learning. Together with recently developed DL-based susceptibility correction techniques, they pave the way for real-time preprocessing of diffusion MRI, facilitating its wider uptake in the clinic.
[252] arXiv:2405.10754 (cross-list from math.OC) [pdf, other]: Title: Stable Phase Retrieval with Mirror Descent

Authors: Jean-Jacques Godeme, Jalal Fadili, Claude Amra, Myriam Zerrad

Subjects: Optimization and Control (math.OC); Computer Vision and Pattern Recognition (cs.CV); Information Theory (cs.IT)

In this paper, we aim to reconstruct an n-dimensional real vector from m phaseless measurements corrupted by an additive noise. We extend the noiseless framework developed in [15], based on mirror descent (or Bregman gradient descent), to deal with noisy measurements and prove that the procedure is stable to (small enough) additive noise. In the deterministic case, we show that mirror descent converges to a critical point of the phase retrieval problem, and if the algorithm is well initialized and the noise is small enough, the critical point is near the true vector up to a global sign change. When the measurements are i.i.d Gaussian and the signal-to-noise ratio is large enough, we provide global convergence guarantees that ensure that with high probability, mirror descent converges to a global minimizer near the true vector (up to a global sign change), as soon as the number of measurements m is large enough. The sample complexity bound can be improved if a spectral method is used to provide a good initial guess. We complement our theoretical study with several numerical results showing that mirror descent is both a computationally and statistically efficient scheme to solve the phase retrieval problem.
[253] arXiv:2405.10762 (cross-list from q-fin.RM) [pdf, ps, other]: Title: Research on Credit Risk Early Warning Model of Commercial Banks Based on Neural Network Algorithm

Authors: Yu Cheng, Qin Yang, Liyang Wang, Ao Xiang, Jingyu Zhang

Subjects: Risk Management (q-fin.RM); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

In the realm of globalized financial markets, commercial banks are confronted with an escalating magnitude of credit risk, thereby imposing heightened requisites upon the security of bank assets and financial stability. This study harnesses advanced neural network techniques, notably the Backpropagation (BP) neural network, to pioneer a novel model for preempting credit risk in commercial banks. The discourse initially scrutinizes conventional financial risk preemptive models, such as ARMA, ARCH, and Logistic regression models, critically analyzing their real-world applications. Subsequently, the exposition elaborates on the construction process of the BP neural network model, encompassing network architecture design, activation function selection, parameter initialization, and objective function construction. Through comparative analysis, the superiority of neural network models in preempting credit risk in commercial banks is elucidated. The experimental segment selects specific bank data, validating the model's predictive accuracy and practicality. Research findings evince that this model efficaciously enhances the foresight and precision of credit risk management.
[254] arXiv:2405.10763 (cross-list from cond-mat.dis-nn) [pdf, other]: Title: Integer Traffic Assignment Problem: Algorithms and Insights on Random Graphs

Authors: Rayan Harfouche, Giovanni Piccioli, Lenka Zdeborová

Comments: 37 pages, 15 figures

Subjects: Disordered Systems and Neural Networks (cond-mat.dis-nn); Discrete Mathematics (cs.DM); Optimization and Control (math.OC); Computation (stat.CO)

Path optimization is a fundamental concern across various real-world scenarios, ranging from traffic congestion issues to efficient data routing over the internet. The Traffic Assignment Problem (TAP) is a classic continuous optimization problem in this field. This study considers the Integer Traffic Assignment Problem (ITAP), a discrete variant of TAP. ITAP involves determining optimal routes for commuters in a city represented by a graph, aiming to minimize congestion while adhering to integer flow constraints on paths. This restriction makes ITAP an NP-hard problem. While conventional TAP prioritizes repulsive interactions to minimize congestion, this work also explores the case of attractive interactions, related to minimizing the number of occupied edges. We present and evaluate multiple algorithms to address ITAP, including a message passing algorithm, a greedy approach, simulated annealing, and relaxation of ITAP to TAP. Inspired by studies of random ensembles in the large-size limit in statistical physics, comparisons between these algorithms are conducted on large sparse random regular graphs with a random set of origin-destination pairs. Our results indicate that while the simplest greedy algorithm performs competitively in the repulsive scenario, in the attractive case the message-passing-based algorithm and simulated annealing demonstrate superiority. We then investigate the relationship between TAP and ITAP in the repulsive case. We find that, as the number of paths increases, the solution of TAP converges toward that of ITAP, and we investigate the speed of this convergence. Depending on the number of paths, our analysis leads us to identify two scaling regimes: in one the average flow per edge is of order one, and in another the number of paths scales quadratically with the size of the graph, in which case the continuous relaxation solves the integer problem closely.
[255] arXiv:2405.10780 (cross-list from eess.SP) [pdf, ps, other]: Title: Intelligent Neural Interfaces: An Emerging Era in Neurotechnology

Authors: Mahsa Shoaran, Uisub Shin, MohammadAli Shaeri

Subjects: Signal Processing (eess.SP); Hardware Architecture (cs.AR); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Neurons and Cognition (q-bio.NC)

Integrating smart algorithms on neural devices presents significant opportunities for various brain disorders. In this paper, we review the latest advancements in the development of three categories of intelligent neural prostheses featuring embedded signal processing on the implantable or wearable device. These include: 1) Neural interfaces for closed-loop symptom tracking and responsive stimulation; 2) Neural interfaces for emerging network-related conditions, such as psychiatric disorders; and 3) Intelligent BMI SoCs for movement recovery following paralysis.
[256] arXiv:2405.10782 (cross-list from physics.app-ph) [pdf, ps, other]: Title: Using oxides to compute with heat

Authors: Guillaume F. Nataf (1), Sebastian Volz (2), Jose Ordonez-Miranda (2), Jorge Íñiguez-González (3), Riccardo Rurali (4), Brahim Dkhil (5) ((1) GREMAN UMR 7347, CNRS, University of Tours, INSA Centre Val de Loire, Tours, France, (2) LIMMS, CNRS-IIS IRL 2820, The University of Tokyo, Tokyo, Japan, (3) Materials Research and Technology Department, LIST, Esch-sur-Alzette, Luxembourg, (4) Institut de Ciència de Materials de Barcelona, ICMAB-CSIC, Campus UAB, Bellaterra, Spain, (5) Université Paris-Saclay, CentraleSupélec, CNRS-UMR8580, Laboratoire SPMS, Gif-sur-Yvette, France)

Journal-ref: Nature Reviews Materials (2024)

Subjects: Applied Physics (physics.app-ph); Emerging Technologies (cs.ET)

One of the most innovative possibilities offered by oxides is the use of heat currents for computational purposes. Towards this goal, phase-change oxides, including ferroelectrics, ferromagnets and related materials, could reproduce sources, logic units and memories used in current and future computing schemes.
[257] arXiv:2405.10789 (cross-list from math.CO) [pdf, other]: Title: On Minimal Transversals of Maximal Cliques in Graphs

Authors: Endre Boros, Vladimir Gurvich, Martin Milanič, Dmitry Tikhanovsky, Yushi Uno

Subjects: Combinatorics (math.CO); Discrete Mathematics (cs.DM); Data Structures and Algorithms (cs.DS)

A hypergraph is conformal if it is the family of maximal cliques of a graph. In this paper we are interested in the problem of determining when is the family of minimal transversal of maximal cliques of a graph conformal. Such graphs are called clique dually conformal (CDC for short). As our main results, we completely characterize CDC graphs within the families of triangle-free graphs and split graphs. Both characterizations lead to polynomial-time recognition algorithms. We also show that the class of CDC graphs is closed under substitution, in the strong sense that substituting a graph $H$ for a vertex of a graph $G$ results in a CDC graph if and only if both $G$ and $H$ are CDC.
[258] arXiv:2405.10803 (cross-list from eess.IV) [pdf, other]: Title: A Large-scale Multi Domain Leukemia Dataset for the White Blood Cells Detection with Morphological Attributes for Explainability

Authors: Abdul Rehman, Talha Meraj, Aiman Mahmood Minhas, Ayisha Imran, Mohsen Ali, Waqas Sultani

Comments: Early Accept

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Earlier diagnosis of Leukemia can save thousands of lives annually. The prognosis of leukemia is challenging without the morphological information of White Blood Cells (WBC) and relies on the accessibility of expensive microscopes and the availability of hematologists to analyze Peripheral Blood Samples (PBS). Deep Learning based methods can be employed to assist hematologists. However, these algorithms require a large amount of labeled data, which is not readily available. To overcome this limitation, we have acquired a realistic, generalized, and large dataset. To collect this comprehensive dataset for real-world applications, two microscopes from two different cost spectrums (high-cost HCM and low-cost LCM) are used for dataset capturing at three magnifications (100x, 40x, 10x) through different sensors (high-end camera for HCM, middle-level camera for LCM and mobile-phone camera for both). The high-sensor camera is 47 times more expensive than the middle-level camera and HCM is 17 times more expensive than LCM. In this collection, using HCM at high resolution (100x), experienced hematologists annotated 10.3k WBC types (14) and artifacts, having 55k morphological labels (Cell Size, Nuclear Chromatin, Nuclear Shape, etc.) from 2.4k images of several PBS leukemia patients. Later on, these annotations are transferred to other 2 magnifications of HCM, and 3 magnifications of LCM, and on each camera captured images. Along with the LeukemiaAttri dataset, we provide baselines over multiple object detectors and Unsupervised Domain Adaptation (UDA) strategies, along with morphological information-based attribute prediction. The dataset will be publicly available after publication to facilitate the research in this direction.
[259] arXiv:2405.10812 (cross-list from q-bio.GN) [pdf, other]: Title: VQDNA: Unleashing the Power of Vector Quantization for Multi-Species Genomic Sequence Modeling

Authors: Siyuan Li, Zedong Wang, Zicheng Liu, Di Wu, Cheng Tan, Jiangbin Zheng, Yufei Huang, Stan Z. Li

Comments: ICML 2024. Preprint V1 with 16 pages and 5 figures

Subjects: Genomics (q-bio.GN); Artificial Intelligence (cs.AI)

Similar to natural language models, pre-trained genome language models are proposed to capture the underlying intricacies within genomes with unsupervised sequence modeling. They have become essential tools for researchers and practitioners in biology. However, the \textit{hand-crafted} tokenization policies used in these models may not encode the most discriminative patterns from the limited vocabulary of genomic data. In this paper, we introduce VQDNA, a general-purpose framework that renovates genome tokenization from the perspective of genome vocabulary learning. By leveraging vector-quantized codebook as \textit{learnable} vocabulary, VQDNA can adaptively tokenize genomes into \textit{pattern-aware} embeddings in an end-to-end manner. To further push its limits, we propose Hierarchical Residual Quantization (HRQ), where varying scales of codebooks are designed in a hierarchy to enrich the genome vocabulary in a coarse-to-fine manner. Extensive experiments on 32 genome datasets demonstrate VQDNA's superiority and favorable parameter efficiency compared to existing genome language models. Notably, empirical analysis of SARS-CoV-2 mutations reveals the fine-grained pattern awareness and biological significance of learned HRQ vocabulary, highlighting its untapped potential for broader applications in genomics.
[260] arXiv:2405.10815 (cross-list from math.OC) [pdf, other]: Title: A Functional Model Method for Nonconvex Nonsmooth Conditional Stochastic Optimization

Authors: Andrzej Ruszczyński, Shangzhe Yang

Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG); Machine Learning (stat.ML)

We consider stochastic optimization problems involving an expected value of a nonlinear function of a base random vector and a conditional expectation of another function depending on the base random vector, a dependent random vector, and the decision variables. We call such problems conditional stochastic optimization problems. They arise in many applications, such as uplift modeling, reinforcement learning, and contextual optimization. We propose a specialized single time-scale stochastic method for nonconvex constrained conditional stochastic optimization problems with a Lipschitz smooth outer function and a generalized differentiable inner function. In the method, we approximate the inner conditional expectation with a rich parametric model whose mean squared error satisfies a stochastic version of a {\L}ojasiewicz condition. The model is used by an inner learning algorithm. The main feature of our approach is that unbiased stochastic estimates of the directions used by the method can be generated with one observation from the joint distribution per iteration, which makes it applicable to real-time learning. The directions, however, are not gradients or subgradients of any overall objective function. We prove the convergence of the method with probability one, using the method of differential inclusions and a specially designed Lyapunov function, involving a stochastic generalization of the Bregman distance. Finally, a numerical illustration demonstrates the viability of our approach.
[261] arXiv:2405.10817 (cross-list from stat.ML) [pdf, ps, other]: Title: Restless Linear Bandits

Authors: Azadeh Khaleghi

Subjects: Machine Learning (stat.ML); Information Theory (cs.IT); Machine Learning (cs.LG)

A more general formulation of the linear bandit problem is considered to allow for dependencies over time. Specifically, it is assumed that there exists an unknown $\mathbb{R}^d$-valued stationary $\varphi$-mixing sequence of parameters $(\theta_t,~t \in \mathbb{N})$ which gives rise to pay-offs. This instance of the problem can be viewed as a generalization of both the classical linear bandits with iid noise, and the finite-armed restless bandits. In light of the well-known computational hardness of optimal policies for restless bandits, an approximation is proposed whose error is shown to be controlled by the $\varphi$-dependence between consecutive $\theta_t$. An optimistic algorithm, called LinMix-UCB, is proposed for the case where $\theta_t$ has an exponential mixing rate. The proposed algorithm is shown to incur a sub-linear regret of $\mathcal{O}\left(\sqrt{d n\mathrm{polylog}(n) }\right)$ with respect to an oracle that always plays a multiple of $\mathbb{E}\theta_t$. The main challenge in this setting is to ensure that the exploration-exploitation strategy is robust against long-range dependencies. The proposed method relies on Berbee's coupling lemma to carefully select near-independent samples and construct confidence ellipsoids around empirical estimates of $\mathbb{E}\theta_t$.
[262] arXiv:2405.10828 (cross-list from eess.SP) [pdf, other]: Title: Analysis of Impulsive Interference in Digital Audio Broadcasting Systems in Electric Vehicles

Authors: Chin-Hung Chen, Wen-Hung Huang, Boris Karanov, Alex Young, Yan Wu, Wim van Houtum

Comments: 44th Symposium on Information Theory and Signal Processing in the Benelux (SITB 2024), Delft, the Netherlands

Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG)

Recently, new types of interference in electric vehicles (EVs), such as converters switching and/or battery chargers, have been found to degrade the performance of wireless digital transmission systems. Measurements show that such an interference is characterized by impulsive behavior and is widely varying in time. This paper uses recorded data from our EV testbed to analyze the impulsive interference in the digital audio broadcasting band. Moreover, we use our analysis to obtain a corresponding interference model. In particular, we studied the temporal characteristics of the interference and confirmed that its amplitude indeed exhibits an impulsive behavior. Our results show that impulsive events span successive received signal samples and thus indicate a bursty nature. To this end, we performed a data-driven modification of a well-established model for bursty impulsive interference, the Markov-Middleton model, to produce synthetic noise realization. We investigate the optimal symbol detector design based on the proposed model and show significant performance gains compared to the conventional detector based on the additive white Gaussian noise assumption.
[263] arXiv:2405.10833 (cross-list from eess.IV) [pdf, other]: Title: Automatic segmentation of Organs at Risk in Head and Neck cancer patients from CT and MRI scans

Authors: Sébastien Quetin, Andrew Heschl, Mauricio Murillo, Murali Rohit, Shirin A. Enger, Farhad Maleki

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Background and purpose: Deep Learning (DL) has been widely explored for Organs at Risk (OARs) segmentation; however, most studies have focused on a single modality, either CT or MRI, not both simultaneously. This study presents a high-performing DL pipeline for segmentation of 30 OARs from MRI and CT scans of Head and Neck (H&N) cancer patients.
Materials and methods: Paired CT and MRI-T1 images from 42 H&N cancer patients alongside annotation for 30 OARs from the H&N OAR CT & MR segmentation challenge dataset were used to develop a segmentation pipeline. After cropping irrelevant regions, rigid followed by non-rigid registration of CT and MRI volumes was performed. Two versions of the CT volume, representing soft tissues and bone anatomy, were stacked with the MRI volume and used as input to an nnU-Net pipeline. Modality Dropout was used during the training to force the model to learn from the different modalities. Segmentation masks were predicted with the trained model for an independent set of 14 new patients. The mean Dice Score (DS) and Hausdorff Distance (HD) were calculated for each OAR across these patients to evaluate the pipeline.
Results: This resulted in an overall mean DS and HD of 0.777 +- 0.118 and 3.455 +- 1.679, respectively, establishing the state-of-the-art (SOTA) for this challenge at the time of submission.
Conclusion: The proposed pipeline achieved the best DS and HD among all participants of the H&N OAR CT and MR segmentation challenge and sets a new SOTA for automated segmentation of H&N OARs.
[264] arXiv:2405.10870 (cross-list from eess.IV) [pdf, other]: Title: Multicenter Privacy-Preserving Model Training for Deep Learning Brain Metastases Autosegmentation

Authors: Yixing Huang, Zahra Khodabakhshi, Ahmed Gomaa, Manuel Schmidt, Rainer Fietkau, Matthias Guckenberger, Nicolaus Andratschke, Christoph Bert, Stephanie Tanadini-Lang, Florian Putz

Comments: Submission to the Green Journal (Major Revision)

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Objectives: This work aims to explore the impact of multicenter data heterogeneity on deep learning brain metastases (BM) autosegmentation performance, and assess the efficacy of an incremental transfer learning technique, namely learning without forgetting (LWF), to improve model generalizability without sharing raw data.
Materials and methods: A total of six BM datasets from University Hospital Erlangen (UKER), University Hospital Zurich (USZ), Stanford, UCSF, NYU and BraTS Challenge 2023 on BM segmentation were used for this evaluation. First, the multicenter performance of a convolutional neural network (DeepMedic) for BM autosegmentation was established for exclusive single-center training and for training on pooled data, respectively. Subsequently bilateral collaboration was evaluated, where a UKER pretrained model is shared to another center for further training using transfer learning (TL) either with or without LWF.
Results: For single-center training, average F1 scores of BM detection range from 0.625 (NYU) to 0.876 (UKER) on respective single-center test data. Mixed multicenter training notably improves F1 scores at Stanford and NYU, with negligible improvement at other centers. When the UKER pretrained model is applied to USZ, LWF achieves a higher average F1 score (0.839) than naive TL (0.570) and single-center training (0.688) on combined UKER and USZ test data. Naive TL improves sensitivity and contouring accuracy, but compromises precision. Conversely, LWF demonstrates commendable sensitivity, precision and contouring accuracy. When applied to Stanford, similar performance was observed.
Conclusion: Data heterogeneity results in varying performance in BM autosegmentation, posing challenges to model generalizability. LWF is a promising approach to peer-to-peer privacy-preserving model training.
[265] arXiv:2405.10890 (cross-list from astro-ph.IM) [pdf, other]: Title: A Versatile Framework for Analyzing Galaxy Image Data by Implanting Human-in-the-loop on a Large Vision Model

Authors: Mingxiang Fu, Yu Song, Jiameng Lv, Liang Cao, Peng Jia, Nan Li, Xiangru Li, Jifeng Liu, A-Li Luo, Bo Qiu, Shiyin Shen, Liangping Tu, Lili Wang, Shoulin Wei, Haifeng Yang, Zhenping Yi, Zhiqiang Zou

Comments: 26 pages, 10 figures, to be published on Chinese Physics C

Subjects: Instrumentation and Methods for Astrophysics (astro-ph.IM); Astrophysics of Galaxies (astro-ph.GA); Artificial Intelligence (cs.AI)

The exponential growth of astronomical datasets provides an unprecedented opportunity for humans to gain insight into the Universe. However, effectively analyzing this vast amount of data poses a significant challenge. Astronomers are turning to deep learning techniques to address this, but the methods are limited by their specific training sets, leading to considerable duplicate workloads too. Hence, as an example to present how to overcome the issue, we built a framework for general analysis of galaxy images, based on a large vision model (LVM) plus downstream tasks (DST), including galaxy morphological classification, image restoration, object detection, parameter extraction, and more. Considering the low signal-to-noise ratio of galaxy images and the imbalanced distribution of galaxy categories, we have incorporated a Human-in-the-loop (HITL) module into our large vision model, which leverages human knowledge to enhance the reliability and interpretability of processing galaxy images interactively. The proposed framework exhibits notable few-shot learning capabilities and versatile adaptability to all the abovementioned tasks on galaxy images in the DESI legacy imaging surveys. Expressly, for object detection, trained by 1000 data points, our DST upon the LVM achieves an accuracy of 96.7%, while ResNet50 plus Mask R-CNN gives an accuracy of 93.1%; for morphology classification, to obtain AUC ~0.9, LVM plus DST and HITL only requests 1/50 training sets compared to ResNet18. Expectedly, multimodal data can be integrated similarly, which opens up possibilities for conducting joint analyses with datasets spanning diverse domains in the era of multi-message astronomy.
[266] arXiv:2405.10897 (cross-list from math.OC) [pdf, other]: Title: Efficient Line Search Method Based on Regression and Uncertainty Quantification

Authors: Sören Laue, Tomislav Prusina

Comments: To be featured in LION18 2024

Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG)

Unconstrained optimization problems are typically solved using iterative methods, which often depend on line search techniques to determine optimal step lengths in each iteration. This paper introduces a novel line search approach. Traditional line search methods, aimed at determining optimal step lengths, often discard valuable data from the search process and focus on refining step length intervals. This paper proposes a more efficient method using Bayesian optimization, which utilizes all available data points, i.e., function values and gradients, to guide the search towards a potential global minimum. This new approach more effectively explores the search space, leading to better solution quality. It is also easy to implement and integrate into existing frameworks. Tested on the challenging CUTEst test set, it demonstrates superior performance compared to existing state-of-the-art methods, solving more problems to optimality with equivalent resource usage.
[267] arXiv:2405.10916 (cross-list from math.AP) [pdf, other]: Title: Nearly self-similar blowup of generalized axisymmetric Navier-Stokes and Boussinesq equations

Authors: Thomas Y. Hou

Comments: 34 pages, 28 figures. arXiv admin note: text overlap with arXiv:2107.06509

Subjects: Analysis of PDEs (math.AP); Numerical Analysis (math.NA)

We perform numerical investigation of nearly self-similar blowup of generalized axisymmetric Navier-Stokes equations and Boussinesq system with a time-dependent fractional dimension. The dynamic change of the space dimension is proportional to the ratio R(t)/Z(t), where (R(t),Z(t)) is the position at which the maximum vorticity achieves its global maximum. This choice of space dimension is to ensure that the advection along the r-direction has the same scaling as that along the z-direction, thus preventing formation of two-scale solution structure. For the generalized axisymmetric Navier-Stokes equations with solution dependent viscosity, we show that the solution develops a self-similar blowup with dimension equal to 3.188 and the self-similar profile satisfies the axisymmetric Navier-Stokes equations with constant viscosity. We also study the nearly self-similar blowup of the axisymmetric Boussinesq system with constant viscosity. The generalized axisymmetric Boussinesq system preserves almost all the known properties of the 3D Navier-Stokes equations except for the conservation of angular momentum. We present convincing numerical evidence that the generalized axisymmetric Boussinesq system develops a stable nearly self-similar blowup solution with maximum vorticity increased by O(10^{30}).
[268] arXiv:2405.10925 (cross-list from stat.ME) [pdf, ps, other]: Title: High-dimensional multiple imputation (HDMI) for partially observed confounders including natural language processing-derived auxiliary covariates

Authors: Janick Weberpals, Pamela A. Shaw, Kueiyu Joshua Lin, Richard Wyss, Joseph M Plasek, Li Zhou, Kerry Ngan, Thomas DeRamus, Sudha R. Raman, Bradley G. Hammill, Hana Lee, Sengwee Toh, John G. Connolly, Kimberly J. Dandreo, Fang Tian, Wei Liu, Jie Li, José J. Hernández-Muñoz, Sebastian Schneeweiss, Rishi J. Desai

Subjects: Methodology (stat.ME); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Multiple imputation (MI) models can be improved by including auxiliary covariates (AC), but their performance in high-dimensional data is not well understood. We aimed to develop and compare high-dimensional MI (HDMI) approaches using structured and natural language processing (NLP)-derived AC in studies with partially observed confounders. We conducted a plasmode simulation study using data from opioid vs. non-steroidal anti-inflammatory drug (NSAID) initiators (X) with observed serum creatinine labs (Z2) and time-to-acute kidney injury as outcome. We simulated 100 cohorts with a null treatment effect, including X, Z2, atrial fibrillation (U), and 13 other investigator-derived confounders (Z1) in the outcome generation. We then imposed missingness (MZ2) on 50% of Z2 measurements as a function of Z2 and U and created different HDMI candidate AC using structured and NLP-derived features. We mimicked scenarios where U was unobserved by omitting it from all AC candidate sets. Using LASSO, we data-adaptively selected HDMI covariates associated with Z2 and MZ2 for MI, and with U to include in propensity score models. The treatment effect was estimated following propensity score matching in MI datasets and we benchmarked HDMI approaches against a baseline imputation and complete case analysis with Z1 only. HDMI using claims data showed the lowest bias (0.072). Combining claims and sentence embeddings led to an improvement in the efficiency displaying the lowest root-mean-squared-error (0.173) and coverage (94%). NLP-derived AC alone did not perform better than baseline MI. HDMI approaches may decrease bias in studies with partially observed confounders where missingness depends on unobserved factors.
[269] arXiv:2405.10930 (cross-list from stat.ML) [pdf, other]: Title: Submodular Information Selection for Hypothesis Testing with Misclassification Penalties

Authors: Jayanth Bhargav, Mahsa Ghasemi, Shreyas Sundaram

Comments: 23 pages, 4 figures

Subjects: Machine Learning (stat.ML); Computational Complexity (cs.CC); Information Theory (cs.IT); Machine Learning (cs.LG); Optimization and Control (math.OC)

We consider the problem of selecting an optimal subset of information sources for a hypothesis testing/classification task where the goal is to identify the true state of the world from a finite set of hypotheses, based on finite observation samples from the sources. In order to characterize the learning performance, we propose a misclassification penalty framework, which enables non-uniform treatment of different misclassification errors. In a centralized Bayesian learning setting, we study two variants of the subset selection problem: (i) selecting a minimum cost information set to ensure that the maximum penalty of misclassifying the true hypothesis remains bounded and (ii) selecting an optimal information set under a limited budget to minimize the maximum penalty of misclassifying the true hypothesis. Under mild assumptions, we prove that the objective (or constraints) of these combinatorial optimization problems are weak (or approximate) submodular, and establish high-probability performance guarantees for greedy algorithms. Further, we propose an alternate metric for information set selection which is based on the total penalty of misclassification. We prove that this metric is submodular and establish near-optimal guarantees for the greedy algorithms for both the information set selection problems. Finally, we present numerical simulations to validate our theoretical results over several randomly generated instances.
[270] arXiv:2405.10933 (cross-list from quant-ph) [pdf, other]: Title: Learning low-degree quantum objects

Authors: Srinivasan Arunachalam, Arkopal Dutt, Francisco Escudero Gutiérrez, Carlos Palazuelos

Comments: 26+4 pages

Subjects: Quantum Physics (quant-ph); Computational Complexity (cs.CC); Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG); Functional Analysis (math.FA)

We consider the problem of learning low-degree quantum objects up to $\varepsilon$-error in $\ell_2$-distance. We show the following results: $(i)$ unknown $n$-qubit degree-$d$ (in the Pauli basis) quantum channels and unitaries can be learned using $O(1/\varepsilon^d)$ queries (independent of $n$), $(ii)$ polynomials $p:\{-1,1\}^n\rightarrow [-1,1]$ arising from $d$-query quantum algorithms can be classically learned from $O((1/\varepsilon)^d\cdot \log n)$ many random examples $(x,p(x))$ (which implies learnability even for $d=O(\log n)$), and $(iii)$ degree-$d$ polynomials $p:\{-1,1\}^n\to [-1,1]$ can be learned through $O(1/\varepsilon^d)$ queries to a quantum unitary $U_p$ that block-encodes $p$. Our main technical contributions are new Bohnenblust-Hille inequalities for quantum channels and completely bounded~polynomials.
[271] arXiv:2405.10941 (cross-list from quant-ph) [pdf, other]: Title: Variational Quantum Algorithm Landscape Reconstruction by Low-Rank Tensor Completion

Authors: Tianyi Hao, Zichang He, Ruslan Shaydulin, Marco Pistoia, Swamit Tannu

Subjects: Quantum Physics (quant-ph); Hardware Architecture (cs.AR); Emerging Technologies (cs.ET)

Variational quantum algorithms (VQAs) are a broad class of algorithms with many applications in science and industry. Applying a VQA to a problem involves optimizing a parameterized quantum circuit by maximizing or minimizing a cost function. A particular challenge associated with VQAs is understanding the properties of associated cost functions. Having the landscapes of VQA cost functions can greatly assist in developing and testing new variational quantum algorithms, but they are extremely expensive to compute. Reconstructing the landscape of a VQA using existing techniques requires a large number of cost function evaluations, especially when the dimension or the resolution of the landscape is high. To address this challenge, we propose a low-rank tensor-completion-based approach for local landscape reconstruction. By leveraging compact low-rank representations of tensors, our technique can overcome the curse of dimensionality and handle high-resolution landscapes. We demonstrate the power of landscapes in VQA development by showcasing practical applications of analyzing penalty terms for constrained optimization problems and examining the probability landscapes of certain basis states.
[272] arXiv:2405.10944 (cross-list from physics.chem-ph) [pdf, other]: Title: Probabilistic transfer learning methodology to expedite high fidelity simulation of reactive flows

Authors: Bruno S. Soriano, Ki Sung Jung, Tarek Echekki, Jacqueline H. Chen, Mohammad Khalil

Subjects: Chemical Physics (physics.chem-ph); Machine Learning (cs.LG)

Reduced order models based on the transport of a lower dimensional manifold representation of the thermochemical state, such as Principal Component (PC) transport and Machine Learning (ML) techniques, have been developed to reduce the computational cost associated with the Direct Numerical Simulations (DNS) of reactive flows. Both PC transport and ML normally require an abundance of data to exhibit sufficient predictive accuracy, which might not be available due to the prohibitive cost of DNS or experimental data acquisition. To alleviate such difficulties, similar data from an existing dataset or domain (source domain) can be used to train ML models, potentially resulting in adequate predictions in the domain of interest (target domain). This study presents a novel probabilistic transfer learning (TL) framework to enhance the trust in ML models in correctly predicting the thermochemical state in a lower dimensional manifold and a sparse data setting. The framework uses Bayesian neural networks, and autoencoders, to reduce the dimensionality of the state space and diffuse the knowledge from the source to the target domain. The new framework is applied to one-dimensional freely-propagating flame solutions under different data sparsity scenarios. The results reveal that there is an optimal amount of knowledge to be transferred, which depends on the amount of data available in the target domain and the similarity between the domains. TL can reduce the reconstruction error by one order of magnitude for cases with large sparsity. The new framework required 10 times less data for the target domain to reproduce the same error as in the abundant data scenario. Furthermore, comparisons with a state-of-the-art deterministic TL strategy show that the probabilistic method can require four times less data to achieve the same reconstruction error.

Replacements for Mon, 20 May 24

[273] arXiv:1609.00004 (replaced) [pdf, other]: Title: On the initial value of PageRank

Authors: Krishanu Deyasi

Comments: 11 pages, 15 figures

Subjects: Physics and Society (physics.soc-ph); Social and Information Networks (cs.SI)
[274] arXiv:2012.02030 (replaced) [pdf, other]: Title: Data-Informed Global Sparseness in Attention Mechanisms for Deep Neural Networks

Authors: Ileana Rugina, Rumen Dangovski, Li Jing, Preslav Nakov, Marin Soljačić

Comments: Presented at LREC-COLING 2024: 12 pages, 4 figures, 11 tables

Subjects: Computation and Language (cs.CL)
[275] arXiv:2107.10764 (replaced) [pdf, other]: Title: Nonlinear transformation of complex amplitudes via quantum singular value transformation

Authors: Naixu Guo, Kosuke Mitarai, Keisuke Fujii

Subjects: Quantum Physics (quant-ph); Data Structures and Algorithms (cs.DS)
[276] arXiv:2112.03242 (replaced) [pdf, other]: Title: Aspect Ratio Universal Rectangular Layouts

Authors: Stefan Felsner, Andrew Nathenson, Csaba D. Tóth

Comments: 25 pages, 12 figures, full version of a 12-page extended abstract to appear in WALCOM 2022

Subjects: Computational Geometry (cs.CG); Combinatorics (math.CO)
[277] arXiv:2202.12361 (replaced) [pdf, other]: Title: RescueNet: A High Resolution UAV Semantic Segmentation Benchmark Dataset for Natural Disaster Damage Assessment

Authors: Maryam Rahnemoonfar, Tashnim Chowdhury, Robin Murphy

Journal-ref: Sci Data 10, 913 (2023)

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[278] arXiv:2204.13666 (replaced) [pdf, other]: Title: Schrödinger's FP: Dynamic Adaptation of Floating-Point Containers for Deep Learning Training

Authors: Miloš Nikolić, Enrique Torres Sanchez, Jiahui Wang, Ali Hadi Zadeh, Mostafa Mahmoud, Ameer Abdelhadi, Kareem Ibrahim, Andreas Moshovos

Subjects: Machine Learning (cs.LG); Hardware Architecture (cs.AR)
[279] arXiv:2206.05850 (replaced) [pdf, other]: Title: Achieving Zero Constraint Violation for Constrained Reinforcement Learning via Conservative Natural Policy Gradient Primal-Dual Algorithm

Authors: Qinbo Bai, Amrit Singh Bedi, Vaneet Aggarwal

Comments: The latest version fixed the error in the proof of Lemma 4 in AAAI2023

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Systems and Control (eess.SY)
[280] arXiv:2207.12855 (replaced) [pdf, other]: Title: Efficient Learning of Accurate Surrogates for Simulations of Complex Systems

Authors: A. Diaw, M. McKerns, I. Sagert, L. G. Stanton, M. S. Murillo

Comments: 13 pages, 6 figures, submitted to Nature Machine Intelligence

Subjects: Machine Learning (cs.LG); Nuclear Theory (nucl-th); Computational Physics (physics.comp-ph); Data Analysis, Statistics and Probability (physics.data-an); Plasma Physics (physics.plasm-ph)
[281] arXiv:2209.00840 (replaced) [pdf, other]: Title: FOLIO: Natural Language Reasoning with First-Order Logic

Authors: Simeng Han, Hailey Schoelkopf, Yilun Zhao, Zhenting Qi, Martin Riddell, Wenfei Zhou, James Coady, David Peng, Yujie Qiao, Luke Benson, Lucy Sun, Alex Wardle-Solano, Hannah Szabo, Ekaterina Zubova, Matthew Burtell, Jonathan Fan, Yixin Liu, Brian Wong, Malcolm Sailor, Ansong Ni, Linyong Nan, Jungo Kasai, Tao Yu, Rui Zhang, Alexander R. Fabbri, Wojciech Kryscinski, Semih Yavuz, Ye Liu, Xi Victoria Lin, Shafiq Joty, Yingbo Zhou, Caiming Xiong, Rex Ying, Arman Cohan, Dragomir Radev

Subjects: Computation and Language (cs.CL)
[282] arXiv:2210.07376 (replaced) [pdf, other]: Title: ScionFL: Efficient and Robust Secure Quantized Aggregation

Authors: Yaniv Ben-Itzhak, Helen Möllering, Benny Pinkas, Thomas Schneider, Ajith Suresh, Oleksandr Tkachenko, Shay Vargaftik, Christian Weinert, Hossein Yalame, Avishay Yanai

Comments: Published in 2024 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML)

Subjects: Cryptography and Security (cs.CR); Information Theory (cs.IT); Machine Learning (cs.LG)
[283] arXiv:2211.14309 (replaced) [pdf, other]: Title: FutureHuman3D: Forecasting Complex Long-Term 3D Human Behavior from Video Observations

Authors: Christian Diller, Thomas Funkhouser, Angela Dai

Comments: Project Page: this https URL Video: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[284] arXiv:2301.00802 (replaced) [pdf, other]: Title: Deep Clustering of Tabular Data by Weighted Gaussian Distribution Learning

Authors: Shourav B. Rabbani, Ivan V. Medri, Manar D. Samad

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[285] arXiv:2301.05084 (replaced) [pdf, other]: Title: Local consistency as a reduction between constraint satisfaction problems

Authors: Victor Dalmau, Jakub Opršal

Subjects: Logic in Computer Science (cs.LO); Computational Complexity (cs.CC)
[286] arXiv:2302.05402 (replaced) [pdf, other]: Title: A Survey on Cyber-Resilience Approaches for Cyber-Physical Systems

Authors: Mariana Segovia-Ferreira, Jose Rubio-Hernan, Ana Rosa Cavalli, Joaquin Garcia-Alfaro

Comments: ACM Computing Surveys, 56(8):1--37, 36 pages, 2 figures, 1 table

Subjects: Cryptography and Security (cs.CR)
[287] arXiv:2302.13680 (replaced) [pdf, other]: Title: A multigrid solver for PDE-constrained optimization with uncertain inputs

Authors: Gabriele Ciaramella, Fabio Nobile, Tommaso Vanzan

Comments: 37, 3 figures

Subjects: Optimization and Control (math.OC); Numerical Analysis (math.NA)
[288] arXiv:2303.00351 (replaced) [pdf, other]: Title: Leveraging SO(3)-steerable convolutions for pose-robust semantic segmentation in 3D medical data

Authors: Ivan Diaz, Mario Geiger, Richard Iain McKinley

Comments: Accepted for publication at the Journal of Machine Learning for Biomedical Imaging (MELBA) this https URL

Journal-ref: Machine.Learning.for.Biomedical.Imaging. 2 (2024)

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[289] arXiv:2303.07158 (replaced) [pdf, other]: Title: Uniform Pessimistic Risk and its Optimal Portfolio

Authors: Sungchul Hong, Jong-June Jeon

Subjects: Portfolio Management (q-fin.PM); Machine Learning (cs.LG); Computation (stat.CO); Machine Learning (stat.ML)
[290] arXiv:2303.10833 (replaced) [pdf, ps, other]: Title: Linear Codes Constructed From Two Weakly Regular Plateaued Functions with Index (p-1)/2

Authors: Shudi Yang, Tonghui Zhang, Zheng-An Yao

Comments: 35 pages

Subjects: Information Theory (cs.IT)
[291] arXiv:2303.12307 (replaced) [pdf, other]: Title: Predicting and Enhancing the Fairness of DNNs with the Curvature of Perceptual Manifolds

Authors: Yanbiao Ma, Licheng Jiao, Fang Liu, Maoji Wen, Lingling Li, Wenping Ma, Shuyuan Yang, Xu Liu, Puhua Chen

Comments: 17pages, Accepted by CVPR 2023, Submitted to TPAMI

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[292] arXiv:2303.12557 (replaced) [pdf, other]: Title: Q-HyViT: Post-Training Quantization of Hybrid Vision Transformers with Bridge Block Reconstruction for IoT Systems

Authors: Jemin Lee, Yongin Kwon, Sihyeong Park, Misun Yu, Jeman Park, Hwanjun Song

Comments: 14 pages, 9 figures, accepted in IEEE Internet of Things Journal

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[293] arXiv:2303.17593 (replaced) [pdf, other]: Title: Anatomically aware dual-hop learning for pulmonary embolism detection in CT pulmonary angiograms

Authors: Florin Condrea, Saikiran Rapaka, Lucian Itu, Puneet Sharma, Jonathan Sperl, A Mohamed Ali, Marius Leordeanu

Comments: Accepted to Computers in Biology and Medicine journal

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[294] arXiv:2304.00598 (replaced) [pdf, other]: Title: Stochastic Reachability of Uncontrolled Systems via Probability Measures: Approximation via Deep Neural Networks

Authors: Karthik Sivaramakrishnan, Vignesh Sivaramakrishnan, Rosalyn Alex Devonport, Meeko M.K. Oishi

Comments: 8 pages, 4 figures, 1 table, Submitted to the Conference on Decision and Control 2024

Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY)
[295] arXiv:2304.12751 (replaced) [pdf, other]: Title: Node Feature Augmentation Vitaminizes Network Alignment

Authors: Jin-Duk Park, Cong Tran, Won-Yong Shin, Xin Cao

Comments: 18 pages, 12 figures, 5 tables; its conference version was presented at the ACM International Conference on Information and Knowledge Management (CIKM 2022)

Subjects: Social and Information Networks (cs.SI); Information Theory (cs.IT); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Networking and Internet Architecture (cs.NI)
[296] arXiv:2305.05994 (replaced) [pdf, other]: Title: ANALOGYKB: Unlocking Analogical Reasoning of Language Models with A Million-scale Knowledge Base

Authors: Siyu Yuan, Jiangjie Chen, Changzhi Sun, Jiaqing Liang, Yanghua Xiao, Deqing Yang

Comments: Accepted to ACL 2024

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[297] arXiv:2305.09441 (replaced) [pdf, other]: Title: STLCCP: An Efficient Convex Optimization-based Framework for Signal Temporal Logic Specifications

Authors: Yoshinari Takayama, Kazumune Hashimoto, Toshiyuki Ohtsuka

Comments: 17 pages

Subjects: Systems and Control (eess.SY); Formal Languages and Automata Theory (cs.FL); Robotics (cs.RO)
[298] arXiv:2305.10611 (replaced) [pdf, other]: Title: ACRoBat: Optimizing Auto-batching of Dynamic Deep Learning at Compile Time

Authors: Pratik Fegade, Tianqi Chen, Phillip B. Gibbons, Todd C. Mowry

Subjects: Machine Learning (cs.LG)
[299] arXiv:2305.12100 (replaced) [pdf, other]: Title: How Spurious Features Are Memorized: Precise Analysis for Random and NTK Features

Authors: Simone Bombari, Marco Mondelli

Comments: Revision after ICML2024 acceptance. Motivation of the paper changed from Privacy to Spurious Features. arXiv admin note: text overlap with arXiv:2302.01629

Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[300] arXiv:2306.02040 (replaced) [pdf, ps, other]: Title: Getting More by Knowing Less: Bayesian Incentive Compatible Mechanisms for Fair Division

Authors: Vasilis Gkatzelis, Alexandros Psomas, Xizhi Tan, Paritosh Verma

Comments: IJCAI 2024, 27 pages

Subjects: Computer Science and Game Theory (cs.GT)
[301] arXiv:2306.05889 (replaced) [pdf, other]: Title: C(NN)FD -- a deep learning framework for turbomachinery CFD analysis

Authors: Giuseppe Bruni, Sepehr Maleki, Senthil K. Krishnababu

Journal-ref: IEEE Transactions on Industrial Informatics (2024)

Subjects: Machine Learning (cs.LG); Computational Engineering, Finance, and Science (cs.CE); Fluid Dynamics (physics.flu-dyn)
[302] arXiv:2306.06276 (replaced) [pdf, ps, other]: Title: Contrastive Learning for Predicting Cancer Prognosis Using Gene Expression Values

Authors: Anchen Sun, Elizabeth J. Franzmann, Zhibin Chen, Xiaodong Cai

Subjects: Machine Learning (cs.LG)
[303] arXiv:2306.08722 (replaced) [pdf, other]: Title: Learning to Stabilize High-dimensional Unknown Systems Using Lyapunov-guided Exploration

Authors: Songyuan Zhang, Chuchu Fan

Comments: 32 pages, 7 figures; Accepted by the 6th Annual Conference on Learning for Dynamics and Control (L4DC 2024)

Subjects: Systems and Control (eess.SY)
[304] arXiv:2307.04577 (replaced) [pdf, other]: Title: AnyTeleop: A General Vision-Based Dexterous Robot Arm-Hand Teleoperation System

Authors: Yuzhe Qin, Wei Yang, Binghao Huang, Karl Van Wyk, Hao Su, Xiaolong Wang, Yu-Wei Chao, Dieter Fox

Comments: this http URL Robotics: Science and Systems 2023

Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[305] arXiv:2307.06155 (replaced) [pdf, other]: Title: Relative Fractional Independence Number and Its Applications

Authors: Sharareh Alipour, Amin Gohari, Mehrshad Taziki

Subjects: Combinatorics (math.CO); Information Theory (cs.IT)
[306] arXiv:2308.05042 (replaced) [pdf, ps, other]: Title: An Electrical Grid with Discrete Energy Levels

Authors: H. Grebel

Comments: 15 pages, 7 figures

Subjects: Systems and Control (eess.SY)
[307] arXiv:2308.07476 (replaced) [pdf, ps, other]: Title: Dependent rounding with strong negative-correlation, and scheduling on unrelated machines to minimize completion time

Authors: David G. Harris

Journal-ref: SODA 2024

Subjects: Data Structures and Algorithms (cs.DS)
[308] arXiv:2308.11056 (replaced) [pdf, ps, other]: Title: Closeness and Residual Closeness of Harary Graphs

Authors: Hande Tuncel Golpek, Aysun Aytac

Comments: 23 pages preprint

Subjects: Discrete Mathematics (cs.DM)
[309] arXiv:2308.11786 (replaced) [pdf, other]: Title: Building Better Human-Agent Teams: Balancing Human Resemblance and Contribution in Voice Assistants

Authors: Samuel Westby, Richard J. Radke, Christoph Riedl, Brooke Foucault Welles

Comments: 12 pages, 4 figures

Subjects: Human-Computer Interaction (cs.HC)
[310] arXiv:2308.12577 (replaced) [pdf, other]: Title: REB: Reducing Biases in Representation for Industrial Anomaly Detection

Authors: Shuai Lyu, Dongmei Mo, Waikeung Wong

Comments: 14 pages, 7 figures, 7 tables

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[311] arXiv:2309.00520 (replaced) [pdf, other]: Title: Robust Online Learning over Networks

Authors: Nicola Bastianello, Diego Deplano, Mauro Franceschelli, Karl H. Johansson

Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG); Multiagent Systems (cs.MA); Systems and Control (eess.SY)
[312] arXiv:2309.02967 (replaced) [pdf, other]: Title: Reviving Static Charts into Live Charts

Authors: Lu Ying, Yun Wang, Haotian Li, Shuguang Dou, Haidong Zhang, Xinyang Jiang, Huamin Qu, Yingcai Wu

Comments: Accepted by IEEE TVCG

Subjects: Human-Computer Interaction (cs.HC)
[313] arXiv:2309.05075 (replaced) [pdf, other]: Title: Secure Set-Based State Estimation for Linear Systems under Adversarial Attacks on Sensors

Authors: M. Umar B. Niazi, Michelle S. Chong, Amr Alanwar, Karl H. Johansson

Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)
[314] arXiv:2309.05516 (replaced) [pdf, other]: Title: Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs

Authors: Wenhua Cheng, Weiwei Zhang, Haihao Shen, Yiyang Cai, Xin He, Kaokao Lv, Yi Liu

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[315] arXiv:2309.09231 (replaced) [pdf, ps, other]: Title: ATM: a Logic for Quantitative Security Properties on Attack Trees

Authors: Stefano M. Nicoletti, Milan Lopuhaä-Zwakenberg, E. Moritz Hahn, Mariëlle Stoelinga

Subjects: Cryptography and Security (cs.CR); Logic in Computer Science (cs.LO)
[316] arXiv:2309.10621 (replaced) [pdf, other]: Title: Large language models can accurately predict searcher preferences

Authors: Paul Thomas, Seth Spielman, Nick Craswell, Bhaskar Mitra

Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
[317] arXiv:2309.10771 (replaced) [pdf, other]: Title: Redefining Qualitative Analysis in the AI Era: Utilizing ChatGPT for Efficient Thematic Analysis

Authors: He Zhang, Chuhao Wu, Jingyi Xie, Yao Lyu, Jie Cai, John M. Carroll

Subjects: Human-Computer Interaction (cs.HC)
[318] arXiv:2309.15817 (replaced) [pdf, other]: Title: Identifying the Risks of LM Agents with an LM-Emulated Sandbox

Authors: Yangjun Ruan, Honghua Dong, Andrew Wang, Silviu Pitis, Yongchao Zhou, Jimmy Ba, Yann Dubois, Chris J. Maddison, Tatsunori Hashimoto

Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
[319] arXiv:2310.01217 (replaced) [pdf, other]: Title: ScaLearn: Simple and Highly Parameter-Efficient Task Transfer by Learning to Scale

Authors: Markus Frohmann, Carolin Holtermann, Shahed Masoudian, Anne Lauscher, Navid Rekabsaz

Comments: Accepted to Findings of the ACL: ACL 2024

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[320] arXiv:2310.05872 (replaced) [pdf, other]: Title: ViCor: Bridging Visual Understanding and Commonsense Reasoning with Large Language Models

Authors: Kaiwen Zhou, Kwonjoon Lee, Teruhisa Misu, Xin Eric Wang

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[321] arXiv:2310.08779 (replaced) [pdf, ps, other]: Title: A Completeness Theorem for Probabilistic Regular Expressions

Authors: Wojciech Różowski, Alexandra Silva

Comments: Accepted for publication at LICS. Full version of the paper containing omitted proofs

Subjects: Logic in Computer Science (cs.LO); Formal Languages and Automata Theory (cs.FL)
[322] arXiv:2310.08949 (replaced) [pdf, other]: Title: EasyGen: Easing Multimodal Generation with BiDiffuser and LLMs

Authors: Xiangyu Zhao, Bo Liu, Qijiong Liu, Guangyuan Shi, Xiao-Ming Wu

Comments: Accepted by ACL 2024, main conference

Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
[323] arXiv:2310.09319 (replaced) [pdf, other]: Title: Topological Data Analysis in smart manufacturing

Authors: Martin Uray, Barbara Giunti, Michael Kerber, Stefan Huber

Comments: Preprint still under review

Subjects: Machine Learning (cs.LG); Algebraic Topology (math.AT); Applications (stat.AP)
[324] arXiv:2310.09542 (replaced) [pdf, other]: Title: The Treewidth Boundedness Problem for an Inductive Separation Logic of Relations

Authors: Marius Bozga, Lucas Bueri, Radu Iosif, Florian Zuleger

Subjects: Logic in Computer Science (cs.LO); Formal Languages and Automata Theory (cs.FL)
[325] arXiv:2310.10155 (replaced) [pdf, other]: Title: Analysis and implementation of nanotargeting on LinkedIn based on publicly available non-PII

Authors: Ángel Merino, José González-Cabañas, Ángel Cuevas, Rubén Cuevas

Comments: 19 pages, 10 figures

Journal-ref: Proceedings of the CHI Conference on Human Factors in Computing Systems May 2024 Article No 1019 Pages 1 to 22

Subjects: Social and Information Networks (cs.SI)
[326] arXiv:2310.10962 (replaced) [pdf, other]: Title: Large Language Models can Contrastively Refine their Generation for Better Sentence Representation Learning

Authors: Huiming Wang, Zhaodonghui Li, Liying Cheng, Soh De Wen, Lidong Bing

Comments: NAACL 2024

Subjects: Computation and Language (cs.CL)
[327] arXiv:2310.14691 (replaced) [pdf, other]: Title: Identifiability of total effects from abstractions of time series causal graphs

Authors: Charles K. Assaad, Emilie Devijver, Eric Gaussier, Gregor Gössler, Anouar Meynaoui

Comments: Accepted to the 40th Conference on Uncertainty in Artificial Intelligence (UAI) 2024, Barcelona, Spain

Subjects: Statistics Theory (math.ST); Artificial Intelligence (cs.AI)
[328] arXiv:2310.15887 (replaced) [pdf, other]: Title: AdaptiX -- A Transitional XR Framework for Development and Evaluation of Shared Control Applications in Assistive Robotics

Authors: Max Pascher, Felix Ferdinand Goldau, Kirill Kronhardt, Udo Frese, Jens Gerken

Comments: Proceedings of the ACM on Human-Computer Interaction (PACM HCI) as submission of the 16th ACM SIGCHI Symposium on Engineering Interactive Computing Systems (EICS'24)

Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Robotics (cs.RO); Software Engineering (cs.SE)
[329] arXiv:2310.17582 (replaced) [pdf, other]: Title: Convergence of flow-based generative models via proximal gradient descent in Wasserstein space

Authors: Xiuyuan Cheng, Jianfeng Lu, Yixin Tan, Yao Xie

Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Optimization and Control (math.OC); Statistics Theory (math.ST)
[330] arXiv:2311.00801 (replaced) [pdf, other]: Title: GIST: Generated Inputs Sets Transferability in Deep Learning

Authors: Florian Tambon, Foutse Khomh, Giuliano Antoniol

Comments: accepted for publication in the "ACM Transactions on Software Engineering and Methodology" journal

Subjects: Machine Learning (cs.LG); Software Engineering (cs.SE)
[331] arXiv:2311.03682 (replaced) [pdf, ps, other]: Title: Incentive Design for Eco-driving in Urban Transportation Networks

Authors: M. Umar B. Niazi, Jung-Hoon Cho, Munther A. Dahleh, Roy Dong, Cathy Wu

Subjects: Systems and Control (eess.SY); Social and Information Networks (cs.SI); Optimization and Control (math.OC)
[332] arXiv:2311.04797 (replaced) [pdf, other]: Title: CloverLeaf on Intel Multi-Core CPUs: A Case Study in Write-Allocate Evasion

Authors: Jan Laukemann, Thomas Gruber, Georg Hager, Dossay Oryspayev, Gerhard Wellein

Comments: 19 pages including artifact appendix; 11 figures, 1 table; numerous corrections, esp. in Table 1

Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Performance (cs.PF)
[333] arXiv:2311.04818 (replaced) [pdf, other]: Title: Cross-Silo Federated Learning Across Divergent Domains with Iterative Parameter Alignment

Authors: Matt Gorbett, Hossein Shirazi, Indrakshi Ray

Comments: Published at IEEE Big Data 2023

Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Distributed, Parallel, and Cluster Computing (cs.DC)
[334] arXiv:2311.05794 (replaced) [pdf, other]: Title: An Experimental Design for Anytime-Valid Causal Inference on Multi-Armed Bandits

Authors: Biyonka Liang, Iavor Bojinov

Subjects: Methodology (stat.ME); Machine Learning (cs.LG)
[335] arXiv:2311.07389 (replaced) [pdf, other]: Title: Transpose Attack: Stealing Datasets with Bidirectional Training

Authors: Guy Amit, Mosh Levy, Yisroel Mirsky

Comments: NDSS24 paper, Transpose Attack, Transposed Model. NDSS version: this https URL

Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
[336] arXiv:2311.09149 (replaced) [pdf, other]: Title: Temporal Knowledge Question Answering via Abstract Reasoning Induction

Authors: Ziyang Chen, Dongfang Li, Xiang Zhao, Baotian Hu, Min Zhang

Comments: Accepted by ACL 2024. 17 pages, 10 figures

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[337] arXiv:2311.09375 (replaced) [pdf, other]: Title: Distributed Constrained Combinatorial Optimization leveraging Hypergraph Neural Networks

Authors: Nasimeh Heydaribeni, Xinrui Zhan, Ruisi Zhang, Tina Eliassi-Rad, Farinaz Koushanfar

Subjects: Optimization and Control (math.OC); Distributed, Parallel, and Cluster Computing (cs.DC)
[338] arXiv:2311.09548 (replaced) [pdf, other]: Title: Universally Optimal Information Dissemination and Shortest Paths in the HYBRID Distributed Model

Authors: Yi-Jun Chang, Oren Hecht, Dean Leitersdorf, Philipp Schneider

Comments: This work combines the three preprints arXiv:2304.06317, arXiv:2304.07107, and arXiv:2306.05977 with some improved results

Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS)
[339] arXiv:2311.10437 (replaced) [pdf, other]: Title: DSD-DA: Distillation-based Source Debiasing for Domain Adaptive Object Detection

Authors: Yongchao Feng, Shiwei Li, Yingjie Gao, Ziyue Huang, Yanan Zhang, Qingjie Liu, Yunhong Wang

Comments: Accepted by ICML2024

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[340] arXiv:2311.13889 (replaced) [pdf, other]: Title: SIMBa: System Identification Methods leveraging Backpropagation

Authors: Loris Di Natale, Muhammad Zakwan, Philipp Heer, Giancarlo Ferrari-Trecate, Colin N. Jones

Comments: First two authors contributed equally. Submitted to IEEE TCST

Subjects: Systems and Control (eess.SY)
[341] arXiv:2311.14363 (replaced) [pdf, other]: Title: High order unfitted finite element discretizations for explicit boundary representations

Authors: Pere A. Martorell, Santiago Badia

Subjects: Computational Engineering, Finance, and Science (cs.CE); Numerical Analysis (math.NA)
[342] arXiv:2311.14495 (replaced) [pdf, other]: Title: StableSSM: Alleviating the Curse of Memory in State-space Models through Stable Reparameterization

Authors: Shida Wang, Qianxiao Li

Comments: 27 pages, 7 figures, ICML 2024

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Dynamical Systems (math.DS)
[343] arXiv:2311.16097 (replaced) [pdf, other]: Title: CG-HOI: Contact-Guided 3D Human-Object Interaction Generation

Authors: Christian Diller, Angela Dai

Comments: Project page: this https URL Video: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[344] arXiv:2311.17778 (replaced) [pdf, other]: Title: Unified Binary and Multiclass Margin-Based Classification

Authors: Yutong Wang, Clayton Scott

Comments: Accepted for publication in Journal of Machine Learning Research

Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[345] arXiv:2312.02691 (replaced) [pdf, ps, other]: Title: Edge coloring of products of signed graphs

Authors: Robert Janczewski, Krzysztof Turowski, Bartłomiej Wróblewski

Subjects: Combinatorics (math.CO); Discrete Mathematics (cs.DM); Data Structures and Algorithms (cs.DS)
[346] arXiv:2312.03579 (replaced) [pdf, ps, other]: Title: The Implication Problem for Functional Dependencies and Variants of Marginal Distribution Equivalences

Authors: Minna Hirvonen

Comments: 23 pages, improved version after reviewer comments, results unchanged

Subjects: Logic in Computer Science (cs.LO)
[347] arXiv:2312.06424 (replaced) [pdf, other]: Title: Cross Domain LifeLong Sequential Modeling for Online Click-Through Rate Prediction

Authors: Ruijie Hou, Zhaoyang Yang, Yu Ming, Hongyu Lu, Zhuobin Zheng, Yu Chen, Qinsong Zeng, Ming Chen

Comments: Accepted by KDD 2024

Subjects: Information Retrieval (cs.IR)
[348] arXiv:2312.06941 (replaced) [pdf, ps, other]: Title: Humans vs Large Language Models: Judgmental Forecasting in an Era of Advanced AI

Authors: MAhdi Abolghasemi, Odkhishig Ganbold, Kristian Rotaru

Subjects: Machine Learning (cs.LG); Computers and Society (cs.CY)
[349] arXiv:2312.07533 (replaced) [pdf, other]: Title: VILA: On Pre-training for Visual Language Models

Authors: Ji Lin, Hongxu Yin, Wei Ping, Yao Lu, Pavlo Molchanov, Andrew Tao, Huizi Mao, Jan Kautz, Mohammad Shoeybi, Song Han

Comments: CVPR 2024

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[350] arXiv:2312.07792 (replaced) [pdf, other]: Title: Differentially private projection-depth-based medians

Authors: Kelly Ramsay, Dylan Spicker

Comments: 44 pages, 1 figure

Subjects: Statistics Theory (math.ST); Cryptography and Security (cs.CR); Machine Learning (cs.LG); Methodology (stat.ME)
[351] arXiv:2312.08140 (replaced) [pdf, other]: Title: GVE-LPA: Fast Label Propagation Algorithm (LPA) for Community Detection in Shared Memory Setting

Authors: Subhajit Sahu

Comments: 8 pages, 7 figures, 1 table. arXiv admin note: text overlap with arXiv:2312.04876

Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Performance (cs.PF)
[352] arXiv:2312.12113 (replaced) [pdf, other]: Title: Variational Mode Decomposition-Based Nonstationary Coherent Structure Analysis for Spatiotemporal Data

Authors: Yuya Ohmichi

Journal-ref: Aerospace Science and Technology, Volume 149, June 2024, 109162

Subjects: Fluid Dynamics (physics.flu-dyn); Machine Learning (cs.LG); Dynamical Systems (math.DS)
[353] arXiv:2312.12321 (replaced) [pdf, other]: Title: Bypassing the Safety Training of Open-Source LLMs with Priming Attacks

Authors: Jason Vega, Isha Chaudhary, Changming Xu, Gagandeep Singh

Comments: ICLR Tiny Paper camera ready version

Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
[354] arXiv:2312.15682 (replaced) [pdf, other]: Title: A Grating Based High-Frequency Motion Stimulus Paradigm for Steady-State Motion Visual Evoked Potentials

Authors: Bartu Atabek, Efecan Yilmaz, Cengiz Acarturk, Murat Perit Cakir

Subjects: Human-Computer Interaction (cs.HC)
[355] arXiv:2401.00396 (replaced) [pdf, other]: Title: RAGTruth: A Hallucination Corpus for Developing Trustworthy Retrieval-Augmented Language Models

Authors: Cheng Niu, Yuanhao Wu, Juno Zhu, Siliang Xu, Kashun Shum, Randy Zhong, Juntong Song, Tong Zhang

Subjects: Computation and Language (cs.CL)
[356] arXiv:2401.04971 (replaced) [pdf, other]: Title: A Survey on Cross-Domain Sequential Recommendation

Authors: Shu Chen, Zitao Xu, Weike Pan, Qiang Yang, Zhong Ming

Comments: Accepted to the IJCAI 2024 Survey Track

Subjects: Information Retrieval (cs.IR)
[357] arXiv:2401.07494 (replaced) [pdf, other]: Title: Input Convex Lipschitz RNN: A Fast and Robust Approach for Engineering Tasks

Authors: Zihao Wang, Zhe Wu

Subjects: Machine Learning (cs.LG); Computational Engineering, Finance, and Science (cs.CE); Systems and Control (eess.SY)
[358] arXiv:2401.07780 (replaced) [pdf, other]: Title: Learning Soft Constrained MPC Value Functions: Efficient MPC Design and Implementation providing Stability and Safety Guarantees

Authors: Nicolas Chatzikiriakos, Kim P. Wabersich, Felix Berkel, Patricia Pauli, Andrea Iannelli

Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)
[359] arXiv:2401.07927 (replaced) [pdf, other]: Title: Are self-explanations from Large Language Models faithful?

Authors: Andreas Madsen, Sarath Chandar, Siva Reddy

Comments: The 62nd Annual Meeting of the Association for Computational Linguistics

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[360] arXiv:2401.08298 (replaced) [pdf, ps, other]: Title: Online Elasticity Estimation and Material Sorting Using Standard Robot Grippers

Authors: Shubhan P. Patni, Pavel Stoudek, Hynek Chlup, Matej Hoffmann

Comments: 22 pages, 17 figures

Journal-ref: The International Journal of Advanced Manufacturing Technology (2024)

Subjects: Robotics (cs.RO)
[361] arXiv:2401.08667 (replaced) [pdf, other]: Title: Data-Driven Physics-Informed Neural Networks: A Digital Twin Perspective

Authors: Sunwoong Yang, Hojin Kim, Yoonpyo Hong, Kwanjung Yee, Romit Maulik, Namwoo Kang

Subjects: Fluid Dynamics (physics.flu-dyn); Computational Engineering, Finance, and Science (cs.CE); Machine Learning (cs.LG)
[362] arXiv:2401.09091 (replaced) [pdf, other]: Title: Noise-Tolerant Quantum Algorithm for Ground State Energy Estimation

Authors: Erenay Karacan, Yanbin Chen, Christian B. Mendl

Subjects: Quantum Physics (quant-ph); Numerical Analysis (math.NA); Computational Physics (physics.comp-ph)
[363] arXiv:2401.09245 (replaced) [pdf, other]: Title: Uncertainty estimates for semantic segmentation: providing enhanced reliability for automated motor claims handling

Authors: Jan Küchler (1), Daniel Kröll (1), Sebastian Schoenen (1), Andreas Witte (1) ((1) ControlExpert GmbH, Langenfeld, Germany)

Comments: 11 pages, 10 figures, 3 tables

Journal-ref: Machine Vision and Applications 35, 66 (2024)

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[364] arXiv:2401.09750 (replaced) [pdf, other]: Title: Exploration and Anti-Exploration with Distributional Random Network Distillation

Authors: Kai Yang, Jian Tao, Jiafei Lyu, Xiu Li

Comments: ICML 2024 accepted

Subjects: Machine Learning (cs.LG)
[365] arXiv:2401.14192 (replaced) [pdf, other]: Title: How Can Large Language Models Understand Spatial-Temporal Data?

Authors: Lei Liu, Shuo Yu, Runze Wang, Zhenxun Ma, Yanming Shen

Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)
[366] arXiv:2401.14341 (replaced) [pdf, other]: Title: Efficient Construction of Long Orientable Sequences

Authors: Daniel Gabric, Joe Sawada

Subjects: Data Structures and Algorithms (cs.DS); Discrete Mathematics (cs.DM); Information Theory (cs.IT); Combinatorics (math.CO)
[367] arXiv:2401.14857 (replaced) [pdf, other]: Title: LIV-GaussMap: LiDAR-Inertial-Visual Fusion for Real-time 3D Radiance Field Map Rendering

Authors: Sheng Hong, Junjie He, Xinhu Zheng, Chunran Zheng, Shaojie Shen

Subjects: Robotics (cs.RO)
[368] arXiv:2401.16640 (replaced) [pdf, other]: Title: TeenyTinyLlama: open-source tiny language models trained in Brazilian Portuguese

Authors: Nicholas Kluge Corrêa, Sophia Falk, Shiza Fatimah, Aniket Sen, Nythamar de Oliveira

Comments: 21 pages, 5 figures

Journal-ref: Machine Learning With Applications, 16, 100558

Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
[369] arXiv:2401.16688 (replaced) [pdf, other]: Title: Characterization of Magnetic Labyrinthine Structures through Junctions and Terminals Detection Using Template Matching and CNN

Authors: Vinícius Yu Okubo, Kotaro Shimizu, B. S. Shivaram, Hae Yong Kim

Comments: 12 pages, 7 figures, submitted to IEEE Access

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[370] arXiv:2401.17536 (replaced) [pdf, other]: Title: PipeNet: Question Answering with Semantic Pruning over Knowledge Graphs

Authors: Ying Su, Jipeng Zhang, Yangqiu Song, Tong Zhang

Comments: 8 pages, 4 figures, accepted to *SEM 2024

Subjects: Computation and Language (cs.CL)
[371] arXiv:2401.17583 (replaced) [pdf, other]: Title: Agile But Safe: Learning Collision-Free High-Speed Legged Locomotion

Authors: Tairan He, Chong Zhang, Wenli Xiao, Guanqi He, Changliu Liu, Guanya Shi

Comments: Published at RSS 2024, Project website: this https URL

Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Systems and Control (eess.SY)
[372] arXiv:2401.17921 (replaced) [pdf, ps, other]: Title: Optimizing T and CNOT Gates in Quantum Ripple-Carry Adders and Comparators

Authors: Maxime Remaud

Subjects: Quantum Physics (quant-ph); Emerging Technologies (cs.ET)
[373] arXiv:2402.00627 (replaced) [pdf, other]: Title: CapHuman: Capture Your Moments in Parallel Universes

Authors: Chao Liang, Fan Ma, Linchao Zhu, Yingying Deng, Yi Yang

Comments: Accepted by CVPR 2024. Project page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[374] arXiv:2402.02969 (replaced) [pdf, other]: Title: Towards Understanding the Word Sensitivity of Attention Layers: A Study via Random Features

Authors: Simone Bombari, Marco Mondelli

Comments: Revision after ICML2024 reviews

Subjects: Machine Learning (stat.ML); Computation and Language (cs.CL); Machine Learning (cs.LG)
[375] arXiv:2402.03145 (replaced) [pdf, ps, other]: Title: SafEDMD: A certified learning architecture tailored to data-driven control of nonlinear dynamical systems

Authors: Robin Strässer, Manuel Schaller, Karl Worthmann, Julian Berberich, Frank Allgöwer

Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG); Optimization and Control (math.OC)
[376] arXiv:2402.03881 (replaced) [pdf, other]: Title: DEthna: Accurate Ethereum Network Topology Discovery with Marked Transactions

Authors: Chonghe Zhao, Yipeng Zhou, Shengli Zhang, Taotao Wang, Quan Z. Sheng, Song Guo

Comments: Accepted for publication in IEEE INFOCOM 2024

Subjects: Networking and Internet Architecture (cs.NI)
[377] arXiv:2402.05008 (replaced) [pdf, other]: Title: EfficientViT-SAM: Accelerated Segment Anything Model Without Accuracy Loss

Authors: Zhuoyang Zhang, Han Cai, Song Han

Comments: CVPR 2024 Workshop (Efficient Large Vision Models)

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[378] arXiv:2402.06952 (replaced) [pdf, other]: Title: Estimating the Effect of Crosstalk Error on Circuit Fidelity Using Noisy Intermediate-Scale Quantum Devices

Authors: Sovanmonynuth Heng, Myeongseong Go, Youngsun Han

Subjects: Quantum Physics (quant-ph); Artificial Intelligence (cs.AI)
[379] arXiv:2402.07166 (replaced) [pdf, other]: Title: Social Evolution of Published Text and The Emergence of Artificial Intelligence Through Large Language Models and The Problem of Toxicity and Bias

Authors: Arifa Khan, P. Saravanan, S.K Venkatesan

Subjects: Artificial Intelligence (cs.AI)
[380] arXiv:2402.08249 (replaced) [pdf, other]: Title: SepRep-Net: Multi-source Free Domain Adaptation via Model Separation And Reparameterization

Authors: Ying Jin, Jiaqi Wang, Dahua Lin

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[381] arXiv:2402.08502 (replaced) [pdf, other]: Title: Provable Traffic Rule Compliance in Safe Reinforcement Learning on the Open Sea

Authors: Hanna Krasowski, Matthias Althoff

Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY)
[382] arXiv:2402.09108 (replaced) [pdf, other]: Title: Novel Long Distance Free Space Quantum Secure Direct Communication for Web 3.0 Networks

Authors: Yew Kee Wong, Yifan Zhou, Xinlin Zhou, Yan Shing Liang, Zi Yan Li

Comments: 17 pages, 6 figures

Subjects: Quantum Physics (quant-ph); Cryptography and Security (cs.CR)
[383] arXiv:2402.10104 (replaced) [pdf, other]: Title: GeoEval: Benchmark for Evaluating LLMs and Multi-Modal Models on Geometry Problem-Solving

Authors: Jiaxin Zhang, Zhongzhi Li, Mingliang Zhang, Fei Yin, Chenglin Liu, Yashar Moshfeghi

Comments: Accepted in ACL 2024 Findings

Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[384] arXiv:2402.11054 (replaced) [pdf, other]: Title: Resource Allocation in Mobile Networks: A Decision Model Of Jockeying in Queues

Authors: Anthony Kiggundu, Bin Han, Dennis Krummacker, Hans D. Schotten

Comments: To appear in IEEE MetidCom 2024

Subjects: Networking and Internet Architecture (cs.NI)
[385] arXiv:2402.12025 (replaced) [pdf, other]: Title: Speech Translation with Speech Foundation Models and Large Language Models: What is There and What is Missing?

Authors: Marco Gaido, Sara Papi, Matteo Negri, Luisa Bentivogli

Comments: Accepted to the ACL 2024 main conference

Subjects: Computation and Language (cs.CL)
[386] arXiv:2402.12715 (replaced) [pdf, other]: Title: Spurious Correlations in Machine Learning: A Survey

Authors: Wenqian Ye, Guangtao Zheng, Xu Cao, Yunsheng Ma, Aidong Zhang

Comments: Version 2; Github Link: this https URL

Subjects: Machine Learning (cs.LG)
[387] arXiv:2402.13457 (replaced) [pdf, other]: Title: A Comprehensive Study of Jailbreak Attack versus Defense for Large Language Models

Authors: Zihao Xu, Yi Liu, Gelei Deng, Yuekang Li, Stjepan Picek

Comments: 18 pages, 9 figures, Accepted in ACL 2024

Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
[388] arXiv:2402.14298 (replaced) [pdf, other]: Title: Multi-modal Stance Detection: New Datasets and Model

Authors: Bin Liang, Ang Li, Jingqian Zhao, Lin Gui, Min Yang, Yue Yu, Kam-Fai Wong, Ruifeng Xu

Subjects: Computation and Language (cs.CL)
[389] arXiv:2402.15189 (replaced) [pdf, other]: Title: Biomedical Entity Linking as Multiple Choice Question Answering

Authors: Zhenxi Lin, Ziheng Zhang, Xian Wu, Yefeng Zheng

Comments: Accepted by COLING 2024

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[390] arXiv:2402.17357 (replaced) [pdf, ps, other]: Title: A robust parameterized enhanced shift-splitting preconditioner for three-by-three block saddle point problems

Authors: Sk. Safique Ahmad, Pinki Khatun

Subjects: Numerical Analysis (math.NA)
[391] arXiv:2402.17614 (replaced) [pdf, other]: Title: Adapt Before Comparison: A New Perspective on Cross-Domain Few-Shot Segmentation

Authors: Jonas Herzog

Comments: accepted to CVPR 2024

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[392] arXiv:2402.18016 (replaced) [pdf, other]: Title: User Decision Guidance with Selective Explanation Presentation from Explainable-AI

Authors: Yosuke Fukuchi, Seiji Yamada

Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI)
[393] arXiv:2403.01009 (replaced) [pdf, other]: Title: Design and Performance Evaluation of SEANet, a Software-defined Networking Platform for the Internet of Underwater Things

Authors: Deniz Unal, Sara Falleni, Kerem Enhos, Emrecan Demirors, Stefano Basagni, Tommaso Melodia

Comments: 14 pages, 18 figures

Subjects: Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)
[394] arXiv:2403.02905 (replaced) [pdf, other]: Title: MMoFusion: Multi-modal Co-Speech Motion Generation with Diffusion Model

Authors: Sen Wang, Jiangning Zhang, Weijian Cao, Xiaobin Hu, Moran Li, Xiaozhong Ji, Xin Tan, Mengtian Li, Zhifeng Xie, Chengjie Wang, Lizhuang Ma

Subjects: Multimedia (cs.MM)
[395] arXiv:2403.05086 (replaced) [pdf, other]: Title: UFORecon: Generalizable Sparse-View Surface Reconstruction from Arbitrary and UnFavOrable Sets

Authors: Youngju Na, Woo Jae Kim, Kyu Beom Han, Suhyeon Ha, Sung-eui Yoon

Comments: accepted at CVPR 2024 project page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[396] arXiv:2403.06592 (replaced) [pdf, other]: Title: Exploiting Style Latent Flows for Generalizing Deepfake Video Detection

Authors: Jongwook Choi, Taehoon Kim, Yonghyun Jeong, Seungryul Baek, Jongwon Choi

Comments: Preprint version, final version will be available at this https URL The IEEE / CVF Computer Vision and Pattern Recognition Conference (CVPR) (2024) Published by: IEEE & CVF

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[397] arXiv:2403.06668 (replaced) [pdf, other]: Title: PeerAiD: Improving Adversarial Distillation from a Specialized Peer Tutor

Authors: Jaewon Jung, Hongsun Jang, Jaeyong Song, Jinho Lee

Comments: Accepted to CVPR 2024

Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
[398] arXiv:2403.06765 (replaced) [pdf, other]: Title: ConspEmoLLM: Conspiracy Theory Detection Using an Emotion-Based Large Language Model

Authors: Zhiwei Liu, Boyang Liu, Paul Thompson, Kailai Yang, Sophia Ananiadou

Comments: Work in progress

Subjects: Computation and Language (cs.CL)
[399] arXiv:2403.08374 (replaced) [pdf, other]: Title: Efficient Signature-Free Validated Agreement

Authors: Pierre Civit, Muhammad Ayaz Dzulfikar, Seth Gilbert, Rachid Guerraoui, Jovan Komatovic, Manuel Vidigueira, Igor Zablotchi

Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
[400] arXiv:2403.11678 (replaced) [pdf, other]: Title: Exploring 3D-aware Latent Spaces for Efficiently Learning Numerous Scenes

Authors: Antoine Schnepf, Karim Kassab, Jean-Yves Franceschi, Laurent Caraffa, Flavian Vasile, Jeremie Mary, Andrew Comport, Valérie Gouet-Brunet

Comments: Camera-ready version accepted at 3DMV-CVPR 2024

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[401] arXiv:2403.11802 (replaced) [pdf, other]: Title: Counting-Stars: A Multi-evidence, Position-aware, and Scalable Benchmark for Evaluating Long-Context Large Language Models

Authors: Mingyang Song, Mao Zheng, Xuan Luo

Comments: work in progress

Subjects: Computation and Language (cs.CL)
[402] arXiv:2403.12864 (replaced) [pdf, other]: Title: A Comparison of Deep Learning Architectures for Spacecraft Anomaly Detection

Authors: Daniel Lakey, Tim Schlippe

Comments: accepted for IEEE Aeroconf 2024. Final version published IEEE Aerospace Conference 2024 (AeroConf 2024), access in IEEE Explore

Subjects: Machine Learning (cs.LG)
[403] arXiv:2403.13357 (replaced) [pdf, other]: Title: A Machine Learning Approach for Multiscale Modeling of Biological Tissues

Authors: Nishan Parvez, Jacob S. Merson

Comments: Data and code can be found at this https URL

Subjects: Biological Physics (physics.bio-ph); Soft Condensed Matter (cond-mat.soft); Computational Engineering, Finance, and Science (cs.CE)
[404] arXiv:2403.13680 (replaced) [pdf, other]: Title: Step-Calibrated Diffusion for Biomedical Optical Image Restoration

Authors: Yiwei Lyu, Sung Jik Cha, Cheng Jiang, Asadur Chowdury, Xinhai Hou, Edward Harake, Akhil Kondepudi, Christian Freudiger, Honglak Lee, Todd C. Hollon

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[405] arXiv:2403.15521 (replaced) [pdf, other]: Title: Exploring new territory: Calibration-free decoding for c-VEP BCI

Authors: J. Thielen, J. Sosulski, M. Tangermann

Comments: 6 pages, 2 figures, 9th Graz Brain-Computer Interface Conference 2024

Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG)
[406] arXiv:2403.15523 (replaced) [pdf, other]: Title: Towards auditory attention decoding with noise-tagging: A pilot study

Authors: H. A. Scheppink, S. Ahmadi, P. Desain, M. Tangermann, J. Thielen

Comments: 6 pages, 2 figures, 9th Graz Brain-Computer Interface Conference 2024

Subjects: Neurons and Cognition (q-bio.NC); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[407] arXiv:2403.15593 (replaced) [pdf, other]: Title: FairerCLIP: Debiasing CLIP's Zero-Shot Predictions using Functions in RKHSs

Authors: Sepehr Dehdashtian, Lan Wang, Vishnu Naresh Boddeti

Comments: The Twelfth International Conference on Learning Representations (ICLR) 2024

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[408] arXiv:2403.16165 (replaced) [pdf, ps, other]: Title: Input-to-State Stability of Newton Methods for Generalized Equations in Nonlinear Optimization

Authors: Torbjørn Cunis, Ilya Kolmanovsky

Comments: Submitted to 2024 Conference on Decision and Control

Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY)
[409] arXiv:2403.17101 (replaced) [pdf, ps, other]: Title: AI Consciousness is Inevitable: A Theoretical Computer Science Perspective

Authors: Lenore Blum, Manuel Blum

Subjects: Artificial Intelligence (cs.AI)
[410] arXiv:2403.19021 (replaced) [pdf, other]: Title: IDGenRec: LLM-RecSys Alignment with Textual ID Learning

Authors: Juntao Tan, Shuyuan Xu, Wenyue Hua, Yingqiang Ge, Zelong Li, Yongfeng Zhang

Comments: Accepted in SIGIR 2024

Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
[411] arXiv:2404.00031 (replaced) [pdf, other]: Title: Towards gaze-independent c-VEP BCI: A pilot study

Authors: S. Narayanan, S. Ahmadi, P. Desain, J. Thielen

Comments: 6 pages, 3 figures, 9th Graz Brain-Computer Interface Conference 2024

Subjects: Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
[412] arXiv:2404.00082 (replaced) [pdf, other]: Title: Data-Driven Room Acoustic Modeling Via Differentiable Feedback Delay Networks With Learnable Delay Lines

Authors: Alessandro Ilic Mezza, Riccardo Giampiccolo, Enzo De Sena, Alberto Bernardini

Comments: The article has been submitted to EURASIP Journal on Audio, Speech, and Music Processing on Jan 02, 2024 and is currently under review

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[413] arXiv:2404.01319 (replaced) [pdf, other]: Title: Information Cascade Prediction under Public Emergencies: A Survey

Authors: Qi Zhang, Guang Wang, Li Lin, Kaiwen Xia, Shuai Wang

Comments: arXiv admin note: text overlap with arXiv:2007.09815 by other authors

Subjects: Social and Information Networks (cs.SI); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
[414] arXiv:2404.01933 (replaced) [pdf, other]: Title: PREGO: online mistake detection in PRocedural EGOcentric videos

Authors: Alessandro Flaborea, Guido Maria D'Amely di Melendugno, Leonardo Plini, Luca Scofano, Edoardo De Matteis, Antonino Furnari, Giovanni Maria Farinella, Fabio Galasso

Comments: Accepted at CVPR 2024

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[415] arXiv:2404.05484 (replaced) [pdf, other]: Title: On Computational Modeling of Sleep-Wake Cycle

Authors: Xin Li

Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[416] arXiv:2404.05840 (replaced) [pdf, ps, other]: Title: Attention-Driven Multi-Agent Reinforcement Learning: Enhancing Decisions with Expertise-Informed Tasks

Authors: Andre R Kuroswiski, Annie S Wu, Angelo Passaro

Comments: This paper was published at Proceedings of FLAIRS-37, May 19-21, Sandestin Beach, FL. The proceedings version is available at this https URL

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)
[417] arXiv:2404.09604 (replaced) [pdf, other]: Title: Machine learning-based optimization workflow of the homogeneity of spunbond nonwovens with human validation

Authors: Viny Saajan Victor, Andre Schmeißer, Heike Leitte, Simone Gramsch

Subjects: Machine Learning (cs.LG)
[418] arXiv:2404.10228 (replaced) [pdf, other]: Title: Two-Stage Stance Labeling: User-Hashtag Heuristics with Graph Neural Networks

Authors: Joshua Melton, Shannon Reid, Gabriel Terejanu, Siddharth Krishnan

Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Social and Information Networks (cs.SI)
[419] arXiv:2404.10260 (replaced) [pdf, other]: Title: HelixFold-Multimer: Elevating Protein Complex Structure Prediction to New Heights

Authors: Xiaomin Fang, Jie Gao, Jing Hu, Lihang Liu, Yang Xue, Xiaonan Zhang, Kunrui Zhu

Subjects: Biomolecules (q-bio.BM); Artificial Intelligence (cs.AI)
[420] arXiv:2404.10401 (replaced) [pdf, other]: Title: A Phone-based Distributed Ambient Temperature Measurement System with An Efficient Label-free Automated Training Strategy

Authors: Dayin Chen, Xiaodan Shi, Haoran Zhang, Xuan Song, Dongxiao Zhang, Yuntian Chen, Jinyue Yan

Journal-ref: IEEE Transactions on Mobile Computing,13 May 2024, 1 - 13

Subjects: Machine Learning (cs.LG)
[421] arXiv:2404.11205 (replaced) [pdf, other]: Title: Pose2Gest: A Few-Shot Model-Free Approach Applied In South Indian Classical Dance Gesture Recognition

Authors: Kavitha Raju, Nandini J. Warrier, Manu Madhavan, Selvi C., Arun B. Warrier, Thulasi Kumar

Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
[422] arXiv:2404.11671 (replaced) [pdf, other]: Title: A Study of Undefined Behavior Across Foreign Function Boundaries in Rust Libraries

Authors: Ian McCormack, Joshua Sunshine, Jonathan Aldrich

Comments: 26 pages without appendix supplement, preprint

Subjects: Software Engineering (cs.SE)
[423] arXiv:2404.13859 (replaced) [pdf, other]: Title: Unveiling and Mitigating Generalized Biases of DNNs through the Intrinsic Dimensions of Perceptual Manifolds

Authors: Yanbiao Ma, Licheng Jiao, Fang Liu, Lingling Li, Wenping Ma, Shuyuan Yang, Xu Liu, Puhua Chen

Comments: 8pages, 6figures, Submitted to TPAMI

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[424] arXiv:2404.14965 (replaced) [pdf, other]: Title: Vision Beyond Boundaries: An Initial Design Space of Domain-specific Large Vision Models in Human-robot Interaction

Authors: Yuchong Zhang, Yong Ma, Danica Kragic

Subjects: Human-Computer Interaction (cs.HC); Robotics (cs.RO)
[425] arXiv:2404.15193 (replaced) [pdf, other]: Title: Structurally Flexible Neural Networks: Evolving the Building Blocks for General Agents

Authors: Joachim Winther Pedersen, Erwan Plantec, Eleni Nisioti, Milton Montero, Sebastian Risi

Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[426] arXiv:2404.15772 (replaced) [pdf, other]: Title: Bi-Mamba+: Bidirectional Mamba for Time Series Forecasting

Authors: Aobo Liang, Xingguo Jiang, Yan Sun, Xiaohou Shi, Ke Li

Comments: New Mamba-based architecture. All experiments rerun

Subjects: Machine Learning (cs.LG)
[427] arXiv:2404.16061 (replaced) [pdf, ps, other]: Title: Dynamic Many Valued Logic Systems in Theoretical Economics

Authors: Daniel Lu

Subjects: Logic in Computer Science (cs.LO); Theoretical Economics (econ.TH)
[428] arXiv:2404.16267 (replaced) [pdf, ps, other]: Title: Dynamic PageRank: Algorithms and Lower Bounds

Authors: Rajesh Jayaram, Jakub Łącki, Slobodan Mitrović, Krzysztof Onak, Piotr Sankowski

Subjects: Data Structures and Algorithms (cs.DS)
[429] arXiv:2404.16296 (replaced) [pdf, ps, other]: Title: Research on Splicing Image Detection Algorithms Based on Natural Image Statistical Characteristics

Authors: Ao Xiang, Jingyu Zhang, Qin Yang, Liyang Wang, Yu Cheng

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[430] arXiv:2404.17027 (replaced) [pdf, other]: Title: Player-Driven Emergence in LLM-Driven Game Narrative

Authors: Xiangyu Peng, Jessica Quaye, Weijia Xu, Portia Botchway, Chris Brockett, Bill Dolan, Nebojsa Jojic, Gabriel DesGarennes, Ken Lobb, Michael Xu, Jorge Leandro, Claire Jin, Sudha Rao

Journal-ref: IEEE Conference on Games 2024

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[431] arXiv:2404.18348 (replaced) [pdf, ps, other]: Title: Bilinear optimal control for the Stokes-Brinkman equations: a priori and a posteriori error analyses

Authors: Alejandro Allendes, Gilberto Campaña, Enrique Otarola

Subjects: Numerical Analysis (math.NA); Optimization and Control (math.OC)
[432] arXiv:2404.18369 (replaced) [pdf, other]: Title: A SAT Scalpel for Lattice Surgery: Representation and Synthesis of Subroutines for Surface-Code Fault-Tolerant Quantum Computing

Authors: Daniel Bochen Tan, Murphy Yuezhen Niu, Craig Gidney

Comments: To appear in 2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)

Subjects: Quantum Physics (quant-ph); Emerging Technologies (cs.ET)
[433] arXiv:2404.18807 (replaced) [pdf, other]: Title: The Landscape of Unfolding with Machine Learning

Authors: Nathan Huetsch, Javier Mariño Villadamigo, Alexander Shmakov, Sascha Diefenbacher, Vinicius Mikuni, Theo Heimel, Michael Fenton, Kevin Greif, Benjamin Nachman, Daniel Whiteson, Anja Butter, Tilman Plehn

Subjects: High Energy Physics - Phenomenology (hep-ph); Machine Learning (cs.LG); High Energy Physics - Experiment (hep-ex)
[434] arXiv:2405.00820 (replaced) [pdf, other]: Title: HLSFactory: A Framework Empowering High-Level Synthesis Datasets for Machine Learning and Beyond

Authors: Stefan Abi-Karam, Rishov Sarkar, Allison Seigler, Sean Lowe, Zhigang Wei, Hanqiu Chen, Nanditha Rao, Lizy John, Aman Arora, Cong Hao

Comments: Edit to "Section V.E" for proper attribution of open-source HLSyn, AutoDSE, and the Merlin compiler

Subjects: Hardware Architecture (cs.AR); Machine Learning (cs.LG)
[435] arXiv:2405.00942 (replaced) [pdf, other]: Title: LLaVA Finds Free Lunch: Teaching Human Behavior Improves Content Understanding Abilities Of LLMs

Authors: Somesh Singh, Harini S I, Yaman K Singla, Veeky Baths, Rajiv Ratn Shah, Changyou Chen, Balaji Krishnamurthy

Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
[436] arXiv:2405.01389 (replaced) [pdf, other]: Title: Invariant Risk Minimization Is A Total Variation Model

Authors: Zhao-Rong Lai, Weiwen Wang

Comments: ICML 2024

Subjects: Machine Learning (cs.LG)
[437] arXiv:2405.02288 (replaced) [pdf, other]: Title: Prospective Role of Foundation Models in Advancing Autonomous Vehicles

Authors: Jianhua Wu, Bingzhao Gao, Jincheng Gao, Jianhao Yu, Hongqing Chu, Qiankun Yu, Xun Gong, Yi Chang, H. Eric Tseng, Hong Chen, Jie Chen

Comments: 45 pages,8 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Robotics (cs.RO)
[438] arXiv:2405.02385 (replaced) [pdf, other]: Title: Efficient Deep Learning with Decorrelated Backpropagation

Authors: Sander Dalm, Joshua Offergeld, Nasir Ahmad, Marcel van Gerven

Subjects: Machine Learning (cs.LG)
[439] arXiv:2405.02508 (replaced) [pdf, other]: Title: Rasterized Edge Gradients: Handling Discontinuities Differentiably

Authors: Stanislav Pidhorskyi, Tomas Simon, Gabriel Schwartz, He Wen, Yaser Sheikh, Jason Saragih

Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
[440] arXiv:2405.02924 (replaced) [pdf, other]: Title: Optimal Sampling for Uncertainty-of-Information Minimization in a Remote Monitoring System

Authors: Xiaomeng Chen, Aimin Li, Shaohua Wu

Subjects: Information Theory (cs.IT)
[441] arXiv:2405.03342 (replaced) [pdf, other]: Title: Doubly Robust Causal Effect Estimation under Networked Interference via Targeted Learning

Authors: Weilin Chen, Ruichu Cai, Zeqin Yang, Jie Qiao, Yuguang Yan, Zijian Li, Zhifeng Hao

Comments: Accepted by ICML 2024

Subjects: Machine Learning (cs.LG)
[442] arXiv:2405.04407 (replaced) [pdf, ps, other]: Title: Super-Exponential Regret for UCT, AlphaGo and Variants

Authors: Laurent Orseau, Remi Munos

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[443] arXiv:2405.04903 (replaced) [pdf, other]: Title: Imbalanced Graph Classification with Multi-scale Oversampling Graph Neural Networks

Authors: Rongrong Ma, Guansong Pang, Ling Chen

Subjects: Machine Learning (cs.LG)
[444] arXiv:2405.05170 (replaced) [pdf, other]: Title: Picking watermarks from noise (PWFN): an improved robust watermarking model against intensive distortions

Authors: Sijing Xie, Chengxin Zhao, Nan Sun, Wei Li, Hefei Ling

Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
[445] arXiv:2405.05886 (replaced) [pdf, other]: Title: Exploiting Autoencoder's Weakness to Generate Pseudo Anomalies

Authors: Marcella Astrid, Muhammad Zaigham Zaheer, Djamila Aouada, Seung-Ik Lee

Comments: SharedIt link: this https URL

Journal-ref: Neural Computing and Applications, pp.1-17 (2024)

Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
[446] arXiv:2405.06488 (replaced) [pdf, other]: Title: Can Neural Networks learn Finite Elements?

Authors: Julia Novo, Eduardo Terrés

Subjects: Numerical Analysis (math.NA)
[447] arXiv:2405.06624 (replaced) [pdf, other]: Title: Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems

Authors: David "davidad" Dalrymple, Joar Skalse, Yoshua Bengio, Stuart Russell, Max Tegmark, Sanjit Seshia, Steve Omohundro, Christian Szegedy, Ben Goldhaber, Nora Ammann, Alessandro Abate, Joe Halpern, Clark Barrett, Ding Zhao, Tan Zhi-Xuan, Jeannette Wing, Joshua Tenenbaum

Subjects: Artificial Intelligence (cs.AI)
[448] arXiv:2405.07291 (replaced) [pdf, other]: Title: Robust Beamforming with Gradient-based Liquid Neural Network

Authors: Xinquan Wang, Fenghao Zhu, Chongwen Huang, Ahmed Alhammadi, Faouzi Bader, Zhaoyang Zhang, Chau Yuen, Merouane Debbah

Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
[449] arXiv:2405.07703 (replaced) [pdf, other]: Title: OpenLLM-Ro -- Technical Report on Open-source Romanian LLMs

Authors: Mihai Masala, Denis C. Ilie-Ablachim, Dragos Corlatescu, Miruna Zavelca, Marius Leordeanu, Horia Velicu, Marius Popescu, Mihai Dascalu, Traian Rebedea

Subjects: Computation and Language (cs.CL)
[450] arXiv:2405.07836 (replaced) [pdf, other]: Title: Forecasting with Hyper-Trees

Authors: Alexander März, Kashif Rasul

Comments: Forecasting, Gradient Boosting, Hyper-Networks, LightGBM, Parameter Non-Stationarity, Time Series, XGBoost

Subjects: Machine Learning (cs.LG); Methodology (stat.ME)
[451] arXiv:2405.07919 (replaced) [pdf, other]: Title: Exploring the Low-Pass Filtering Behavior in Image Super-Resolution

Authors: Haoyu Deng, Zijing Xu, Yule Duan, Xiao Wu, Wenjie Shu, Liang-Jian Deng

Comments: Accepted by ICML 2024

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[452] arXiv:2405.08253 (replaced) [pdf, ps, other]: Title: Thompson Sampling for Infinite-Horizon Discounted Decision Processes

Authors: Daniel Adelman, Cagla Keceli, Alba V. Olivares-Nadal

Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Optimization and Control (math.OC)
[453] arXiv:2405.08299 (replaced) [pdf, other]: Title: Differentially Private Federated Learning: A Systematic Review

Authors: Jie Fu, Yuan Hong, Xinpeng Ling, Leixia Wang, Xun Ran, Zhiyu Sun, Wendy Hui Wang, Zhili Chen, Yang Cao

Comments: 37pages

Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
[454] arXiv:2405.08337 (replaced) [pdf, ps, other]: Title: Perivascular space Identification Nnunet for Generalised Usage (PINGU)

Authors: Benjamin Sinclair, Lucy Vivash, Jasmine Moses, Miranda Lynch, William Pham, Karina Dorfman, Cassandra Marotta, Shaun Koh, Jacob Bunyamin, Ella Rowsthorn, Alex Jarema, Himashi Peiris, Zhaolin Chen, Sandy R Shultz, David K Wright, Dexiao Kong, Sharon L. Naismith, Terence J. OBrien, Meng Law

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[455] arXiv:2405.08366 (replaced) [pdf, other]: Title: Towards Principled Evaluations of Sparse Autoencoders for Interpretability and Control

Authors: Aleksandar Makelov, George Lange, Neel Nanda

Subjects: Machine Learning (cs.LG)
[456] arXiv:2405.08403 (replaced) [pdf, other]: Title: TFWT: Tabular Feature Weighting with Transformer

Authors: Xinhao Zhang, Zaitian Wang, Lu Jiang, Wanfu Gao, Pengfei Wang, Kunpeng Liu

Comments: Accepted by IJCAI 2024

Subjects: Machine Learning (cs.LG)
[457] arXiv:2405.08988 (replaced) [pdf, other]: Title: Techniques for Showing the Decidability of the Boundedness Problem of Language Acceptors

Authors: Oscar H. Ibarra, Ian McQuillan

Comments: 23 pages,2 figures

Subjects: Formal Languages and Automata Theory (cs.FL)
[458] arXiv:2405.09062 (replaced) [pdf, other]: Title: Naturalistic Music Decoding from EEG Data via Latent Diffusion Models

Authors: Emilian Postolache, Natalia Polouliakh, Hiroaki Kitano, Akima Connelly, Emanuele Rodolà, Taketo Akama

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[459] arXiv:2405.09312 (replaced) [pdf, ps, other]: Title: Agnostic Active Learning of Single Index Models with Linear Sample Complexity

Authors: Aarshvi Gajjar, Wai Ming Tai, Xingyu Xu, Chinmay Hegde, Christopher Musco, Yi Li

Subjects: Machine Learning (cs.LG)
[460] arXiv:2405.09531 (replaced) [pdf, other]: Title: Ticket-based multi-strand method for increased efficiency in proof-of-work based blockchains

Authors: Elias Rudberg

Comments: 6 pages

Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
[461] arXiv:2405.09551 (replaced) [pdf, other]: Title: Towards Bi-Hemispheric Emotion Mapping through EEG: A Dual-Stream Neural Network Approach

Authors: David Freire-Obregón, Daniel Hernández-Sosa, Oliverio J. Santana, Javier Lorenzo-Navarro, Modesto Castrillón-Santana

Comments: Second place award at the Brain Responses to Emotional Avatars Challenge held by the 18th IEEE International Conference on Automatic Face and Gesture Recognition(FG2024)

Subjects: Signal Processing (eess.SP); Human-Computer Interaction (cs.HC)
[462] arXiv:2405.09554 (replaced) [pdf, ps, other]: Title: Underdetermined DOA Estimation of Off-Grid Sources Based on the Generalized Double Pareto Prior

Authors: Yongfeng Huang, Zhendong Chen, Kun Ye, Lang Zhou, Haixin Sun

Subjects: Signal Processing (eess.SP); Information Theory (cs.IT)
[463] arXiv:2405.09591 (replaced) [pdf, other]: Title: A Comprehensive Survey on Data Augmentation

Authors: Zaitian Wang, Pengfei Wang, Kunpeng Liu, Pengyang Wang, Yanjie Fu, Chang-Tien Lu, Charu C. Aggarwal, Jian Pei, Yuanchun Zhou

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[464] arXiv:2405.09713 (replaced) [pdf, other]: Title: SOK-Bench: A Situated Video Reasoning Benchmark with Aligned Open-World Knowledge

Authors: Andong Wang, Bo Wu, Sunli Chen, Zhenfang Chen, Haotian Guan, Wei-Ning Lee, Li Erran Li, Chuang Gan

Comments: CVPR

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[465] arXiv:2405.09733 (replaced) [pdf, other]: Title: SCI 3.0: A Web-based Schema Curation Interface for Graphical Event Representations

Authors: Reece Suchocki, Mary Martin, Martha Palmer, Susan Brown

Subjects: Computation and Language (cs.CL)
[466] arXiv:2405.09735 (replaced) [pdf, other]: Title: An Analysis of Sentential Neighbors in Implicit Discourse Relation Prediction

Authors: Evi Judge, Reece Suchocki, Konner Syed

Subjects: Computation and Language (cs.CL)
[467] arXiv:2405.09814 (replaced) [pdf, other]: Title: Semantic Gesticulator: Semantics-Aware Co-Speech Gesture Synthesis

Authors: Zeyi Zhang, Tenglong Ao, Yuyao Zhang, Qingzhe Gao, Chuan Lin, Baoquan Chen, Libin Liu

Comments: SIGGRAPH 2024 (Journal Track); Project page: this https URL

Subjects: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[468] arXiv:2405.09817 (replaced) [pdf, other]: Title: Active Learning with Fully Bayesian Neural Networks for Discontinuous and Nonstationary Data

Authors: Maxim Ziatdinov

Comments: Fixed PGM in Figure 2 and update caption

Subjects: Machine Learning (cs.LG); Data Analysis, Statistics and Probability (physics.data-an)
[469] arXiv:2405.09883 (replaced) [pdf, other]: Title: RoScenes: A Large-scale Multi-view 3D Dataset for Roadside Perception

Authors: Xiaosu Zhu, Hualian Sheng, Sijia Cai, Bing Deng, Shaopeng Yang, Qiao Liang, Ken Chen, Lianli Gao, Jingkuan Song, Jieping Ye

Comments: Technical report. 32 pages, 21 figures, 13 tables. this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[470] arXiv:2405.09955 (replaced) [pdf, other]: Title: Dual-band feature selection for maturity classification of specialty crops by hyperspectral imaging

Authors: Usman A. Zahidi, Krystian Łukasik, Grzegorz Cielniak

Comments: Preprint: Paper submitted to the special issue of "Computers and Electronics in Agriculture"

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[471] arXiv:2405.10029 (replaced) [pdf, other]: Title: AsCL: An Asymmetry-sensitive Contrastive Learning Method for Image-Text Retrieval with Cross-Modal Fusion

Authors: Ziyu Gong, Chengcheng Mai, Yihua Huang

Comments: This work has been strong-accepted as the oral conference paper by IEEE International Conference on Multimedia & Expo (ICME) 2024

Subjects: Multimedia (cs.MM)
[472] arXiv:2405.10128 (replaced) [pdf, other]: Title: Red Teaming Language Models for Contradictory Dialogues

Authors: Xiaofei Wen, Bangzheng Li, Tenghao Huang, Muhao Chen

Comments: 18 pages, 5 figures

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[473] arXiv:2405.10267 (replaced) [pdf, other]: Title: Sharpness-Aware Minimization in Genetic Programming

Authors: Illya Bakurov, Nathan Haut, Wolfgang Banzhaf

Comments: Submitted to the Genetic Programming Theory and Practice workshop 2024

Subjects: Neural and Evolutionary Computing (cs.NE); Machine Learning (cs.LG)
[474] arXiv:2405.10292 (replaced) [pdf, other]: Title: Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning

Authors: Yuexiang Zhai, Hao Bai, Zipeng Lin, Jiayi Pan, Shengbang Tong, Yifei Zhou, Alane Suhr, Saining Xie, Yann LeCun, Yi Ma, Sergey Levine

Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[475] arXiv:2405.10320 (replaced) [pdf, other]: Title: Toon3D: Seeing Cartoons from a New Perspective

Authors: Ethan Weber, Riley Peterlinz, Rohan Mathur, Frederik Warburg, Alexei A. Efros, Angjoo Kanazawa

Comments: Please see our project page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)

New submissions
Cross-lists
Replacements

[ total of 475 entries: 1-475 ]
[ showing up to 2000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, recent, 2405, contact, help (Access key information)

> cs

Computer Science

New submissions

New submissions for Mon, 20 May 24

Cross-lists for Mon, 20 May 24

Replacements for Mon, 20 May 24