Veröffentlichungen

Picking a specific object is an essential task of assistive robotics. While the majority of grasp detection approaches focus on grasp synthesis from a single depth image or point cloud, this approach is often not viable in an unstructured, uncontrolled environment. Due to occlusion, heavy influence of noise or simply because no collision-free grasp is visible from some perspectives, it is beneficial to collect additional information from other views before opting for grasp execution. We present a closed-loop approach that selects and navigates towards the next-best-view by minimizing the entropy of the volume under consideration. We use a local measure of estimation uncertainty of the surface reconstruction, to sample grasps and estimate their success probabilities in an online fashion. Our experiments show that our algorithm achieves better grasp success rates than comparable approaches, when presented with challenging household objects.
3D Point clouds have unique computational challenges due to their unstructured sparse nature and non-uniform density. Recent 3D object detectors usually apply sparse convolutional networks for feature extraction owing to its efficiency. However, sparse CNNs suffer from the submanifold dilation problem of regular sparse convolutions yielding more and more active sites in feature maps. Thus, leading reduced feature sparsity and higher computational burden. We find that 3D object detector’s learned underlying geometric and spatial distributions in deeper feature maps can serve as an important cue to determine crucial areas at runtime to eliminate the extra computations in sparse convolution networks. A Bayesian feature selection approach is proposed which adds zero training overhead while streamlining active sites at the inference stage. We reveal inherent divergences between object and background distributions in high-dimensional feature space and exploit this to classify features and prune non-essential regions. It is demonstrated that solely leveraging these learned distributions, our approach achieves significant memory savings with only minor performance drawbacks.
Depth sensors in general, but especially consumer level sensors, show significantly varying noise characteristics depending on the environment and the measured material. If only an empirically determined sensor model is used, the quality of some reconstructed surface areas might therefore be worse than estimated. In order to gain an accurate surface approximation with meaningful corresponding estimation variances, noise characteristics of surface elements must be treated individually. We propose a probabilistic approach that combines an empirically determined sensor noise model with an incrementally updated estimation of the local noise distribution. Experiments on a publicly available dataset demonstrate that our algorithm reconstructs challenging scenes with comparatively high accuracy and fewer outliers.
LiDAR-based 3D point cloud segmentation is a crucial component in the perception system of many applications that rely on a thorough 3D environment understanding. LiDAR point clouds are unstructured, sparsely scattered in 3D space and have non-uniform densities, which restricts the use of ordinary convolutional neural networks. Tackling the point clouds irregular format, several approaches are introduced that divide the 3D space into a discretised volumetric grid. However, voxelization inevitably abandons the 3D topology of point clouds and suffers from information loss. In this paper, we propose SSL-VoxPart, a novel approach of scan-pattern related partitioning for voxel-wise semantic segmentation suitable to MEMS-actuated Solid-State LiDARs. We argue scan-pattern related voxel partitioning is worth being considered by showing improvements in extracting semantic information for voxel-wise point cloud segmentation, while reducing lossy cell-label encoding. We specifically conduct to the problems of crowded scene analysis. Crowds represent locally high density distributions of people with heavy occlusion effects, demanding for fine-grained predictions. We introduce a SSL data generation pipeline tailored to surveillance scenarios and validate our proposed method on the resulting dataset.
Multivariate time series can reveal a lot about their present circumstances. Such data may be used for a variety of purposes, including tracking patient health information, recognizing server hacking attempts or spotting unusual activity in industrial plants. In this work we introduce a novel approach for detecting anomalies in multivariate time series. We use a Generative Adversarial Network (GAN) that is conditioned on the input signal by a convolutional autoencoder consisting of one encoder and two decoders. The generator of the GAN reconstructs the input twice with different quality, which helps the discriminator to distinguish better between the original sample and the generated ones. While other approaches are using smaller time windows as input, we focus on longer sequences to be able to identify and evaluate long-term dependencies in the time series. The anomaly score for each input is calculated as a weighted combination of the discriminator output and the latent space differences. Experiments show that our setup returns better results in terms of F1 scores for common anomaly detection datasets than networks with comparable input sample length.
Deep classification networks play an important role as backbone networks in industrial AI applications. These applications are often cost or safety critical; explainability of the AI results is a highly demanded feature. We introduce CAM fostering, a method to improve the explainability of classification nets based on local layers such as convolutional or pooling layers. Several CAM interpretability measures are defined and used as additional loss terms. Even though the method requires second-order derivatives, it is demonstrated that deep nets can be trained on large datasets without frozen parameters. The training parameters can be chosen such that the accuracy degradation remains decent in favor of the CAM interpretability improvement. We conclude by comparing the results of different training parameter configurations.
Sensor data analysis can be used to detect anomalies, frauds, or attacks in various scenarios at an early stage. Identifying abnormal behavior in time series data often requires processing a large period because distinct time steps might be highly contextually correlated. Current state-of-the-art methods on multivariate time-series anomaly detection use relatively small windows as an input to their model since they are computationally less expensive for most architectures. In this work, we propose a novel Transformer architecture based on a symmetrical U-shaped network using the Shifted Windows (Swin) technique, aiming to detect anomalies in multivariate time series data. We focus on long sequences to catch distant contexts, which unveils more anomalous behavior compared to state-of-the-art deep learning reconstruction-based anomaly detection algorithms while keeping computational costs low. Our input window is thereby up to 200 times larger than that of other models that are used to reconstruct signals of an industrial context. Extensive experiments on industry-related datasets show that most related architectures are designed to achieve good results with a specific input size but are not easily adaptable to larger sizes. Our algorithm can detect anomalies when trained with different input window sizes and achieve remarkable results when larger windows are used.
Many state-of-the-art grasping approaches are constrained to top-down grasps. Reliable robotic grasping in a human-centric environment requires considering all six degrees of freedom. We use an end-effector mounted depth camera to reconstruct the object’s surface by fusing data gathered along a shaped trajectory. The utilization of a truncated signed distance function and an effective pose refinement algorithm counteract typical sources of error. We propose to use a multi-view deep learning approach to vastly limit the search space of possible grasps and employ robust quality metrics to estimate their chances of success. To evaluate the performance of our approach we conducted extensive real-world experiments and achieved an average success rate of 92.3%.
We consider a light-weight method which allows to improve the explainability of localized classification networks. The method considers (Grad)CAM maps during the training process by modification of the training loss and does not require additional structural elements. It is demonstrated that the (Grad)CAM interpretability, as measured by several indicators, can be improved in this way. Since the method shall be applicable on embedded systems and on standard deeper architectures, it essentially takes advantage of second order derivatives during the training and does not require additional model layers.
Reliable robotic grasping of a prior unknown object requires a three-dimensional volumetric scene. Recent successful approaches use convolutional neural networks to find grasp candidates in depth images. We propose to expand this strategy by using multiple real and virtual viewpoints and projecting predicted grasp quality information to a surface representation of the object. This allows us to find 6-DOF grasp poses for arbitrary, unknown objects.
Der technologische Fortschritt in der letzten Dekade im Bereich der Robotik und der künstlichen Intelligenz birgt großes Potenzial für robotische Assistenzsysteme im medizinischen/pflegerischen Bereich. Das Potenzial wird dabei besonders ausgeschöpft, wenn das System mit autonomen Funktionalitäten ausgestattet ist. Nach einem Vorschlag zur Taxonomie von autonomen Funktionalitäten im medizinischen/pflegerischen Bereich betrachten wir einen Demonstrator, der das Potenzial autonomer Teilfunktionalitäten selbst für eng definierbare Aufgaben aufzeigt. Es wird hierzu die Aufgabe des autonomen „Löffelns“ unter Verwendung eines handelsüblichen, im medizinischen/pflegerischen Bereich genutzten Manipulatorarms implementiert. Die Funktionsweise des Systems und seiner wichtigsten Komponenten wird vorgestellt. Nach einer experimentellen Durchführung der Aufgabe betrachten wir abschließend den Nutzen und die durch die autonomen Funktionen entstandenen Zusatzkosten.