source

Papers & reports

Adversarial Testing as a Tool for Interpretability: Length-based Overfitting of Elementary Functions in Transformers

The Transformer model has a tendency to overfit various aspects of the training data, such as the overall sequence length. We study elementary string edit functions using a defined set of error indicators to interpret the behaviour of the sequence-to-sequence Transformer. We show that generalization to shorter sequences is often possible, but confirm that longer sequences are highly problematic, although partially correct answers are often obtained. Additionally, we find that other structural characteristics of the sequences, such as subsegment length, may be equally important. We hypothesize that the models learn algorithmic aspects of the tasks simultaneously with structural aspects but adhering to the structural aspects is unfortunately often preferred by Transformer when they come into conflict.
Fighting action recognition

Several fighting action classifiers based on RGB-D videos were developed. The Motion History Images (MHIs) and Motion Energy Images (MEIs) were extracted and used together with SIFT bag-of-words (BoW) features. Both shallow neural netwoks and SVMs were outperformed by the KNN. Lastly, deep neural networks using both 2D and 3D convolution have been trained directly on the RGB-D videos and the MHI + MEI motion silhouettes. The 3D-2D CNN significantly outperformed all classical models and the 2D MHI + MEI CNN. The MHI + MEI silhouettes are argued to insufficiently capture the relevant information compared to “trainable deep motion silhouettes” extracted by the 3D-2D CNN. The SIFT based BoW is argued not to be suited the partial body movement actions. Performance-efficient CNN alternatives are proposed.
Face mask detection using one-stage and two-stage CNNs

Multiple models have been trained on the WIDER FACE and MASKED FACE datasets. A two-stage CNN model has been trained separately on the two datasets, and used to enhance the MASKED FACE dataset. Further, one-stage YOLO detectors have been trained on the concatenation of the two datasets, with either the original or the enhanced MASKED FACE version. Both one-stage variants perform similarly well and outperform the two-stage model. The advantages and disadvantages of all approaches are discussed.
Deep-learning movie recommendation

A modular movie recommendation algorithm for the TMDB 5000 dataset was designed, with a number of content-based feature pipelines. The features used range from very simple, such as rescaled numerical data or document TF-IDF, to deep-learning hidden states (BERT). All pipilines are found to be significantly better than random recommendations. Deep-learning and classical-learning based features perform similarly well. Advantages and limitations are discussed.
Visual search experiment report

This report lays out the basics of visual search and the evaluation of its perfomance. Participants were asked to take a short visual search test. The reaction time slope was found to be significantly larger when the target is absent and when a conjunction of multiple features is being considered. Discussion of the results is given.
ETRA 2019 challenge report

Positional eye-tracking data from the ETRA 2019 challenge dataset is analyzed to check for one-to-one search correspondence in puzzle type experiments and for heterogeneities of visual attention with respect to the target area color. For puzzle compared to non-puzzle image types, left image attention is found to be a significant predictor of the right image attention, suggesting one-to-search search patterns. The observed color heterogeneities, e.g. RGB red or HSV value are discussed.

Papers & reports

Adversarial Testing as a Tool for Interpretability: Length-based Overfitting of Elementary Functions in Transformers

Fighting action recognition

Face mask detection using one-stage and two-stage CNNs

Deep-learning movie recommendation

Visual search experiment report

ETRA 2019 challenge report