# Students’ day, Friday, 4 November 2022, 09:00 - 12:00
University of Warsaw, Faculty of Mathematics, Informatics and Mechanics
Banacha 2 Street – 02-097 Warsaw
An event dedicated to students taking first steps as researchers and speakers, consisting of a series of talks and networking sessions. More information can be found here
Attendance at this event is free of charge!
09:00 - 09:05 CET: Opening remarks
09:05 - 09:25 CET: Extraction of semantic relations using selected methods and data for the Polish language (Grzegorz Kwiatkowski)
09:25 - 09:45 CET: Linear probing of transformer models for Slavic languages (Aleksandra Mysiak)
09:45 - 10:05 CET: HyperSound: Generating Implicit Neural Representations of Audio Signals with Hypernetworks (Filip Szatkowski)
10:05 - 10:25 CET: How weakening the constraints on non-negative model leads to the more practical XAI (Michał Balicki)
10:25 - 10:45 CET: Esperanto constituency parser (Tomasz Michalik)
10:45 - 11:05 CET: SurvSHAP(t): Time-dependent explanations of machine learning survival models (Mateusz Krzyziński)
11:05 - 11:25 CET: Application of Advanced Text Data Analysis in Science (Aleksandra Kowalczuk)
11:25 - 11:45 CET: Universal Image Embedding (Konrad Szafer)
11:45 - 11:50 CET: Closing remarks
# Students’ day - NVIDIA Workshop, Friday, 4 November 2022, 08:00 - 14:00 (Room 4.05)
ul. Rektorska 4 00-614 Warszawa
During the workshop you will learn about the mechanics of deep learning and popular techniques, train your computer vision model and finally use transfer learning to train your own RNN model.
After the workshop you can take part in the exam, passing which you will be provided with the certificate that confirms you understood all of the topics.
Registration form: https://mlinplworkshop.paperform.co/
Note: we may close the form earlier in case we have an overwhelming number of eligible participants.
For more details: https://www.nvidia.com/en-us/training/instructor-led-workshops/fundamentals-of-deep-learning/
# Friday, 4 November 2022
University of Warsaw, Auditorium Maximum
Krakowskie Przedmieście 26/28 Street - 00-927 Warszawa
15:00 - 15:45 CET: Registration
15:45 - 16:00 CET: Opening remarks
The opening remarks of the project leaders Alicja Grochocka and Dima Zhylko.
16:00 - 17:00 CET: Keynote Lecture: Kyunghyun Cho
Lab-in-the-loop de novo antibody design - what are we missing from machine learning?
In this talk, I will give an overview of what we do at Prescient Design by introducing the concept of lab-in-the-loop de novo antibody design with particular emphasis on its computational side. I will then deconstruct the computational side of lab-in-the-loop design into three major components; generative modeling, computational oracles and active learning. I will then go over why each component is necessary, how each component still requires significant scientific research and what Prescient Design is doing to address these challenges.
17:05 - 18:20 CET: Strategic Sponsor’s Lecture: QuantumBlack, AI by McKinsey
Through this series of short talks, QuantumBlack/McKinsey practitioners will share how they work across industries to push AI's limits and create solutions that transform organizations. As part of this session, you'll hear how they used Reinforcement Learning to help Emirates Team New Zealand win the 36th America's Cup. They'll also demo some of their products including Customer1, a tool that helps optimize customer journeys, and Kedro, an open-sourced Python framework for creating maintainable and modular data science code. Finally, you will hear from one of QuantumBlack's brilliant data scientists about his career journey.
18:35 - 19:35 CET: Panel: Popularization of ML Research
During the panel, we will discuss how we should popularize research. What should we think about when presenting AI to the public and what are some guidelines which help to reach a wider audience? We will also address the misleading presentation of scientific results.
Panelists: Piotr Migdał (Quantum Flytrap), Yannick Kilcher (DeepJudge), Letitia Parcalabescu (Heidelberg University)
19:40 - 20:40 CET: Keynote Lecture: Petar Veličković
Amazing things that happen with Human-AI synergy
For the past few years, I have been working on a challenging project: teaching machines to assist humans with proving difficult theorems and conjecturing new approaches to long-standing open problems. Alongside our pure mathematician collaborators from the Universities of Oxford and Sydney, we have demonstrated that analyzing and interpreting the outputs of (graph) neural networks offers a concrete way of empowering human intuition. This allowed us to derive novel top-tier mathematical results in areas as diverse as representation theory and knot theory. The significance of these results has been recognised by the journal Nature, where our work featured on the cover page. Naturally, being on a project of this scale gets one thinking: what other kinds of amazing things can one do when AI and human domain experts synergistically interact? During this talk, I will offer my personal perspective on these findings, the key details of our modelling work, and also positioning them in the "bigger picture" context of synergistic Human-AI efforts.
# Saturday, 5 November 2022
University of Warsaw, Faculty of Mathematics, Informatics and Mechanics
Banacha 2 Street – 02-097 Warsaw
09:30 - 10:00 CET: Registration
10:00 - 11:15 CET: Keynote Lecture: Cheng Zhang (Room 3180)
Causal Decision Making
Causal inference is essential for data-driven decision-making across domains such as business engagement, medical treatment, or policymaking. Building a framework that can answer real-world causal questions at scale is critical. However, research on deep learning, causal discovery, and inference has evolved separately. In this talk, I will present a Deep End-to-end Causal Inference (DECI) framework, a single flow-based method that takes in observational data and can perform both causal discovery and inference, including conditional average treatment effect estimation (CATE). Moreover, I will talk about how such a framework can be used with different real-world data, including time series or considering latent confounders. In the end, I will cover different application scenarios with the Microsoft causal AI suite. We hope that our work bridges the causality and deep learning communities leading to real-world impact.
10:00 - 11:15 CET: Keynote Lecture: Andriy Mnih (Room 4420)
Deep Generative Models Through the Lens of Inference
Latent variable modelling provides a powerful and flexible framework for generative models. We will start by introducing this framework along with the concept of inference, which is central to it. We will then cover three types of deep generative models: flow-based models, variational autoencoders, and diffusion models. We will highlight the interdependence between the model structure and the inference process, and explain the trade-offs each model type involves.
11:30 - 12:20 CET: Sponsor’s Talk: deepsense.ai (Room 3180)
TrelBERT – a model that understands the language of social media
Wojciech Szmyd, Data Scientist (deepsense.ai)
Abstract: With the growing importance and popularity of social media, the application of natural language processing techniques to this domain is becoming an increasingly trendy topic both among NLP researchers and practitioners. In this talk I will explain what is so interesting in the language of social media and why this topic requires special attention. deepsense.ai actively follows the latest international developments in the field of NLP and as part of an R&D project has introduced TrelBERT - a language model for Polish social media which has achieved outstanding results in detecting cyberbullying.
Speaker's bio: Wojtek Szmyd is a data scientist at deepsense.ai with five years of experience in development of machine learning systems. He graduated in ICT from AGH University of Science and Technology, but his innate curiosity meant that he has recently begun studying philosophy at the Jagiellonian University. His professional interests are connected with natural language processing. In his free time… he has no free time, but if he had, he would travel more.
11:30 - 12:20 CET: Contributed Talks Session 1: Learning with Positive and Unlabeled Data (Room 4420)
One class classification approach to variational learning from Positive Unlabeled Data
Jan Mielniczuk (Institute of Computer science, PAS and Warsaw University of Technology)
We present Empirical Risk Minimization approach in conjunction with variational inference method to learn classifiers for biased Positive Unlabeled data. Labeled data is biased meaning that labeling depends on covariates and thus the labeled data is not a random sample from a positive class. This extends method VAE-PU introduced by Na et al. (2020, ACM IKM Conference Proceedings) by proposing another estimator of theoretical risk of a classifier to be minimized, which has important advantages over the previous proposal. This is based on one class classification approach which turns out to be an effective method of detecting positive samples among unlabeled ones. Experiments performed on real data sets show that the proposed VAE-PU+OCC algorithm works very promisingly in comparison to its competitors such as the original VAE-PU and SAR-EM method in terms of accuracy and F1 score.
How to learn classifier chains using positive-unlabelled multi-label data?
Paweł Teisseyre (Institute of Computer Science, Polish Academy of Sciences and Faculty of Mathematics and Information Sciences, Warsaw University of Technology)
Multi-label learning deals with data examples which are associated with multiple class labels simultaneously. The problem has attracted significant attention in recent years and dozens of algorithms have been proposed. However in traditional multi-label setting it is assumed that all relevant labels are assigned to the given instance. This assumption is not met in many real-world situations. In positive unlabelled multi-label setting, only some of relevant labels are assigned. The appearance of a label means that the instance is really associated with this label, while the absence of the label does not imply that this label is not proper for the instance. For example, when predicting multiple diseases in one patient, some diseases can be undiagnosed however it does not mean that the patient does not have these diseases. Among many existing multi-label methods, classifier chains gained the great popularity mainly due to their simplicity and high predictive power. However, it turns out that adaptation of classifier chains to positive unlabelled framework is not straightforward, due to the fact that the true target variables are observed only partially and therefore they cannot be used directly to train the models in the chain. The partial observability concerns not only the current target variable in the chain but also the feature space, which additionally increases the difficulty of the problem. We propose two modifications of classifier chains. In the first method we scale the output probabilities of the consecutive classifiers in the chain. In the second method we minimize weighted empirical risk, with weights depending on prior probabilities of the target variables. The predictive performance of the proposed methods is studied on real multi-label datasets for different positive unlabelled settings.
11:30 - 12:20 CET: Contributed Talks Session 2: Computer Vision (Room 5440)
NeRF – Generating 3d world from a bunch of images
Łukasz Pierścieniewski (NVIDIA)
Neural radiance fields (NeRFs) are an emerging area in computer vision research that focuses on generating novel views of complex 3D scenes based on a partial set of 2D images. Instead of optimizing model parameters to perform decision-based tasks on unseen data, the model learns the representation of the given sample within its parameters. This allows it to freely generate object renders from different angles and positions based on this learned representation. The results are striking, but the depth maps produced as a side effect are even more useful. NeRF provides practitioners with the ability to model virtual worlds through scenes from the physical world, or insert virtual objects in correct places in augmented reality environments. In this talk we are going to dive into this technology, its current state-of-the-art approaches, and open challenges for the future
How to win MAI'2022 Depth Estimation challenge: manual backbone optimization combined with smart model pruning
Łukasz Treszczotko (TCL Research Europe)
Mobile AI challenge is an international competition organized in connection with CVPR'2022 and ECCV'2022 conferences, where teams from best universities and companies are supposed to optimize AI models for practical deployments. Here, both model quality and execution time should be jointly optimized for target low-power hardware platforms. Amongst 5 tasks, we took part in the Depth Estimation challenge evaluated on the Raspberry Pi 4 device and we managed to win the competition. In this talk, we show the techniques that led us to this success. We demonstrate our in-house developed neural model compression framework that contains 1) a pruning algorithm that learns channel importance directly from feature maps; 2) an algorithm that groups the interdependent blocks of channels and 3) device-aware pruning modes that can be adjusted to specific hardware platforms. Using this frameworks we have been able to successfully compress a wide range of backbone models, e.g., EfficientNet family, MobileNetV3 family, ViT-based backbones for classification as well as networks for semantic segmentation, denoising, detection, depth estimation, to name a few. This fully automated pruning technique was combined with manual backbone adjustments and some other tricks described during the talk. As a result, we got the highest quality rating of all competitors and the second-fastest model execution time on Raspberry Pi device (46 ms), beating the second-best candidate by almost 30% in the final score. We will also briefly discuss real-world applications of model pruning and depth estimation models on TCL smartphones.
12:20 - 13:30 CET: Lunch Break
13:30 - 14:20 CET: Sponsor’s Talk: Amazon (Room 3180)
Challenges and evolution of Alexa Text-To-Speech
Adam Furmanek, Senior Software Engineer (Amazon)
Abstract: It was quite a hike to deliver a first world-class voice assistant, but nowadays it is common to control computers with your voice. What about the future? What else is there in the world of Text-To-Speech? In this talk we are going to see what challenges Alexa TTS is facing now. We’ll go through the history of TTS, technical solutions, and then focus on what future brings. We’ll focus on both technical and business problems we’re solving now, and how these things will affect our lives.
Speaker's bio: Adam Furmanek is a Senior Software Engineer at Alexa. He works on the backend side of Alexa Text-To-Speech. He specializes in machine learning and large scale distributed systems. In his free time, he watches The Office and plays piano.
13:30 - 14:20 CET: Contributed Talks Session 3: Probabilistic & Auto Machine Learning (Room 4420)
Interactive sequential analysis of a model improves the performance of human decision-making
Hubert Baniecki (University of Warsaw)
Evaluation of explainable machine learning, especially with human subjects, became mandatory for the trustworthy adoption of predictive models in various applications. This contribution focuses on reporting the results from a user study considering evaluating model explanations in a real-world medical use case; as described in Section 5 of the paper: Baniecki, D. Parzych, P. Biecek. The Grammar of Interactive Explanatory Model Analysis, 2022 (https://arxiv.org/abs/2005.00497v4). The contribution's title is mainly supported by Tables 4 & 5. Moreover, we find both: the user study design and the IEMA grammar's theoretical background worth discussing.
Lightweight Conditional Model Extrapolation for Streaming Data under Class Prior Shift (LIMES)
Paulina Tomaszewska (Warsaw University of Technology)
Many Machine Learning models don’t work well under large class imbalance. The problem gets even more difficult when the class distribution changes over time. It may happen in non-stationary data streams. We address this issue in the proposed method called LIMES which stands for Lightweight Model Extrapolation for Streaming data under Class-Prior Shift. The model works in a continuous manner. It is lightweight as it adds no trainable parameters and almost no memory or computational overhead compared to training a single model. The solution is inspired by the MAML network where one model that can be easily adjusted to different scenarios. In the case of MAML, the adaptation is done using few gradient updates, in the case of LIMES, we apply analytical formula derived from the Bayesian rule. The core of the method is bias correction term that allows to shift from the reference class distribution to the target one. At this point, the extrapolation step is also needed in the case when the exact target class distribution is not a priori known. We evaluated our method on Twitter data where the goal was to find out the country where the tweet was issued. The results that we got show that LIMES outperforms baselines especially in the most difficult scenarios.
13:30 - 14:20 CET: Contributed Talks Session 4: Deep Learning (Room 5440)
Introduction and analyzing of Diffusion-based Deep Generative Models
Kamil Deja (Warsaw University of Technology)
Diffusion-based Deep Generative Models (DDGMs) offer state-of-the-art performance in generative modeling. Their main strength comes from their unique setup in which a model (the backward diffusion process) is trained to reverse the forward diffusion process, which gradually adds noise to the input signal. Although DDGMs are well studied, it is still unclear how the small amount of noise is transformed during the backward diffusion process. In this talk we will introduce the general idea behind Diffusion-based generative models while focusing on denoising and generative capabilities of those models. We observe a fluid transition point that changes the functionality of the backward diffusion process from generating a (corrupted) image from noise to denoising the corrupted image to the final sample. Based on this observation, we postulate to divide a DDGM into two parts: a denoiser and a generator. The denoiser could be parameterized by a denoising auto-encoder, while the generator is a diffusion-based model with its own set of parameters. We experimentally validate our proposition, showing its pros and cons.
Session-aware Graph Neural Network-based Recommendations for Custom Advertisement Segment Generation Focused on Game players (remote)
Tomasz Palczewski (Samsung Research America)
Recommendation engines are commonly used across a broad range of verticals. One can find these solutions in e-commerce, advertisement, marketing, news, on TVs, and within streaming service content. They are extremely important to keep a high-level user engagement when users face an overwhelming number of items that can be potentially presented to them. In general, recommendation systems can be divided into two groups - 1) systems built on general interaction information and 2) systems built on sequential interaction data. As the field has progressed, the research has revolved around expanding both groups to include different types of auxiliary information. The sequence-based recommendation systems can be categorized as session-based solutions that rely on the user's actions in the current sessions. The session-aware recommendation systems also leverage user information in the historical sessions. In this paper, we propose a novel Session-awaRe grAph Neural Network-based Recommendation model (called SRA-NN-Rec for brevity) and test it using publicly available datasets as well as present its usefulness in a real-word case for custom advertisement segment generation focused on game players. The key components of SRA-NN-Rec are: 1) a graph neural network that consumes user and session information, 2) a self-attention network to enhance item embeddings based on session data, and 3) an attention network. The experiment results show that the SRA-NN-Rec outperforms the existing baseline models and is an effective model for session-aware recommendations.
14:35 - 15:50 CET: Panel: Women in ML in PL (Room 3180)
This event will see a discussion on participants’ research topics, their career paths and their views on current ML trends.
Panelists: Farah Shamout (NYU Abu Dhabi) , Inez Okulska (NASK National Research Institute), Aga Dobrowolska (Relation Therapeutics), Daria Szmurlo (QuantumBlack, AI by McKinsey)
14:35 - 15:50 CET: Contributed Talks Session 5: NCBR IDEAS (Room 4420)
Improved Feature Importance Computation for Tree Models Based on the BanzhafValue
Piotr Sankowski (IDEAS NCBR, MIM Solutions and University of Warsaw)
The Shapley value -- a fundamental game-theoretic solution concept -- has recently become one of the main tools used to explain predictions of tree ensemble models. Another well-known game-theoretic solution concept is the Banzhaf value. Although the Banzhaf value is closely related to the Shapley value, its properties w.r.t. feature attribution have not been understood equally well. This paper shows that, for tree ensemble models, the Banzhaf value offers some crucial advantages over the Shapley value while providing similar feature attributions.
In particular, we first give an optimal O(TL+n) time algorithm for computing the Banzhaf value-based attribution of a tree ensemble model's output. Here, T is the number of trees, L is the maximum number of leaves in a tree, and n is the number of features. In comparison, the state-of-the-art Shapley value-based algorithm runs in O(TLD^2+n) time, where D denotes the maximum depth of a tree in the ensemble.
Next, we experimentally compare the Banzhaf and Shapley values for tree ensemble models. Both methods deliver essentially the same average importance scores for the studied datasets using two different tree ensemble models (the sklearn implementation of Decision Trees or xgboost implementation of Gradient Boosting Decision Trees). However, our results indicate that, on top of being computable faster, the Banzhaf is more numerically robust than the Shapley value.
Interpretable Deep Learning with Prototypes
Bartosz Zieliński (IDEAS, Jagiellonian University)
The broad application of deep learning in fields like medical diagnosis and autonomous driving enforces models to explain the rationale behind their decisions. That is why explainers and self-explainable models are developed to justify neural network predictions. Some of them are inspired by mechanisms used by humans to explain their decisions, like matching image parts with memorized prototypical features that an object can possess. Recently, a self-explainable model called Prototypical Part Network (ProtoPNet) was introduced, employing feature matching learning theory. It focuses on crucial image parts and compares them with reference patterns (prototypical parts) assigned to classes.
In this presentation, we will present our papers concerning this topic published on the main track of three recent conferences (SIGKDD 2021, ECML PKDD 2022, and ECCV 2022). They discuss the topic of prototype sharing between classes and their adaptation to the weakly-supervised setting.
Fine-Grained Conditional Computation in Transformers
Sebastian Jaszczur (IDEAS NCBR, University of Warsaw)
Large Transformer models yield impressive results on many tasks, but are expensive to train, or even fine-tune, and so slow at decoding that the use and study of the largest models becomes out of reach for many researchers and end-users. Conditional computation, or sparsity, may help alleviate those problems.
In my work "Sparse is Enough in Scaling Transformers", done at Google Research and published at NeurIPS 2021, we showed that sparse layers leveraging fine-grained conditional computation can enable Transformers to scale efficiently and perform unbatched decoding much faster than standard Transformer. Importantly, in contrast to standard Mixture-of-Expert methods, this fine-grained sparsity achieves the speed-up without decreasing the model quality, and with the same number of model parameters.
My current work on this topic, done at IDEAS NCBR, focuses on adjusting those conditional computation methods to the training environment, with the goal of speeding up the training process as well as the inference. This can be achieved by a careful redesign of fine-grained conditional computation while using only dense tensor operations, which are efficient on modern accelerators. While this is still an ongoing work, the preliminary results show the promise of improving the training speed of Transformers on existing hardware, without degrading the quality of the model's predictions.
14:35 - 15:50 CET: Contributed Talks Session 6: Reinforcement Learning (Room 5440)
Experimental Evaluation of Fictitious Self-play Algorithm in Unity Dodgeball Environment
Jacek Cyranka (University of Warsaw)
We introduce the generalized fictitious co-play algorithm (GFCP) for team video games. Our algorithm constitutes two phases and can be applied on top of any multi-agent RL training procedure in a setting of a simultaneous team game. First, a pool of checkpoints is trained using a vanilla self-play algorithm. Second, further training is carried out via fictitious co-play, i.e., setting one of the agents to inference mode and sampling previously frozen checkpoints. We evaluate our algorithm in the Unity Dodgeball elimination 4vs4 environment. The experiments show the favorable properties of GFCP agents in terms of win rate when matched with weaker checkpoints and previously unseen checkpoints that were not used for the training. On top of that, we evaluate the capability of the GFCP agents to match in a team with human players within a hand-crafted computer game built in the Unity environment.
Fast and Precise: Adjusting Planning Horizon with Adaptive Subgoal Search
Michał Tyrolski (University of Warsaw)
Complex reasoning problems contain states that vary in the computational cost required to determine a good action plan. Taking advantage of this property, we propose Adaptive Subgoal Search (AdaSubS), a search method that adaptively adjusts the planning horizon. To this end, AdaSubS generates diverse sets of subgoals at different distances. A verification mechanism is employed to filter out unreachable subgoals swiftly and thus allowing to focus on feasible further subgoals. In this way, AdaSubS benefits from the efficiency of planning with longer subgoals and the fine control with the shorter ones. We show that AdaSubS significantly surpasses hierarchical planning algorithms on three complex reasoning tasks: Sokoban, the Rubik's Cube, and inequality proving benchmark INT, setting new state-of-the-art on INT.
Reinforcement learning in Datacenters
Alessandro Seganti (Equinix)
Did you know that datacenters are a highly controlled environment? That inside a datacenter we have thousands of PID controllers that are automating every aspect of the functioning? This together with Equinix expertise, allows our datacenters to operate without flaws (99.9999% uptime record).
In our presentation we share our experience in using reinforcement learning in a datacenter. Hopefully by the end of the presentation you will learn what are the challenges and advantages of using reinforcement learning in a real world scenario.
During this presentation we will show how data centers are controlled. We will also introduce you to the kind of data we are using and how we are processing it. Then we will explain how we used reinforcement learning to improve the energy efficiency of a simulated datacenter. Finally we will show how our experiments and models have been used in real use cases inside the company.
16:05 - 17:20 CET: Contributed Talks Session 7: Natural Language Processing (Room 3180)
Automated Harmful Content Detection Using Grammar-Focused Representations of Text Data
Daria Stetsenko (NASK PIB)
Harmful content detection is one of the most important topics in natural language processing. Every couple of years, surveys and thesis are published that summarize the state-of-the-art methods to approach the problem. In 2019 even one of the shared tasks in SemEval Competition focused solely on offensive language detection. Both classical and neural models have been proposed and compared in the literature, with different levels of analysis: the lexical level (including sentiment analysis), the meta-information (context), and, if applicable, the multimodal perspective. However, almost all the mentioned models relied on semantic word or sentence embeddings, putting aside the potential of grammar. And there is a lot of meaningful information to be extracted from the grammar layer.
This research project aims to find models to extract and process the grammar indicators of harmful content in written texts, and investigate possible ways of representing them, most probably combined with the lexical ones. The research includes work on sentence and document embeddings that preserve linguistic information interpretable for both machine-learning algorithms and humans. We strive to set the clear-cut linguistic boundaries of syntactic constructions and semantic features for the categorization of harmful content, defined as a general category including hate speech, violence description, and pornography. The goal would be to determine a linguistic representation of each of them.
Building more helpful and safe dialogue agents via targeted human judgements
Maja Trębacz (DeepMind)
We present Sparrow, an information-seeking dialogue agent trained to be more helpful, correct, and harmless compared to prompted language model baselines. We use reinforcement learning from human feedback to train our models with two new additions to help human raters judge agent behavior. First, to make our agent more helpful and harmless, we break down the requirements for good dialogue into natural language rules the agent should follow, and ask raters about each rule separately. We demonstrate that this breakdown enables us to collect more targeted human judgements of agent behavior and allows for more efficient rule-conditional reward models. Second, our agent provides evidence from sources supporting factual claims when collecting preference judgements over model statements. For factual questions, evidence provided by Sparrow supports the sampled response 78% of the time. Sparrow is preferred more often than baselines while being more resilient to adversarial probing by humans, violating our rules only 8% of the time when probed. Finally, we conduct extensive analyses showing that though our model learns to follow our rules it can exhibit distributional biases.
Blogpost: https://dpmd.ai/sparrow, tech report: https://dpmd.ai/sparrow-paper
Uncertainty estimation in BERT-based Named Entity Recognition
Łukasz Rączkowski (Allegro Machine Learning Research)
Named Entity Recognition (NER) is one of the standard tasks in natural language processing and models optimized for it are routinely deployed for users. With the advent of deep learning in general, and Transformer-based architectures in particular, the SOTA for NER moved towards large parameter spaces. However, such models are known to be poorly calibrated black boxes, which do not provide any uncertainty estimates. These estimates are crucial for misclassification detection and out-of-distribution detection in deployed machine learning models. Despite the common application of NER, the issue of uncertainty estimation in that task is not well studied as of yet. This talk will cover a novel exploration of NER uncertainty estimation in the context of e-commerce product descriptions.
We pre-trained a BERT-based language model on our internal corpus of around 35 million product descriptions. We then used that pre-trained backbone to fine-tune NER classifiers in several product categories and utilized the variational dropout technique to measure the predictive entropy for each observation, which served as a general-purpose uncertainty metric.
With variational dropout, we were able to properly calibrate our NER models and the entropy uncertainty measure proved to be a good indicator of both misclassified and out-of-distribution observations. Moreover, thanks to the effective model averaging provided by variational dropout, we improved our internal NER evaluation metrics. The solution discussed in this talk is a relatively simple method of uncertainty estimation and can be effectively utilized in production-grade NER models.
16:05 - 17:20 CET: Remote Keynote Lecture: Marc Deisenroth (Room 4420, Remote)
Data-Efficient Machine Learning in Robotics
Data efficiency, i.e., learning from small datasets, is of practical importance in many real-world applications and decision-making systems. Data efficiency can be achieved in multiple ways, such as probabilistic modeling, where models and predictions are equipped with meaningful uncertainty estimates, Bayesian optimization, transfer learning, or the incorporation of valuable prior knowledge. In this talk, I will focus on how robot learning can benefit from data-efficient learning algorithms. In particular, I will motivate the use of Gaussian process models for learning good policies in three different settings: model-based reinforcement learning, transfer learning, and Bayesian optimization.
16:05 - 17:20 CET: Remote Keynote Lecture: Aleksander Mądry (Room 5440, Remote)
Datamodels: Predicting Predictions with Training Data
Machine learning models tend to rely on an abundance of training data. Yet, understanding the underlying structure of this data - and models' exact dependence on it - remains a challenge.
In this talk, we will present a new framework - called datamodeling - for directly modeling predictions as functions of training data. This datamodeling framework, given a dataset and a learning algorithm, pinpoints - at varying levels of granularity - the relationships between train and test point pairs through the lens of the corresponding model class. Even in its most basic version, datamodels enable many applications, including discovering subpopulations, quantifying model brittleness via counterfactuals, and identifying train-test leakage.
17:20 - 19:00 CET: Poster Session (Room 2180)
During this session, the authors will present their posters. We accepted 34 posters this year. A complete list of accepted posters can be found here.
17:25 - 17:50 CET: Lightning Talk: Samsung (Room 3180)
CompactAlign: A Tool for Large-Scale Near-Duplicate Text Alignment
Jakub Radoszewski (Samsung)
Detecting near-duplicate passages between a collection of source documents and a query document constitutes one of the crucial steps of plagiarism detection. This problem is computation-intensive, especially for long documents. Due to the high computation cost, many solutions primarily rely on heuristic rules, such as the "seeding-extension-filtering" pipeline, and involve many hard-to-tune hyper-parameters. We will describe an efficient tool for detecting near-duplicate passages called TxtAlign that was developed recently by Deng et al. (SIGMOD’21, SIGMOD’22) and a new tool, CompactAlign, that can be seen as a compact version of TxtAlign. Both tools use bottom-k sketching for approximate Jaccard index to assess the similarity measure of passages. CompactAlign outperforms TxtAlign in terms of both space usage and query time by orders of magnitude. The presentation will be partially based on joint work with Roberto Grossi, Costas S. Iliopoulos, and Zara Lim.
20:00 - ? CET: Conference Party
After a long day of talks, panels, and sessions, you can relax and network at the conference party. It will be hosted at River Club (https://goo.gl/maps/wNgp3qZxkCdJigor9) with pizza, finger food, drinks, and music included. At the entrance, every participant will receive a wristband.
# Sunday, 6 November 2022
University of Warsaw, Faculty of Mathematics, Informatics and Mechanics
Banacha 2 Street – 02-097 Warsaw
9:30 - 10:00 CET: Registration
10:00 - 11:15 CET: Keynote Lecture: Matthias Bethge (Room 3180)
Brain-like visual representations, decision making, and learning
Representations, decision making, and learning are closely interlinked components of intelligent systems that can be studied both in brains and machines. Since the advent of large datasets and extensive use of machine learning in computer vision, the range of tasks that machines can solve (often better than humans) has been rapidly growing. However, both the type of "knowledge" and the way how it is acquired, still seems fundamentally different between brains and machines, and the relation between neuroscience and machine learning is a tricky one. In this talk, I will present my own approach to research at the intersection of the two fields and argue that the emerging field of lifelong machine learning will be key to bringing the two fields closer together in the future.
10:00 - 11:15 CET: Keynote Lecture: Piotr Miłoś (Room 4420)
Sequential decision-making: its promises and challenges
Intelligence can be defined as the ability to set and achieve goals. To do this, an agent needs to execute a prolonged sequence of actions. Making an intelligent decision on which action to choose requires anticipating its future consequences and an accurate perception of the current state. Both tasks are cognitively demanding, even more so if the actions are to be swiftly rendered under computational or time constraints. The key to acquiring such skills is to learn them, leveraging past experiences and often deep learning techniques. Sequential decision-making techniques can be used to control complex systems. In my talk, I will illustrate what can be achieved with a few examples. Further, I will outline a few promising future research directions and related challenges.
11:30 - 12:20 CET: Sponsor’s Talk: Allegro (Room 3180)
Retrieval At Scale
We will show how we operate retrieval in different areas of the company by applying a diverse set of techniques that allow our models to run big and fast. In the presentation, we will show how one could tackle the problem of retrieval in the e-commerce domain using state-of-the-art deep learning models.
11:30 - 12:20 CET: Contributed Talks Session 8: Science-related ML (Room 4420)
From data acquisition to ML model for predicting outcomes of chemical reactions
Michał Sadowski (Molecule.one)
Outcomes of many chemical reactions are hard to predict for humans and computers alike. Despite significant progress in computational chemistry, we are still far from replacing the experiment with simulations. One of the promising approaches to predicting reaction outcomes is based on deep neural networks. It is however fundamentally limited by available data, which is especially problem in chemistry in which datasets of reactions are not diverse and consist almost only of positive reactions. In order to demonstrate commercially-viable data-driven learning about chemical reactions, we designed a methodology based on the data acquired in our own high-throughput lab. We tested out and used different active learning techniques for choosing which reactions should be performed. We also examined applicability (is input sufficiently similar to the reactions it saw during training) and uncertainty (how certain the model is when determining prediction) metrics for our deep learning models. In order to ensure usability and reliability of our models, we kept chemist-in-the-loop in all phases of the project. In this talk we will discuss challenges in building real world machine learning system from data acquisition, designing ML models to its final evaluation.
Sequential counterfactual prediction to support individualized decisions on treatment initiation
Paweł Morzywołek (Department of Applied Mathematics, Computer Science and Statistics, Ghent University)
Digitalization of patient records and increasing computational power have led to a paradigm shift in the field of medical decision-making from one-size-fits-all interventions to data-driven intervention strategies optimised for particular sub-populations or individuals. In this research, we make use of statistical techniques on counterfactual prediction and heterogeneous treatment effects (CATE) estimation to support patient-centered treatment decisions on the initiation of renal replacement therapy (RRT).
In particular, we applied sequential counterfactual prediction strategies to help physicians in making patient-centered decisions on treatment initiation. For this, we made use of linked databases from the Intensive Care Unit (ICU) and dialysis center of the Ghent University Hospital, containing longitudinal, highly granular records from all adult patients admitted to the ICU from 2013 to 2017. Based on these data, we inferred whether or not to initiate RRT for individual patients suffering acute kidney injury (AKI) on each day of their ICU stay, based on their measurements until that day.
Because RRT is not a feasible treatment option for many patients, standard counterfactual prediction strategies are fallible. In our analysis we therefore study the use of retargeted causal learning techniques, which reweight individuals to generate a population of patients for whom all treatment options are plausible, so that counterfactual prediction becomes feasible. The proposed approach leverages recent developments on orthogonal statistical learning literature, resulting in a class of Neyman-orthogonal weighted loss functions for counterfactual prediction. Our approach is guaranteed to deliver predictions in the support of the counterfactual outcome mean, and delivers oracle behavior due to orthogonality of the loss function (where orthogonality is relative to the infinite-dimensional propensity score and conditional outcome mean).
Our study involves a performance evaluation of state-of-the-art methods from the literature on heterogeneous treatment effects estimation, e.g. DR-Learner and R-Learner, which can be viewed as special cases resulting from the proposed retargeting framework for counterfactual prediction and CATE estimation.
11:30 - 12:20 CET: Contributed Talks Session 9: Probabilistic Neural Networks and Consolidated Learning (Room 5440)
Towards self-certified learning: Probabilistic neural networks trained by PAC-Bayes with Backprop
Omar Rivasplata (University College London)
This talk is based on the content of my paper "Tighter Risk Certificates for Neural Networks" (JMLR, 2021) which reports the results of empirical studies regarding training probabilistic (aka stochastic) neural networks using training objectives derived from PAC-Bayes bounds. In this context, the output of training is a probability distribution over network weights, rather than a fixed setting of the weights. We present two training objectives, used for the first time in connection with training neural networks, which were derived from tight PAC-Bayes bounds. We also re-implement a previously used training objective based on a classical PAC-Bayes bound, to compare the properties of the predictors learned using the different training objectives. We compute risk certificates for the learnt predictors, which are valid on unseen data from the same distribution that generated the training data. The risk certificates are computed using part of the data used to learn the predictors. We further experiment with different types of priors on the weights (both data-free and data-dependent priors) and neural network architectures. Our experiments on MNIST and CIFAR-10 show that our training methods produce stochastic neural network classifiers with competitive test set errors and non-vacuous risk bounds with much tighter values than previous results in the literature. Thus, our results show promise not only to guide the learning algorithm through bounding the risk but also for model selection. These observations suggest that the methods studied in our work might be good candidates for self-certified learning, in the sense of using the whole data set for learning a predictor and certifying its performance with a risk certificate that is valid on unseen data (from the same distribution as the training data) and reasonably tight, so that the certificate value is informative of the value of the true error on unseen data.
Consolidated learning – new approach to domain-specific strategy of hyperparameter optimization
Katarzyna Woźnica (Mi2, Faculty of Mathematics and Information Science, Warsaw University of Technology)
For many machine learning models, a choice of hyperparameters is a crucial step towards achieving high performance. Prevalent meta-learning approaches focus on obtaining good hyperparameters configurations with a limited computational budget for a completely new task based on the results obtained from the prior tasks. In this presentation, I will present a new formulation of the tuning problem, called consolidated learning, more suited to these practical challenges faced by ML developers creating models on similar datasets. In domain-specific ML applications, ones do not solve a single prediction problem, but a whole collection of them, and their data sets are composed of similar variables. In such settings, we are interested in the total optimization time rather than tuning for a single task. Consolidated learning assumes leveraging these relations and supporting meta-learning approaches. Providing the benchmark metaMIMIC, we show that consolidated learning enables an effective hyperparameter transfer even in a model-free optimization strategy. In the talk, we will show that the potential of consolidated learning is considerably greater due to its compatibility with many machine learning application scenarios. We investigate the extension of the application of consolidated learning through integrating diverse data sets using the ontology-based similarity of data sets.
12:20 - 13:30 CET: Lunch Break
13:30 - 14:45 CET: Panel: Industry and Academia (Room 3180)
During the panel, we will investigate the wide range of implications of artificial intelligence (AI), and delve deep into both positive and negative impacts on communities and science areas, and industry.
Panelists: Przemysław Biecek (Warsaw University of Technology and University of Warsaw), Robert Bogucki(deepsense.ai), Pablo Ribalta(Nvidia), Piotr Sankowski (IDEAS NCBR), Stanisław Jastrzębski (molecule.one)
13:30 - 14:45 CET: Keynote Lecture: Farah Shamout (Room 4420)
Towards Improved Health via Computational Precision Medicine: Learning from Diverse and Heterogeneous Data
In the era of big data and artificial intelligence, computational precision medicine can create opportunities that directly contribute to saving patient lives, improving efficiencies of care providers, and decreasing the high burden of disease and socioeconomic costs. In this talk, I will provide a high-level overview of the research conducted at the Clinical Artificial Intelligence Lab at NYU Abu Dhabi. I will then focus on one recent flagship project related to breast cancer detection in ultrasound imaging using deep learning and 5 million ultrasound images. Finally, I will describe the themes of ongoing work, in order to advance patient diagnosis and prognosis and promote multidisciplinary contributions in computing and health sciences in the region.
14:45 - 15:15 CET: Closing remarks (Room 3180)
The closing remarks of the project leaders Alicja Grochocka and Dima Zhylko.