Group Activities

Szymon Jaroszewicz and his collaborators conduct research on generalizations of well-established methods of machine learning to the case of uplift modelling which concerns modelling of individual treatment effects (e.g., marketing campaign or medical therapy) by taking into account a control group not subjected to the treatment. Specially tailored methods for such cases include adaptations of the Support Vector Machine methods and the Committees of Classifiers approaches. The theory of linear models for the uplift case is also being developed.

Stan Matwin’s research is in the areas of text mining, data privacy, and applications of machine learning. In text mining, he is working on text classification, sentiment analysis, and deep learning methods for the text embeddings representations. In data privacy, he is focused on data summary and synthetic data approaches to protect sensitive data in a data-sharing and publishing environment. In applications, his interests are in learning knowledge from traces of moving objects, e.g., GPS ship trajectories collected through the global AIS system. Finally, he has recently started a collaborative project in applying advanced Machine Learning methods on multi-modal, longitudinal brain imaging data.

The domains researched by Łukasz Dębowski include information theoretic and probabilistic modelling of the natural language. Objects of a special interest here are discrete stochastic processes with specific types of dependence which are quantified, e.g., by the rate of increase of the block entropy and the length of the maximal repetition. Such processes exhibit certain statistical properties which are close to those found in natural language production, e.g., they satisfy Hilberg's hypothesis about a power law increase of mutual information.

The research direction pursued by Paweł Teisseyre is classification for multivariate response variables. An intensely studied special case is so-called multi-label classification when the response is a multivariate variable with binary coordinates. Of a particular interest is construction of effective methods for high-dimensional data when high-dimensionality refers to large number of potential predictors as well as to dimensionality of the response. The aim of the research is to develop algorithms (as well as to analyse their performance theoretically) for variable selection and prediction in this set-up. They rely among others on regularization methods adapted for such models.

Variable selection is also studied for high-dimensional generalized linear and additive models by Jan Mielniczuk and Mariusz Kubkowski. Here, of interest are two- and multi-step procedures in which selection is executed based on information criteria after performing preliminary screening and/or ranking of the variables pertaining to the values of their importance measures. The main results concern selection consistency when the assumed model for data at hand is correctly specified. The analogous problem is also studied for the misspecification case with the concept of selection consistency is suitably modified. Moreover, variable approaches based on information theoretic measures such as mutual information has been studied, in particular methods taking into account higher order interactions.

Research concerning the decision making (on the production, stock and distribution schedule so as to have minimal costs) is being pursued by the group's member Stanisław Bylka. Optimal solution techniques, algorithms and heuristics are devoted to problems of planning and forecast horizons in dynamic models with uncertain information about future.

The Group maintains strong links with the Faculty of Mathematics and Information Sciences of the Warsaw University of Technology where several of its members teach courses and pursue joint research. Subjects of Ph.D. theses are being offered within the group's reasearch interests.