Ray G. Butler
- 3 min read

Automated Data Science at Dagstuhl

For one week, 38 professionals from 12 different countries confined at Schloss Dagstuhl, one of the most important research centers in computer sciences in Europe, to share experiences and learn about the best techniques to automate the advanced data analysis process.

From data wrangling to the effective explainability of the results joining the maximum degree of automatization with the minimum necessary intervention of the expert user.

What is Schloss Dagstuhl?

The history of Schloss Dagstuhl (Dagstuhl's Castle) dates back on 1760, when the then reigning prince Count Anton von Öttingen-Soetern-Hohenbaldern had a family residence constructed in the form of an exquisite manor house at the foot of the old Dagstuhl castle ruins.

Today, Schloss Dagstuhl – Leibniz Center for Informatics is one of the most renowned meeting centers supporting the research in Computer Sciences, especially for their innovative concept of seminars.

Great challenges ...

It only took 10 minutes to me to realise that the week would become immensely beneficial. It was just when the main challenges were presented. Each one of them would represent a great opportunity to improve AutoDiscovery:

Automated data cleaning / wrangling : to what extent it is possible?
Automated data type discovery : how would this improve the automated selection of the proper selection of algorithms?
Optimal software configuration: is it possible to reduce the configuration to zero?
Results explainability: is it really useful to improve the user's engagement? would this transparency create additional problems?
Optimal user interaction / participation: where, when and why is the expert user participation absolutely necessary?

Our contribution

One of the most interesting aspects of these seminars is the active contribution that startups and specialized companies (such as KNIME or SICOS) have, not just to promote their commercial solutions but specially to show practical cases on effective technology transfer to generate value to the market.

In that sense, I performed a technical demonstration of AutoDiscovery applied to uveal melanoma to show how our software actually integrates many of these innovative research outcomes to the benefit of a specific application, such as biomedical research.

Believe me the feedback along these days was amazing!

To top it off, it was a true pleasure for me to leave our hand-written testimony reflected in the awesome Dagstuhl Book of Abstracts...

What did we get from it ...

If anything characterises these Dagstuhl seminars is an efficient combination of casualness, dynamism and creativity. Just to give an example of this: the agenda of the seminar were built in collaboration between all the participants some weeks before the event throught a shared onlin wiki. Moreover, we had many spaces to re-arrange contents depending on what we all considered best at every moment!

All this helped generating a climate of mutual trust which ultimately led to a high level of productivity.

To put it in quite practical terms, we could establish the basis for future cooperation with some of the research groups to start developing a range of new features in AutoDiscovery, always aligned with its main goal: automating the scientific discovery process, making it even more efficient.

Among these opportunities, I'd like to highlight the following...

Automated data wrangling: Monte Carlo Tree Search for Algorithm Configuration (MOSAIC)

This is one of the main research lines of Dr. Michèle Sebag (Paris-Sud Univ.) that would help AutoDiscovery in cleaning the input data which, as we already know, is an obstacle to any data analysis process.

For those of you who are more daring, this is the link to the open source code repository at GitHub.

Automated selection of statistical methods: Data Type Discovery

As you already know, AutoDiscovery is able to select a proper statistical algorithm in each case but, what if it was able to improve that process thanks to a better detection of the type of data stored in your files?

The research works of Rich Caruana / Chris Williams (The Alan Turing Institute) and Isabel Valera (Max Planck Institute for Intelligent Systems) could be extremely useful in this sense!

Actionability of the results: Explainability

Certainly, this was the buzzword along all sessions: until which point it is necessary to explain the results generated by a data analysis software? in which terms should them be done to make it really effective?

We're currently working in all this so ... stay tuned! :)

Bis bald, Dagstuhl!

I can assure you that this experience has made me an activist and fan of Dagstuhl seminars. For those of us who are within this scope, their agenda is simply spectacular!.

Finally, and I will end here, I'd like to thank personally and publicly to the Dagstuhl Scientific Committee and the organizers of this seminar for their efficiency and especially their support to make it easier for me and Butler Scientifics to attend this event.

Bis bald, Dagstuhl!