Eleni attended the very first ACM Europe Summer School on Data Science that took place in Athens in July. The school covered a broad range of subjects in Big Data and Data Science in the form of six one-day-long courses. Each course consisted of formal lectures in the morning, and practical hands-on exercises in the afternoon. Below is a short overview of the different topics that were covered.
Spatio-temporal analytics, urban analytics: This course focused on how to use data to make cities more efficient and sustainable, and improve the lives of their residents. It identified several challenges related to the analysis of urban datasets, such as their heterogeneity, their spatio-temporal nature and data quality issues. An interesting idea that was presented in this course was using data to explain data by leveraging the abundance of datasets and their relations.
Visual analytics: This course focused on how humans and computers cooperate through graphics to process data. It presented guidelines for considering data systematically from multiple perspectives and demonstrated several workflows of analysis of real data sets on human mobility, city traffic, aviation, animal movement, and football.
Data science ethics and privacy-preserving analytics:Data-driven machine learning algorithms and automated decision-making systems are often “black boxes”, mapping individuals into a class or a ranking value without exposing the reasons. This course discussed how possible social biases and prejudices hidden in the training data can be learned by the algorithms, leading to discriminatory decisions or unfair actions. It also highlighted the importance of legal compliance and data ethics technologies and safeguards to protect privacy and anonymity, so that we can seize the opportunities of data science while controlling the associated risks.
Social network analytics: This course was an introduction to the emerging field of study that analyzes complex networks and addresses fundamental questions about how the social, economic, and technological worlds are connected. It mainly focused on the structure and function of the social network, describing how real networks differ from random networks, and introduced the main models of network science.
Streaming analytics: This course provided an overview of key algorithmic tools for supporting effective, real-time analytics over streaming data, focusing mainly on small-space sketch synopses. The course addressed both centralized and distributed settings. For the latter, it introduced a geometric approach that allows to decompose the monitoring task into local constraints on streams.
Text analytics: This course provided an overview of important applications of mining text conversations and examined three topics in this area: topic modeling; natural language summarization; and extraction of rhetorical structure and relationships in text.
Health analytics: This course described some of the challenges involved in biomarker discovery, such as the lack of quality assessment tools for data generated by ever-evolving genomics platforms, and presented some data cleaning and pre-processing techniques.
User analytics for recommender systems: This course presented methods for deriving knowledge from user actions and generating successful recommendations for the users. It presented some real-life recommender systems, along with measures of “success” in recommender systems.
Keynotes: In addition to the courses, the school offered two keynote lectures. Joseph Sifakis, laureate of Turing Award in 2007, gave a keynote about the nature and the importance of computing. This keynote was particularly interesting. Sifakis defined knowledge as “truthful information that embedded into the right network of conceptual interrelations can be used to understand a subject or solve a problem” and characterized computing as a domain of knowledge. He then talked about two methodological principles that are shared across all domains of knowledge. First, a domain of knowledge uses abstraction hierarchies to cope with scale problems. Similarly to how the physics world is abstracted in different layers, ranging from the whole universe down to individual particles, the computing world hierarchy ranges from the cyber-world and networked systems to logical gates and transistors. Second, a domain of knowledge uses modularity to deal with complexity. However, there are some inherent limitations. When the components at one layer are combined, new properties emerge. As a result, understanding what a computing system does in a holistic manner by taking into account properties of its hardware is difficult. Two particular topics that Sifakis discussed in his talk are: 1) how computing can be linked to physics, and; 2) what is the relationship between artificial intelligence and natural intelligence. Computing and Physical Sciences share a common objective: the study of dynamic systems. Physical systems are driven by uniform laws while computing systems are driven by laws enforced by their designers. A big difference is that physical systems are inherently synchronous. Another important difference lies in the discrete nature of Computing that limits its ability to fully capture physical phenomena. Comparing artificial intelligence with conscious thinking, we can identify some common characteristics such as the use of memory and languages. However, computers have the advantage of computing extremely much faster and with extremely much higher precision and as a result they can beat humans in very specialized tasks. Humans on the other side have a semantic model of the world and they also possess consciousness: they see themselves acting within this semantic model and make choices while evaluating the consequences of their actions. If we want computers to exhibit the same kind of behavior as humans, we should equip them with semantic models of the world, which is a very hard problem. We should also maintain our critical thinking when it comes to the evaluation of the potential threats of artificial intelligence to the mankind, since “worrying about machines that are too smart, distracts us from the real and present threat from machines that are too dumb”. For anyone who might be interested to know more about Sifakis’ ideas, there is a youtube video of another version of this talk given at Universitat Politècnica de Catalunya (https://www.youtube.com/watch?v=soTCAqvQopM).
Professor Dame Wendy Hall gave the second keynote, introducing Web Science: the study of the Web as a socio-technical system. In the second part of her talk, she introduced web observatories that allow researchers around the world to collaborate by sharing both data and tools in standard formats. Here https://www.youtube.com/watch?v=PrDckcXCP8E is a youtube video of another version of this talk given at Rutgers School of Communication and Information.
Finally, P. Fatourou, member of ACM Europe Council, and G. Eleftherakis, the chair of ACM’s Council of European Chapter Leaders, introduced ACM and ACM chapters to the summer school participants. P. Fatourou also talked about the different ways in which ACM supports women in computing. Her talk triggered several discussions among the participants regarding whether such special support to women successfully contributes to promoting gender equality in computer science.
Concluding remarks: Overall, the summer school was a great educational experience. Moreover, it was a great platform for meeting fellow researchers and exchanging ideas, as it offered many opportunities for social interaction not only with the other participants, but also with the instructors and the organizers.
The 1st ACM Europe Summer School on Data Science was successfully held in Athens
Text by Eleni Tzirita-Zacharatou