BALab yearly reports

Members

Faculty Members

Damianos Chatziantoniou

Maria Kechagia

Dimitris Mitropoulos

Panos (Panagiotis) Louridas

Diomidis Spinellis

Senior Researchers

Vasiliki Efstathiou

Stefanos Georgiou

Thodoris Sotiropoulos

Marios Fragkoulis

Associate Researchers

Stefanos Chaliasos

Antonios Gkortzis

Tushar Sharma

Konstantina Dritsa

Researchers

Alexandros Lattas

Aris Pattakos

Ioannis Batas

Efstathia Chioteli

Nikolas Doureliadis

Vitalis Salis

Christos Oikonomou

Overview in numbers

New Publications	Number
Monographs and Edited Volumes	1
PhD Theses	2
Journal Articles	10
Book Chapters	0
Conference Publications	6
Technical Reports	2
White Papers	0
Magazine Articles	1
Working Papers	0
Datasets	0
Total New Publications	22
Projects
New Projects	1
Ongoing Projects	1
Completed Projects	1
Members
Faculty Members	5
Senior Researchers	4
Associate Researchers	4
Researchers	7
Total Members	20
New Members	6
PhDs
Ongoing PhDs	5
Completed PhDs	2
New Seminars
New Seminars	12

New Publications

Monographs and Edited Volumes

Panos Louridas. Real-World Algorithms: A Beginner's Guide. MIT Press, Cambridge, MA, 2017. ISBN 978-0-262-03570-5.

PhD Theses

Maria Kechagia. Tools and Techniques for Building Reliable Application Programming Interfaces. PhD thesis, Athens University of Economics and Business, Athens, Greece, 2017.

Marios Fragkoulis. Technologies for Main Memory Data Analysis. PhD thesis, Athens University of Economics and Business, Athens, Greece, 2017.

Journal Articles

Diomidis Spinellis and Marios Fragkoulis. Extending Unix pipelines to DAGs. IEEE Transactions on Computers, 66(9):1547–1561, September 2017.

Diomidis Spinellis. The social responsibility of software development. IEEE Software, 34(2):4–6, March 2017.

Diomidis Spinellis. The elusiveness of smart healthcare. IEEE Software, 34(6):4–6, November 2017.

Diomidis Spinellis. State-of-the-art software testing. IEEE Software, 34(5):4–6, September 2017.

Diomidis Spinellis. Software-engineering the internet of things. IEEE Software, 34(1):4–6, January 2017.

Diomidis Spinellis. Software reliability redux. IEEE Software, 34(4):4–7, July 2017. Republished in Computing Edge, 3(12):20–23, December 2017.

Diomidis Spinellis. How abundance changes software engineering. IEEE Software, 34(3):4–7, May 2017. Republished in Computing Edge, 3(8):46–49, August 2017.

Diomidis Spinellis. A repository of Unix history and evolution. Empirical Software Engineering, 2017.

Dimitris Mitropoulos and Diomidis Spinellis. Fatal injection: a survey of modern code injection attack countermeasures. PeerJ Computer Science, pages e136, November 2017.

Les Hatton, Diomidis Spinellis, and Michiel van Genuchten. The long-term growth rate of evolving software: empirical results and implications. Journal of Software: Evolution and Process, 2017.

Conference Publications

Tushar Sharma, Marios Fragkoulis, and Diomidis Spinellis. House of cards: code smells in open-source c# repositories. In ESEM 2017. November 2017.

Maria Kechagia and Diomidis Spinellis. Type checking for reliable APIs. In Proceedings of the 1st International Workshop on API Usage and Evolution, WAPI '17, 15–18. Piscataway, NJ, USA, May 2017. IEEE Press.

Maria Kechagia, Tushar Sharma, and Diomidis Spinellis. Towards a context dependent Java exceptions hierarchy. In ICSE '17: Poster Track Session, 347–349. IEEE Press, 2017.

Georgios Gousios and Diomidis Spinellis. Mining software engineering data from GitHub. In Proceedings of the 39th International Conference on Software Engineering Companion, ICSE-C '17, 501–502. Piscataway, NJ, USA, May 2017. IEEE Press. Technical Briefing.

Stefanos Georgiou, Maria Kechagia, and Diomidis Spinellis. Analyzing programming languages' energy consumption: an empirical study. In PCI 2017: Proceedings of the 21st Pan-Hellenic Conference on Informatics, ACM International Conference Proceeding Series. ACM Press, September 2017.

Alessandra Bagnato, Konstantinos Barmpis, Nik Bessis, Juri Di Rocco, Davide Di Ruscio, Gergely Tamás, Scott Hansen, Dimitrios S. Kolovos, Philippe Krief, Ioannis Korkontzelos, Stéphane Laurière, Jose Manrique Lopez de la Fuente, Pedro Maló, Richard F. Paige, Diomidis Spinellis, Cedric Thomas, and Jurgen Vinju. Developer-centric knowledge mining from large open-source software repositories (CROSSMINER). In STAF 2017: Software Technologies: Applications and Foundations. July 2017. Projects Showcase track. Lecture Notes in Computer Science 10748.

Technical Reports

Diomidis Spinellis. Research priorities in the area of software technologies. Available online \XEurl https://ec.europa.eu/digital-single-market/en/news/future-trends-and-research-priorities-area-software-technologies, March 2017. A report prepared for EU DG Communications Networks, Content and Technology.

Theofilos Petsios, Adrian Tang, Dimitris Mitropoulos, Salvatore J. Stolfo, Angelos D. Keromytis, and Suman Jana. Tug-of-war: observations on unified content handling. Technical Report, CoRR abs/1708.09334, 2017.

Magazine Articles

Dimitris Mitropoulos. How 1 million app calls can tell you a bit about malware. XRDS: Crossroads, The ACM Magazine for Students, 24(1):17–19, 2017.

Projects

New Projects

CROSSMINER - Developer-Centric Knowledge Mining from Large Open-Source Software Repositories

Ongoing Projects

SENECA - Software ENgineering in Enterprise Cloud Applications

Completed Projects

Action II - The "Meta-Life" of JavaScript

New Members

Stefanos Chaliasos

Aris Pattakos

Ioannis Batas

Efstathia Chioteli

Theodore Stassinopoulos (PhD student)

Vasiliki Efstathiou

Nikolas Doureliadis

Ongoing PhDs

Vaggelis Atlidakis Topic: Structure and Feedback in Cloud Service API Fuzzing

Theodore Stassinopoulos Topic: Data and Quality Metrics of System Configuration Code

Antonios Gkortzis Topic: Secure Systems on Cloud Computing Infrastructures

Stefanos Georgiou Topic: Energy and Run-Time Performance Practices in Software Engineering

Tushar Sharma Topic: Software Engineering in Enterprise Cloud Applications

Completed PhDs

Marios Fragkoulis Topic: Technologies for main memory data analysis

Maria Kechagia Topic: Tools and Techniques for Reliable Application Programming Interfaces

Seminars

Does your configuration code smell?

Date: 25 January 2017
Presenter: Tushar Sharma
Abstract

The wide adoption of configuration management and the increasing size and complexity of the associated code, prompt for assessing, maintaining, and improving the configuration code's quality. This talk introduces configuration smells, their types with various examples, tools to detect them, and suggestions to refactor them.

The wide adoption of configuration management and the increasing size and complexity of the associated code, prompt for assessing, maintaining, and improving the configuration code's quality. We can leverage traditional software engineering knowledge and best practices to develop and maintain high quality configuration code. This talk brings the smell metaphor to configuration domain. This talk introduces configuration smells, their types with various examples, tools to detect them, and suggestions to refactor them.

From pipelines to graphs: Escape the tyranny of the shell’s linear pipelines with dgsh

Date: 25 January 2017
Presenter: Diomidis Spinellis
Abstract

The Unix dgsh shell provides an expressive way to construct sophisticated and efficient data processing pipelines using standard Unix tools, as well as third-party and custom-built components. Dgsh allows the specification of pipelines of non-uniform non-linear operations. For example tee can feed three processes whose output can then be collected by paste. The pipelines form a directed acyclic process graph, which is typically executed by multiple processor cores, thus increasing the task's processing throughput. We will see how to use dgsh in practice through a number of general data processing and domain-specific examples, and how to adapt tools for use with dgsh.

Software Engineering Research at The University of Alberta

Date: 08 February 2017
Presenter: Eleni Stroulia,
Abstract

My team investigates two types of problems, aiming first, to support software developers in their activities, and second, to design and develop software platforms to address specific service-delivery challenges in domains such as healthcare and education. In this seminar, I will report on three ongoing projects best exemplifying these two objectives. In the context of the SAVI project, we have been designing methods for supporting the migration of traditional relational web-based application to the cloud, in order to enable large-scale analytics. In the context of the LRA project, we are examining the use of linked-data formalisms to enable large-scale REST service federation. Finally, in the Smart-Condo project we are developing a hardware-software platform for unobtrusively recognizing the activities of people at home, in order to support the evaluation of their physical and cognitive function. We believe that the broad scope of our research agenda enriches the formulation of the technical problems we address and enhances the validity of our results.

Technologies for main memory data analysis

Date: 02 March 2017
Presenter: Marios Fragkoulis
Abstract

The absence of suitable analytical tools hinders knowledge extraction in cases of software applications that do not need the support of a database system. Some examples are applications whose data have a complex structure and are often stored in files, eg scientific applications in areas such as biology, and applications that do not maintain permanent data, such as data visualization applications and diagnostic tools. Databases offer widely used and recognized query interfaces, but applications that do not need the services of a database should not resort to this solution only to satisfy the need to analyze their data. The thesis studies the methods and technologies for supporting queries on main memory data and how the widespread architecture of software systems currently affects technologies. Based on the findings from the literature we develop a method and a technology to perform interactive queries on data that reside in main memory. After an overview of the programming languages that fit the data analysis we choose SQL, the standard data manipulation language for decades. The method we develop represents programming data structures in relational terms as requires SQL. Our method replaces the associations between structures with relationships between relational representations. The result is a virtual relational schema of the programming data model, which we call relational representation. The implementation, which we carried out in C/C++, includes a domain specific language for describing relational representations, a compiler that generates the source code of the relational interface to the programming data structures given a relational specification, and the implementation of SQLite’s virtual table API. The overall evaluation of our approach involves its integration in three C++ software applications, in the Linux kernel, and in Valgrind, where we also perform a user study with students. We find a) that our approach exhibits greater expressiveness than C++ queries, b) real problems in the Linux kernel, c) opportunities for space and performance optimizations in applications instrumented by Valgrind, and d) that it took users less time to draft queries with SQL than with Python.

Metrics of successful websites and companies

Date: 17 March 2017
Presenter: Danai Avratoglou
Abstract

In the global on line environment, comprehending the practices of websites adoptions by enterprises is becoming increasingly important. This study investigates the correlation of the implementation of specific websites metrics to a company's home page, with the revenues of U.S.A.'s most successful companies. The metrics that are examined are related to the website's quality and usability as well as to the user's satisfaction. The companies that are under examination are taken from the Fortune 500 list of 2016. The results indicate that regardless the industry that a firm belongs to there are specific metrics that are associated with the success of an enterprise (in terms of its revenue) according to their implementation or the avoidance of their use.

Type Checking for Reliable APIs

Date: 12 April 2017
Presenter: Maria Kechagia
Abstract

We propose to configure at compile time the checking associated with Application Programming Interfaces’ methods that can receive possibly malformed values (e.g. erroneous user inputs and problematic retrieved records from databases) and thus cause application execution failures. To achieve this, we design a type system for implementing a pluggable checker on the Java’s compiler and find at compile time insufficient checking bugs that can lead to application crashes due to malformed inputs. Our goal is to wrap methods when they receive external inputs so that the former generate checked instead of unchecked exceptions. We believe that our approach can improve Java developers’ productivity, by using exception handling only when it is required, and ensure client applications’ stability. We want to evaluate our checker by using it to verify the source code of Java projects from the Apache ecosystem. Also, we want to analyze stack traces to validate the identified failures by our checker.

YALCOM - Yet Another LCOM Metric

Date: 12 April 2017
Presenter: Tushar Sharma
Abstract

High cohesion is a desired property of object-oriented abstractions. LCOM is a metric that has been used traditionally to measure the degree of lack of cohesion among methods. Software engineering community has proposed many variants of this metric. However, these variants exhibit deficiencies to correctly represent the degree of lack of cohesion in certain cases. In this presentation, I would like to highlight these deficiencies and propose a new method to compute LCOM.

Mining Natural Language in Code Review Comments

Date: 15 June 2017
Presenter: Vasiliki Efstathiou
Abstract

The growing availability of open software repositories has advanced research on mining software engineering data. Besides code-specic data, an interest towards developer communication data has emerged aiming to uncover features that impact on the software development lifecycle. Code reviews in particular, provide rich textual communicative information directly coupled with edits in source code. This talk will discuss possible unexplored directions towards analyzing natural language in code review comments, by adapting simple ideas from the community of linguistics. The proposed research aims to identify natural language patterns that imply higher level semantics, related to the underlying reasoning and intentions (such as necessity and probability), of the message conveyed in the comment. The ultimate goal in this context, is to uncover associations between the high-level semantics of comments and the revisions they suggest, and potential effects in comment usefulness.

Analyzing Programming Languages' Energy Consumption: An Empirical Study

Date: 27 September 2017
Presenter: Stefanos Georgiou
Abstract

Motivation: The energy efficiency of it-related products, from the software perspective, has gained vast popularity the recent years and paved a new emerging research field. However, there is limited number of research works regarding the energy consumption of relatively small programming tasks. This knowledge is critical to be known especially in cases where millions of small tasks are running in parallel on multiple devices all around the globe. Goal: In this preliminary study, we aim to identify energy implications of small, independent tasks developed in different programming languages; compiled, semi-compiled, and interpreted ones. Method: To achieve our purpose, we collected, refined, compared, and analyzed a number of implemented tasks from Rosetta Code, that is a publicly available Repository for programming chrestomathy. Results: Our analysis shows that among compiled programming languages such as C, C++, Java, and Go offer the highest energy efficiency for all of our tested tasks compared to C#, vb.net, and Rust. Regarding interpreted programming languages php, Ruby, and JavaScript exhibit the most energy savings compared to Swift, R, Perl, and Python.

House of Cards: Code Smells in Open-source C# Repositories

Date: 01 November 2017
Presenter: Tushar Sharma
Abstract

Many studies have explored the characteristics of code smells and analyzed their effects on the software's quality. I would like to present our empirical study on smells that examines inter-category and intra-category correlation between design and implementation smells. The study mines 19 design smells and 11 implementation smells in 1988 C# repositories containing more than 49 million lines of code and presents our observations based on the collected data.

The "Meta-Life" of JavaScript

Date: 22 November 2017
Presenter: Vitalis Salis
Abstract

JavaScript is one of the most important elements of the web. It is being used by the majority of websites and it is supported by all modern browsers. We present the rst large-scale study of client- side JavaScript code evolution. Speci cally, we have been collecting and storing JavaScript code from Alexa’s top 10000 websites on a daily basis (∼7.5 gb per day), for nine consecutive months. We have analyzed the resulting dataset to study how often developers deploy new scripts on the server-side. Our results indicate that the lifespan of scripts is quite short. That is, ve days for external scripts and one day for internal JavaScript code. In addition, we have examined how common JavaScript code reuse and especially the reliance to third-party libraries. Furthermore, we observed how software bugs evolve over time. To do so, we employed well-known static analysis tools to identify potential software bugs on the various scripts and then observed if they increase or decrease over time.

Mining Software Repositories and Search Based Software Engineering Tools and Infrastructures

Date: 22 December 2017
Presenter: Diomidis Spinellis
Abstract

Mining software repositories and search based software engineering typically benefit from tools aiding text processing, interaction data collection, evolutionary computation, testing, code analysis, and repository analysis. The corresponding data come from data collections, source and binary code repositories, fault and failure datasets, and process details. Key issues in research in this area concern the following of best practices and reproducibility. We will also see key readings and challenges for the future.

This is work jointly performed with Tse-Hsun (Peter) Chen, Yasutaka Kamei, Masanari Kondo, Neil Walkinshaw, Xin Xia, and Shin Yoo based on an NII Shonan Meeting working group formed to examine this topic.

Note: Data before 2017 may refer to grandparented work conducted by BALab's members at its progenitor laboratory, ISTLab.

Yearly Report 2017