Technical News

Jun 03 2019

Production Traffic For Testing


Publication: Information and Software Technology
Authors: Jeff Anderson, Maral Azizi, Saeed Salem, and Hyunsook Do

Title: On the Use of Usage Patterns from Telemetry Data for Test Case Prioritization

In an original work about Production Traffic for Testing, Jeff Anderson, Maral Azizi, Saeed Salem, and Hyunsook Do present a new opportunity in the area of regression testing techniques. Here is the abstract:

Context: Modern applications contain pervasive telemetry to ensure reliability and enable monitoring and diagnosis. This presents a new opportunity in the area of regression testing techniques, as we now have the ability to consider usage profiles of the software when making decisions on test execution. Objective: The results of our prior work on test prioritization using telemetry data showed improvement rate on test suite reduction, and test execution time. The objective of this paper is to further investigate this approach and apply prioritization based on multiple prioritization algorithms in an enterprise level cloud application as well as open source projects. We aim to provide an effective prioritization scheme that practitioners can implement with minimum effort. The other objective is to compare the results and the benefits of this technique factors with code coverage-based prioritization approaches, which is the most commonly used test prioritization technique. 

Method: We introduce a method for identifying usage patterns based on telemetry, which we refer to as “telemetry fingerprinting.” Through the use of various
algorithms to compute fingerprints, we conduct empirical studies on multiple software products to show that telemetry fingerprinting can be used to more effectively prioritize regression tests. 

Results: Our experimental results show that the proposed techniques were able to reduce over 30 percent in regression test suite run times compared to the coverage-based prioritization technique in detecting discoverable faults. Further, the results indicate that fingerprints are effective in identifying usage patterns, and that the fingerprints can be applied to improve regression testing techniques.

Conclusion: In this research, we introduce the concept of fingerprinting software usage patterns through telemetry. We provide various algorithms to compute fingerprints and conduct empirical studies that show that fingerprints are effective in identifying distinct usage patterns. By applying these techniques, we believe that regression testing techniques can be improved beyond the current state-of-the-art, yielding additional cost and quality benefits.

Apr 02 2019

Search-Based Test Case Implantation for Testing Untested Configurations

Publication: Information and Software Technology
Authors: Dipesh Pradhan, Shuai Wang, Tao Yue, Shaukat Ali, Marius Liaaen

Title: Search-Based Test Case Implantation for Testing Untested Configurations

Modern large-scale software systems are highly configurable, and thus require a large number of test cases to be implemented and revised for testing a variety of system configurations. This makes testing highly configurable systems very expensive and time-consuming.

Driven by our industrial collaboration with a video conferencing company, we aim to automatically analyze and implant existing test cases (i.e., an original test suite) to test the untested configurations.

We propose a search-based test case implantation approach (named as SBI) consisting of two key components: 1) Test case analyzer that statically analyzes each test case in the original test suite to obtain the progr am dependence graph for test case statements and 2) Test case implanter that uses multi-objective search to select suitable test cases for implantation using three operators, i.e., selection, crossover, and mutation (at the test suite level) and implants the selected test cases using a mutation operator at the test case level including three operations (i.e., addition, modification, and deletion).

We empirically evaluated SBI with an industrial case study and an open source case study by comparing the implanted test suites produced by three variants of SBI with the original test suite using evaluation metrics such as statement coverage (SC), branch coverage (BC), and mutation score (MS). Results show that for both the case studies, the test suites implanted by the three variants of SBI performed significantly better than the original test suites. The best variant of SBI achieved on average 19.3% higher coverage of configuration variable values for both the case studies. Moreover, for the open source case study, the best variant of SBI managed to improve SC, BC, and MS with 5.0%, 7.9%, and 3.2%, respectively.

SBI can be applied to automatically implant a test suite with the aim of testing untested configurations and thus achieving higher configuration coverage.

Mar 22 2019

Configuration Tests: the JHipster web development stack use case


Title: Test them all, is it worth it? Assessing configuration sampling on the JHipster Web development stack
Authors: Axel Halin, Alexandre Nuttinck, Mathieu Acher, Xavier Devroey, Gilles Perrouin, Benoit Baudry
Publication: Springer Empirical Software Engineering, April 2019, Vol24

A group of software researchers, partially involved in the EU-funded STAMP project, has published interesting results based on configuration tests on JHipster, a popular open source application generator to create Spring Boot and Angular/React projects. 

Abstract: Many approaches for testing configurable software systems start from the same assumption: it is impossible to test all configurations.
This motivated the definition of variability-aware abstractions and sampling techniques to cope with large configuration spaces. Yet, there is no theoretical barrier that prevents the exhaustive testing of all configurations by simply enumerating them if the effort required to do so remains acceptable. Not only this: we believe there is a lot to be learned by systematically and exhaustively testing a configurable system. In this case study, we report on the first ever endeavour to test all possible configurations of the industry-strength, open source configurable software system JHipster, a popular code generator for web applications. We built a testing scaffold for the 26,000+ configurations of JHipster using a cluster of 80 machines during 4 nights for a total of 4,376 hours (182 days) CPU time. We find that 35.70% configurations fail and we identify the feature interactions that cause the errors. We show that sampling strategies (like dissimilarity and 2-wise):
(1) are more effective to find faults than the 12 default configurations used in the JHipster continuous integration;
(2) can be too costly and exceed the available testing budget. We cross this quantitative analysis with the qualitative assessment of JHipster’s lead developers.

Read the full article on Springer website

Feb 27 2019

Five Machine Learning Usages in Software Testing

According to the Reqtest team, machine learning is a hot trend this year, bringing revolutionary changes in workflows and processes.
In software testing, machine learning can be used for:

  • Test suite optimization, to identify redundant and unique test cases.
  • Predictive analytics, to predict the key parameters of software testing processes on the basis of historical data.
  • Log analytics, to identify the tests cases which need to be executed automatically.
  • Traceability, extracting keywords from the Requirements Traceability Matrix (RTM) to achieve test coverage.
  • Defect analytics, to identify high-risk areas of the application for the prioritization of regression test cases.

Read nine more recent testing trends from the Reqtest editors.

Feb 25 2019

Maven Central Top Libraries

Elastest Architecture

Analysing the Maven Central Repository during the second half of 2018, a group of scientific researchers led by Benoit Baudry, Professor in Software Technology at the KTH Royal Institute of Technology, reveals that Maven Central contains more than 2.5 million artifacts, a real treasure of extraordinary software development. More than 17% of the libraries have several versions that are actively used by a large number of clients.
However, 1.3 million dependencies declared are actually not used. Also, a vast majority of APIs can be reduced to a small, compact core and still serve most of their clients. 

For a more accurate exploration of the Maven Central ecosystem, read Benoit Baudry's article posted on
A journey at the heart of 2.4 million Maven artifacts

Feb 11 2019

Global vs Local Coverage


On the XWiki project, we've been pursuing a strategy of failing our Maven build automatically whenever the test coverage of each Maven module is below a threshold indicated in the pom.xml of that module. We're using Jacoco to measure this local coverage.

We've been doing this for over 6 years now and we've been generally happy about it. This has allowed us to raise the global test coverage of XWiki by a few percent every year.

More recently, I joined the STAMP European Research Project and one our KPIs is the global coverage, so I got curious and wanted to look at precisely how much we're winning every year. 

I realized that, even though we've been generally increasing our global coverage (computed using Clover), there are times when we actually reduce it or increase very little, even though at the local level all modules increase their local coverage...

Read Vincent Massol, XWiki CTO, full post and learnings

Dec 21 2018

Short circuiting method executions to assess test quality

Today, a Medium article by Benoit Baudry, Professor in Software Technology at KTH and STAMP project coordinator, shares interesting results about Descartes mutation testing tool. This software can automatically short-circuit covered methods and determine a list of pseudo-tested methods in Java projects. Experimenting Descartes over 21 open source Java projects, a total of 28K+ methods could be analyzed, with three main results:

  • short circuiting the complete execution of methods provides valuable feedback to developers. The developers have clear goal to write a test: to make this method not pseudo-tested anymore. Developers are more comfortable reasoning at the granularity of a method than at the statement level (fine grained traditional mutation testing).
  • short circuiting methods has revealed the presence of pseudo-tested methods in all the projects that we have analyzed, even the ones with very high code coverage. Development teams of all Java projects can benefit from this type of analysis to assess their test suites and improve them.
  • interviews with developers reveal that some pseudo-tested methods actually reveal major weaknesses in the test suite. We have collected empirical evidence of test suites fixed after running a short-circuiting experiment.

Dec 17 2018

The ElasTest testing platform targets large distributed systems

Elastest Architecture

A recent Elastest white paper raises issues of concern such as System Under Test deployments, IoT testing services and root cause analysis. The H2020 Elastest project is a STAMP related project. We are currently exploring integration opportunities between our tools.

With four demonstrators covering different application domains such as 5G networks, web applications, WebRTC, and IoT, the ElasTest platform is designed to improve the efficiency and effectiveness of the testing process, and ultimately to improve the quality of large software systems deployed in the Cloud. Testers and developers of large distributed systems will learn more about the Elastest platform architecture and benefits in the Elastest white paper

Dec 07 2018

Commit Assistant: the Ubisoft bug detection bot


What if a development bot could help you detect software bugs automatically, then provide probable causes for each issue along with fixes suggestions? Identifying patterns in past bugs to better intercept new bugs might save significant debugging time and cost to software development teams.

At Ubisoft La Forge Research Lab in Montreal, Technical Architect Mathieu Nayrolles collaborates on such a learning bot with Concordia University expert Abdelwahab Hamou-Lhadj at the Electrical and Computer Engineering Department. Using the innovative CLEVER approach, they can detect commits that are likely to introduce bugs, with an average of 79.10% precision and a 65.61% recall. 

CLEVER combines code metrics, clone detection techniques, and project dependency analysis to detect risky commits within and across projects. CLEVER operates at commit-time, before the commits reach the central code repository. Also, because it relies on code comparison, CLEVER does not only detect risky commits but also makes recommendations to developers on how to fix them. 

You can find more details on the risky commit detector online:

Nov 16 2018

DSpot Study on Ten Mature Open Source Projects

Improving existing Java test cases and give the improvements back to developers as patches or pull requests. Indeed, the idea is attractive. But is it yet an efficient and proven code optimisation process? 

A scientific paper from Benjamin Danglot (Inria), co-signed with three more STAMP project contributors, tickles us to think so.
The PhD candidate provides a thorough study on ten notable and mature open source projects where all test methods from 40 unit test classes have been amplified by DSpot. This proves the STAMP tool ability to strengthen real unit test classes in Java. 

More test automation will be offered in the future, requiring more understanding and comparison of test purposes. Moreover, DSpot can be placed in a continuous integration service (CI) where test classes would be amplified on-the-fly. This would greatly improve the industrial applicability of this software engineering research, conclude the authors.

Nov 13 2018

How software code perturbation can strengthen its reliability?

A recent IEEE blog article, by a group of researchers involved in the STAMP project reveals that, facing state perturbations, software might be more stable and reliable than expected.


This fascinating phenomenon is called "Correctness Attraction" in reference to the concepts of “stable equilibrium” and “attraction basin” in physics. It underlines input points for which a software system eventually reaches the same fixed and correct point according to a perturbation model.

Moreover, this could lead to new “bug absorbing zones” in software applications where software engineering techniques would improve the correctness attraction.

Discover the reasons behind correctness attraction in this blog post:

Nov 06 2018

Luc Esape, artificial software fixer, unmasked by The Register


Luc Esape, aka Repairnator, is unmasked! The Java software fixer recently earned a world class reputation as a smart bot, thanks to an article posted on The Register by Thomas Claburn, a real editor based in San Francisco (California). 

The Register article is entitled: The mysterious life of Luc Esape, bug fixer extraordinaire. His big secret? He's not human

For the INRIA researchers at University of Lille within the Spirals team, this international recognition underlines the open source software ability to fix bugs through automatically generated patches, provided within minutes during the continuous integration and continuous delivery. 

A quote from KTH Professor Martin Monperrus, Repairnator and STAMP contributor, confirms the bot track records. In a few weeks, Repairnator has produced five patches that have been accepted by human developers and merged into their respective code bases: "This is a milestone for human-competitiveness in software engineering research on automatic program repair", he explains.

The online article along with multiple comments also raise unsolved questions about patch legal responsibility and future DevOps careers. 

Sep 20 2018

Mutation testing is a serious game

Automate what?

Thanks to this tweet from Arie Van Deursen, Head of Software Technical Department at TU DELFT and STAMP project partner, we are glad to share with you this online resource where you can learn about mutation testing through a serious game.
Pick your side carefully between attackers who are mutating the software code, and defenders who are adding new tests. And let us know about your gamification experience...

More on 

Sep 19 2018

Repairnator to repair software bugs on a large scale


Repairnator is an innovative bot to repair software bugs on a large scale, and an open source solution available to all developers now with a STAMP connection!

This development comes from the Spirals team, a joint team between Inria and the University of Lille within UMR 9189, CNRS-Centrale Lille-University of Lille, CRIStAL. More…

Sep 18 2018

Facebook SapFix and Sapienz to find and fix software bugs

Facebook engineers are investigating code automation using Artificial Intelligence in Sapienz and mutation-based fix in SapFix.
Both tools are designed to speed up the deployment of new software through distributed codes that are pre-tested and as stable as possible.
According to a recent article, they are intended for open source release in the future, once additional engineering work is completed, but no date is mentioned. More…

Sep 17 2018

Google Test Efficacy: running software at scale

Peter Spragins, Google Software Engineer and Teaching Assistant at UCSD Math Department, is summarizing almost four years of experience in running software tests at scale, with several colleagues in Mountain View (California).
"The two key numbers for the system's performance are sensitivity, the percentage of failing tests we actually execute, and specificity, the percentage of passing tests we actually skip. The two numbers go hand in hand."
Discover how Machine Learning is now part of the Google process of committing code. Read his article about Efficacy Presubmit Service

Sep 12 2018

Inauguration of Castor Research Center

Castor center Inauguration
A collaboration between KTH, Saab and Ericsson, the CASTOR Software Research Center was inaugurated today at Östermalm (Sweden), with over 50 guests including KTH professors, researchers, industry representatives and employees from the French embassy and Vinnova. 

Prof. Benoit Baudry underlined the aim of delivering outstanding research in software engineering and also expressed his wishes to increase collaboration through more co-developments of open source software tools. The goal is also to increase the number of industry PhD students to run the core research activities of the center, and contributing to reducing the cultural gap that currently exists when referring to software technology between the industry and the lab.

Ingemar Söderquist (Saab) and Diarmuid Corcoran (Ericsson) shared their vision about the challenges and opportunities for software technology in their respective application areas (defense and telecom).

Robert Feldt, Professor of Software Engineering at Chalmers University of Technology in Gothenburg, talked about his experience for setting up collaborations with the industry on software research in Sweden. 

Kristina Höök, Professor in Interaction Design at KTH, presented her insights after having led for more than 10 years the “Mobile Life” research center at KTH.

The official opening was made by Pontus de Laval – CTO SAAB, Dr. Magnus Frodigh – Acting Head of Research at Ericsson, and Prof. Annika Trigell – KTH Vice-President for research, which followed by a reception dinner.

Check out the Castor Research Center inauguration presentations and photos 

Sep 05 2018

STAMP and DeFlacker approach compared

Automate what?

Flaky tests raise a major testing problem in the software industry, in term of performance overhead.

Automatically detecting flaky tests, DeFlaker provides a new milestone in order to cope with them in a principled way.
There is no need to re-run the failed tests anymore. 

In the STAMP project, we follow a similar philosophy: we target major testing problems (missing assertions, crash reproduction) and we invent principled tools that target those problems and that are evaluated on large and complex software projects. These projects are both coming from the STAMP project partners and from international open source members from the OW2 community. More…

Sep 03 2018

Descartes Tutorial at ASE 2018

Place: Montpellier Corum Conference Center
Conference: ASE 2018
Instructors: Benoît Baudry (KTH), Vincent Massol (XWiki), Oscar Luis Vera Pérez (INRIA)

Let the CI spot the holes in tested code with Descartes tool

Bring your laptop, your favorite Java project (with JUnit tests) and find out how much of the covered code is actually specified by the test suite!

In this tutorial, we introduce the intriguing concept of pseudo-tested method, i.e. methods that are covered by the test suite, yet no test case fails when the method body is removed. We show that such methods can be found in mature, well-tested projects and we discuss some possible root causes. Attendants have the opportunity to experiment hands-on with our tool, called Descartes tool, which automatically detects pseudo-tested methods in Java projects. More…

Jul 20 2018

Resolving Maven Artifacts with ShrinkWrap... or Not

Trying to generate custom XWiki WARs directly from the unit tests, Vincent Massol, XWiki CTO, gave a try to the the SkrinkWrap Resolver.
Follow his work on this article about Resolving Maven Artifacts with ShrinkWrap... or Not 

Jun 25 2018

Environment Testing Experimentations

As part of the STAMP Project, Vincent Massol, XWiki CTO, is conducting five testing experimentations, with issues and limitations.
Read more insights in his article about Environment Testing Experimentations  

May 09 2018

Automatic Test Generation DSpot

DSpot is a mutation testing tool that automatically generates new tests from existing test suites. It's being developed as part of the STAMP European research project to which XWiki SAS is participating to.
Read this article from Vincent Massol, CTO of Wiki about DSpot Automatic Test Generation

Nov 17 2017

Controlling Test Quality

How to verify the quality of your tests?
Vincent Massol, XWiki CTO, suggests a strategy for Test Quality Control

Nov 08 2017

Flaky tests handling with Jenkins and JIRA

Flaky tests are a plague because they lower the credibility in your CI strategy, by sending false positive notification emails.

Vincent Massol, XWiki Technical Director suggests a new Flaky test strategy.

Oct 29 2017

Creating your own project's Quality Dashboard

Conference: SoftShake 2017, Geneva

Offered at SoftShake 2017 in Geneva, by Vincent Massol, Technical Director of XWiki SAS, this brand new presentation explains how to use XWiki to create a custom Quality Dashboard by aggregating metrics from other sites (Jenkins, SonarQube, JIRA and GitHub), saving them locally to draw history graphs and sending emails when combined metric thresholds are crossed. More…

Sep 28 2017

Mutation testing with PIT and Descartes

Vincent Massol, Technical Director of XWiki SAS, wrote an article about a recent experimentation with Descartes, a mutation engine for PIT, in the framework of the STAMP project. 

Here's an example of running Descartes on an XWiki module:

For more information, click on the Pit test report and read the Vincent Massol blog post, published here: Mutation testing with PIT and Descartes

Sep 17 2017

Using Docker and Jenkins to test configurations

XWiki SAS is part of the STAMP research project and one domain of this research is improving configuration testing.

In this article, Vincent Massol, Technical Director of XWiki SAS, suggests a new architecture that should allow XWiki to be tested on various configurations, including various supported databases and versions, various Servlet containers and versions, and Various Browsers and versions.

Aug 30 2017

Mutate and Test Your Tests

by Benoit Baudry

I am extremely proud and happy that my talk on mutation testing got accepted as an early bird for EclipseCon Europe 2017.

We will talk a lot about software testing at the project quality day. In this talk, I will focus on qualitative evaluation of a unit test suite. Statement coverage is commonly used to quantify the quality of a test suite: it measures the ratio of source code statements that are executed at least once when running the test suite. However, statement coverage is known to be a rather weak quality indicator. For example, a test suite that covers 100% of the statements and that has absolutely no assertion is a very bad test suite, yet is considered of excellent quality according statement coverage.


Jun 06 2017

Jenkins Pipelines: Attach Failing Test Screenshot

How to attach failing test screenshot to a Jenkins Pipeline?
Read Vincent Massol, XWiki CTO article about Jenkins Pipelines

May 10 2017

TPC Strategy Check

Read Vincent Massol, XWiki CTO, article about TPC Strategy Check

Dec 10 2016

Full Automated Test Coverage with Jenkins and Clover

Generating full coverage report for a multi-reactor project is a complex task.
Hopefully, Vincent Massol, XWiki CTO, provides a script with clear explanations for that need.
Ready to jump ?
Read his article on test coverage reports generation with Jenkins and Clover