Readings


24 posts

XWiki Leverages STAMP Software Testing Suite

https://www.stamp-project.eu/download/main/contributors/Vincent_Massol.jpg?rev=1.2

What is it that characterises XWiki application lifecycle management?

XWiki  is an open source project born 16 years ago, which is continuously improving through 40 new versions per year. This enterprise wiki provides a platform for application development and offers 700 extensions. All together, it contains more than one million lines of java and javascript code. Every month, nearly 50 contributors are working on XWiki (including translations), including around 10 developers regularly involved in the core of the platform. The Build phase is very tool-intensive. A lot of tests and verifications are carried out to ensure optimal quality and compatibility with previous versions.

Which of the automated testing tools in the STAMP project are used regularly by XWiki?

We perform integration tests, unit tests and functional tests on several browsers, with several versions of java, several DBMS and in various Servlet engines. The testing tools developed during the STAMP project help us to validate our test coverage, to improve quality. Each new code that joins an XWiki module is checked in the Build phase to ensure that the quality of the module will be equal or superior to that of the previous version. 

More

BotPress NLP Open Source Stack

Sylvain_Perron.jpg

In a recent podcast with Software Engineering Daily, Sylvain Perron, CEO of BotPress, explains how BotPress is different from other bot platforms.

"The biggest competitors are the Google and Microsoft, where they offer natural language as a service. And I think that's very difficult to get something high-quality out of those service. And that's kind of like the Firebase approach versus Postgres, whereas with Firebase, like you don't control really well like all the configuration and options. 

And the way Botpress works is, and we're sort of the only one that does that, it's an open source stack. You run it on your computer. You can actually customize everything behind. And so you can really get the extra juice out of the engine. You can really fine tune anything you want. And also, the other advantage is that you can actually host that platform anywhere you want. So if you want to deploy on AWS or on Azure, you can do that, whereas if you go with the major cloud platform, you're actually stuck with that vendor. 

And so it's not very flexible. And so imagine you're your bank or healthcare provider, the idea of streaming all of your customers’ interactions over to Google might be frightening. So for any kind of application, I think developers want this kind of experience where they have control over the stack. And I don't think it feels natural to use just like an HTTP service that does that for you and you have no control. It's like a black box and anything can break at any moment. With Botpress, it's much more natural. It feels like regular software."

Adaptation of Cartesian Genetic Programming for Automatic Repair of Software Regression Faults

CGenProg.jpg

Title: CGenProg: Adaptation of cartesian genetic programming with migration and opposite guesses for automatic repair of software regression faults
Authors: Alireza Khalilian, Ahmad Baraani-Dastjerdi, Bahman Zamani
Journal: Expert Systems with Applications
Date: 1 May 2021
Read the full paper

Highlights

  • CGenProg proposed for automatic repair of software regression faults in Java programs.
  • Cartesian genetic programming as the core evolutionary algorithm was adapted and modified.
  • Biogeography-based optimization (migration) as the crossover was adapted.
  • Opposition-based learning (opposite guesses) as the mutation was adapted.

Towards multi-modal causability with Graph Neural Networks enabling information fusion for explainable AI

gnn.png

Title: Towards multi-modal causability with Graph Neural Networks enabling information fusion for explainable AI
Authors: Holzinger Andreas, Malle Bernd, Saranti Anna, Pfeifer Bastian. (2021)
Journal: Information Fusion
Publisher: Elsevier

The authors describe a novel, holistic approach to an automated medical decision pipeline, building on state-of-the-art Machine Learning research, yet integrating the human-in-the-loop via an innovative, interactive & exploration-based explainability technique called counterfactual graphs. They outline the necessity of computing a joint multi-modal representation space in a decentralized fashion, for the reasons of scalability and performance as well as ever-evolving data protection regulations. This effort is indented as a motivation for the international research community and a launchpad for further work in the fields of multi-modal embeddings, interactive explainability, counterfactuals, causability, as well as necessary foundations for effective future human–AI interfaces.

More: https://featurecloud.eu/wp-content/uploads/2021/03/Holzinger-et-al_2021_Towards-multi-model-causability.pdf

Sustainable computational science: the ReScience initiative

ReScience-banner-500.png

Title: Sustainable computational science: the ReScience initiative
Authors: Nicolas Rougier,  Hinsen Konrad and others
Journal: PeerJ Computer Science
Publisher: PeerJ Inc.

Computer science offers a large set of tools for prototyping, writing, running, testing, validating, sharing and reproducing results; however, computational science lags behind. In the best case, authors may provide their source code as a compressed archive and they may feel confident their research is reproducible. But this is not exactly true.
James Buckheit and David Donoho proposed more than two decades ago that an article about computational results is advertising, not scholarship. The actual scholarship is the full software environment, code, and data that produced the result. This implies new workflows, in particular in peer-reviews.  Existing journals have been slow to adapt: source codes are rarely requested and are hardly ever actually executed to check that they produce the results advertised in the article.
ReScience is a peer-reviewed journal that targets computational research and encourages the explicit replication of already published research, promoting new and open-source implementations in order to ensure that the original research can be replicated from its description. To achieve this goal, the whole publishing chain is radically different from other traditional scientific journals. ReScience resides on GitHub where each new implementation of a computational study is made available together with comments, explanations, and software tests.

More: https://www.labri.fr/perso/nrougier/papers/10.7717.peerj-cs.142.pdf

Hardware Versus Software Fault Injection of Modern Undervolted SRAMs

SRAM_FI.jpg

Researchers from Barcelona Supercomputing Center (Spain) and Abdullah Gul University in Kayseri (Turkey) are sharing an approach to apply real under-volting SRAM fault maps to a simulated system and observe the resiliency of the applications.
They compare the hardware guided fault injection approach with a random guided fault injection approach. Significant differences appears in the coarse categorization of the resiliency of the application, which become more obvious as the number of faulty bits increases. There are also differences when inspecting the quality of the output among the two techniques. This is because in an realisticsystem  not all fault locations have the same probability to  present faults, therefore from the software  perspective the faults can propagate to a limited number of software structures.

More

Corrective Commit Probability Code Quality Metric

CCP.jpg

An article signed by Idan Amit and Dror G. Feitelson from the Department of Computer Science at the Hebrew University of Jerusalem, presents a code quality metric, the Corrective Commit Probability (CCP).

This metric measures the probability that a commit reflects corrective maintenance. The authors think that this metric agrees with developers’ concept of quality, informative, and stable. Corrective commits are identified by applying a linguistic model to the commit messages. The  team compute the CCP of all large active GitHub projects (7,557 projects with 200+ com-mits in 2019). This leads to the creation of a quality scale, suggesting that the bottom 10% of quality projects spend at least 6 times more effort on fixing bugs than the top 10%. Analysis of project attributes shows that lower CCP (higher quality) is associated with smaller files, lower coupling, use of languages like JavaScript and C# as opposed to PHP and C++, fewer developers, lower developer churn, better on boarding, and  better  productivity. Among  other  things these results support the “Quality is Free” claim, and suggest that achieving higher quality need not require higher expenses.

MongoDB, A Database For Document Stores

MongoDB_logo.jpg

A potential prey for Oracle or Microsoft, MongoDB leads the document store market, and is now ranked #5 among all DBMS (source: DB Engines). It is at the heart of the DECODER PKM and also of multiple one-page websites based on the MEAN stack (Angular, MongoDB, Express, NodeJS).

In a recent article, Eric Weiss, Analyst at several large banks, sees MongoDB as the clear-cut leader within the high-growth, non-relational database SaaS sector. "MongoDB has been and will continue to be an indirect beneficiary of high-growth megatrends such as AI, Machine Learning, IoT (Internet-of-the-Things) and digitalization. Each of these trends have sparked an exponential growth in supply of unstructured data resulting in an increasing demand for (NoSQL) non-relational database solutions. Such databases can much more efficiently handle this new flow of data workloads compared to more traditional relational, SQL-based solutions".

Big Code has a direct impact on the business outcomes

bigcode_dep.png

For developers, code releases are "emotional" events. Many have fear and anxiety at the moment they release code or submit it for review and fear breaking dependencies.

Indeed, managing large and complex code bases (Big Code) can become laborious, time consuming and costly. Joe McKendrick article refers to a 2020 survey of 500 north American professional developers compiled by Dimensional Data and underwritten by Sourcegraph. The Emergence of Big Code survey highlights a dramatic growth in the volume and complexity of software code.

It's almost unanimous: 99% of respondents report that big code has a direct impact on the business outcomes of software development efforts. Challenges include less time for new hires to be productive (62%), code breaking due to a lack of understanding of dependencies (57%), and difficulties managing changes to code (50%).

Read the full article in ZDnet: https://www.zdnet.com/article/low-and-no-code-are-wonderful-but-a-big-code-world-lurks-underneath/

Machine Learning for Cybersecurity

IVD_method.jpg

Automated Vulnerability Detection in Source Code Using Minimum Intermediate Representation

Vulnerability is one of the root causes of network intrusion. An effective way to mitigate security threats is to discover and patch vulnerabilities before an attack. Traditional vulnerability detection methods rely on manual participation and incur a high false positive rate. The intelligent vulnerability detection methods suffer from the problems of long-term dependence, out of vocabulary, coarse detection granularity and lack of vulnerable samples.
This paper proposes an automated and intelligent vulnerability detection method in source code based on the minimum intermediate representation learning.
More..

..
Site maintained by OW2