News
Next Seminar on 19.06.2024
Written on 18.06.2024 09:59 by Niklas Medinger
Dear All,
The next seminar(s) take place on 19.06.2024 at 14:00 (Session A) and 14:00 (Session B).
Session A: (14:00-14:30, 15:00-15:30)
Björn Karthein, Heyang Li
https://cispa-de.zoom.us/j/96786205841?pwd=M3FOQ3dSczRabDNLb3F1czVXVUpvdz09
Meeting-ID: 967 8620 5841
Kenncode: BT!u5=
Session B: (14:00-15:30)
Ben Rosenzweig, Justus Sparenberg, Milan Conrad
https://cispa-de.zoom-x.de/j/66136901453?pwd=YVBSZU9peUpvUlk4bWp3MDR4cGlUUT09
Session A:
14:00 - 14:30
Speaker: Björn Karthein
Type of talk: Master Final
Advisors: Prof. Dr. Andreas Zeller, Dr. Cristian Staicu
Title: Exploring the Suitability of Input Invariants for Automated Testing of Web Forms
Research Area: RA5
Abstract:
Web-based applications are omnipresent in today’s world. Web applications often rely
on user input to interact with, or get information from the end-user. Most modern
websites employ client-side validation to verify user inputs directly inside the browser to
improve responsiveness and accessibility of the website. Due to their popularity, web
applications are also employed in fields that require them to be secure and accessible to
the biggest possible group of users, such as banking or healthcare. Testing these sorts
of applications thoroughly is important to guarantee that the expected standards are
met. In this thesis we present a novel approach to automatically extract constraints
on web form values from the client-side source code. The extracted constraints are
merged together into a specification that defines an input invariant over the expected web
form input values, by encoding their syntactic and semantic properties. The obtained
specification uses well known language standards which makes it easy to understand
and reason about. Furthermore, the specification is freely editable, which allows to
manually encode additional input properties. A further contribution of this thesis is
a solution to automatically test the web form with values that are generated on the
basis of the specification. The approach allows for the generation of valid values that
conform with the extracted specification, as well as invalid values that purposely violate
it. We evaluate the reliability and correctness of our approach on web forms of real-world
applications. We succeed in extracting a specification for all tested web forms and
manage to successfully identify and extract JavaScript validation constraints for two of
the applications. In a subsequent experiment we generate multiple valid and invalid test
cases for the subject applications on the basis of the previously extracted specification.
For every set of generated test inputs, we attempt to submit the form and check whether
the values pass client-side validation or not. Across all subjects we report an overall
Accuracy of 83%, with a Precision of 69% and Recall of 96% for valid value generation
and a Precision of 97% and Recall of 76% for invalid value generation.
15:00 - 15:30
Speaker: Heyang Li
Type of Talk: Master Final
Advisors: Prof. Dr. Andreas Zeller, Fengmin (Paul) Zhu
Title: Monitoring Data Flow with Context-Free Grammars
Research Area: RA3
No further information provided.
Monitoring is a light weight and efficient formal method to provide correctness guarantees by observing an
execution of the software system. Observing the behaviours of software systems is challenging. Users have to
intrude the monitored system, or analyze a big amount of logs or interaction messages. However, existing non-intrusive monitoring methods has strict restrictions on formats
of logs and message, so online monitoring is pretty inflexible for arbitrary software systems.
In this thesis, we present a grammar-based monitoring method. In contrast to prior online monitoring methods
for specific system with restrictive logs formats requirement, our method is able to monitor any kinds of logs and messages,
as long as the formats of logs and messages can be encoded by Context-Free Grammars. We propose a declarative
specification language. The language can declare operations on a single log, and data dependencies and temporal relations between logs and messages.
We also propose a monitoring algorithm to evaluate a series of logs or messages against specifications.
Our method is able to evaluate complex dependencies between logs and messages without looking up predecessors and successors.
We discuss the expressiveness and evaluate our implementation using three cases studies from different areas,
demonstrating that our specification language is able to express real-world properties, and our monitoring algorithm
is able to detect validations and violations against the specification efficiently.
Session B:
14:00 - 14:30
Speaker: Ben Rosenzweig
Type of talk: Bachelor Final
Advisor: Dr.-Ing. Aurore Fass
Title: Machine Learning-Based Approach for Detecting Malicious Browser Extensions
Research Area: RA5: Empirical and Behavioural Security
Abstract:
Millions of people use browser extensions to enhance the functionality of their web browsers. Some browser extensions require elevated privileges, which web pages typically do not have. Attackers can abuse these privileges, for example, to steal user data, inject unwanted additional advertisements into websites, or manipulate search results. To protect users from these threats, we create a system based on static analysis and machine learning to detect malicious browser extensions. Surprisingly, we show that relying merely on metadata (e.g., the number of JavaScript files included in an extension or the number of active users) is sufficient to identify malicious extensions.
We train and test our system on 70,738 Chrome extensions, achieving an accuracy of up to 98.37%, a false-positive rate of 1.29%, and a false-negative rate of 4.61%. Additionally, we evaluate our system on an extra set of 35,462 (unlabeled) extensions that are not utilized for testing and training purposes and identify 1,345 potentially malicious extensions. Given the high accuracy and low overhead of our approach, we envision that it could be added to the vetting process of the Chrome Web Store.
14:30 - 15:00
Speaker: Justus Sparenberg
Type of talk: Bachelor Final
Advisor: Sven Bugiel
Title: Detecting, Categorizing & Evaluating App Permission Rationales
Research Area: RA5: Empirical and Behavioural Security
Abstract: Access to user data and necessary smartphone functionalities granted by user permissions are vital in ensuring the functionality of services provided by apps. Access to the camera, the microphone, the contacts, or any other private data may be relevant. No matter if it is Google or an independent developer, local laws and, thereby, the restrictions set by the app stores make it mandatory for developers to consider user consent when the app requires access to private data. Whether developers ask when the app is used for the first time or use runtime permissions to ask only when the data is needed for the next task, they want to ensure that the user consents to the required permission requests. It is beneficial for developers to provide the user with a rationale to justify the app's need for private data. Rationales have shown to be a vital tool for developers that influences the willingness of users to give permissions. Therefore, they should be studied further so that they can help developers improve how they convey their legitimate need for a user's private data.
This thesis aims to build a pipeline to extract and categorize developer Rationales directly from Android APKs, with the goal of analyzing not only individual apps but also collecting data for large-scale studying of user permission Rationales. This is achieved with the help of Machine Learning Transformers to identify and categorize the Rationales from the String data extracted from the APKs. In the process, this thesis also looks at the limitations and problems that can occur with this method and explores ways to deal with some of these, like a lack of labeled data. There is still potential for improvement, but the data collection method used in this thesis showcases promising results.
15:00 - 15:30
Speaker: Milan Conrad
Type of talk: Master Final
Advisor: Tural Mammadov
Supervisor: Andreas Zeller
Title: Learning UI Models from Web Apps
Research Area: RA3: Thread Detection & Defense
Abstract:
In the dynamic interplay between web developers and users, misalignments in intent
and expectations can significantly impact the user experience. Addressing this challenge,
our research introduces a predictive model utilizing transformer-based architectures
to accurately anticipate changes in the Document Object Model (DOM) as a direct
consequence of user actions, such as clicks and hovers.
This approach aims to bridge the gap between developer intentions and user interactions by enabling
web applications to adapt dynamically in real-time, without requiring direct user input.
In order to identify difficulties in training a model with the objective of accurately
predicting DOM transformations, we trained and evaluated three different transformer-
based models on various datasets of different sizes and complexities.
Afterwards, we conducted a detailed analysis of the models’ performance aswell as a failure analysis,
allowing us to identify the challenges and limitations of the current state-of-the-art
models in predicting DOM transformations.
A critical revelation of our exploration is the enhanced efficiency and predictive accuracy
achieved through fine-tuning large language models (LLMs), such as Mistral7B and
Llama3-8B. This method significantly outperforms the traditional approach of training
transformer models based on the GPT-2 architecture from scratch on equal datasets as
used for the fine-tunings, demonstrating the advantages of applying pre-trained models
to the specific domain of predicting user-induced DOM transformations.
While training GPT-2 from scratch failed predicting diffs induced by user interaction completely, reaching exact matches in 0% of the test cases, fine-tuning Mistral7B and Llama3-8B achieved a significant increase in the exact matches, reaching exact match proportions of up to 68%.
Our research examines how various factors—such as the detail level in DOM represen-
tation, the complexity of the changes to be predicted, an advanced prompting strategy
and the influence of the content’s language—affect model performance. These investiga-
tions demonstrate the complex nature of web page dynamics, revealing the underlying
challenges involved in accurately predicting the effects of user actions on the DOM.