Project 2 - Deadline Extension and Some Hints
Given that lots of you seem to have problems reaching the target coverage for Part II, we have decided to extend the deadline to Friday (13th of December) midnight.
I think some of you might be confused by the format of the lecture, so I want to restate some of the organizational principles: The lecture's evaluation is project based. Unlike exercises, the projects are not designed to make you simply apply the basic notions presented in the lecture but rather to make you think more in depth about them. As such, projects can be more difficult and this is why I expect you to ask questions and discuss the project with me if you encounter difficulties or if you are unsure how to solve a particular task. I have been here at the lectures, on Askbot and answering emails for this very purpose.
Our goal if to help you succeed and learn more about fuzzing, not grow frustrated because you cannot achieve a target coverage. As a matter of fact, this project is a simplified version of the original one and the difficulty was considered appropriate for a project lasting over three weeks. Feedback on the difficulty level and the problems you encounter would have been appreciated well before yesterday.
Here are also a few additional hints that might help you to solve part II:
- The target program processes the seeds at a high-level of abstraction (boxes, placement properties, ...), so inserting random character-level mutations has low chance to help you achieve (much) more coverage. On the contrary, maintaining the structural integrity of the seeds might help you a lot.
- Having a character-level mutator does not mean you are restricted to "dumb" mutations, just that you cannot parse the seed into a structured form or insert structured elements. You are perfectly allowed to detect and replace for example alpha-numeric words, numbers, etc. You are also allowed to have a dictionary of keywords (not tags) that can be used for the mutation.
- I already stated in the project that you are allowed to look at the target's source code, to learn more about the input features that it uses. This include having a look at the coverage that you achieved already. The coverage module used in the project provides HTML reports with line by line coverage information.
- The notion of fragment is pretty vague, it only means a part of the seed. You can choose what is relevant for your use case. Considering all HTML tags equal when getting fragments is maybe not a good idea (for example, what good will it make if you replace the top html tag with a random other fragment).
- You are not restricted to having only one kind of fragments, you can build categories.
- Drawing the rate at which the coverage is growing might help you figure out if your changes are efficient without performing the entire 1000 runs.
You are of course totally free to solve this differently, but these might help achieve the required coverage.