Should large source files be split into smaller ones for work flow efficiencies ?
I am working on a study of risk associated with code changes made in the Mozilla Firefox browser.
Here is the ten thousand feet view of my approach
Remember ... this study is done not on all bugs in the Bugzilla but only on the fixed bugs.
Without getting into too many details, I have found that top 10 files in the stack ranked order of fix-score have 12% of all bug fixes in them. Since there are thousands of files in the executable, this is a significant concentration.
All of these TOP score files are big in size. For example the number one score file has fifteen thousand lines of code in it.
But if I double click the source file and see which methods are actually modified to fix the bugs, the distribution of hotspots is fairly concentrated in the file itself into segments. Typical scenarios are more like 20% of the source file itself has 80% of fixes concentrated in it.
It is fairly obvious that the top 10 files should have additional review mechanisms in place to account for the fact that they have been centers of a lot of bug fix activity. But the actual bug fixing in those files is not uniformly spread across the files.
Here is my question ... since the bug fix hotspots in the files are concentrated , does it make sense to recommend that the large file be split into smaller files so that only those ( small ) files that contain the hotspot segments should be tracked for additional review and the safer portion(s) follow the normal check-in procedures ?
Here is the ten thousand feet view of my approach
- Get the list of all bugs that were fixed in Firefox in the last one year [ from Bugzilla ].
- Assign a score to each bug fix.
- Example: If the source file is modified to fix a security bug in the recent quarter .. give it 20 points, if it was fixed 4 quarters ago, then give it 5 points ... that means a sec-fix can get a score of 20 or 15 or 10 or 5 points based on when a sec bug is fixed in the last 365 days
If the source file is modified to fix a regression bug in the recent quarter .. give it 16 points, if it was fixed 4 quarters ago, then give it 4 points ... that means a reg-fix can get a score of 16 or 12 or 8 or 4 points based on when a reg bug is fixed in the last 365 days
Using a similar logic, a general bug fix would get a score of 12 or 9 or 6 or 3 points based on whether it was fixed 0-90 days ago or 91-180 days ago or 181-270 days ago or 271-365 days ago
Then add all the scores for all source files and stack rank the files in descending order.
- Each source file gets a cumulative score based on how many bugs are fixed in it, what types of bugs are fixed in it and when.
Remember ... this study is done not on all bugs in the Bugzilla but only on the fixed bugs.
Without getting into too many details, I have found that top 10 files in the stack ranked order of fix-score have 12% of all bug fixes in them. Since there are thousands of files in the executable, this is a significant concentration.
All of these TOP score files are big in size. For example the number one score file has fifteen thousand lines of code in it.
But if I double click the source file and see which methods are actually modified to fix the bugs, the distribution of hotspots is fairly concentrated in the file itself into segments. Typical scenarios are more like 20% of the source file itself has 80% of fixes concentrated in it.
It is fairly obvious that the top 10 files should have additional review mechanisms in place to account for the fact that they have been centers of a lot of bug fix activity. But the actual bug fixing in those files is not uniformly spread across the files.
Here is my question ... since the bug fix hotspots in the files are concentrated , does it make sense to recommend that the large file be split into smaller files so that only those ( small ) files that contain the hotspot segments should be tracked for additional review and the safer portion(s) follow the normal check-in procedures ?
Comments
I think we should do it at function granularity. So risky functions would have an attribute
ie
nsresult NEEDS_EXTRA_REVIEW
foo() {
}
where is a macro that expands to nothing at compile time.
But splitting methods out of the file makes coding and reviewing harder. nsEditor related code for example has the methods of the classes in many different files, and following that code is way too difficult.
We should just merge those different files to one.
An interesting question is whether you can connect regressions to the bugs they regressed, and then see whether fixes in particular files or places are more likely to cause regressions. That might indicate that the code is fragile and should be refactored or reviewed.
In fact, it would probably be worth evaluating anything you wanted to tag with NS_EXTRA_REVIEW for possible refactoring.
You can play the games 유니 벳 on the casino sites at one 브라 벗기기 of the many providers, 승인전화없는 토토사이트 such 도박 사이트 as Pragmatic Play, Microgaming, Play'n bet 뜻 GO, Evolution Gaming. So far, the