Tuesday 17 February 2015

Bug and instability propagation, working with numbers

Look at the following numbers which show the number of lines in some famous software applications, I've got them from informationisbeautiful.net:

Average iPhone app:          50,000 =   50K
Camino Web browser:         200,000 =  200K
Photoshop CS6:            4,500,000 =  4.5M
Google Chrome:            5,000,000 =  5.0M
Fire Fox:                 9,500,000 =  9.5M
MySQL:                   12,000,000 = 12.0M
Boeing 787:              14,000,000 = 14.0M
Apache Open Office:      22,000,000 = 22.0M
Windows 7:               39,500,000 39.5M
Large Hadron Collider:   50,000,000 = 50.0M
Facebook:                62,000,000 62.0M
Mac OS X Tiger:          85,000,000 85.0M
Human Genome:         3,300,000,000 =  3.3T (!!!)

I don't know how much these numbers are accurate, but even if they have 50% of accuracy there is still something we have to think about:

You can't just start writing such an application on the fly, can you?

I'm talking to those guys (even good programmers) who think since they can write complex Java/C++ /... classes or some code to interface with Linux kernel and ... they can write or manage such medium, large or even huge applications like the above list the way they write code before.

Sorry guys you can't do this
You'll fail if you don't pay attention to how you are going to do it. You need more than enough attention, and I'm sorry for you if your boss doesn't understand and asks you to immediately start writing code.

Use your experience or do some search on the Internet, you'll find out the average of a good class or software module is around 250 lines of code. Some say it should be even less, something like 100, but let us consider it as 250. So for the above list, Google Chrome should have 20K software modules or classes, Facebook around 250M and ... Now, how can you build such a software without having any software communication topology?

I've seen guys who even have a software design, but don't have any idea of what size of application they are dealing with. You need to work with numbers, define some rules and strategies to eliminate bug or instability propagation, otherwise your small unit or module works but in the real application, they stuck or behave abnormal and suddenly you'll see one of your units or modules get out of control.

Q: How to make sure software doesn't get out of control? 
A: Study how the human body works.

Have you ever seen how a neuron works, its structure, modules, how submodules in it communicate to each other, ...  is the most amazing and wise designed modular communication system you may find to study and learn. Not just neurons but the whole human body is the most complicated machine or in a way modular system we can learn from it.

Neuron cell diagram 
Recently I had a deep study on how neurons work, communicate to each other and how we human learn things. It is amazing, give it a try and read about it, you'll learn many things that help you to design and build better software. There is a hierarchy in the human body, it is good for a large software application should have such a hierarchy too. Look at the neuron detail here ... , you need to have such a hierarchy or you will get lost somewhere.

Now how can you manage a medium software application which you think it's gonna be like 1,000,000 lines of code? It is about 4K of units or modules. No modules in our body do everything, they usually do limited things, but cooperate with others, so design the software like it. You don't need a module that does anything by itself.

So you need to put some line limitation on the design for yourself and don't cross them in any situation. For example in order to design the hierarchy, you have to accept that any modules is going to have a maximum of 250 lines of code and can only have a maximum of 10 fully connected mesh network with siblings and can have a maximum of 10 children in its body. This gives you a maximum of 10(10-1)/2 or 45 communication lines between modules. It is still much if you ask me, not everybody is capable of good handling of such a connection network. In this case, if the application we are designing is going to have top 5 functions, the hierarchy will be something like the following

Level 0:       1                    =    1          
Level 1:       1 x 5                =    5           
Level 2:       1 x 5 x 10           =   50          
Level 3:       1 x 5 x 10 x 10      =  500           
Level 4:       1 x 5 x 10 x 10 x 10 = 5000           
Total Modules:                      5556

So such a hierarchy can support up to 5,556 modules which are about 1.4M lines of code. We also have from equation (6) that the maximum chain reaction propagation for each module will be:

(6) : ψ/(ρ-1)  (n-1)^2  (Σ E+ Σ Ni) , n = 10  → (6): 81 ψ/(ρ-1) (Σ E+ Σ Ni

Is 81 high or not, we still don't know but what we know is that if you chose 20 instead of 10 you'd have 361 which is much worst than 10 and if you chose it 4 you'd have 9 which is much better than 10. The difference is between these numbers is that by choosing a small number of modules per small networks you need more discipline at work and project management, which leads to more stable result.