Saturday 21 February 2015

Sparse Distributed Representation

I don't know why, but we have learned that we somehow always are in out of storage memory or disk space situation. We have learned we always have to compress data, for example, to show 256 simple ASCII characters all we need is having 8 bits. Even this much of representation doesn't satisfy us and we go for compressing data using mathematics formulas and ....

I never forget the time I started to study and work on designing databases, tables, fields ... all was about normalizing and minimizing sizes as much as possible. Why should you keep duplicate information? This is a bad idea to have some pre-calculated fields in tables! and ... But today, you can't have applications with tons of data and insisting on having normalized data design or even relational design.

In fact, you may have heard that relational database design is better for programmers, make programming easier, while if your concern is dealing with huge amount of data, then the relational model doesn't work and you need to go for something else. Even the way we are keeping information in memory.

5 active bits in dense and sparse memory.

Dense Representation
This means if you want to store for example 256 different ASCII characters as 0 and 1 bits all you need is 8 bit or a Byte (2^8 = 256). Yes it is good for minimizing the storage space and is somehow a good coding because you can easily represent 'A' as 0x41 or 01000001 and 'B' as 0x42 or 01000010 and ... a simple coder/encoder can help you to store and retrieve the data. But this simple coder and encoder may not give you the best result all the time. What if you want to compare a string with a long list of strings? ... Or you want to recognize (which is also a comparison) a face of person in a database of pictures stored as dense bytes? Do you still want to compare bytes?

In dense representation (DR) the position of bits has no inherent meaning, they just have some mathematical order. The coder/decoder uses this mathematical position to give a meaning to the whole bits in a Byte. You can extend DR from Byte to Word (16 bits) or Double Word or Long (32 bits) or Double Long or Quad (64 bits) or ... all you are doing is just trying to represent as much as possible states in as less as bits.

Sparse Distributed Representation
In 1988 Pentti Kanerva who was working on human memory models introduced a mathematical model for how human memory stores data. For a minute think about all the things you know, do you think you can have a search through them if you store them as a computer do? How can you look for some specific data in the huge memory you have kept during your life? No you know better than me, it doesn't work.

What Kanerva introduced was that human memory stores data in a way that the address itself represents the data. Weird!? Not much, it has many benefits and one of its best is that it is much more efficient to have access to the thing you are looking for than the typical random access memories with search algorithms. Here addresses which are similar to each other are having related content too.

In this model of data representation, you use many bits instead of few, for example, 5,000 bits vs 8 or 32 or 64 bits we usually have. But the only small number of these bits are set to 1 and most of the are 0, this is in fact what sparse means. You can have all the possible states a Byte or Word or Long ... have, but each bit itself has inherent or semantic meaning.

To have a sparse memory, you have to have not more than 2% of the bits active so for 1,000 bits we can have 100 active bits. Which gives us the following number of states:

(1,000-0) x (1,000-1) x (1,000-2)  x ... x (1,000-99) ~ 1,000^100 = 10^3^100 = 10^300

Compare this number with the number of stars in Milky Way which is about 4x10^11!!! or number of starts in the whole universe which is about 10^29!!!

So you see this 1,000 bit gives us a huge amount of states or events or .... to keep while it may give you also to get easily searched too. SDR is the way all mammals keep track of stored events, pictures, .... in their brain.

All we talk was about what sparse meaning is, now let us see what distributed in SDR means. Remember we talked about that each bit in SDR has a meaning, so the meaning of the memory content is distributed in the memory and the active bit distribution doesn't have a pattern and is distributed across all bits, so you can't say if ...0000100 represents 'A' then ...0000101 represents 'B'.

Also, note that since each bit has a meaning, if the content of some bit changes from 1 to 0, the whole memory still has the meaning more or less like before.

This may be one of the reasons why we are able to recognize a friend even after 20 years when you see him/her in the street! 

And again because of this sparse distribution you don't need to compare all bits of two memory to make sure they are equal (remember the number 125x10^100).