Entropy is a quite cryptic, often misunderstood scientific term that may have different definitions depending on specific field and context, which can intuitively be interpreted as an amount of disorder, uncertainty or randomness. There are two main kinds of entropy: information entropy (information theory) and thermodynamic entropy (physics).

Information entropy is a basic concept in information theory -- watch out, this kind of entropy is different from entropy in physics (which is described below). We use entropy to express an "amount of hidden information" in events, messages, codes etc. This can be used e.g. to design compression algorithms, help utilize bandwidths better etc.

Let's first define what information means in this context (note that the meaning of *information* here is kind of mathematical, not exactly equal to the meaning of *information* used in common speech). For a random event (such as a coin toss) with probability *p* the amount of information we get by observing it is

*I(p) = log2(1/p) = -1 * log2(p)*

The unit of information here is bit (note the base 2 of the logarithm -- other bases can be used too but then the units are called differently), in information theory also known as *shannon*. Let's see how the definition behaves: the less probable an event is, the more information its observation gives us (with 0, i.e. impossible event, theoretically giving infinite information), while probability 1 gives zero information (observing something we know will happen tells us literally nothing).

Now an **entropy of a random variable** *X*, which can take values *x1*, *x2*, *x3*, ..., *xn* with probabilities *q1*, *q2*, *q3*, ..., *qn* is defined as

*H(x) = sum(qi * Ii) = sum(qi * log2(1/qi))*

**How does entropy differ from information?** Well, they are measured in the same units (bits), the difference is in the interpretation -- under the current context information is basically what we know, while entropy is what we don't know, the uncertainty. So entropy of a certain message (or rather of the probability distribution of possible messages to receive) says how much information will be gained by receiving it -- once we receive the message, the entropy kind of "turns into information", so the amount of information and entropy is actually the same. Perhaps the relationship is similar to that of energy and work in physics -- both are measured in the same units, energy is the potential for work and can be converted to it.

Entropy is greater if unpredictability ("randomness") is greater -- it is at its maximum if all possible values of the random variable are equally likely. For example entropy of a coin toss is 1 bit, given both outcomes are equally likely (if one outcome was more likely than the other, entropy would go down).

More predictable events have lower entropy -- for example English text has quite low entropy because it is pretty easy to predict missing letters from other letters (there is a lot of redundancy in human language). Thanks to this we can compress the text, e.g. using Huffman code -- compression reduces size, i.e. removes redundancy/correlation/predictability, and so increases entropy.

**Example**: consider a weather forecast for a specific area, day and hour -- our weather model predicts rain with 55% probability, cloudy with 30% probability and sunny with 15% probability. Once the specific day and hour comes, we will receive a message about the ACTUAL weather that there was in the area. What entropy does such message have? According to the formula above: *H = 0.55 * log2(1/0.55) + 0.3 * log2(1/0.3) + 0.15 * log2(1/0.25) ~= 1.3 bits*. That is the entropy and amount of information such message gives us.

**How is information entropy related to the physics entropy?**

TODO

TODO

**But WHY does entropy increase in time-forward direction?** One may ask if laws on nature are time-symmetric, why is the forward direction of time special in that entropy increases in that direction? Just WHY is it so? Well, it is not so really, entropy simply increases in both time-forward and time-backward directions from a point of low entropy. Such point of low entropy may be e.g. the Big Bang since which entropy has been increasing in the time direction that's from the Big Bang towards us. Or the low entropy point may be a compressed gas; if we let such gas expand its entropy will increase to the future, but we may also look to the past in which the gas had high entropy before we compressed it, i.e. here entropy locally increases also towards the past. This is shown in the following image:

```
time
^ future
| . . . . . .. . . higher entropy (gas has expanded)
| . . . . . .
| . . . . . .
| .. . .. ..
| . .. ..
|_________....__________ low entropy (gas is compressed)
| .. . .
| . . .. .
| .. .. . .
| . .. . . . .
| . . . .. . . .. . higher entropy (we start compressing)
v past
```

All content available under CC0 1.0 (public domain). Send comments and corrections to drummyfish at disroot dot org.