Pixel, tokra has pretty much nailed down how the converter works. Two important details should be added to the explanation, however - first, the converter does not: first reduce to 16 colours and then apply the restrictions of the VIC, but rather: apply the restrictions and only then reduce - with dithering! - each character cell to the remaining colours (2 or 4). Second, I made sure that the converter processes *linear* intensities of the R, G and B channels.
The other details of the error/distance function between two colours (such as the colour weightings) and the accuracy of the used palette are much less important: but if you don't take gamma into account, darker areas of the image are inadvertently 'lighted up' with stray pixels, furthermore the colour balance is disturbed, which is especially noticable with weakly saturated colours (i.e., greys) in the source image.
Given a set of 3 global colours, the distance function then aims to minimize the error of each attribute cell, trying out all 8 foreground colour and hires/multi, which can be done independently of the other attribute cells, so there's no combinatorial problem involved. In the end, that actual set using the 3 global colours with the least total error over all attribute cells is output.
I didn't release the converter however, because often enough the restrictions of the underlying graphics mode (MG bitmap) *are* actually so severe, that the converter just can't find a good set of 3 global colours - and then the result becomes just a colourful pixel pulp, with the dithering part helplessly roaming around to minimize the error. A "debug" version of the converter allows me to "clamp" one, two or all global colours, but even that doesn't always lead to improved results.
So there's still a good opportunity for pixel artists to better the results of the converter. That's the good side of this.