Tuesday, December 23, 2008

Tabs vs. Spaces

You know that 'tab' key on your keyboard? I hate it. Not because it's not useful; I actually like it for the purpose of navigating through fields of a form. I hate it for programming. Why? Because it's another way of making invisible spaces.

Non-programmers are probably already losing interest at this point. I don't blame them; there's not really any controversy about tabs vs. spaces in normal text editing. So let me explain a bit about why it matters to programmers.

Source code (the stuff programmers write) is generally stored in a source-code management system (commonly referred to as an SCM). The SCM is in charge of storing versions of source files, so that as you add features or fix bugs, you can go back and see what changed.

The problem comes in when you change between spaces/tabs, because the SCM will see that lines have changed, but only the invisible, non-functional spaces of some lines really changed. And each programmer has their own view on how tabs should be used, so sometimes one will 'fix' the spacing written by another.

Source code files are written in a very structured format that includes code 'blocks', which are groupings of lines that are related to each other. These code blocks are indented from the left margin to easily distinguish them visually while scanning through the text.

The 'standard' view on tabs, which I disagree with, is that a tab is worth 8 spaces. Beyond that, indentation levels should be in increments of 1/2 tab. The result is that your lines will start with 4 spaces, then 1 tab, then 1 tab plus 4 spaces, then 2 tabs... Alignment of adjacent lines within a block are done the same way, using tabs until you can only use spaces to finish. So basically, tabs are used as a lazy way of writing 8 spaces.

That completely wastes the possibilities of having a different type of spacing character.

My view on tabs is that they should only be used to indent blocks of code. Beyond the initial code block indentation, only spaces should be used to align adjacent lines within a block. The benefit here is that if I like indentation levels of 4 spaces, but you prefer indentation levels of 3 spaces, we can each set our editor tab width appropriately and it will just work. No reason to change someone else's code to match your preference, everything is aligned no matter what your tab width is set to.

But since the two main drivers of source code formatting are the Linux and Microsoft developers... the standard is not going to change. Linux standard practice is 1 tab = 8 spaces, with indentation levels of 1/2 tab width. Microsoft is a little bit better (which is painful for me to say), with standard practice being 1 tab = 4 spaces, indentation levels of 1 tab width. If Microsoft would just go on to say that alignment within a block must be done with spaces, we'd be partway to a better world.

Not that the Linux developers would care; they've hardcoded their editors to do it their way, and seeing Microsoft push something will only keep them from doing it out of spite. But maybe, just maybe, a few of them would realize that the additional flexibility would be beneficial and not just a Microsoft ploy to take over the world. But it probably would be, so nevermind.