I've been one of the main people driving automatic formatting for the Nix language forward. As such, a major point of contention which repeatedly appeared is the one about where to put the commas separating function arguments — before or after each item?
This question is not new though, and the style can occasionally be observed in other languages as well, like Haskell for example.
In this text I will, mostly using the Nix language as an example, show you that commas or other separators need to be put after the item. I will also go into the few rare exceptions. I approach the topic purely from a code formatting perspective, and try to discuss it as exhaustively as possible. I will thus not go into other topics affected by this, like parsing and code generation for example, or languages where this encodes a semantic difference.
TL;DR:
- If your language supports trailing delimiters on the last item, using them is the strictly superior solution
- Therefore, every language should support them in all cases where this issue may arise
- In languages which don't, leading commas are the best possible workaround
- Having a leading delimiter is fine as long as it is also allowed in front of the first item, for example a YAML list starting each element with
-
.
The issue
Let's say you have a list of items separated by a delimiter. I chose commas because this is where you see this issue arise the most often, but this generalizes to any token-separated list of expressions.
Let's start with lists as an example, because they are simple and easy: [ 1, 2, 3 ]
.
Now, let's say the list is getting a bit long and you want to put every item onto its own line:
[
1,
2,
3
]
One thing which is common to do, is to append an item at the end:
[
1,
2,
- 3
+ 3,
+ 4
]
As you can see, to do this, one has to touch the line of the previous last element, despite nothing changing on it.
Now, if you have a language which allows a trailing comma on the last element, like Rust or Go, you can easily fix this:
[
1,
2,
3,
+ 4,
]
The diff only has one line. But if you are on a language which does not support this, like Haskell or JSON, you are out of luck. Since this is a so common and prevalent problem in many situations and many languages, people have converged onto a common workaround:
[ 1
, 2
, 3
]
[ 1
, 2
, 3
+, 4
]
In situations where a trailing comma is not allowed, this actually is very likely to be the best possible solution to format code in these cases. That's why people have independently converged on it over time again and again. However, as I will show, compared to simply having a trailing comma on the last item, this still has a lot of downsides.
Case study: Function arguments in Nix
Okay, so let's design a format for function arguments and leading commas and try to make it as good as possible. I'll cover all relevant design questions. (More may come up during an actual implementation attempt, but this should cover most of the ground.)
Basics
Let's start simple: Just three arguments, all single-line. There are already two possibilities to choose from.
# 1
{
arg1
, arg2 ? null
, arg3
}:
# 2
{ arg1
, arg2 ? null
, arg3
}:
[]
You may say "obviously we want the second one", but this already produces an inconsistency: In lists and attribute sets, we don't start the first element on the same line as the opening bracket/brace.
[ first
second
third
]
{ one = "1";
two = "2";
three = "3";
}
Of course we could decide to format lists and attribute sets like this for consistency, but pushing this further down the line would create weird situations. In these examples, one cannot easily start the first element on the same line as the opening bracket/brace:
rec {
one = "1";
two = "2";
three = "3";
}
function call [
first
second
third
]
function call # long line
[ first
second
third
]
This may even have ripple effects onto other syntax constructs like parentheses. So it looks like accepting the one inconsistency in function declarations is the less bad option here.
Now let's have a look at the same situation, but with trailing commas. Same decision as last time:
# 1
{
arg1,
arg2 ? null,
arg3,
}:
# 2
{ arg1,
arg2 ? null,
arg3,
}:
[]
This time it is easy though, we can simply pick style #1 where no consistency issues arise.
Multiline arguments
Things are already getting tricky.
# 1
{ one ? [
1
1
]
, two ?
if "two" == 2 then 2 else "two"
, three ? function [
1
2
]
}:
# 2
{ one ? [
1
1
]
, two ?
if "two" == 2 then 2 else "two"
, three ? function [
1
2
]
}:
# 3
{ one ?
[
1
1
]
, two ?
if "two" == 2 then 2 else "two"
, three ?
function [
1
2
]
}:
# 4
{ one
? [
1
1
]
, two
? if "two" == 2 then 2 else "two"
, three
? function [
1
2
]
}:
[]
There are many other variations to handle the indentation here, but the main issue in all of these is that it is ambiguous where the "base indentation" is. Does indentation start at the comma or at the argument?
On the trailing commas side, there is little to debate. The syntax looks so similar to bindings in attrsets, that we can simply copy the rules over 1:1 (replacing =
with ?
and ;
with ,
). Yay for consistency!
# 1
{
one ? [
1
1
],
two ?
if "two" == 2 then 2 else "two"
three ? function [
1
2
],
}:
[]
@
-pattern
In Nix, you can have the @
-pattern either before or after the argument destructuring. Both are equivalent.
# 1
args@{
one
, two
, three
}:
# 2
{ one
, two
, three
}@args:
# 3
{ one
, two
, three
}
@args:
# 4
{ one
, two
, three
}
@ args:
# 5
{ one
, two
, three
} @ args:
[]
Currently in Nix, leading @
is the more prevalent convention (estimated through some quick greps through nixpkgs
), but clearly it doesn't work here: the first argument without a leading comma just looks too inconsistent. (This is the same issue as three sections up coming to bite us again.)
So the only sensible option is to normalize them to the trailing version during formatting. (Note that this is not as trivial as it may look like at a first glance, since any token may have comments associated with it which need proper handling as well. But it should be doable.)
On the trailing commas side, it does not matter whether the @
-pattern is before or after the arguments:
# 1
args@{
one,
two,
three,
}:
{
one,
two,
three,
}@args:
# 2
args @ {
one,
two,
three,
}:
{
one,
two,
three,
} @ args:
# 3
args@
{
one,
two,
three,
}:
{
one,
two,
three,
}
@args:
# 4
args @
{
one,
two,
three,
}:
{
one,
two,
three,
}
@ args:
[]
There are many possibilities on how to place the whitespace here, but in all cases this works without having to normalize the code.
Comments
Arguments may have comments associated with them. These may come in various forms, #
, /*
, /**
, single-line or spanning multiple lines. Trailing comments after the argument name are trivial and don't need to be discussed.
# 1
{ /* comment */
one
, /*
multiline
comment
*/
two
, # comment
three
, # multiline
# comment
four
}:
# 2
/* comment */
{ one
/*
multiline
comment
*/
, two
/* comment */
, three
# multiline
# comment
, four
}:
[]
Style #1 has the disadvantage that the comma gets increasingly distanced from the argument it is associated with (comments may be long). It also loses the clear vertical line to the left which is one of the big advantages of a leading comma style.
Style #2 looks more promising though. When using #
comments it even has a clear vertical line for visual reference. It may be tempting to thus decide to normalize all /*
comments to #
for that purpose. But this may be a controversial change on its own, but more importantly this cannot be done for /**
doc comments. Additionally, style #2 has the issue that the first comment collides with a potential comment for the entire function, causing ambiguity which would break any documentation tooling.
Therefore, using style #1 for comments is the only viable option here.
On the trailing commas sie, nothing really to discuss again:
# 1
{
/* comment */
one,
/*
multiline
comment
*/
two,
# comment
three,
# multiline
# comment
four,
}:
[]
Digression: Leading delimiters all the way down
There is an interesting observation to be made. Sometimes, expressions are not separated by a delimiter, but they are terminated by one. Think of the semicolon ending statements. Similarly in Nix:
let
foo = 1;
bar = x: x;
in
{}
1. The semicolon on the last item is not optional. 2. The trailing comma style with a comma after the last item mirrors this.
So, what if we flipped this on its head? After all, we are still trying out a style with leading commas.
{
, foo
, bar
}:
[]
Okay, this looks very weird, but one can clearly see that it solves one of the major issues with the usual leading commas style which comes up around the first element. One could even indent the commas by one level to solve the indentation ambiguities discussed earlier. This concept also generalizes to attribute sets:
{
; foo = 1
; bar = x: x
}
Probably the main reason you find this weird is that the comma is an unusual delimiter to start statements. Same concept in another context and suddenly it doesn't look that weird anymore:
- foo
- bar
Alex Rogozhnikov has a great writeup on this topic: Delimiter-first code. So while this idea requires syntax changes and thus is out of scope from a pure formatting perspective, if you find yourself designing a language, consider to briefly pause and ponder whether starting your statements instead of terminating them with a delimiter might be reasonable.
Conclusion
When trying to create a good style with leading commas, at every corner we are faced with tough decisions between sub-optimal options and allowing inconsistencies. The only reason to use them in the first place is when they are the less bad option due to language restrictions. In fact, I don't know of any in the wild examples of people using leading commas despite the language supporting a trailing comma on the last item. (If you know of one let me know, but examples where language support was added later on after the style was "established" don't count!)
Related work
Other people have written about this topic as well. For now I'll just collect it down here, maybe if my text is not convincing enough one of these will do. If you find something, send it to me!