Short variable names prohibited by grammar
Naming things using a single letter is consistently identified as a bad practice, and is even acknowledged as such by those who admit to sometimes "slipping up" and doing it themselves. So why not solve this by eliminating single-letter names in the grammar altogether?
Many languages adopt a rule that says, roughly, "identifiers must start with a
letter which can be followed by one or more letters and digits". (Some allow
for special characters like _
and $
, too.) Or, in EBNF:
ident = letter { letter | digit };
Initially, we might suggest changing the rule to "identifiers must start with a letter which must be followed by one or more letters, digits, or symbols", which means the minimum length for a valid identifier is 2. With two-letter identifiers, though, single-letter programmers will likely end up throwing in another consonant or tacking on an underscore, thereby satisfying the language's rules, but subverting their spirit. I think the tipping point is 3. With a minimum length of 3, the ridiculousness of trying to thwart the rules without actually increasing the readibilty of the code becomes apparent even to the stalwarts, which should result in few hold outs.
Considerations
First, math-heavy algorithms frequently use single-letter and two-letter names to great effect. With a defensive grammar as outlined here, a parameter list
x, y
can trivially become_x_, _y_
, although those names aren't easy to work with in the function body. I already think that languages should feature variable aliasing, in which case this mostly becomes a non-issue.Second, it's easy to design this into the language from the start, but nearly impossible to do afterward unless the language has a clearly defined experimentation period, like Rust's pre-1.0 releases or Swift's early development, where the Swift folks famously got rid of
++
and--
.Third, given that most new languages' biggest problem is adoption, it's possible that those against this kind of change are the ones who have enough influence to affect adoption. (Or more perniciously, those who are subtly turned off by it but never voice their concerns, so it's impossible for leaders to be reactive or attribute failure during post-mortems to the correct mix of circumstances.)
Type-named objects
Consider the following snippet:
PROCEDURE PassFocus* (V: Viewer);
VAR M: ControlMessage;
BEGIN
M.id := defocus;
FocusViewer.handle(M);
FocusViewer := V;
END PassFocus;
(This is Oberon. It has flaws—annoying ones. Oberon is not my favorite language. I'm comfortable presenting the examples here in Oberon, however, because this snippet should be more or less understandable even to those who've never seen its syntax, and if I'm going to present any example, I'm going to do it in a dead language that no one really uses, so as not to play favorites and put undue focus on the one chosen.)
Note the use of the single-letter identifier V
in the parameter list and the
local variable M
. Our V
can be easily changed to viewer
, and that would
probably be the prescription in most code reviews where the initial naming
would be seen as a problem. However, we're now running afoul of an awful lot
of repetition, which is a frequent criticism of many languages with static
type systems. It's often pointed out with classic Java for example that
almost any time you do something, you end up repeating yourself, sometimes up
to three times. E.g.:
FrobbedFoo frobbedFoo = new FrobbedFoo(bar);
This is why C#'s var
keyword is seen as an improvement, and JVM languages
have by now adopted similar constructs.
It's also said that naming things is one of the hardest things in CS. The
line above raises other questions, too. For our frobbedFoo
should we
perhaps be giving the local variable another name that describes it as
something else? We're obviously dealing with a FrobbedFoo
, and it is
redundant to refer to it as such, so should we prefer to name it after its
purpose in this context, i.e., what its role is in the procedure, rather than
what kind of thing it is?
With type-named objects, we answer this hand-wringing by acknowledging that in many cases, the type alone is sufficient—not merely sufficient for the machine, but for the human reader, too. In languages with support for type-named objects, we therefore need not always give an object an explicit name. Instead we unambiguously refer to it in the local context using its type.
For example, one approach to designing a language with type-named objects
would be to disambiguate with keyword the
. The example above becomes:
PROCEDURE PassFocus* (Viewer);
VAR ControlMessage;
BEGIN
(the ControlMessage).id := defocus;
FocusViewer.handle(the ControlMessage);
FocusViewer := the Viewer;
END PassFocus;
Compared to our single-letter identifiers in the preceding snippet, this results in more typing, but the programmer isn't pressed to stop and think of intermediate names to give to the two objects local to the procedure. This will allow for maintaining an uninterrupted train of thought, and despite the higher demand for "human IO", type-bound objects should be more productive and viewed as a programmer convenience.
Considerations
There are no accommodations to discriminate between multiple type-bound objects of the same type. This is because type-named objects are primarily a tool for maintaining programmer focus when performing brief excursions to write small utility functions and glue code, as above. At the point where you're juggling multiple locals with the same type, there should be enough of a conceptual distinction between the two that the programmer can readily come up with names, likely patterned after the distinction.
Language support for type-named objects is not expected to result in any feature abuse that will require project owners to bear a higher burden of policing the quality of incoming changes. In fact, it may reduce them, as is the case with the elimination of the single-letter identifiers for the example given here.
Member access can be made cumbersome. For example,
(the ControlMessage).id
may be awkward to write and, although not difficult to read and understand, may be superficially unattractive and unwelcome, especially if it occurs regularly.It's feasible to design most grammars so that the parens may be omitted, allowing
the ControlMessage.id
to work, butthe
binding more tightly toControlMessage
thanControlMessage
to.id
would probably weird people out, given that it's happening across a whitespace boundary. Introducing a special form modelled after English possessive might help:the ControlMessage's id
Why not? Well, constructs involving a lone ASCII single quote can make the job of the parser more difficult, when single quote is already significant within the language (such as for denoting character or string literals).
Additionally, some people simply might not like this. (It's a little bit too natural; some people demand that their programming language look like a programming language, which means aiming for a contrived syntax.) They might however be okay with using an arbitrary symbol:
the ControlMessage # id
We are now at the brink of contriving a syntax that is less readable and less welcome than the problem we are trying to solve. Let's look for a solution in the following proposal for an inverted selector syntax.
Inverted selectors
Many languages have a receiver.member
selector syntax, to select slot
member
of receiver
. This is used both to access fields of
records/structs/objects and to reference functions or other procedures—i.e.,
methods. Here we discuss an "inverted" selector syntax, so that the
receiver.member
above can become member @ receiver
. This on its own is
probably no significant benefit, but consider it in the context of a
subroutine, paired with language support for type-named objects:
PROCEDURE PassFocus* (Viewer);
VAR ControlMessage;
BEGIN
id @ the ControlMessage := defocus;
FocusViewer.handle(the ControlMessage);
FocusViewer := the Viewer;
END PassFocus;
This @
-notation is generalizable. I've wondered before why I don't see many
(any?) languages offer a "passive" form to refer to members.
If the culture of the language under discussion is one that involves an
overall pursuit to avoid magic symbols (e.g., Python and Wirth languages like
Pascal and Ada), then the keyword from
might be used, viz.
id from the ControlMessage
Considerations
The from
keyword, if not already present in the language grammar (for use in
some other context), may be problematic—it's hard to add keywords to a
language, because it can end up making code that worked in version n-1 suddenly
invalid code (reserved word used as an identifier). Contrast this the
suggestion regarding the
for discriminating type-named objects—I expect use
of the
as an identifier in the wild to be rare. So in the case of from
, a
semantically similar word like of
might be used in its place. Failing that
then for
, although it reads slightly awkwardly, wouldn't be a completely
inappropriate choice, and it's likely to already be a reserved word. The
language designers just need to be comfortable allowing it to appear in two
constructs, each one in which it has a completely different meaning.