Benjamin Eberlei

I do a lot of Code Reviews, and without proper automation of most of the low level items that you are usually "remarking" to colleagues it is a frustrating experience for everyone involved and takes more time that needed.

My natural tendency is towards action over planning, so I tend to find arguments over coding style are a waste of time if sufficient standards exist that you can just adopt. In my opinion this is is something team leads and managers need to rule on to avoid that their teams become unproductive in little coding-standard and style discussions.

Choosing a set of strict rules is very important and enforcing them helps reducing time spent on unimportant details during code review and allows more junior developers to prepare their changes in a way that most feedback can be automated.

Coding Styles alone are not enough in most cases. There should also be rules with respect to what type of language constructs and building blocks are preferred over others.

Equipped with a fixed ruleset, you can automate the low-level parts of code review in such a way that every developer can perform it themselves and fix violations before you even get to see the code in the first place.

In this post I give an (probably incomplete) overview of rules and patterns we follow for Tideways code base.

The first thing I put in place when Tideways hired its first engineer was PHP Code Sniffer with a slightly adopted Doctrine Standard (PSR-2 + a lot more), coupled with static analysis using Psalm. Combined with Github Checks Run API and Scotts diff filter tool all the violations are shown directly within Github commits and Pull Requests.

Using the phpcbf command and a lot of manual work we are down to zero violations. If you start using a standard from the beginning this won't happen, but strict enforcement of coding standard is not something I was used to before.

Violation check runs on Github include a message with a command to copy paste and auto-fix violations if available for PHP Code Sniffer, ESLint and TSLint.

We also have PHPStorm configurations to integrate each tool into the IDE.

In my opinion PSR-2 is a good first step, but it leaves a lot of details open that the Doctrine Coding Standard locks further down. A lot those rules are very strict and we had to disable them on our code base. If you are interested our phpcs rule definitions are in this Gist.

Besides "simple" code style checks, we have some clean code guidelines that we don't check with phpcs, because they are not black and white but require some deliberation.

For Javascript we enforce the strict defaults of ESLint/TSLint and for Go the default coding standard that `go fmt` produces.

We use Psalm for static code analysis. It automatically tries to find out all types of all variables in your application and then checks hundrets of different rules that might indicate bugs or improve the design of your application.

While type safety is awesome, without generics forcing Psalm or PHP-Stan at the highest level leads to a lot of clutter and boilerplate code that is only necessary to make the tool happy.

This is why I recommend to only fail the build on the obvious rules and leave the other ones as Warnings for the developer visible from the IDE.

We use a lenient psalm.xml and the Baseline feature to get down to zero errors. Every error fails the build.

A lot of accidental complexity in code comes from exposing implementation details to the outer layers of your application. This is mostly due to function arguments or return values that expose the implementation details.

This concept is called Information Hiding.

When adding arguments or return values think about the client calling the function:

Do they only get the information required to work, or does it return additional information (extra array keys, objects) that are unncessary?
Does a function argument expose the internal implementation of a datastructure? Can you turn it into a simpler input that is converted to the desired variable internally?

This is extremely difficult skill to master and takes a lot of practice.

One of the most influential blog posts on me has been To new or not to new by Misko Hevery.

It clearly defines which types of objects can be constructed with new in place (data objects, entities, value objects) and which ones should also be injected and centrally constructed by a factory object or a dependency injection container.

Blocks of statements and blocks of declarations usually get separated by an empty line, to make a very visual distinction. Also statements with a different context tend to be split by empty lines as well. See this example from one our controllers, where each statement and declaration is in its own paragraph.

<?php

 public function deleteAction(PageContext $context, $serverName)
 {
     $server = $this->findServer($serverName, $context);

     $this->serverRepository->remove($server);

     $this->jobQueue->queue(new BalanceServerTask(['applicationId' => $context->application->getId()]));

     return $context->redirectRoute('XhprofProfilerBundle.Server.environment');
 }

I would define paragraphs are statements or declarations that belong to each other. This concept closely follows paragraphs in natural languages and is easy to grasp and follow.

Code almost never needs the else keyword. It can be written in a way to use early exits with return or continue instead. This following code could have been written with an else. This rule keeps the main/happy path of the code on the main level of the method.

<?php

 public function formatQueryString(string $query): string
 {
     if (strlen($query) === 0) {
         return '';
     }

     parse_str($query, $queryArgs);

     $queryArgs = $this->truncateArrayValues($queryArgs);

     return json_encode($queryArgs, JSON_PRETTY_PRINT);
 }

For loops we continue to the next iteration for early exits:

<?php

 private function parseErrorTrace(string $trace, $removeArguments = false)
 {
     $traceResult = [];

     $parts = explode("\n", $trace);
     foreach ($parts as $line) {
         if (strpos($line, '{main}') !== false) {
             continue;
         }

         if (strpos($line, '#') !== 0) {
             continue;
         }

         // more
     }
 }

Another rule I try to follow is "`only one level of indentation per method <https://williamdurand.fr/2013/06/03/object-calisthenics/#1-only-one-level-of-indentation-per-method>`_" from the Object Calisthenics rulebook.

Guilherme from the Doctrine team introduced me to Object Calisthenics many years ago and I found it a very good guidestick to improve my code without having to think too much.

The deleteAction in the controller example above inhibits this rule, having no additional level of indentation. This second example shows a controller with one level of indentation, using the "Guard clause" pattern instead of nested conditions (See the refactoring)

<?php

 public function process(Task $task)
 {
     $application = $this->organizationRepository->findApplication($task->applicationId);

     if (!$application) {
         return;
     }

     $retentionDays = $application->getRetentionDays();
     $daysDiff = (new \DateTime('now'))->diff($task->date)->days;

     if ($daysDiff > $retentionDays) {
         return;
     }

     $this->historicalDataAction->compute($task->applicationId, $task->date, $task->overwrite);
 }

When methods use loops two levels of indention are ok, if the second level is used only for guard clauses.

<?php
 $lastProductionCommunication = null;

 foreach ($servers as $server) {
     if ($server->getEnvironment() !== 'production') {
         continue;
     }

     if (!$server->getLastCommunication()) {
         continue;
     }

     if ($lastProductionCommunication !== null && $lastProductionCommunication > $date)
         continue;
     }

     $lastProductionCommunication = $date;
 }

It is not easy to follow this rule to the dot, and we do have a lot of code in Tideways that doesn't follow this rule as we don't enforce it. But I view the violation of the one level rule as an automtaic, immediate signal of both complexity and potential for refactoring.

This rule is easier in PHP than in Go (by design). As we use both languages, we force ourselves to write out what a variable, class, constant or function is doing, and avoid abbreviations.

func (previous PreviousSpans) GetLastIndex(category, summary string) int {
    if entry, ok := previous[category]; ok {
        if entry.Summary == summary {
            return entry.Index
        }
    }
    return -1
}

Certain names still get abbreviated though, for example i, idx, j and other common ones.

We still work with a lot of arrays, because PHP makes it much easier to use them than complex types (objects). The risk of using arrays is not knowing if a key exists, and code being littered with if (isset($data['key'])) { statements all over the place. It is much better to use the null coalesce operator ?? to define default values for keys that you don't know exist on an array:

<?php
 $hostName = $serverNames[$currentOccurrance['sId']] ?? 'unknown';

If something doesn't exist it is NULL or false right? Wrong, if you care about simplicity of your code.

If a method is supposed to return an object but can't, it may be better to throw an exception.
If an array is empty, then it is empty, not null or false. Iterating over an empty array can correctly do nothing.

I prefer methods written in a way, where they either only handle the default (happy path) by throwing exceptions otherwise, or by defining all variables in a way where even the "unhappy/error" paths can run successfully through the happy code path and yield the exactly right result.

PHP allows this, but it can lead to very confusing code if you re-use the same variable with different types in a method or function. Avoid at all costs.

<?php
 public function deleteAction(PageContext $context, string $serverName)
 {
     $server = $this->findServer($serverName, $context);

In the previous block the argument $serverName could easily have also be called $server and therefore be both a string and an object of type Server within different parts of the method.

PHP has no way to typehint the items of collections (for example arrays, iterators). You either typehint array, \Doctrine\Common\Collections\Collection or any other class in your methods, with no way to specify type of the key or value once you start iterating with foreach.

There are various docblock supported styles, to define these types, which I recommend using on class properties and return values such as:

<?php
 class XhprofReport extends Struct
 {
     /** @var array<string> \*/
     public $namespaces = [];
 }

But to help both PHPStorm and Psalm (our static code analyser) to understand items of a collection, we use assert() with instanceof, if we can't provide the types ourselves on return values where we cannot define the typehint, for example when using Doctrine repositories:

<?php
 $organizations = $organizationRepository->findAll();

 foreach ($organizations as $organization) {
     assert($organization instanceof Organization);

     // ..
 }

To avoids typical complex array problems such as not knowing what keys exist or what types values have, arrays should always be wrapped in objects extending from a Struct base class.

The struct base class maps an array to properties of the class in the constructor:

<?php
 abstract class Struct
 {
     public function __construct(array $data = [])
     {
         foreach ($data as $property => $value) {
             if (!property_exists($this, $property)) {
                 throw new \RuntimeException();
             }

             $this->$property = $value;
         }
     }
 }

Often data in Tideways is best represented in domain objects that doesn't fit the entity or database model 100%. In those cases we manually map the database to the desired objects using this Struct pattern for simplicity.

Take this ServerConfiguration struct which is a combination of data from three Doctrine entities and is assembled in a repository from custom SQL.

<?php
class ServerConfiguration extends Struct
{
    /** @var int */
    public $id;
    /** @var string */
    public $environment;
    /** @var int */
    public $tracesPerMinute;
    /** @var bool */
    public $disabled;
    /** @var bool */
    public $newlyCreated = false;
    /*\*/
}

I am really looking forward when we deploy PHP 7.4 with typed properties where this pattern will become much more powerful and will allow us to write a "mini orm" for mapping SQL results to typed objects.

You can read more about Struct objects on the Qafoo blog and look into Kore's DataObject package that implements this pattern in a reusable Composer package.

Published: 2020-02-25

Article

Clean Code and Object Calisthenics Rules I try to Follow