Made Tech Blog

Data Structures at the heart of your system

You should always prefer to create complex data structures over complex program logic. In fact, making complex data structures that are intelligent will eliminate the need for complex program logic. This will lead to a more robust system that’s easier to reason about and has less code to maintain.

Data structures are language agnostic, programmers are free to discuss them without having to be tied down to a specific programming language or framework.

Revealing intent

Just by looking at a data structure, you should be able to tell what its role in the system is.

Exhibit A:

[100.00, 'Foo']

What is its purpose? There is no way of telling. The person reading this can only assume.

If it were passed in as a method argument, the method would be coupled to the fact that the values occur in different positions. This is not the right way to package this data, especially if it is meant to represent an important concept in the system.

Exhibit B:

{
  price: 100.00,
  name: 'Foo'
}

The data is enriched with keys that we can reference. You could say that this data structure has more value in the system.

Exhibit C:

Product = Struct.new(:price, :name) do
  def discount_price
    price * 0.5
  end
end

...

This data structure conveys a lot more meaning. Note that the ‘price’ in the discount_price method does not reach through other objects to do what it needs to, it operates directly on the data. This lifts the burden from other application code from having to deal with it.

Show me your flowchart and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won’t usually need your flowchart; it’ll be obvious.

– Fred Brooks, The Mythical Man-Month.

Being close to the data

Being far from the data manifests itself in what are commonly referred to as train wrecks. Violation of the Law of Demeter. Chained methods are an indication that your code is not close to the data.

You never want to reach vast distances across your program to execute functionality. You are at the wrong level of abstraction.

Order.customer.profile.display_name

vs.

user.display_name

The Rule of Representation

Based on this UNIX rule, you should fold knowledge into data so program logic can be stupid and robust. Data is more tractable than procedural logic.

Starting your program without data structures, adding application code as you go will lead to a bloated design. Start with your data structures first, and you will not be forced to write unnecessary code to compensate for missing functionality.

Fixing problems at the source

Never add more code to your program to address a problem that could be fixed at the source (the data). This may seem obvious, but it happens. Make it as easy for the next consumer of the data to do what needs to do.

Example:

Suppose that you have a controller responding to an HTTP request and you want to instantiate a new object with the incoming parameters.

For this example I will use an Order object. Let’s say that We need to set its currency. We can get this data from a different current_currency method. The wrong approach would be to fix the problem only at the point where you encounter it:

@order_params = params[:order]
@order_params[:currency] = current_currency
order = Order.new(@order_params)

While this seems innocent enough, we have unnecessarily convoluted the code dealing with the data structure. And it could be worse. Think about the origin of this data, could it be improved to make our lives easier when we get to this point?

This is not always possible, but a lot of the time it is. In this case, We can modify the params hash to come pre-packaged with the currency.

By changing the data, we have reduced the complexity of the program code.

order = Order.new(params[:order])

Unnecessary intermediate representations of data is a form accidental complexity and should be avoided. Accidental complexity tends to snowball, and so systems grow.

The data structures can accommodate a lot of data but need to remain cohesive.

We put the valid? method on a value object so that its validity does not need to be computed outside of the object.

class Order
  ...

  def valid?
    paid? and not expired?
  end
end

...

do_stuff if order.valid?

As opposed to:

do_stuff if order.paid? and !order.expired?

Managing the visibility of the data that an object exposes is known as encapsulation.

Cyclomatic complexity

State and cyclomatic complexity are two big sources of confusion in programming. Mix the two together and the cognitive overhead becomes unwieldy, at some point too much to deal with.

Considering your data structures first will help reduce the cyclomatic complexity of your program.

Example:

Let’s say we want to send emails to different email accounts based on the type of request that has been submitted. First thing that comes to mind is conditional logic. Right?

if params[:contact][:type] == :enquiry
  email = 'enquiries@foobar.com'
else
  email = 'admin@foobar.com'
end

send_email email

This does what we want, but it is a good example of unnecessary logic that could be handled by a data structure. Let’s build up a hash, which conveniently comes pre-packaged with a .default method, and push this logic down into the data structure. If the key is not found in the hash, it will use the catch-all default value.

admin_emails = Hash.new('admin@foobar.com')
admin_emails[:enquiry] = 'enquiries@foobar.com'
send_email admin_emails[params[:contact][:type]]

We have removed our own conditional logic and replaced it with a pattern matching mechanism provided for free by the data structure.

Whenever you see conditional logic in your code, consider whether it could be solved with a more intelligent data structure. Correctly chosen data structures promote elegant minimal solutions.

Conclusion

Designing good data structures upfront will have many positive side effects, including a less convoluted codebase.

It is important that you choose a programming language that facilitates the creation of effective data structures. Clojure and Haskell (to name but a couple) provide a vast array of useful data structures out of the box. These data structures are also immutable, which is another example of complexity being hosted by the data structure itself.

Once you have constructed your foundations out of data structures, the application code to glue this together will be simple. You start thinking about the relationships between the data structures, not the procedural steps to get from A to B.

Developers should choose to make data more complicated rather than the procedural logic of the program when faced with the choice, because it is easier for humans to understand complex data compared with complex logic. This rule aims to make programs more readable for any developer working on the project, which allows the program to be maintained.

– Eric S. Raymond on the Rule of Representation in UNIX.

About the Author

Avatar for Emile Swarts

Emile Swarts

Lead Software Engineer at Made Tech

All about big beards, beers and text editors from the seventies.