Minimal DTOs

What is this?

This is my argument in favor of writing Data Transfer Objects (DTOs) with as little code as possible.

There is a common sentiment in Object-Oriented Programming that "the logic should live with the data". I agree that this is a good practice in what I would call the "business layer" of the code.

But we can benefit from drawing a line between the business layer and the data transfer layer. Business logic in the data transfer layer is usually misplaced, and can even be harmful.

If an object's main responsibility is to hold data, I believe that should be its only responsibility.

Why is this?

Are minimal DTOs better for the cloud? No. Are they better for performance? Not inherently. Are they better for use with JSON? With RabbitMQ? IoC? Cosmos? No, all around.

I will not claim that minimal DTOs are more fit for some specific purpose. I am claiming that they are easier to write, easier to read, easier to understand, and easier to change.

Technical debt is real, and it correlates strongly with code. The more code we write, the more maintenance it requires. The more advanced our classes are, the more they shift from "easy to use" to "easy to use once you understand them". The more our logic spreads out, the more our maintainers must run around to keep the company moving forward.

If we are to be kind to our maintainers, and our company, everything we write must justify its existence, and its placement. The rest of this page is dedicated to code I have seen in DTOs, and why I believe (most of) it belonged elsewhere.


How Minimal?

Serialization hints 😒

A class could have serialization annotations. These specify what the class should look like when serialized. But:

In the following example, the highlighted code can be omitted with no change to the serialization behavior.


using System.Xml.Serialization;

[Serializable]
[XmlRoot("Foo")]
public class Foo
{
	[XmlElement("Bar")]
	public Bar Bar { get; set; }

	[XmlArray("Quuxum")]
	[XmlArrayItem("Quux")]
	public List<Quux> Quuxum { get; set; }
}

[Serializable]
public class Bar
{
	[XmlAttribute("Name")]
	public string Name { get; set; }
}

[Serializable]
public class Quux
{
	[XmlAttribute("Id")]
	public int Id { get; set; }
}
		

Furthermore, the class will still have valid serialization if we take out the rest of the hints:


using System.Xml.Serialization;

public class Foo
{
	public Bar Bar { get; set; }
	public List<Quux> Quuxum { get; set; }
}

public class Bar
{
	[XmlAttribute]
	public string Name { get; set; }
}

public class Quux
{
	[XmlAttribute]
	public int Id { get; set; }
}
		

This comes at the cost of backwards compatibility: instead of <Quux Id="123"></Quux> we would see <Quux><Id>123</Id></Quux>. Of course, backwards compatibility is a non-issue when authoring new DTOs for internal use.

Ben's opinion: If you don't need it, skip it

Overriding Equals() 😠

The default implementation of object.Equals compares, I believe, memory addresses. So even otherwise identical objects would not be considered equal, unless this is overridden to consider the individual properties of the object. But:

Here is an example of a DTO where Equals has been overridden, in a way the consumer cannot use. Note the highlighted code:


public class Foo
{
	public int Id { get; set; }
	public string Name { get; set; }
	public DateTime RegistrationDate { get; set; }
	public DateTime ModifyDate { get; set; }

	public override bool Equals(object obj)
	{
		var other = obj as Foo;
		return other != null
			&& Id == other.Id
			&& Name == other.Name
			&& RegistrationDate == other.RegistrationDate
			&& ModifyDate == other.ModifyDate;
	}
}

public class FooProcessor
{
	public void Process(Foo incoming)
	{
		var existing = _database.GetFoo(incoming.Id);

		if (existing is null)
		{
			_database.InsertFoo(incoming);
		}
		else if (incoming.Name == existing.Name
			&& incoming.RegistrationDate == existing.RegistrationDate)
		{
			return;
		}
		else
		{
			_database.UpdateFoo(incoming);
		}
	}
}
		

Since Foo.Equals considers ModifyDate, and the consumer does not want to consider ModifyDate, the properties the consumer does want to compare are listed manually.

This is not as pretty as incoming.Equals(existing)! While "prettiness" is a subjective judgement, I believe most developers would agree. But I believe it's more important that, in order to understand FooProcessor's behavior, a maintainer only needs to read FooProcessor. That's a tradeoff I will gladly take.

Ben's opinion: Skip it

Overriding GetHashCode() 😠

The default behavior of object.GetHashCode() is about as useful as that of object.Equals(). An object's hash code will be used if that object is used as a key in a Dictionary, or if that object is stored in a hashtable. But:

Here is an example of a DTO that overrides GetHashCode, so that it can be put into a clever collection that ensures only unique objects are added:


public class MyUniqueCollection<T> : IEnumerable<T> where T : IEquatable<T>
{
	private HashSet<T> _items = new HashSet<T>();

	public MyUniqueCollection(IEnumerable<T> items)
	{
		foreach (var item in items)
		{
			_items.Add(item);
		}
	}

	public void Add<T>(item)
	{
		// System.Collections.Generic.HashSet<T> leverages T.GetHashCode
		if (!_items.Contains(item))
		{
			_items.Add(item);
		}
	}

	public IEnumerator<T> GetEnumerator()
	{
		return _items.GetEnumerator();
	}

	private IEnumerator IEnumerable.GetEnumerator()
	{
		return GetEnumerator();
	}
}

public class Foo : IEquatable<Foo>
{
	public int Bar { get; set; }
	public string Baz { get; set; }
	public string Quux { get; set; }
	public DateTime RegistrationDate{ get; set; }

	public override bool Equals(Foo other)
	{
		return other != null
			&& Bar == other.Bar
			&& Baz == other.Baz.ToUpperInvariant();
	}

	public override int GetHashCode()
	{
		unchecked // Overflow is fine, just wrap
		{
			return (17 + Bar.GetHashCode()) * 23
				+ Baz.ToUpperInvariant().GetHashCode();
		}
	}
}

public class FooProcessor
{
	public void ProcessFoos(IEnumerable<Foo> foos)
	{
		var uniqueFoos = new MyUniqueCollection(foos);
		foreach (var foo in uniqueFoos)
		{
			// do some work
		}
	}
}
		

Here I've rewritten that example to put the uniqueness code into the consumer. This demonstrates how we can use LINQ to keep that code nice and brief, and shows that putting the burden of uniqueness onto the consumer is not onerous:


public class Foo
{
	public int Bar { get; set; }
	public string Baz { get; set; }
	public string Quux { get; set; }
	public DateTime RegistrationDate { get; set; }
}

public class FooProcessor
{
	public void ProcessFoos(IEnumerable<Foo> foos)
	{
		var uniqueFoos = foos
			.GroupBy(f => new { f.Bar, Baz = f.Baz.ToUpperInvariant() })
			.Select(group => group.First());

		foreach (var foo in uniqueFoos)
		{
			// do some work
		}
	}
}
		

Ben's opinion: Skip it

Implementing CompareTo() 😠

By default, objects are not inherently sortable. We can define sortability on them by implementing IComparable, with the CompareTo method. But:

Here is an example of a DTO that implements CompareTo, and a consumer that makes use of it.


public class Foo : IComparable<Foo>
{
	public int Bar { get; set; }
	public string Baz { get; set; }
	public string Quux { get; set; }
	public DateTime RegistrationDate { get; set; }

	public override bool CompareTo(Foo other)
	{
		if (Bar < other.Bar) return -1;
		if (Bar > other.Bar) return 1;
		return -1 * Baz.CompareTo(other.Baz);
	}
}

public class FooConsumer
{
	public void Process(List<Foo> foos)
	{
		foos.Sort();

		foreach (var foo in foos)
		{
			// do some work
		}
	}
}
		

Note how it is not clear how the collection is sorted, if you read only the consumer code. Even once you read the DTO code, it still may not be clear; does returning -1 mean this comes before that? Or after?

Note also how a consumer wishing to sort by some other property, like RegistrationDate, cannot make use of the CompareTo implementation here. They would have to start at square one.

Contrast to this example, where we skip CompareTo:


public class Foo
{
	public int Bar { get; set; }
	public string Baz { get; set; }
	public string Quux { get; set; }
	public DateTime RegistrationDate { get; set; }
}

public class FooConsumer
{
	public void Process(IEnumerable<Foo> foos)
	{
		var sortedFoos = foos
			.OrderBy(f => f.Bar)
			.ThenByDescending(f => f.Baz);

		foreach (var foo in sortedFoos)
		{
			// do some work
		}
	}
}
		

The job of specifying the properties to sort by falls on the consumer, but it is not burdensome. And the behavior of FooConsumer is obvious from reading FooConsumer alone.

Operator overloading 😡

In C#, it is possible to implement custom versions of operators like == and <. But:

Ben's opinion: Awful

Overriding ToString() 😒

The default implementation of object.ToString() tells you the type and nothing more. This can be overridden to print individual properties of the object. But:

Ben's opinion: Doesn't hurt much... doesn't help much

Implementing Clone() 😒

Making a copy of an object is a hard problem that the designers of most languages have left to the developers using that language. This is because every implementation involves significant tradeoffs in time and memory, depending on how deeply the object is copied (or if the word "copy" even makes sense, as for a database connection).

In the rare case that I do need to make a copy of an object, I will make a copy of that object. I will deep-copy the properties I want copied deeply, shallow-copy the properties I want shallow-copied, and omit the properties that don't affect me.

Since cloning is such a rare (and, in my opinion, strange) need, leaving the implementation to the consumer is not a large burden.

Does implementing Clone on the DTO have a downside? Yes, I believe so. Once it exists, a maintainer tasked with adding a new field to the DTO has no way to know whether their new field should be included in the Clone. If they do include it, their one-line change became a 2-line change, quite possibly for no benefit. If they don't, we now have an inconsistency. The next maintainer will not be able to tell if the list of fields in Clone is deliberately incomplete. If they want to unravel the mystery, they must search all applications that might use this class, for all uses of Clone, and track each of the copies down to see which fields are used and how. I'm not making this up; it has happened to me.

In short: Clone (and methods like it) accelerate code rot.

Ben's opinion: Skip it

Providing static TestData 😒

A DTO with many properties could provide a static method that constructs a "typical" instance, for use in testing. But:

Here is an example of what a TestData property might look like. Notice how easily that test-related code could be moved into the tests themselves. Also notice how each test assumes certain values will be present in the TestData. These assumptions make the tests more difficult to read, and more brittle.


public class Foo
{
	public int Bar { get; set; }
	public string Baz { get; set; }

	public static TestData
	{
		get
		{
			return new Foo
			{
				Bar = 123,
				Baz = "abc",
			};
		}
	}
}

[TestClass]
public class FooProcessorTests
{
	[TestMethod]
	public void ProcessFoo_ShouldInsertIfMissing()
	{
		// ARRANGE
		var mockDB = new Mock<IDatabase>();
		mockDB.Setup(db => GetFooWithId(123))
			.Returns(null);

		var processor = new FooProcessor(mockDB.Object);

		// ACT
		processor.ProcessFoo(Foo.TestData);

		// ASSERT
		mockDB.Verify(db => db.InsertFoo(It.IsAny<Foo>()));
	}

	[TestMethod]
	public void ProcessFoo_ShouldUpdateIfChanged()
	{
		// ARRANGE
		var mockDB = new Mock<IDatabase>();
		mockDB.Setup(db => GetFooWithId(123))
			.Returns(new Foo { Bar = 123, Baz = "xyz"});

		var processor = new FooProcessor(mockDB.Object);

		// ACT
		processor.ProcessFoo(Foo.TestData);

		// ASSERT
		mockDB.Verify(db => db.UpdateFoo(It.IsAny<Foo>()));
	}

	[TestMethod]
	public void ProcessFoo_ShouldDoNothingIfUnchanged()
	{
		// ARRANGE
		var mockDB = new Mock<IDatabase>();
		mockDB.Setup(db => GetFooWithId(123))
			.Returns(new Foo { Bar = 123, Baz = "abc"});

		var processor = new FooProcessor(mockDB.Object);

		// ACT
		processor.ProcessFoo(Foo.TestData);

		// ASSERT
		mockDB.Verify(db => db.InsertFoo(It.IsAny<Foo>()), Times.Never());
		mockDB.Verify(db => db.UpdateFoo(It.IsAny<Foo>()), Times.Never());
	}
}
		

For the sake of completeness, here is how the tests would look if each built their own test instance. Note how each test can be read in isolation from the others; a maintainer investigating a broken test will not need to jump around to find setup, teardown, or anything else.


public class Foo
{
	public int Bar { get; set; }
	public string Baz { get; set; }
}

[TestClass]
public class FooProcessorTests
{
	[TestMethod]
	public void ProcessFoo_ShouldInsertIfMissing()
	{
		// ARRANGE
		var mockDB = new Mock<IDatabase>();
		mockDB.Setup(db => GetFooWithId(123))
			.Returns(null);

		var processor = new FooProcessor(mockDB.Object);

		// ACT
		processor.ProcessFoo(new Foo { Bar = 123, Baz = "abc" });

		// ASSERT
		mockDB.Verify(db => db.InsertFoo(It.IsAny<Foo>()));
	}

	[TestMethod]
	public void ProcessFoo_ShouldUpdateIfChanged()
	{
		// ARRANGE
		var mockDB = new Mock<IDatabase>();
		mockDB.Setup(db => GetFooWithId(123))
			.Returns(new Foo { Bar = 123, Baz = "xyz"});

		var processor = new FooProcessor(mockDB.Object);

		// ACT
		processor.ProcessFoo(new Foo { Bar = 123, Baz = "abc" });

		// ASSERT
		mockDB.Verify(db => db.UpdateFoo(It.IsAny<Foo>()));
	}

	[TestMethod]
	public void ProcessFoo_ShouldDoNothingIfUnchanged()
	{
		// ARRANGE
		var mockDB = new Mock<IDatabase>();
		mockDB.Setup(db => GetFooWithId(123))
			.Returns(new Foo { Bar = 123, Baz = "abc"});

		var processor = new FooProcessor(mockDB.Object);

		// ACT
		processor.ProcessFoo(new Foo { Bar = 123, Baz = "abc" });

		// ASSERT
		mockDB.Verify(db => db.InsertFoo(It.IsAny<Foo>()), Times.Never());
		mockDB.Verify(db => db.UpdateFoo(It.IsAny<Foo>()), Times.Never());
	}
}
		

A DTO with more properties, or nested objects, would obviously break up the flow of the test if constructed inline. For those, I personally would probably put the full initialization in the "ARRANGE" section of each test - but I wouldn't mind seeing it in a helper method inside the test class.

Ben's opinion: Skip it

Enums 😐

It is common practice to capture a fixed set of values as an enum, giving them names and a sort of protection from typos. This is always better than using a set of "magic numbers" to represent the values, and often better than using a set of "magic strings". But:

Since this affects all layers of the stack, the example code was too large for me to include it on this page (and you've seen the level of detail I'm willing to include here)! So the example code for enums is on a separate page.

I will fully admit that this is one of the weakest arguments I make on this page, if not the weakest. If I haven't convinced you to abandon the use of enums in your DTOs, I will not be offended - if you agree that the case for enums is pretty weak too. The tradeoffs are fairly minor, in the scheme of things. Please, continue reading.

Ben's opinion: Could take it or leave it

Automatic properties 😎

C# has a powerful feature called Properties that allows us to implement private backing fields in a very transparent way. Essentially, it is compiler-supported shorthand for the encapsulation pattern popular in object oriented languages. It allows us to replace this code:


public class Foo
{
	private int _bar = 0;

	public int GetBar() {
		// Perhaps some calculation, then:
		return _bar;
	}

	public void SetBar(int value) {
		// Perhaps some validation, then:
		_bar = value;
	}
}

// In consuming code:
var baz = myFoo.GetBar();
myFoo.SetBar(quux);
		

With this code:


public class Foo
{
	private int _bar = 0;

	public int Bar {
		get
		{
			// Perhaps some calculation, then:
			return _bar;
		}
		set
		{
			// Perhaps some validation, then:
			_bar = value;
		}
	}
}

// In consuming code:
var baz = myFoo.Bar;
myFoo.Bar = quux;
		

If there is no calculation or validation to be done, the code can be shortened even further to:


public class Foo
{
	public int Bar { get; set; }
}
		

This, of course, has almost no advantage over using a bare field:


public class Foo
{
	public int Bar;
}			
		

"Really, no advantage?" Yes. Obviously properties can do much more than fields, but if the properties don't contain any logic or access restrictions, then the only advantage of using them is a theoretical scenario where 1) you want to add some logic or restriction, and once you've done so 2) you cannot afford to recompile the consuming applications. That is, you intend to hot-swap your new DLL into the applications that are using it.

I cannot imagine that ever being necessary.

At this point you might reasonably expect me to say "If all you're doing is { get; set; }, just use a field instead!"

But I don't really care at all. The consumers can't tell the difference, and the impact to readability is near zero. I've gotten into the habit of using properties for all fields myself, and I can't be bothered to try to break it.

Ben's opinion: Cool with me

Setter validation 😎

Suppose we store names in a database column with a limit of 50 characters. The names enter our system from a web form. How should we enforce that limit?

  1. Let the database throw exceptions when we try to store data that won't fit
  2. Silently truncate data that won't fit, just before putting it in the database
  3. Throw an exception from the data access layer, just before attempting to store data that won't fit
  4. Throw an exception when setting the property of the DTO that corresponds to that database column
  5. Check the data when we first receive the web request, returning an HTTP 400 if it won't fit in our database
  6. Warn the user when they attempt to type a name that is too long, preventing them from submitting the form
  7. Disable further typing in the name textbox when the length limit is reached

Apart from B, I believe all of these answers are valid. It's probably best to use a combination of them. Personally as a user, I find F less infuriating than G... although I'm not sure that would help any, if I were being told my name is invalid.

Most relevant for this discussion, of course, is D. Setters are an excellent place for this sort of validation. It might look something like this:


/// <remarks>
/// Stored in MyDB.dbo.Accounts
/// </remarks>
public class Account
{
	/// <remarks> DB type is VARCHAR(50) </remarks>
	public string Name
	{
		get => _name;
		set
		{
			if (value?.Length > 50) throw new ArgumentOutOfRangeException(
				paramName: nameof(Name),
				actualValue: value,
				message: $"{nameof(Name)} cannot exceed 50 characters");
			_name = value;
		}
	}
	private string _name;
}
		

Even with the proper validation on the frontend (and assuming no one is using cURL to hit our API), this error will inevitably be thrown someday. Perhaps when the frontend is replaced, or when some other part of the system starts creating accounts. Even when that happens, we know that the database would ultimately protect itself...

But putting this exception in the DTO itself moves that error further up toward the initiator, which is always good. If an un-storable Account object can't even be created, then it can't get stuck in a queue, for example. And the error message and stack trace will be a nice informative one:


[System.ArgumentOutOfRangeException: Name cannot exceed 50 characters
Parameter name: Name
Actual value was The William Henry Gates III Living Trust, established 1955.]
   at Account.set_Name(String value) :line 15
		

I don't do this on all of my DTOs, yet. But the practice is growing on me.

Ben's opinion: Cool with me

Setter sanitization 😠

Suppose the following example throws an exception on the second line:


foo.Bar = input;
Assert.AreEqual(input, foo.Bar);
		

How could that happen? The DTO said it would hold the input (no exception was thrown from the setter!) but it stored something else instead. Let's call this sanitization.

Perhaps the input was a string with leading whitespace, and the DTO trimmed it away. Perhaps the input was a number, and the DTO clamped it to a range. Perhaps the input was a null, and the DTO stored today's date instead.

I am of the strong opinion that code which tells you it's doing something, but silently does something else, is deceptive code. If the input is not valid, throw an exception! Tell me I'm doing something wrong! Tell me to trim the inputs on my frontend, or to apply a default, or to fix my config file!

Silently sanitizing bad inputs allows the root cause to persist. The sanitization becomes a de facto standard, which must then be carried forward forever. It is a lossy transformation that makes the original input unknowable for downstream systems. It is impossible for maintainers to know, without reading the DTO code, whether and how a field is sanitized.

To be clear, I am directly rejecting the Robustness Principle. It tolerates bad actors in the system, and tolerance is an implicit blessing.

Ben's Opinion: Avoid

Setter cascades 😒

Suppose the following example throws an exception on the third line:


Assert.AreEqual(bar, foo.Bar)
foo.Baz = input;
Assert.AreEqual(bar, foo.Bar);
		

How could that happen? The DTO was holding something in its left hand, we gave something to its right hand, and its left hand changed! Let's call this a cascade.

Cascading might be used to implement business logic like "when an order's status is set to Delivered, its DeliveryDate should be populated with today's date". I believe this logic belongs in the consumer code; when the consumer sets the status to Delivered, let it set the DeliveryDate on the next line.

As usual, my reasoning is the maintainer's perspective. When someone asks you, "when is DeliveryDate populated?" I want you to be able to answer by searching the code for "DeliveryDate". I want you to see the reference in the AccountApi.MarkAsDelivered() endpoint and have your answer. I don't want you to also have to search the code for all references to Order.Status, to find all the places the status might be set to Delivered, to trace all of those back to their callers.

Ben's opinion: Skip it if you can

Setter parsing 😡

Suppose we have a DTO with an enum property. Our deserializer should handle translating an incoming string to a value of the enum. But what if the sending system isn't using our enum, or isn't using our serializer settings, and the incoming value doesn't quite match to an enum value?

We could make our DTO more flexible, like this:


public class Order
{
	public int Id { get; set; }

	[XmlIgnore]
	public OrderStatusType Status { get; set; }

	[XmlAttribute("Status")]
	public string StatusString
	{
		get
		{
			return _statusString;
		}
		set
		{
			var status = OrderStatus.Parse(value);
			if (status.HasValue)
			{
				Status = status;
				_statusString = status.ToString();
			}
		}
	}
	private string _statusString;
}

public enum OrderStatusType
{
	Pending, Fulfilled, Cancelled
}

public static class OrderStatus
{
	public static OrderStatusType? Parse(string str)
	{
		var s = $"{str}".Trim().ToUpperInvariant();
		switch (s)
		{
			case "PENDING":
			case "PENDINGDELIVERY":
			case "P":
				return OrderStatusType.Pending;

			case "FULFILLED":
			case "DELIVERED":
			case "F":
				return OrderStatusType.Fulfilled;

			case "CANCELLED":
			case "CANCELED":
			case "DONOTFULFILL":
			case "C":
				return OrderStatusType.Cancelled;

			default:
				return null;
		}
	}
}
		

We have definitely gotten more flexible in one direction. We can "robustly" handle many different inputs. But we have also created a puzzle for our maintainers. And worse, we have set them a trap. How long before the next member needs to be added to this enum? How likely that the maintainer will miss the parse function, which may be in a different repository from where they are generating the new string?

I am not being theoretical. I am not proud of this, but I have lost days to bugs on my stories, due to exactly this cause.

Once again I will invoke the Robustness Principle, an example of good intentions leading to major pitfalls.

Please, be restrictive in what you accept. If you do choose to use enums, leave the parsing to the serializer. If you must parse it yourself, stick to Enum.Parse. If an external system sends non-conforming values, translate their values to ours at the system boundary.

Ben's opinion: Avoid if at all possible

Lazy getters 😒

It's possible to implement lazy properties on a DTO, with code like the following:


public class Foo
{
	public Bar Bar
	{
		get
		{
			if (_bar == null)
			{
				_bar = new Bar();
			}
			return _bar;
		}
	}
	private Bar _bar = null;
}
		

That example accomplishes approximately the same thing as this non-lazy code:


public class Foo
{
	public Bar Bar { get; } = new Bar();
}
		

Properties are usually not expensive to instantiate, so the simpler code usually wins on that basis alone.

What if instantiating that property is very expensive? Odds are good we will need it eventually, so why not use the simpler code to pay that cost up front?

What if you know that that property is very expensive, and you know that it usually won't be needed? Well I'd guess we're not looking at a DTO anymore... But even if the laziness is appropriate, I'd personally prefer code that's a little more explicit about its intentions, and a little less verbose:


public class Foo
{
	public Bar
	{
		get
		{
			return _lazyBar.Value;
		}
	}
	private Lazy<Bar> _lazyBar = new Lazy<Bar>(() => new Bar());
}
		

Ben's opinion: Skip it

Custom collection types 😒

Prior to the introduction of generics in C#, it was common to create custom list-like classes. There is little reason to do so anymore. Between System.Collections.Generic and System.Linq, whatever logic might have lived in a custom class will fit just as nicely in consuming code.

Take this example of a FooCollection, that implements an indexer to search the collection by name:


class FooCollection : List<Foo>
{
	public indexer[string name]
	{
		get
		{
			foreach (var foo in this)
			{
				if (foo.Name == name)
				{
					return result;
				}
			}
			return null;
		}
	}
}

public class FooConsumer
{
	public void ProcessFoos(FooCollection foos, List<string> names)
	{
		foreach (var name in names)
		{
			var foo = foos[name];
			if (foo != null)
			{
				ProcessFoo(foo);
			}
		}
	}
}
		

Let's see that without the custom collection, using LINQ:


public class FooConsumer
{
	public void ProcessFoos(List<Foo> foos, List<string> names)
	{
		foreach (var name in names)
		{
			var foo = foos.FirstOrDefault(f => f.Name == name);
			if (foo != null)
			{
				ProcessFoo(foo);
			}
		}
	}
}
		

Using LINQ adds only a single line of code to the consumer, and it makes clear what the property formerly on FooCollection did. This version also brings to mind a quick refactoring to reduce the nesting level and skip a null check. Was the following version obvious to you when reading the FooCollection version?


public class FooConsumer
{
	public void ProcessFoos(List<Foo> foos, List<string> names)
	{
		var relevantFoos = foos.Where(f => names.Contains(f.Name));
		foreach (var foo in relevantFoos)
		{
			ProcessFoo(foo);
		}
	}
}
		

Ben's opinion: Skip it

Interfaces 😒

If a DTO contains only data, then the only thing an interface can do is describe that data. That's not useful for abstraction: a consumer using the DTO will already know they're using the DTO; there's no point in hiding it.

It's not useful for dependency injection in tests either, as the data will need to be provided in some object. Why not just use the DTO itself in your tests?

The only use I've seen for an interface on a DTO is a pattern like this:


public interface ICreatable
{
	public DateTime CreateDate { get; }
}

public class Foo : ICreatable
{
	public int Bar { get; set; }
	public string Baz { get; set; }
	public DateTime CreateDate { get; set; }
}

public class Quux : ICreatable
{
	public int Fee { get; set; }
	public string Foe { get; set; }
	public DateTime CreateDate { get; set; }
}

public static class Extensions
{
	public static void AddCreateDateParam(this IDbCommand command, ICreatable obj)
	{
		command.Parameters.AddWithValue("@CreateDate", obj.CreateDate);
	}
}

public static class MyDao
{
	public void UpdateFoo(Foo foo)
	{
		using (var command = new SqlCommand(_connection))
		{
			command.Parameters.AddWithValue("@Bar", foo.Bar);
			command.Parameters.AddWithValue("@Baz", foo.Baz);
			command.AddCreateDateParam(foo);
			command.ExecuteNonQuery();
		}
	}

	public void UpdateQuux(Quux quux)
	{
		using (var command = new SqlCommand(_connection))
		{
			command.Parameters.AddWithValue("@Fee", quux.Fee);
			command.Parameters.AddWithValue("@Foe", quux.Foe);
			command.AddCreateDateParam(quux);
			command.ExecuteNonQuery();
		}
	}
}
		

Compare that to the version where we skip the interface:


public class Foo
{
	public int Bar { get; set; }
	public string Baz { get; set; }
	public DateTime CreateDate { get; set; }
}

public class Quux
{
	public int Fee { get; set; }
	public string Foe { get; set; }
	public DateTime CreateDate { get; set; }
}

public static class MyDao
{
	public void UpdateFoo(Foo foo)
	{
		using (var command = new SqlCommand(_connection))
		{
			command.Parameters.AddWithValue("@Bar", foo.Bar);
			command.Parameters.AddWithValue("@Baz", foo.Baz);
			command.Parameters.AddWithValue("@CreateDate", foo.CreateDate);
			command.ExecuteNonQuery();
		}
	}

	public void UpdateQuux(Quux quux)
	{
		using (var command = new SqlCommand(_connection))
		{
			command.Parameters.AddWithValue("@Fee", quux.Fee);
			command.Parameters.AddWithValue("@Foe", quux.Foe);
			command.Parameters.AddWithValue("@CreateDate", quux.CreateDate);
			command.ExecuteNonQuery();
		}
	}
}
		

This code is obviously simpler. We could expand on this example, with more classes implementing ICreatable. We could add more fields to ICreatable, perhaps CreatedBy, or even more. Eventually we would arrive at a point where the interface + extension method was a net savings in lines of code.

But the interface will never provide a net savings in complexity, in terms of what we require our maintainers to understand. So I say the explicit, inline method will always be a better choice.

Ben's opinion: Skip it

Constructors 😒

The downside of a DTO with { get; set; } on every property is that it cannot be made immutable. This problem can be solved with a constructor. For example this:


public class Foo
{
	public int Bar { get; set; }
	public string Baz { get; set; }
}
		

Might be made immutable like this:


public class Foo
{
	public int Bar { get; }
	public string Baz { get; }

	public Foo(int bar, string baz)
	{
		Bar = bar;
		Baz = baz;
	}
}
		

However:

Here's an example of an immutable DTO gone wrong:


public class Gree
{
	public int Alpha { get; }
	public int Beta { get; }
	public string Gamma { get; }
	public int Delta { get; }
	public decimal Epsilon { get; }
	public string Zeta { get; }
	public int Eta { get; }
	public int Theta { get; }
	public int Iota { get; }
	public decimal Kappa { get; }
	public int Lambda { get; }
	public DateTime? Mu { get; }
	public DateTime? Nu { get; }
	public DateTime? Xi { get; }
	public int Omicron { get; }
	public Guid Pi { get; }
	public int? Rho { get; }
	public int Sigma { get; }
	public int? Tau { get; }
	public int Upsilon { get; } // Not in the constructor??
	public string Phi { get; }
	public DateTime Chi { get; }
	public string Psi { get; }
	public DateTime Omega { get; }

	public Gree(
		int alpha,
		int beta,
		string gamma,
		int delta,
		decimal epsilon,
		string zeta,
		int eta, // Never set??
		int theta,
		int iota,
		decimal kappa,
		DateTime? mu,
		DateTime? nu,
		DateTime? xi,
		int lambda, // Out of order??
		int omicron,
		Guid pi,
		int? rho,
		int sigma,
		int? tau,
		string phi,
		DateTime chi,
		string psi,
		DateTime omega,
	)
	{
		Alpha = alpha;
		Beta = beta;
		Gamma = gamma;
		Delta = delta;
		Epsilon = epsilon;
		Zeta = zeta;
		Theta = theta;
		Iota = iota;
		Kappa = kappa;
		Mu = mu;
		Nu = nu;
		Xi = xi;
		Lambda = lambda;
		Omicron = omicron;
		Pi = pi;
		Rho = rho;
		Sigma = sigma;
		Tau = tau;
		Phi = phi;
		Chi = chi;
		Psi = psi;
		Omega = omega;
	}
}
		

Gross, right? I think we're better off using a strong convention of "you shouldn't modify DTOs", than using constructors to try to ensure "you can't modify this DTO".

Ben's opinion: Skip it

Loggers 😠

Loggers are great for showing errors, unexpected input or results, program flow, and diagnostic information. But:

Ben's opinion: Skip it


FAQ

What about Records?

Records are an exciting feature introduced by C# 9. They are ideal for building minimal DTOs. They have concise syntax for declaration, and much of the functionality that could make a DTO "fat" is built-in. For example, every Record automatically implements "nice" versions of Equals, GetHashCode, and ToString. They look "nice" in the Visual Studio debugger, too (with no extra mouse movement).

I am really looking forward to these. When functionality like that is built-in, it will be unsurprising to consumers and maintainers. And it will come at no maintenance cost! Unfortunately I can't use them at my job yet. We're still, for the most part, stuck with C# 7.

Are you telling me to go rewrite all of my DTOs to be more minimal?

First of all, I'm not your boss. I can't tell you to do anything.

Secondly, even if I were your boss, I wouldn't want to mandate a certain code style. Not at this level of detail, at least.

Thirdly, rewrites for their own sake are a luxury for those with an overabundance of free time. I'm not burdened with that much time myself, so I can't presume you are either.

But. I hope I've been persuasive enough that, for the next DTO you write, you'll choose to leave the business logic to the consumers. Hopefully you'll even be able to do that without any guilt for offloading that complexity to them.

Why did you set up such obvious strawman examples?

There were no strawmen. All of the examples on this page are taken from code that I really had to (or have to) maintain. If an example seems particularly verbose or overwrought, that's one of the ones I trimmed down for clarity. But before you judge too harshly, please bear in mind that some of the features we take for granted (like LINQ, and generics) were not always available in C#.

Come to that, the majority of the points I make on this page are going to look very silly to anyone who starts their .NET career today, with all the conveniences of C# 10. "Why did Ben spend so much of his life writing about details that don't matter," they will ask. "Why not spend those precious hours with his friends, or his cats, or just reading a book instead? Especially since he made it so long that there's no way anyone will ever read the whole thing." Those are great questions, anonymous theoretical Zoomer. I wish I had answers.

Do you have secret ulterior motives for preferring minimal DTOs?

Yep.

Do you expect me to be convinced by your slippery slope arguments?

I know that "X leads to Y, and Y is bad, so avoid X" is not the strongest rhetorical device. But I need you to understand that I'm talking to you from the bottom of the slope. I have personally had to deal with the consequences of these patterns too many times to just shrug and hope they won't happen next time.

Why do you care how I write my DTOs? We don't even work together.

I don't care. But I truly and sincerely believe that your maintainers will.

I truly believe that the separation of data from business logic is a boon to those reading the code after the fact. I truly believe that practices like inlining a SortBy lambda do not add undue noise to the business logic code. I truly believe that asking maintainers to intuit the behavior of a "sanitizing" setter is asking too much - especially because if they guess wrong, the actual behavior may be in a library in a different repository that they cannot find and may not even have access to.

We know that code is read far more often than it is written. We know that code is far easier to understand as you're writing it than when you're reading it. So please: make your code as plain, as obvious, as unsurprising, as explicit as possible.