E-Mail is one of the very few truly aincient computer protocols that has survived into the current day. Even though every one uses it nearly every day, there are supprisingly few people who actually understand how it works under the hood. To use anything effectively, one needs to understand how it works and what it does. This is my attempt to unravel E-mail.

So as E-Mail is terribly old and fundamentally decentralized it helps to have a look at it's development over time and how that affects how it is used today. Due to it's widespread use, it's pretty certain not to go anywhere any time soon, so it's also pretty certain that a good undertandig of email will be useful well into the future.

The RFCs

The core way email works, is described in so called 'Request for Comments', which really is a weird way of saying 'Standard' but that's what they have eventually become. On that note, I highly recommend you read RFC1149. It's essentially a collection of memos written by the people of the Internet Engineering Task Force, which is one of the bodies that write recommendations on how to interconnect computers.

Here it is important to note that these descriptions do not talk about specific programs (so called 'implementations'), but rather on how different programs on different computers can work together (by communicating with each other) to form a distributed system (like the internet or E-Mail). It is then up to 'developers' to write such programs. The RFCs act as descriptions/definition of different Languages used for that communication. Again other RFCs describe how such distributed systems should be set up. They then define what kind of programs should exist and what responsibilities they have within the larger distributed system. They also might set out to describe the purpose of such a distributed system and it's features and capabilities. This is all to say that one should not read RFCs like software manuals but more as Arcitectures/overviews/protocol definitions.

I think that the [rfc-editor](https://www.rfc-editor.org) makes for a nicer experience than requesting the plain RFC directly from the IETF

So a simple search for 'Email' in the RFC-index gives 73 results, but after some closer inspection this turns out to be a bit misleading. It's misleading, because these are only the RFCs that talk about the Email itself (so how to represent an email), not about the architecture of the distributed system (even though it touches on aspects of that, focussing mostly on filtering out spam and Internationalization of email). It can also be seen by the RFC number, that these RFCs where all written after E-Mail had become wide spread.

So to understan email, I'll try to lead through the historic evolution of email through the different RFCs.

The very first RFC on the specific topic of what is now referred to E-Mail is RFC524. In this the Author proposes to introduce an XMAIL command space inside the FTP command space (the MAIL command space was already taken). However RFC524 references earlier work done on a more ad hoc basis.

Command spaces can again be understood better when looking at what was common during the early 70s. When connecting to a computer a 'terminal', comprising an
electronic typewriter and keyboard where used to send individual characters to the Computer. A program on the computer (the shell for locally connected users) would then
handle the input from that typewriter. What essentially happened was that ARPANET was built to essentially provide each of the two different computers
with a keyboard and a typewriter. So a command can literaly be thought of as a computer typing a sequence of ascii characters on it's virtual typewriter.
Commands may therefore be context sensitive (i.e. only be available if a command space has been entered into by issuing a previous context switching command).
The way that commands are transfered from one host to the other is based on the well known TELNET protocol Proposed in RFC318 which effectively describes
the above paragraph in way more technical detail.

One of the earliest ones is RFC724 and it simply sets out to describe how computers on the ARPANET in the US should communicate with each other. The CAHCOM (Committee on Computer-Aided Human Communication) is constituted and tasked with defining a header (an envelope of sorts) for all ARPANET 'Messages' to unify them accross the whole of ARPANET.

Just as context, The ARPANET in 1977 was probably not much more than a dozen or so computers connected together as at the time 'Computer' meant huge Mainframes (which where terribly slow compared to the ones we have today) owned by universities. The goal was to unify the 'postal systems' that had developed between different university mainframes, but as a result all used similar but incompatible message formats. It is called the ARPA Network "mail" service.

These 'pre mail' mail systems actually used FTP to transport files over the netwok.

These files simply contained the messages but the files where arbitrary and did not have standardized formats until RFC561 proposes such a standard (which looks very similar to the one still in use today).

The RFCs relevant for Email are (in no particular order):

  • RFC5598 - This one talks about the architecture of E-mail
  • RFC3297 - This one talks about MIME type
  • RFC3598 - spam filtering
  • RFC3685 - more spam filtering
  • RFC4870 - email authentication (making sure the mail came from where it says it came from)
  • RFC4952 - making email more international (allowing other languages than english ect.)
  • RFC
  • RFC5228 - specifies a language for filtering emails
  • RFC5335 - internationalized email headers
  • RFC5336 - internationalized email addresses
  • RFC5442 - Introduces a thing called LEMONADE for mobile email
  • RFC5703 - spam Filtering based on mime information
  • RFC6530 - Internationalization framework (6531 and 6532 expand on some details)