Missing Salamanders: Matrix Media can be decrypted to multiple valid plaintexts using different keys

End to end encryption in chat protocols is a complex topics with lots of pitfalls.

One such pitfall is failing to verify that a ciphertext is meant to be decrypted with a specific key, or that a ciphertext is meant to decrypt to a specific plaintext, especially when an attacker can specify the key.

This post is about how Matrix does not perform authenticated encryption at all for media sent in end-to-end encrypted chats.

Timeline

I am unsure when I discovered this issue.
around 2024-05-15: I took a look again at security issues known to me in matrix. I constructed an initial POC
2024-05-18: I sent security@matrix.org a report
2024-07-13: I announced the disclosure date to security@matrix.org.
2024-08-02: The security team has recommended to publish the findings and file an issue against the spec repository.
2024-08-18: This article goes live.

How encrypted media is stored in Matrix

Matrix stores encrypted media as just the plain ciphertext, with the message specifying the URI, encryption key, encryption method, and file hash.
That means that the ciphertext is completely detatched from the encryption parameters, and could be changed out without others being able to easily tell.

The main ones that are of concern in this context are the version field, and the hashes field.

The cryptographic algorithms that are intended to be used are AES256-CTR with a SHA256 hash for the ciphertext. Given that it is plain SHA256, you can’t store that with the ciphertext, as otherwise the server could modify it. My assumption here is that they are intending to use this as a MAC, as this value is intended to be protected by megolm/olm, where you couldn’t verify a message without the “authentication key” [OLM Message Key].

Gripes with the spec

The spec for encrypted attachment is Section 10.12.1.7 of the client-server spec, which does intend to define what I have described above. It however seems to indicate absolute requirements with “should” sometimes and while it never defers to RFC2119 or RFC8174, the spec is using RFC2119 keywords in some places.

UPDATE: 2024-08-27: I found out today that matrix did started adopting RFC2119 in its spec process on the 14th, 4 days before the post went live. This is not reflected in the latest version of the spec as of today.

For the rest of the article I am going to interpret “should” as RFC2119 “SHOULD”. This does lead to some fun consequences for clients, as that means they could:

hardcode the key
hardcode the IV
not randomly generate key/iv
Use other 256 bit ciphers with an IV, like AES256-CBC, instead of AES256-CTR for encryption

It is also valid for an implementation to start an asynchronous media upload, pass the MXC-URI, and never actually upload an encrypted file.

It is also unclear if the requirement of unique key/IV pairs refers to the iv property of the final object, the first 8 bytes of the iv property, or just the iv generated through the optional key/iv generation process. 2/3 of these interpretations allow for key/iv reuse.

For a cryptographic protocol, this needs to be clarified, as this kind of uncertainty can introduce security issues.

Additionally, while this protocol is versioned [good!], it duplicates the encryption parameters inside of the key field. Encoding redundant information is frowned upon, as this can lead to attacks caused by improper validation of data. Seeminggly the reason why this format was chosen was so that it could be directly fed into the WebCrypto APIs in browsers, which sounds like an anti-feature when almost every field of this is required to be set to specific values. That is also not to mention how this isn’t a natural representation of AES Keys outside of web browsers.^[1]

Also as a side note, the spec doesn’t specify what kind of hash is to be used. You are required to provide a hash, but clients aren’t required to support any particular hashing algorithm. The recommended and only one that is specified is sha256. As such the scheme may not actually provide the intended resistance against malicious modification as crc16, or parity, or const [] are hashing algorithms.

Invisible Salamanders: Decrypting Authenticated Messages with multiple keys

In the paper “Fast Message Franking: From Invisible Salamanders to Encryptment” a novel attack on AES-GCM was published, where an attacker could construct a ciphertext-auth tag pair that decrypts to two (or more!) separate plaintexts.

The reason that this is possible is that GMAC, the MAC used by AES-GCM, is not a cryptographic hash function, and as such it is possible to construct hash collisions in a reasonable timeframe.

AES-GCM of course isn’t alone in this. Poly1305 is similarly not a cryptographic hash function and you can construct similar collisions in it.

The properties these two authenticated encryption algorithms are lacking is that they are neither message nor key-committing. As such it is impossible to verify whether a plaintext is the intended plaintext for a given (ciphertext, auth tag) pair, or whether a key k corresponds to a (ciphertag, auth tag) pair.

One way you could mitigate this is to use a cryptographic MAC instead, like HMAC-SHA256, or HMAC-BLAKE2b. This will commit the message which would be sufficient in this context.

Alternatively, you could generate the encryption key, authentication key, and a special ”key commitment” value from the secret key, which would provide key-commitment.

Charlotte Raccoon sitting in front of a laptop wearing sunglasses, typing furiously.

I was looking if Matrix has already solved this problem and they got close to it.
Megolm messages are MAC-ed with HMAC-SHA256. The issue is that they truncated it to 8 bytes, which is possible to be brute-forced. I am not sure *why* they did this, as being able to force collisions to occur would invalidate the security reasons for including a MAC in this position, especially since the entire ciphertext is also signed.

Missing Salamanders: Decrypting Unauthenticated Messages with multiple keys

Of course, the invisible Salamanders Attack affects authenticated encryption. So how does Matrix’s Media encryption fare?

The hashes field only takes into account the ciphertext, as such as long as the ciphertext is the same, any key and IV are accepted. That means that for every ciphertext/plaintext combination, there are $2^{384}$ valid key/ivs for each ciphertext, resulting in up to $2^{384}$ different plaintexts.

This isn’t particularly surprising if you think about how this is constructed, and also not particularly interesting unless we can create a useful example file that exploits this.

Preparing the Proof of Concept

Different file formats naturally have different byte patterns in different locations. One fun consequence of this is that you can create an amalgamation of multiple different file formats that effectively contains multiple different files per file.

The trivial case of it is if these file formats all happen to just be a zip file with a special suffix. This isn’t particularly interesting, as you can just create such a zip file by extracting multiple of these archives into the same directory and create a new zip file with all of the extracted files as contents.

It gets a lot more interesting with non-zip file formats, as there such a merge is not a well defined or meaningful operation.

Unlike with the plaintext version of these kinds of files, in this case it is possible to select files that overlap in very small chunks [on the order of at most 8 bytes], as you can brute force a ciphertext and two keys that results in two specific byte patterns to appear after decryption. I decided to not do this as this would be time consuming, especially when doing trial and error.

Suitable file types

PNG is a format that starts at byte offset 0 in the file, with the magic bytes 89 50 4e 47 0d 0a 1a 0a. As this is a common file format, this would be our first format to use.

The second is ZIP, that starts from the end of the file and has a magic of 50 4b 01 02. Also, all of its internal offsets are relative, so you can prepend any data to the front. This would be our second format

JFIF, commonly known as JPEG, starts with the bytes FF D8. For some decoders, this doesn’t have to be at the beginning of the file, however it must be the first FF in the file.

JPEG File

I settled for this sample image, as this showcases the features of the format quite nicely:

Charlotte Raccoon washing cotton candy in a water bowl, with massive jpeg artifacting
Lineart by PulexArt

Due to the way I plan on constructing the file, there is no need to prepare it in any special way.

PNG File

PNGs use multiple chunks to define both the internal metadata and the place the actual image data is stored. These chunks start with the length of the chunk, followed by the chunk name, then length bytes of data, and then a CRC32 checksum for the entire chunk (excluding length).

After the PNG signature, the first chunk is the IHDR chunk, which defines metadata necessary to decode a PNG, such as its size, pixel format, and encoding options.

There are several types of chunks, but the standard chunks all have formats that a typical PNG reader will probably verify. Thankfully, PNG lets you create custom chunks following the naming conventions of PNG chunks. How a field is to be interpreted by a parser depends on the casing of each letter in the chunk name.

In the aforementioned IHDR chunk, all characters are uppercase, meaning this is a chunk that is critcal, defined by the png standard, valid in current version PNGs, and unsafe to just copy by editors. Following this example, I chose to name my header jfIF, which means it is ancillary and custom.

As the ~~victim~~ test subject, I chose this image:

A plush of Charlotte Raccoon with a clueless look looking at the reader with tongue sticking out. French text reading “ceci n’est pas un graphisme portable de réseau” is overlaid on the image ^[2]
Sticker by Sammy the Tanuki

Now to combine the PNG with the JPEG, we just splice the files together. For that I used the following python script:

import struct
with open("not-png-orig.png", "rb") as f:
    png_data = f.read()

with open("washingFood.jpg", "rb") as f:
    jpeg_data = f.read()

with open("not-png.png", "wb") as f:
    f.write(png_data[:0x21])
    f.write(struct.pack(">I", len(jpeg_data)))
    f.write(b"jfIF")
    f.write(jpeg_data)
    f.write(bytes(4))
    f.write(png_data[0x21:])

The checksum is incorrect, but we will deal with it later, as this depends on the encryption keys we choose to use.

ZIP File

ZIP files are much simpler to combine with other formats, as they start from the end of the file. As such it is usually possible to just concatenate any other file and a zip file and get a hybrid file.

The sample I chose is the EICAR antivirus test file, stored inside of a zip.

After concatenating, this is the resulting hybrid file we are going to use

Encryption

Now that we have a hybrid file, we can encrypt it with AES-CTR. For simplicity. I will use a key of all zeros, and a nonce of 0, 1, and 2 for the jpeg, png, and zip respectively.

This does reuse nonces, but you can definitely interpret the spec to allow for that.

I generate the 3 files with:

openssl aes-256-ctr -K 0000000000000000000000000000000000000000000000000000000000000000 -iv 00000000000000000000000000000000 -in hybrid.zip -out hybrid-1.bin
openssl aes-256-ctr -K 0000000000000000000000000000000000000000000000000000000000000000 -iv 00000000000000000000000000000001 -in hybrid.zip -out hybrid-2.bin
openssl aes-256-ctr -K 0000000000000000000000000000000000000000000000000000000000000000 -iv 00000000000000000000000000000002 -in hybrid.zip -out hybrid-3.bin

Resulting in three separate files.

Then I splice the files together based on file indexes from the original file. I didn’t bother doing so systematically. This results in an almost-final spliced file.

Now I need to fix the png CRC. For that I have decrypted the spliced file with the PNG key. Thankfully someone created a tool for exactly this intended for use in fuzzers. This even keeps the trailing data!

Finally, I can re-encrypt the fixed file with the same key yet again, resulting in the final hybrid.bin.

The Proof of Concept

I uploaded the file I have constructed in the previous section in an unencrypted chat. Technically you can just upload it to the API directly, however that was easier and faster. The MXC URI of the uploaded file is mxc://matrix.chir.rs/b32beebfa9d6d10a4167605dd2a604125d4cd53a1816818523461124096 here.

The messages sharing the same file are as follows:

{
  "body": "hybrid.png",
  "file": {
    "hashes": {
      "sha256": "VYel1Xrqyq6DEXvfOmPi2+gGcqum+LQgk0L0Vj3h0eM"
    },
    "iv": "AAAAAAAAAAAAAAAAAAAAAQ",
    "key": {
      "alg": "A256CTR",
      "ext": true,
      "k": "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA",
      "key_ops": [
        "encrypt",
        "decrypt"
      ],
      "kty": "oct"
    },
    "url": "mxc://matrix.chir.rs/b32beebfa9d6d10a4167605dd2a604125d4cd53a1816818523461124096",
    "v": "v2"
  },
  "info": {
    "h": 512,
    "w": 512,
    "mimetype": "image/png",
    "size": 109899
  },
  "m.mentions": {},
  "msgtype": "m.image"
}

{
  "body": "hybrid.jpg",
  "file": {
    "hashes": {
      "sha256": "VYel1Xrqyq6DEXvfOmPi2+gGcqum+LQgk0L0Vj3h0eM"
    },
    "iv": "AAAAAAAAAAAAAAAAAAAAAA",
    "key": {
      "alg": "A256CTR",
      "ext": true,
      "k": "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA",
      "key_ops": [
        "encrypt",
        "decrypt"
      ],
      "kty": "oct"
    },
    "url": "mxc://matrix.chir.rs/b32beebfa9d6d10a4167605dd2a604125d4cd53a1816818523461124096",
    "v": "v2"
  },
  "info": {
    "h": 512,
    "w": 512,
    "mimetype": "image/jpg",
    "size": 109899
  },
  "m.mentions": {},
  "msgtype": "m.image"
}

{
  "body": "hybrid.zip",
  "file": {
    "hashes": {
      "sha256": "VYel1Xrqyq6DEXvfOmPi2+gGcqum+LQgk0L0Vj3h0eM"
    },
    "iv": "AAAAAAAAAAAAAAAAAAAAAg",
    "key": {
      "alg": "A256CTR",
      "ext": true,
      "k": "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA",
      "key_ops": [
        "encrypt",
        "decrypt"             
      ],
      "kty": "oct"
    },
    "url": "mxc://matrix.chir.rs/b32beebfa9d6d10a4167605dd2a604125d4cd53a1816818523461124096",
    "v": "v2"
  },
  "info": {
    "mimetype": "application/x-zip",
    "size": 109899
  },        
  "m.mentions": {},
  "msgtype": "m.file"
}

The differences between the JSON files are as follows:

All 3 have differing body text. This is usually the default filename when downloading a file in matrix clients.
All 3 have different MIME types.
All 3 have a different IVs.

There are some caveats however:

The JPEG doesn’t load correctly. You need to display it in ffmpeg or something similar.
Some archivers, for example Ark on KDE can’t open the ZIP file, even though it is a Zip archive.

Original Proof of Concept

The PoC I sent to the matrix.org security team was quite different. I decided to create a file that would read “good” under one key and “EVIL” under another. I didn’t think that was interesting enough for a full article, so I changed my plans a bit.

Why this is a problem

The invisible salamanders paper goes into detail about this. It allows you to send a file to someone [for example a malicious file], and if requested by admins, send them the same file but this time with the “good” keys. It also lets you hide malicious content in plain sight, only visible to those who know how a secret second key. While steganography will always be possible, detecting this would require an admin to be in possession of a secret 384 bit key that isn’t specified in the message.

Fixing this issue

In the section about invisible salamanders, I already posted a link about potential remediation for this issue. The v2 of this code could be adapted pretty directly, but will constitute a breaking change to end-to-end encrypted file support.

The key mentioned below is a uniformly distributed 32 byte (256 bits) cryptographic key. The key MUST be randomly generated using a CSPRNG and MUST only be used to encrypt a single file.

The EncryptedFile structure is replaced as follows:

Parameter	Type	Description
`url`	`string`	REQUIRED: MXC URI of the encrypted file.
`key`	`string`	REQUIRED: A 256 bit key encoded as unpadded base64. The client MUST verify the length to be exactly 32 bytes after decoding.
`v`	`string`	REQUIRED: The literal value `"rs.chir.matrix.media.v3"`. The client MUST verify the version.

This structure is not extensible. Additional fields MUST NOT be added without changing the version field.

The file is encrypted and decrypted using the method described in the aforementioned article, with two changes:

The version is changed from v2 to rs.chir.matrix.media.v3. The client MUST verify that the version number matches up between the EncryptedFile struct and the uploaded file.
The AAD is generated using Paseto’s Pre-authentication Encoding as follows: PAE([VERSION, nonce]). The rationale for this change is that JSON canonicalization is not a can of worms I am going to get into here.

If this was standardized, the version would be v3 instead.

Going further: Deriving the key from the Megolm ratchet

With Media Access Authentication being adopted into the standard, and MSC3911 existing, it might be useful to cryptographically link uploaded files to the events they are sent with. The big difference from the above is that the key is generated from the per message key, and that the version and AAD is different.

We then take the Megolm Ratchet at position i Ri, and and for each embedded file the file key is generated as follows:

Kf = HKDF(ikm = Ri, info = f"rs.chir.matrix.media.v4: {media_id} media key", salt = None, length = 32)

Where media is one of the following values:

File ID	File Type
`rs.chir.matrix.media.v4.file`	The main file or image
`rs.chir.matrix.media.v4.thumbnail`	The thumbnail of the file

If this was standardized, these would be m.file and m.thumbnail respectively.

The EncryptedFile structure is replaced as follows:

Parameter	Type	Description
`url`	`string`	REQUIRED: MXC URI of the encrypted file.
`v`	`string`	REQUIRED: The literal value `"rs.chir.matrix.media.v4"`. The client MUST verify the version.

This structure is not extensible. Additional fields MUST NOT be added without changing the version field.

This method would use the same method as above, but with the version field updated and the AAD of PAE([VERSION, nonce, media_id])

It would also be possible to add the event ID into the AAD, however this would significantly increase the complexity of the upload procedure, and also limit this scheme to files that can be encrypted and uploaded in about 30 seconds. This is because the client is unable to calculate the event ID of an event before sending it to the homeserver.

Closing Remarks

This is my second long-form vulnerability writeup, and there might be more coming in the future.

One of these proposals will probably be turned into an MSC in the future. If you have improvement ideas feel free to comment on the corresponding MSC. When I created it I should add the link here.

The blog currently has no comment functionality, so in the meantime consider leaving a content on the fediverse.

Special thanks to Soatok, without whom I would not have created this writeup.

If you liked this article and want to buy me a coffee, I have a Ko-Fi.

1.Most AES implementations do let you pass raw AES keys, but take a look at how JWK stores ECDSA keys. They are just raw X and Y coordinates. I have not seen an ECDSA implementation accept them as is. That is ignoring that these aren’t compressed points so you have to take extra care avoiding invalid curve attacks. ↩
2.I am aware that no french person would ever write PNG like this, but I don’t want to garner the wrath of the immortals. ↩