Cryptanalysis of the Oceantoo cipher
I saw this video of creating an encryption algorithm on the channel Gary Explains:
Gary's video goes through creating a new symmetric crypto algorithm. He says multiple times that this is just an experiment and not to be used with anything important. This is because crypto is challenging, it can't be proven to be unbreakable. The best it can be is "no known exploits are published, but multiple experts have studied it". I am not a crypto expert, and neither is Gary by his own admission, but this is still useful to learn the challenges of designing a crypto algorithm.
From his video, I learned about the existing attacks against LFSRs and proposed countermeasures. I found that interesting and worth the time to watch the video.
To be fair to Gary, he may know all the points below and wasn't able to fit them in to a reasonable length video. This is meant to educate, not criticize.
The code for Oceantoo is at https://github.com/garyexplains/oceantoo
Let's take an example picture and encrypt it.
The Makefile I used to generate these pictures is below. The "gray" image format in ImageMagick is just the raw pixel data with no compression or metadata. I'm using it in 1 bit per pixel mode. The output looks reasonably random to me, there's no obvious patterns that I can see.
SIZE=400x400
DRAW="rectangle 50,50 350,350; rectangle 100,100 300,300; rectangle 150,150 250,250"
DEPTH=1
image: example.png example.enc.png
example.enc.png: example.enc
convert -size $(SIZE) -depth $(DEPTH) gray:example.enc example.enc.png
example.enc: example.gray
../oceantoo -p "some random password" example.gray example.enc
example.png: example.gray
convert -size $(SIZE) -depth $(DEPTH) example.gray example.png
example.gray: Makefile
convert -size $(SIZE) -depth $(DEPTH) canvas:white -fill none -stroke black -strokewidth 10 -draw $(DRAW) example.gray
For comparison, this cipher's output looks much better than ECB mode:
So that's a good start. Let's dig into some possible problems.
The first problem is the lack of an initialization vector. There's a workaround by giving a random sequence offset, but relying on users to remember to never use overlapping sequences is asking a lot of them.
The second problem is passwords. The average user picks terrible passwords and reuses them all over the place. Breaking the password of a file might give you the user's password to their Facebook or email account if they are not disciplined when it comes to password management.
An improvement to both of those problems would be to use a brute force resistant password hash function. Typically password hashes also accept a salt, which can be stored in a header in the encrypted file. The salt would be randomly generated for each file and make the hash output for that password unique to just that file. This would eliminate the risk of reusing the same password for multiple files.
Wikipedia also has a list of other things to consider with a stream cipher, including:
- "keys must never be used twice" (addressed above)
- "valid decryption should never be relied on to indicate authenticity" - adding a Message authentication code would protect against the data being changed by an attacker, which could otherwise cause many different kinds of problems
Lastly, not every key is equally secure. If your LFSRs happen to end up with all 0's, your encrypted file is an exact copy of your plaintext. The chance of getting this key is very unlikely, but the code could protect against that case.
If your LFSR initial state ends up having a lot of zeros, the beginning of your plaintext will leak through:
# plaintext, first pixels are all white 0xffff
$ xxd example.gray | head
00000000: ffff ffff ffff ffff ffff ffff ffff ffff ................
00000010: ffff ffff ffff ffff ffff ffff ffff ffff ................
00000020: ffff ffff ffff ffff ffff ffff ffff ffff ................
00000030: ffff ffff ffff ffff ffff ffff ffff ffff ................
00000040: ffff ffff ffff ffff ffff ffff ffff ffff ................
00000050: ffff ffff ffff ffff ffff ffff ffff ffff ................
00000060: ffff ffff ffff ffff ffff ffff ffff ffff ................
00000070: ffff ffff ffff ffff ffff ffff ffff ffff ................
00000080: ffff ffff ffff ffff ffff ffff ffff ffff ................
00000090: ffff ffff ffff ffff ffff ffff ffff ffff ................
# encrypted, leaked plaintext when given bad LSFR key
$ ../oceantoo --force-lfsr-init-value 0x1 example.gray /dev/stdout | xxd | head
00000000: ffff ffff ffff fffd ffff ffff ffff ffff ................
00000010: ffff ffff ffff b98f ffff ffff ffff ffff ................
00000020: ffff ffff dfd5 97ff ffff ffff ffff ffff ................
00000030: ef1a 8b71 a14f ffff ffff ffff ffff f7ff ...q.O..........
00000040: ff7f 7ffe d7ff ffff fffb c7ff b804 7bc7 ..............{.
00000050: ffb8 04ff ffff fff5 ddd5 dd7f dff5 7ddd ..............}.
00000060: 5dd7 ffec 8c50 7fff c6a2 d810 e949 ffff ]....P.......I..
00000070: ffdf dfdf ffff f7ff ffff fbe7 dfdf fffb ................
00000080: 8cc7 ffff fee3 fffd ef0f 0d1d 5557 57ff ............UWW.
00000090: ffff 7fd4 c68b de90 5c50 a7ff ffff fff9 ........\P......
Knowing which password would lead to a "bad key" would be difficult for an average user.
It is my hope that this cryptanalysis was educational and that you found it interesting.