Files and Data

 Publ .

 Mins 8 (1573 words).

 Edit .

Files and Data

To understand the different utilities and procedures used to overcome the next challenges you must learn the basics of data encoding. The key point is that data in a computer system is only stored and manipulated as binary data. It is important to specifically address that when saying this, we refer to the low-level definition of file. In this sense, a file is a finite set of discrete values of only two possible states, often labeled with zeros and ones.

So far, we have been dealing with files made entirely of printable characters. These are often referred to as ASCII files in the high-level sense of the term. However, those files are obvioulsy comprised of binary data at the low-level, in the way that data is arranged on disk or memory. After making this distinction, we can understand that people often refer to binary files also on the high-level sense of it.

This differentiation leads to acknowledge that file contents can be other types of data structures aside from printable characters. A computer file can store encoded data, machine code, directory files, etc. It could also contain a mixture of two or more of these data types.

The following levels introduce several tools. Most of these are used to display the contents of files in various formats. They assist in the search for an interpretation of some unknown data for some given file.

Index


LEVEL 9 -> LEVEL 10

The password for the next level is stored in the file data.txt in one of the few human-readable strings, preceded by several ‘=’ characters.

Commands you may need to solve this level
grep, sort, uniq, strings, base64, tr, tar, gzip, bzip2, xxd

The quest is for a particular chain of characters in a given file. Concatenating the file, strings of characters lacking any kind of meaning appear on the screen. This is because the data of this file is comprised of more than ASCII characters. It is given that it has some chain of readable characters, but the rest can be any type of data.

The suggested command, strings, is a program that outputs the sequence of printable characters present in a file. After passing to it the file data.txt, you may observe that this file has many printable lines. To properly ‘filter’ the output you should rely on the second hint. Piping the output of strings to grep to only catch the lines that match our desired pattern.

$ strings data.txt | grep "=="
========== the
bu========== password
4iu========== is
b~==P
========== G7w8LIi6J3kTb8A7j9LgrywtEUlyyp6s

LEVEL 10 -> LEVEL 11

The password for the next level is stored in the file data.txt which contains base64 encoded data.

Commands you may need to solve this level
grep, sort, uniq, strings, base64, tr, tar, gzip, bzip2, xxd

Base64 is a binary-to-text encoding scheme. The base64 alphabet is comprised of alphanumeric charactes and the + and / symbols, reserving the = sign to signal padding. Therefore it can often be recognised by trailing equal signs at the end of the data, which quite often has.

The command base64 allows for encoding and decoding in Base64. For decoding, you need to use the flag -d.

$ cat data.txt
VGhlIHBhc3N3b3JkIGlzIElGdWt3S0dzRlc4TU9xM0lSRnFyeEUxaHhUTkViVVBSCg==
$ base64 -d data.txt
The password is IFukwKGsFW8MOq3IRFqrxE1hxTNEbUPR

LEVEL 11 -> LEVEL 12

The password for the next level is stored in the file data.txt, where all lowercase (a-z) and uppercase (A-Z) letters have been rotated by 13 positions

Commands you may need to solve this level
grep, sort, uniq, strings, base64, tr, tar, gzip, bzip2, xxd

The password for the next level is in an encripted file. The plain data has been processed with the algorithm rot13. This is a simple cryptographic function that changes every character in its input for the one positioned 13 steps ahead on the alphabet. Below is a table of the function’s mappings:

InputOutput
an
bo
cp
dq
AN
BO
CP
YL
ZM

This function has the particular property of being its own inverse. This means when you apply the function twice to the same character you ‘return’ to its original value. Observe, for instance, that “Bad” ciphers to “Onq” when applying the algorithm and then “Onq” becomes “Bad” again when applying it a second time.

To decript this data, the ciphertext has to be passed through the same function used for encryption:

Map all characters in the set:
'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz'
To the corresponding characters of the set
'NOPQRSTUVWXYZABCDEFGHIJKLMnopqrstuvwxyz'

The command tr (translation) can translate form one set to the oter. Piping the output of cat to tr and using the two sets as arguments yields the password for the next level:

$ cat data.txt | tr 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz' 'NOPQRSTUVWXYZABCDEFGHIJKLMnopqrstuvwxyzabcdefghijklm'
The password is JVNBBFSmZwKKOP0XbFXOoW8chDz5yVRv

LEVEL 12 -> LEVEL 13

The password for the next level is stored in the file data.txt, which is a hexdump of a file that has been repeatedly compressed. For this level it may be useful to create a directory under /tmp in which you can work using mkdir. For example: mkdir /tmp/myname123. Then copy the datafile using cp, and rename it using mv (read the manpages!)

Commands you may need to solve this level
grep, sort, uniq, strings, base64, tr, tar, gzip, bzip2, xxd, mkdir, cp, mv, file

The advice for this level is that you start by locating the file into a temporary folder, to aid in the procedure of obtaining the password in an orderly manner.

Let’s make a directory on the temporary folder:

$ mkdir /tmp/foobarbaz

Then copy the file to the created folder:

$ cp data.txt /tmp/foobarbaz

Position the prompt into the directory:

$ cd /tmp/foobarbaz

Use the command mv to rename the file:

$ mv data.txt data

The file to be manipulated, a hexdump, is a file whose data has been reorganized with the intention of perhaps being further analyzed. It has been rearranged into a format in which its binary data (each set of 8-bits, that is, a byte) is now represented as a two-digit hexadecimal number.

By default Linux has several utilities for the creation and manipulation of hexdump files. After exploring the given command’s manpages xxd seems as the proper tool. This is an utility capable of creating hexadecimal representations from binary files, or vice-versa. To return (reverse) a hexdump to a binary you pass the flag -r.

$ xxd -r data data2

This outputs to the folder a new file data2 that should be a compressed file. file determines the type of compression or contents of this data.

$ file data2
data2: gzip compressed data, was "data2.bin", last modified: Thu May  7 18:14:30 2020, max compression, from Unix

Knowing that its type is gzip, allows you to find a proper way to treat this file. But there is more to the file command. One of the flags, -Z, makes it try to look inside compressed files:

$ file -Z data2
data2: bzip2 compressed data, block size = 900k

More compression. Let’s move on and decompress data.bin using the gunzip utility:

$ gzip -d data2
gzip: data2: unknown suffix -- ignored

A small error. A search reveals that gunzip is one of the tools that do care about what extension files are named with. mv adds an extension before trying to unzip:

$ mv data2 data2.bin.gz
$ gzip -d data2.bin.gz

Now, running again file and also file -Z to peek its insides:

$ file data2.bin
data2.bin: bzip2 compressed data, block size = 900k
$ file -Z data2.bin
data2.bin: gzip compressed data, was "data4.bin", last modified: Thu May  7 18:14:30 2020, max compression, from Unix

You must use bunzip to decompress this type of file:

$ bunzip2 data2.bin
bunzip2: Can't guess original name for data2.bin -- using data2.bin.out

An error message reads that there is some problem about our original file name. Nevertheless, the task has been successful. The decompressed file has the name data2.bin.out. Issuing file to check its type and contents:

$ file data2.bin.out 
gzip compressed data, was "data4.bin", last modified: Thu May  7 18:14:30 2020, max compression, from Unix
$ file -Z data2.bin.out 
data2.bin.out: POSIX tar archive (GNU)

After running gzip once more two POSIX tar archive within one another, are obtained:

$ tar -xf data2.bin.out

And data5.bin:

$ tar -xf data5.bin

The procedure yielded a file named data6.bin, of type bzip2, that is compressing yet another tar:

$ bunzip2 data6.bin
bunzip2: Can't guess original name for data6.bin -- using data6.bin.out
$ tar -xf data6.bin.out

Now data8.bin, may be very close to solving this puzzle. It is a file of type gzip, containing inside an ASCII text file.

$ file data8.bin 
data8.bin: gzip compressed data, was "data9.bin", last modified: Thu May  7 18:14:30 2020, max compression, from Unix
$ file -Z data8.bin 
data8.bin: ASCII text

After adding an extension and decompressing the file, the final step is making use of cat to read the text it stores:

$ mv data8.bin data8.z
$ gunzip data8.z
$ ls
data  data2.bin.out  data5.bin  data6.bin.out  data8
$ cat data8
The password is wbWdlBxEir4CaE8LaPhauuOo6pwRmrDw

Finally, you shold clean up and delete the whole folder used. rm with the options -f and -r removes files and folder never prompting and recursively.

$ cd ~
rm -rf /tmp/foobarbaz