Cybersecurity blog header

OWASP FSTM, stage 4: Extracting the filesystem

Extracting the fylesystem is a fundamental phase in the analysis of a device's firmware.Many IoT devices run an embedded Linux operating system that can be found included in the firmware image, along with the corresponding file systems. This article discusses the identification and how to extracting the filesystem from a firmware image.

The file system contains the executables, configuration files, scripts and services run by the operating system, so accessing it allows an in-depth analysis of the operation and characteristics of an IoT device. The analysis can be divided into initial recognition phases, the identification of existing file systems in the firmware and their extraction or assembly.

The fourth stage of the OWASP Firmware Security Testing Methodology aims to identify the file systems that can be found in a firmware image, detect the format, and extract their contents for further analysis.

In the previous steps, the firmware of the IoT device under study has been obtained and analyzed. It is common to find embedded Linux systems in these firmware images, adapted to IoT devices, with specific software and file systems. Therefore, one of the most important phases of the analysis is the identification and extraction of the filesystem, which will contain the executables, configuration files, scripts, and services of the device.

Subsequent analysis of this file system provides detailed knowledge of the device’s boot process and operation, which can lead to the identification of vulnerable executables or services and delimit the attack surface.

The file systems contained in the firmware may be in clear text or may be compressed or encrypted. In the first two cases, it will only be necessary to identify the format and use the appropriate tool to extract or mount it in the analysis environment. For an encrypted file system, more research about the firmware and manufacturer will be needed.

The following sections of the article detail the general steps necessary to obtain the contents of the file system. Additionally, some good practices and a set of useful tools for file system analysis are also presented.

In the examples, both firmware images available in the IoTGoat project and images extracted from other IoT devices are used to illustrate some of the possible scenarios.

Firmware image format identification

Before trying to identify the sections with file systems, to understand their contents, it is useful to identify the format of the firmware image. The file utility, available on Linux systems, tries to find out the file type given as an argument.

$ file hola.txt
hola.txt: ASCII text

To do this, file runs three different types of tests on the file: information search with the stat system call, magic numbers search and language identification. More information about this can be found in the previous article.

In cases where the file system appears at the beginning of the extracted image, file can help to identify it:

$ file squashfs

Squashfs: Squashfs filesystem, little endian, version 4.0, xz compressed, 3946402

bytes, 1333 inodes, blocksize: 262144 bytes, created: Wed Jan 30 12:21:02 2019

Although, in most cases, the firmware will start with a bootloader image or a blank section.

Search for signatures and magic numbers

Searching for signatures and magic numbers that reference file types and formats is a very useful technique in identifying sections of firmware, as discussed in the previous article in the series, especially for file system searching.

A useful tool for this is the well-known strings, which displays character strings that can be interpreted as printable in a file:

% strings IoTGoat-raspberry-pi2.img


For the IoTGoat-raspberry-pi2.img firmware, the following interesting strings are found for file system lookup:

  • hsqs: magic number of squashfs filesystems on little endian.
  • 7zXZ: part of the magic number of files compressed with LZMA2

It can also be useful to search for magic numbers in hexadecimal, since, in some cases, magic numbers do not consist of printable characters. For this, you can use a hex editor, such as hexedit, which allows searching byte strings. Some magic numbers corresponding to common file systems in IoT devices are as follows:

– CramFS: 45 3D CD 28
– UBIFS: 31 18 10 06
– JFFS2: 85 19
– SquashFS: 73 71 73 68 (sqsh), 68 73 71 73 (hsqs)

In the IoTGoat-raspberry-pi2.img firmware itself there are also FAT16 and FAT32 tags, but these file systems do not contain files of interest. They are used to allow writing the image to a USB flash drive.

Other magic numbers of interest may be those related to compressed files, such as the following:

– zip: 50 4B 03 04 (PK..)
– rar: 52 61 72 21 1A 07 01 00 (Rar!….)
– 7z: 37 7A BC AF 27 1C (7z¼¯’.)
– xz: FD 37 7A 58 5A 00 (ý7zXZ.)

When searching for a signature or magic number, keep in mind that firmware images may be in little endian or big endian, which affects the byte order within the signature.

In addition, for certain file systems and compression formats, non-standard signatures may be encountered. Many device manufacturers use modified signatures to indicate the format. For example, the open-source DD-WRT firmware for routers may use the tqsh signature to indicate a SquashFS (big endian) file system.

Entropy study

In some cases, sections within the firmware may be encrypted or compressed. If compressed, it is common to find some signature identifying the format, although it does not always exist. However, identifying an encrypted section requires another type of analysis.

In information theory, the entropy of data source is a measure of the average amount of information obtained for every character. By the very design of encryption algorithms, a sample of encrypted information should have an entropy very close to 1, the maximum value, while sections of code and unencrypted data typically have a variable entropy ranging from 0.3 to 0.8. Compression algorithms also produce results with high entropy. A study of the entropy across a firmware image, therefore, can reveal encrypted or compressed sections.

The binwalk firmware analysis tool has an entropy study function, which produces a result like the following:

$ binwalk -E IoTGoat-raspberry-pi2.img

0             0x0             Falling entropy edge (0.002664)
4718592       0x480000        Falling entropy edge (0.833424)
4997120       0x4C4000        Falling entropy edge (0.837713)
5095424       0x4DC000        Falling entropy edge (0.840429)
5341184       0x518000        Falling entropy edge (0.839935)
5570560       0x550000        Falling entropy edge (0.849444)
5636096       0x560000        Falling entropy edge (0.834985)
5799936       0x588000        Falling entropy edge (0.840472)
5849088       0x594000        Falling entropy edge (0.840706)
5996544       0x5B8000        Falling entropy edge (0.849569)
6275072       0x5FC000        Falling entropy edge (0.849042)
6373376       0x614000        Falling entropy edge (0.848267)
6553600       0x640000        Falling entropy edge (0.848343)
6701056       0x664000        Falling entropy edge (0.678427)

6914048       0x698000        Rising entropy edge (0.965015)
6930432       0x69C000        Falling entropy edge (0.619229)
7356416       0x704000        Falling entropy edge (0.831099)
7487488       0x724000        Falling entropy edge (0.842073)
7585792       0x73C000        Falling entropy edge (0.836944)
7667712       0x750000        Falling entropy edge (0.593631)
7798784       0x770000        Falling entropy edge (0.667160)
12058624      0xB80000        Rising entropy edge (0.950634)
12075008      0xB84000        Falling entropy edge (0.560117)
29360128      0x1C00000       Rising entropy edge (0.998248)

In the terminal, the directions where the rising and falling edges of the entropy are located are shown, which can be useful to delimit the sections. The graph shows several sections of unencrypted information at the beginning and a section of encrypted or compressed information at the end.

La extracción del sistema de ficheros es clave en la auditoría de seguridad

If the high entropy section, which, according to binwalk results, starts at address 0x1C00000, is accessed with the hexedit hex editor, the following data are found:

filesystem hexdump

The signature hsqs5, which has already been detected in the string search, indicates a squashfs file system at that address, while 7zXZ, a few lines further on, indicates compressed information in xz format. It is therefore not an encrypted region, but a compressed one.

The following example shows an entropy study for an encrypted firmware:

$ binwalk -E firmware

0                        0x0                            Rising entropy edge (0.971675)
4716544           0x47F800                Rising entropy edge (0.976452)

high entropy graph

In this case, on the one hand, we find only regions of high entropy barely separated from each other. In hexedit, at address 0x0, we find some unencrypted information preceding a region of random information, but no recognizable signature:

unencrypted low entropy dump without signature

At address 0x47F800, a similar situation is found:

IoT firmware entropy change section

These cases indicate an encrypted section in the firmware. To resolve this and access the information they contain, further investigation into the manufacturer, encryption formats it may use, leaked keys and previous versions of the firmware will be helpful. In some cases, these versions are unencrypted and can provide a lot of information about how the device works, including the encryption it uses.

In more complex cases, you should wait for the dynamic and runtime analysis phases for more information.

Extracting the filesystem

Depending on the type of file system found in the firmware, different tools will be required to extracting the filesystem.

The binwalk tool attempts to automate the detection and extraction process for most file systems commonly found in firmware:

$ binwalk firmware

With this command, code, files, and file systems contained in the firmware sections can be obtained according to the binwalk engine. To do this, the tool traverses the image looking for matches with magic numbers, signatures and strings identifying sections within the firmware. The following is the result for the IoTGoat sample firmware:

$ binwalk IoTGoat-raspberry-pi2.img

4253711       0x40E80F        Copyright string: “copyright does *not* cover…

4329472       0x421000        ELF, 32-bit LSB executable, version 1 (SYSV)
4762160       0x48AA30        AES Inverse S-Box
4763488       0x48AF60        AES S-Box

12061548      0xB80B6C        gzip compressed data, maximum compression, from Unix, last modified: 1970-01-01 00:00:00 (null date)
12145600      0xB953C0        CRC32 polynomial table, little endian
12852694      0xC41DD6        xz compressed data

29360128      0x1C00000       Squashfs filesystem, little endian, version 4.0, compression:xz, size: 3946402 bytes, 1333 inodes, blocksize: 262144 bytes, created: 2019-01-30 12:21:02

In this case, binwalk detects several compressed files and a SquashFS file system, which matches the previously detected signatures. Binwalk also has an automatic extraction function, which, while scanning the contents of the firmware, tries to extract them. This is achieved with the following command:

$ binwalk -e firmware

The -e option extracts the contents. The results are stored in _firmware/filesystem_type, where filesystem_type is the type of filesystem the tool has found.

binwalk can find and extract squashfs, ubifs, romfs, rootfs, jffs2, yaffs2, cramfs and initramfs systems, but, due to the signature-based analysis method and the use of different tools for each filesystem, false positives are also frequent. These are especially frequent with short signatures, of 1 or 2 bytes, which can appear in a firmware without this meaning that a section with this format has been found, so you should always check the binwalk results using a hex editor, such as hexedit, to inspect the area where the signature has been detected, especially if they do not match the information collected previously.

Also, binwalk can sometimes introduce errors when attempting to extract a section of the firmware, so if attempting to unzip or mount the archive results in formatting errors, it is useful to perform a manual extraction with the dd tool and unzip or mount the file system with the appropriate tool, as explained below.

For example, on a firmware extracted from another IoT device, binwalk yields the following result:

$ binwalk firmware.bin

5107699       0x4DEFF3        MySQL MISAM compressed data file Version 8
8532033       0x823041        Intel x86 or x64 microcode, sig 0xfc208000, pf_mask 0xf0c100, 1C18-01-30, rev 0x22000000, size 1796
8951861       0x889835        bix header, header size: 64 bytes, header CRC: 0x79079084, created: 1970-01-01 04:59:12, image size: 33591409 bytes, Data Address: 0x10183013, Entry Point: 0x102001F, data CRC: 0xE0208000, image type: Binary Flat Device Tree Blob image name: “”

The first signature indicates a MySQL MISAM compressed data file, which is suspicious, both because of its location and its signature of only three bytes (0xFE 0xFE 0xFE 0x07). If you access the address with hexedit, you can see that the format of the preceding and following bytes does not correspond to that of such a document:

MySQL MISAM wrong signature

MySQL MISAM byte structure

This type of error is very common. It is also possible that binwalk doesn’t have a modified signature registered for a common file type. In these cases, looking for the device manufacturer’s own signatures can be very useful. It may also happen that it is unable to extract a section but is able to detect its location in the firmware. For these cases, you can use binwalk information or manufacturer-specific information about formats to manually extract the section containing the file system with dd:

$ binwalk firmware.img

0             0x0             uImage header, header size: 64 bytes, header CRC: 0x4EA03918, created: 2017-07-20 02:34:00, image size: 6164416 bytes, Data Address: 0x80000000, Entry Point: 0x80294000, data CRC: 0x8D40BD44, OS: Linux, CPU: MIPS, image type: OS Kernel Image, compression type: lzma, image name: “Linux Kernel Image-al-2.32”
64            0x40            LZMA compressed data, properties: 0x5D, dictionary size: 33554432 bytes, uncompressed size: 2818364 bytes
851968        0xD0000         Squashfs filesystem, little endian, non-standard signature, version 3.0, size: 5309286 bytes, 781 inodes, blocksize: 65536 bytes, created: 2017-07-20 02:33:58

$ dd if=firmware.img of=squashfs bs=1 skip=851968 count=5309286
5309286+0 records in
5309286+0 records out
5309286 bytes (5,3 MB, 5,1 MiB) copied, 7,25504 s, 732 kB/s
dd if=firmware.img of=squashroot bs=1 skip=851968   0,78s user 6,40s system 98% cpu 7,259 total

With the file system section separated, the appropriate tool must be used to extract the files.

For the squashfs format, the unsquashfs or sasquatch tools, available on Linux systems, can be used to decompress the file system:

$ sasquatch squashroot
SquashFS version [3.0] / inode count [781] suggests a SquashFS image of the same endianess
Non-standard SquashFS Magic: shsq
Parallel unsquashfs: Using 1 processor
Trying to decompress using default gzip decompressor…
Trying to decompress with lzma…
Detected lzma compression
688 inodes (901 blocks) to write

[=================================================================/] 901/901 100%

created 533 files
created 93 directories
created 155 symlinks
created 0 devices
created 0 fifos

Other tools for common formats are:

  • cpio for cpio formats.
  • jefferson for jffs2 formats.
  • uncramfs or cramfsck for cramfs formats.

As a result, you get the file system in a directory like squashfs-root.

It is also possible to find firmware images that directly contain partition tables with embedded file systems. This can occur on devices that require the use of systems such as FAT, NTFS, or ext. To detect this case, the fdisk tool is useful:

$ fdisk -l IoTGoat-raspberry-pi2.img
Disco IoTGoat-raspberry-pi2.img: 31,76 MiB, 33306112 bytes, 65051 sectors
Units: sectors de 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
E/S size (minimum/optimum): 512 bytes / 512 bytes
Disc label type: two
Disc identifier: 0x5452574f

Disposit.                        Start Start End Sectors Size Id Type
iotgoat/IoTGoat-raspberry-pi2.img1 *          8192  49151    40960    20M  c W95 FAT32 (LBA)
iotgoat/IoTGoat-raspberry-pi2.img2           57344 581631   524288   256M 83 Linux

In the IoTGoat example image, you can see a partition table with two file systems directly contained in the firmware: a FAT32 partition and a partition with the Linux system image.

The kpartx tool can be used to create virtual devices (loop devices) for the partitions contained in the table. To create devices with the partitions in the firmware, use the -a option:

$ sudo kpartx -a IoTGoat-raspberry-pi2.img
device-mapper: reload ioctl on loop0p2 (254:1) failed: Invalid argument
create/reload failed on loop0p2
$ lsblk -f
├─loop0p1 vfat     FAT16       78D2-382B
├─loop0p1 vfat     FAT16       78D2-382B
└─loop0p2 squashfs 4.0

Although there is an error with partition p2, 2 loop devices are created: loop0p1 and loop0p2. These partitions can be mounted in the directory tree with the mount tool:

$ sudo mount /dev/mapper/loop0p1 /mnt/iotgoat/fat
$ ls -l /mnt/iotgoat/fat
total 9366
-rwxr-xr-x 1 root root   22493 mar 29  2020 bcm2709-rpi-2-b.dtb
-rwxr-xr-x 1 root root   23588 mar 29  2020 bcm2710-rpi-3-b.dtb
-rwxr-xr-x 1 root root   23707 mar 29  2020 bcm2710-rpi-3-b-plus.dtb
-rwxr-xr-x 1 root root   22342 mar 29  2020 bcm2710-rpi-cm3.dtb
-rwxr-xr-x 1 root root   52116 mar 29  2020 bootcode.bin
-rwxr-xr-x 1 root root     133 mar 29  2020 cmdline.txt
-rwxr-xr-x 1 root root   30725 mar 29  2020 config.txt
-rwxr-xr-x 1 root root   18693 mar 29  2020 COPYING.linux
-rwxr-xr-x 1 root root    2622 mar 29  2020 fixup_cd.dat
-rwxr-xr-x 1 root root    6695 mar 29  2020 fixup.dat
-rwxr-xr-x 1 root root 5817564 mar 29  2020 kernel.img
-rwxr-xr-x 1 root root    1494 mar 29  2020 LICENCE.broadcom
drwxr-xr-x 2 root root   10240 mar 29  2020 overlays
-rwxr-xr-x 1 root root  678372 mar 29  2020 start_cd.elf
-rwxr-xr-x 1 root root 2864164 mar 29  2020 start.elf

When trying to mount the second partition, an error like the following occurs:

$ sudo mount /dev/loop0p2 /mnt/iotgoat/squashfs
mount: /mnt/iotgoat/squashfs: /dev/loop0p2 ya está montado o el punto de montaje está ocupado.
dmesg(1) may have more information after failed mount system call.
$ sudo dmesg | grep -v audit | tail
[ 7453.070938] loop2: detected capacity change from 0 to 7707
[ 8259.648960] /dev/loop0p2: Can’t open blockdev
[ 8281.520899] /dev/loop0p2: Can’t open blockdev
[ 8304.153145] loop0: detected capacity change from 0 to 65051
[ 8304.171992]  loop0: p1 p2
[ 8304.172392] loop0: p2 size 524288 extends beyond EOD, truncated
[ 8304.240350] device-mapper: table: 254:1: loop0 too small for target: start=57344, len=524288, dev_size=65051
[ 8304.240355] device-mapper: core: Cannot calculate initial queue limits
[ 8304.240357] device-mapper: ioctl: unable to set up device queue for new table.
[ 8316.660386] /dev/loop0p2: Can’t open blockdev

In this case, it has been detected that there is a problem with the size of the squashfs partition that prevents mounting it as a loop device. However, if this partition is extracted to an archive, as described in previous sections, the archive can be mounted with the squashfuse tool:

$ sudo squashfuse -d squashfs /mnt/iotgoat/squashfs
FUSE library version: 2.9.9
nullpath_ok: 0
nopath: 0
utime_omit_ok: 0
unique: 2, opcode: INIT (26), nodeid: 0, insize: 104, pid: 0
INIT: 7.36
INIT: 7.19
unique: 2, success, outsize: 40

In another terminal, you can see the result::

$ sudo ls -l /mnt/iotgoat/squashfs
total 1
drwxr-xr-x  2 root root   0 ene 30  2019 bin
drwxr-xr-x  2 root root   0 ene 30  2019 dev
-rwxrwxrwx  1 root root 797 ene 30  2019
drwxr-xr-x 18 root root   0 ene 30  2019 etc
drwxr-xr-x 11 root root   0 ene 30  2019 lib
drwxr-xr-x  2 root root   0 ene 30  2019 mnt
drwxr-xr-x  2 root root   0 ene 30  2019 overlay
drwxr-xr-x  2 root root   0 ene 30  2019 proc
drwxr-xr-x  2 root root   0 ene 30  2019 rom
drwxr-xr-x  2 root root   0 ene 30  2019 root
drwxr-xr-x  2 root root   0 ene 30  2019 sbin
drwxr-xr-x  2 root root   0 ene 30  2019 sys
drwxrwxrwt  2 root root   0 ene 30  2019 tmp
drwxr-xr-x  7 root root   0 ene 30  2019 usr
lrwxrwxrwx  1 root root   3 ene 30  2019 var -> tmp
drwxr-xr-x  4 root root   0 ene 30  2019 www

This filesystem mount can also be performed for other formats by creating a loop device, either with kpartx or other tools such as losetup or directly, mount, and mounting the result at a point in the directory tree.

There are also certain cases where the manufacturer modifies the signatures and format of a file system to adapt it to their devices or to obfuscate it to make analysis more difficult. In these cases, automatic tools such as binwalk will probably not be able to obtain consistent results and a manual analysis of the file will be necessary.

The data obtained about the manufacturer during the previous phases can be of great help, as well as the analysis of the code that may have been found in the firmware. In some cases, there are forums specialized in a type of IoT devices where you can find information discovered by other researchers and even extraction tools, although it is not common.

After the work of analyzing and extracting the filesystem hosted in the firmware, it is possible to move on to the phase of analyzing its contents, where the operation and internal characteristics will be analyzed from a static point of view.


As we have seen, analyzing and extracting the filesystem is a fundamental phase in the analysis of the firmware of a device. One of the steps that can be carried out when conducting an IoT security audit.

There are different formats that can contain a file system in a firmware image. The most popular are squashfs and cramfs systems, but it is also common to find jffs2, ubifs, rom, cpio or compressed files. It is also possible to find, in some cases, file system images directly embedded in the firmware.

To analyze and extracting the filesystem, automatic tools such as binwalk are very useful, but it should be noted that they often fail and the results must be checked manually, with other tools such as file, strings, hexedit, dd and fdisk.

In cases where the firmware contains encrypted sections, it will be necessary to further investigate the manufacturer and the sections in clear or wait for the dynamic and runtime analysis phases. The results of this stage of the process will be of great help for the subsequent analysis, so it is always worthwhile to extract as much information as possible.




More articles in this series about OWASP

This article is part of a series of articles about OWASP

  1. OWASP methodology, the beacon illuminating cyber risks
  2. OWASP: Top 10 Web Application Vulnerabilities
  3. IoT and embedded devices security analysis following OWASP
  4. OWASP FSTM, stage 1: Information gathering and reconnaissance
  5. OWASP FSTM, stage 2: Obtaining IOT device firmware
  6. OWASP FSTM, stage 3: Analyzing firmware
  7. OWASP FSTM, stage 4: Extracting the filesystem
  8. OWASP FSTM, stage 5: Analyzing filesystem contents
  9. OWASP FSTM step 6: firmware emulation
  10. OWASP FSTM, step 7: Dynamic analysis
  11. OWASP FSTM, step 8: Runtime analysis