« Back to home

PNG Steganography from First Principles

Steganography is experiencing a revival as a wrapper for delivering payloads. Like most things Red Teaming, what’s old is new again, and we’re closely following behind the trend of several threat actors out there using stego for payload hosting. Last year Worok were found to be using LSB encoding in PNG’s to hide payloads. Similarly, APT10’s use of a BMP to hide a backdoor payload was found embedded in an old Windows logo and hosted on GitHub.

But when searching through posts on how steganography can be ported into your tooling, you’d be forgiven for getting a bit lost. From posts which start “git clone X tool from GitHub”, or “apt-get this tool on Kali Linux”, and even tools which throw a base64 encoded string into a PNG and call it a day.. there is a mix of quality in the resources available.

Similarly for defenders, knowing how attackers may be hiding payloads within images is becoming more crucial as the trend continues. And if you’re anything like me, not knowing the byte-level details when applying a technique can be frustrating!

So, to satisfy my own curiosity and hopefully share some knowledge along the way, in this post we’ll go back to basics and show just how steganography can be applied to a PNG image using the common least significant bit (LSB) encoding technique. No magic… just raw information… and a little C++.

How is a PNG structured

To begin, let’s look at how a PNG file is actually structured. As with most file formats, everything starts with a magic signature. In a PNG file, that signature looks like this:

Following this header, we find a list of chunks which make up the rest of a PNG.

PNG Chunk

A chunk in a PNG has the following pseudo-layout:

struct Chunk {
  uint32_t length;
  uint32_t type;
  uint8_t data[LENGTH];
  uint32_t crc;
};

The fields are thankfully quite self-explanatory. The length attribute is the size of the data contained in the data field. The type is the type of chunk associated within the data field, and the crc is the CRC32 checksum of the data contained in the chuck (excluding the header).

PNG Types

As shown above, each chunk is associated with a type, which indicates the data stored in the data field.

It should be mentioned that the type field contains a few flags. We won’t dig into the specifics of the flags in this post as we’ll be parsing an existing image with the fields already set (and simply passing through any chunks which aren’t useful for us in our quest). But if you’re interested, the specification contains the supported flags here.

Some critical types that we’ll want to be aware of are:

IHDR - Contains information about the PNG file data layout, such as the width and height of the image, and number of bits per pixel
IDAT - Contains the raw pixel data
PLTE - Palette entries referenced within the IDAT chunk (when supported)
IEND - Indicates the end of the PNG file

If you dig through some of the various tools available, you might find that they embed payloads by adding a new chunk with a type such as an iTXt, and simply add base64 encoded data. Apart from the obvious disadvantages when faced with the humble strings command, we want to understand the PNG format further, so we’re going to avoid this technique in this post. Instead, we’ll be focusing on adding our data to the pixel data stored within a IDAT.

Let’s start by pulling apart each chunk type so we know how to parse a PNG from scratch.

IHDR Chunk

The IHDR chunk provides important information about the structure of pixels located within a PNG, and has the following layout:

typedef struct _IHDR {
  uint32_t width;
  uint32_t height;
  uint8_t bitDepth;
  uint8_t colorType;
  uint8_t compressionMethod;
  uint8_t filterMethod;
  uint8_t interlaceMethod;
} Chunk_IHDR;

The fields within the IHDR chunk are:

width - The number of pixels wide that the image is
height - The number of pixels high that the image is
bitDepth - The number of bits per pixel colour channel
colorType - The format of data used for each pixel
compressionMethod - The type of compression used for the IDAT chunks. Currently this needs to be set to 0 as only deflate compression is supported
filterMethod - The method of filtering used by each scanline in an IDAT chunk. This needs to be set to 0 as only adaptive filtering is supported which we’ll discuss later
interlace - Set to 0 for no-interlacing, or 1 for interlacing

Let’s explore a few of these further to fully understand their purpose.

The colorType field has several possible values, and determines what a pixel value actually represents when displayed.

As an example, if we set the value to 0, each pixel in the IDAT chunk will be a grayscale pixel value, with each pixel being of bitDepth bits.

If the value is set to 6, each pixel in the IDAT chunk will be a Red channel value (of bitDepth bits), then a Green channel value, a Blue channel value, and finally an Alpha value.

A more comprehensive list of colour types can be found in the specification:

With the information taken from a IHDR chunk, we can tell if a chosen source image will be large enough to store our payload. For example, if we have an image with the dimensions of 1024 x 768, a colour type of 6, a bit depth of 8, and we are using the LSB of each pixel to encode our payload, we can store:

1024 * 768 * 4 = 3145728 bits = 393216 bytes = 384 kb

Note that the resulting 1024 x 768 image won’t need to be ~3mb in size.. compression will play a factor as we’ll see later, but the amount of data we can store in a relatively small image is quite surprising.

PLTE Chunk

The palette chunk contains an array of 3-byte entries in the form Red Green Blue and is used when the colour type is set to 3.

For the purposes of this post, we will avoid using images which utilise a pallet. The reason is because we’re going to focus on using LSB encoding in the IDAT chunk, and modifying an index into a pallet is likely to distort and/or destroy the source image we are looking to masquerade as. That being said, there are several techniques out there for using a PLTE chunk for hiding data, feel free to share your techniques!

If you’re interested in how a PNG uses a pallet for optimisation, check out this post.

IEND Chunk

The IEND chunk marks the end of the PNG. It holds no data, so its structure is simply the chunk header consisting of the IEND type and 0 for the length. Once we reach this chunk, processing of the PNG stops.

IDAT Chunk

The IDAT chunk is what encapsulates the raw pixel data in a PNG file and is where we’ll be spending most of our time hiding our warez.

The layout of the IDAT is determined by the IHDR chunks bitDepth and colorType fields as shown in the IHDR section. There is also a further value we need to be aware of, and that’s the “scanline” length.

A scanline is a single row of pixels in an image. For example, if we have a colorType of 0 and a bitDepth of 8, an image 100 pixels wide has an scanline length of:

100 * 1 * 8 / 8 = 100 bytes

Or an image with a width of 100 pixels and a colorType of 6, with a bitDepth of 8 will have a scanline length of:

100 * 4 * 8 / 8 = 400 bytes

Here we can see how these values can dramatically impact the amount of hidden data that we can store in an image.

So now we know about what a scanline is, how is an IDAT chunk’s data arranged? Well in this example we’ll use a colorType of 6 and bitDepth of 8. If we have an image with the dimensions of 4 x 4, the image IDAT chunk will have the following layout:

Next let’s look at that filter value at the beginning of each scanline.

IDAT Filters

Each scanline is proceeded by a filter value which determines the layout of a scanline. The potential types of filters are:

0 - No Filter
1 - Sub Filter
2 - Up Filter
3 - Average Filter
4 - Paeth Filter

Filters are applied to scanlines to increase the effectiveness of compression. Several different filters can be applied to sequential scanlines, further increasing the effectiveness of compression by formatting pixel data in an optimised way.

Let’s look at each filter to see what we’re dealing with.

Sub Filter

The Sub Filter takes the value of a pixel’s colour channel value and subtracts it from the previous colour channel value (or 0 if the offset is beyond the bounds of the image).

The function defined in the specification is:

Sub(x) = Raw(x) - Raw(x-bpp)

Here, bpp means bytes per pixel. This is used to ensure that red channel values are calculated against other red channel values, green against green etc.. This makes sense as it’s more likely that same colour channels will have similar values than different channels.

To visualise this, let’s take a look at this somewhat awkward scanline to give you a sense of what happens:

To apply the sub filter to this scanline, we’d subtract each pixel’s colour channel value from the previous value. This would give us a filtered result of:

Again, note that the first 4 values don’t change as there are no previous values to subtract from, so Raw(x-bpp) is 0.

Up Filter

The Up Filter is similar to Sub, however we transform the scanline by subtracting the pixel value from the previous scanline’s corresponding pixel value.

First the specification function:

Up(x) = Raw(x) - Prior(x)

Again, if we take the following two sequential scanlines to see what happens:

After applying this filter to the second scanline, we’d get something line this:

If our scanline is the first in the image, and we apply the Up filter, our values will stay exactly the same, as Prior(x) will be 0.

Average Filter

The average filter involves both the previous scanline and proceeding pixel values in its calculations.

Average(x) = Raw(x) - floor((Raw(x-bpp)+Prior(x))/2)

Again let’s take the following example:

If we apply the Average filter to the second scanline, we’d get:

Paeth Filter

The Paeth filter has more steps than the previous examples, with the filter being defined as:

Paeth(x) = Raw(x) - PaethPredictor(Raw(x-bpp), Prior(x), Prior(x-bpp))

This uses the PaethPredictor function:

function PaethPredictor (a, b, c)
  begin
       ; a = left, b = above, c = upper left
       p := a + b - c        ; initial estimate
       pa := abs(p - a)      ; distances to a, b, c
       pb := abs(p - b)
       pc := abs(p - c)
       ; return nearest of a,b,c,
       ; breaking ties in order a,b,c.
       if pa <= pb AND pa <= pc then return a
       else if pb <= pc then return b
       else return c
  end

The purpose of the Paeth filter is to find the best value to subtract based on the surrounding pixel values.

Taking two scanlines with the values:

If we apply the Paeth filter, we’d get:

We’ll show some C code later which will make this hopefully a bit clearer.

IDAT Compression

Once the filters are applied to each IDAT scanline, compression is then used to shrink the size of the image. The only compression method currently supported by PNG is “deflate/inflate”.

We will be using the zlib library to do this stage for us. One thing to note here is that it is possible for a PNG to contain several IDAT chunks. Each chunk is a continuation of the last, meaning that for handling decompression we feed each IDAT into the decompression function as a continuation.

Hiding Our Data

Now we understand the layout of a PNG, and where we want to hide our data, let’s start putting some code together.

The full code that we’ll be referencing in this post can be found here. I recommend cloning the repo and following along with the code to help understand the bits below. We won’t go through the code line-by-line, instead I’ll focus on the areas which I think may be worth expanding on.

The encoder we are building will take an image as a surrogate, along with some data to hide.

First up we perform a basic sanity check to make sure the header of the surrogate image is as expected. This is a case of checking the magic signature first:

#define PNG_SIGNATURE 0xa1a0a0d474e5089

char* PNGStego::encode(char *data, int dataLen, int *outLength) {

...

  if (*(uint64_t *)inBuffer != PNG_SIGNATURE) {
    return NULL;
  }
  
...

If everything looks OK, we start to iterate through the chunks of data:

#define IHDR 0x52444849
#define PLTE 0x45544c50
#define IDAT 0x54414449
#define IEND 0x444e4549

  // Iterate through PNG chunks
  while (inBufferOffset < this->fileLength) {

    // Each chunk starts with a 4 byte length and 4 byte type
    // followed by the data (of length bytes) and a 4 byte CRC
    pngChunkLength = LONG_BITSWAP(*(uint32_t *)(inBuffer + inBufferOffset));
    pngChunkType = *(uint32_t *)(inBuffer + inBufferOffset + PNG_LENGTH_FIELD_SIZE);

    switch(pngChunkType) {
      case IHDR:

        printf("[*] IHDR Chunk\n");
        outBuffer = processIHDR(inBuffer, &inBufferOffset, outBuffer, &outBufferOffset);
        break;

      case IDAT:

        printf("[*] IDAT Chunk\n");
        outBuffer = processIDAT(inBuffer, &inBufferOffset, outBuffer, &outBufferOffset);
        break;
      
      case IEND:

        printf("[*] IEND Chunk\n");
        outBuffer = processIEND(inBuffer, &inBufferOffset, outBuffer, &outBufferOffset);
        break;

      default:

        printf("[*] Unknown Chunk: %c%c%c%c\n", pngChunkType & 0xFF, (pngChunkType >> 8) & 0xFF, (pngChunkType >> 16) & 0xFF, (pngChunkType >> 24) & 0xFF);
        outBuffer = processUnknownChunk(inBuffer, &inBufferOffset, outBuffer, &outBufferOffset);
        break;

    }
  }

The first chunk that we look for is the IHDR chunk, which we will use to perform some sanity checking and calculate the size of scanlines. We’ll also make sure that we can handle the incoming image:

char* PNGStego::processIHDR(char *input, int *inputOffset, char *output, int *outputOffset) {
...
  if (ihdr->compressionMethod != 0) {
    printf("[!] Error: Compression method must be 0\n");
    return NULL;
  }

  if (ihdr->filterMethod != 0) {
    printf("[!] Error: Filter method must be 0\n");
    return NULL;
  }

  if (ihdr->interlaceMethod != 0) {
    // Not currently implemented, because CBA until I see it's needed
    printf("[!] Error: Interlace method must be 0\n");
    return NULL;
  }

We also confirm the colour type used by the PNG, as this is going to feed into the amount of data that we can store and just how we store it:

switch(ihdr->colorType) {
  case 0:
    printf("[*] IHDR Color Type: Grayscale\n");
    this->scanlineLength = LONG_BITSWAP(ihdr->width) + 1;
    this->bytesPerPixel = (1 * ihdr->bitDepth) / 8;
    break;

  case 2:
    printf("[*] IHDR Color Type: RGB\n");
    this->scanlineLength = LONG_BITSWAP(ihdr->width) * 3 + 1;
    this->bytesPerPixel = (3 * ihdr->bitDepth) / 8;
    break;

  case 3:
    printf("[*] IHDR Color Type: Palette\n");
    printf("[!] Pallate not currently supported so results may be screwy!\n");
    this->scanlineLength = LONG_BITSWAP(ihdr->width) + 1;
    this->bytesPerPixel = (1 * ihdr->bitDepth) / 8;
    break;

  case 4:
    printf("[*] IHDR Color Type: Grayscale + Alpha\n");
    this->scanlineLength = LONG_BITSWAP(ihdr->width) * 2 + 1;
    this->bytesPerPixel = (2 * ihdr->bitDepth) / 8;
    break;

  case 6:
    printf("[*] IHDR Color Type: RGB + Alpha\n");
    this->scanlineLength = LONG_BITSWAP(ihdr->width) * 4 + 1;
    this->bytesPerPixel = (4 * ihdr->bitDepth) / 8;
    break;
}

And of course, we work out the maximum data size that the image can hold for us:

if ((this->width * this->height * this->bytesPerPixel) < this->inputDataLength) {
  printf("[!] Error: Payload is too large to fit in the image\n");
  exit(1);
}

Once everything checks out, we then start to parse the IDAT chunks which will hold the data we want hide our payload within. To do this we decompress the data using zlib. As it’s possible to have multiple IDAT chunks, the decompression will be streamed, running until we have our final IDAT chunk.

The code below shows just how we handle incoming IDAT chunks, passing each into a decompression method and storing the result for later. We do this as we want to gather every IDAT chunk in the PNG before we start to process the image:

char* PNGStego::processIDAT(char *input, int *inputOffset, char *output, int *outputOffset) {

...

  // Decompress the data
  this->compression->decompress((unsigned char *)chunkData, chunkLength, (this->width * this->height * this->bpp) + this->height, [this](char *decompressedData, int decompressedLength) {

    // Copy the decompressed data for later processing
    printf("[*] Decompressed Length: %d bytes\n", decompressedLength);
    
    this->uncompressedData = (char *)realloc(this->uncompressedData, this->uncompressedDataLength + decompressedLength);
    memcpy(this->uncompressedData + this->uncompressedDataLength, decompressedData, decompressedLength);
    this->uncompressedDataLength += decompressedLength;

  });
...

Once we encounter the IEND chunk, we know that we’re done pulling out any further IDAT chunks, so we can move onto unfiltering each scanline.

Each unfilter method is just the reverse of the above filter methods that we discussed. The first unfilter method we define is the sub filter:

// Unfilters a scanline using the sub filter type
// scanline - the scanline to unfilter
// length - the length of the scanline
void PNGStego::unfilterSub(unsigned char *scanline, int length) {
  int i;
  unsigned char previousPixel;

  for (i = 0; i < length; i++) {
    if (i < bpp) {
      previousPixel = 0;
      scanline[i] += previousPixel;
    } else {
      previousPixel = scanline[i - bpp];
      scanline[i] += previousPixel;
    }
  }
}

The second unfilter method we define is the up filter:

// Unfilters a scanline using the up filter type
// scanline - the scanline to unfilter
// previousScanline - the previous scanline
// length - the length of the scanline
void PNGStego::unfilterUp(unsigned char *scanline, unsigned char *previousScanline, int length) {
  int i;
  unsigned char previousPixel;

  for (i = 0; i < length; i++) {
    previousPixel = previousScanline[i];
    scanline[i] += previousPixel;
  }
}

The third is for the Average filter:

// Unfilters a scanline using the average filter type
// scanline - the scanline to unfilter
// previousScanline - the previous scanline
// length - the length of the scanline
void PNGStego::unfilterAverage(unsigned char *scanline, unsigned char *previousScanline, int length) {
  int i;
  unsigned char previousPixel;
  unsigned char previousPixelUp;

  for (i = 0; i < length; i++) {
    if (i < bpp) {
      previousPixel = 0;
      previousPixelUp = previousScanline[i];
      scanline[i] += (previousPixel + previousPixelUp) / 2;
      
    } else {
      previousPixel = scanline[i - bpp];
      previousPixelUp = previousScanline[i];
      scanline[i] += (previousPixel + previousPixelUp) / 2;
    }
  }
}

And last is the Paeth filter:

// Unfilters a scanline using the paeth filter type
// scanline - the scanline to unfilter
// previousScanline - the previous scanline
// length - the length of the scanline
void PNGStego::unfilterPaeth(unsigned char *scanline, unsigned char *previousScanline, int length) {
  int i;
  unsigned char previousPixel;
  unsigned char previousPixelUp;
  unsigned char previousPixelUpLeft;

  for (i = 0; i < length; i++) {
    if (i < bpp) {
      previousPixel = 0;
      previousPixelUp = previousScanline[i];
      previousPixelUpLeft = 0;
      scanline[i] += this->paethPredictor(previousPixel, previousPixelUp, previousPixelUpLeft);
    } else {
      previousPixel = scanline[i - bpp];
      previousPixelUp = previousScanline[i];
      previousPixelUpLeft = previousScanline[i - bpp];
      scanline[i] += this->paethPredictor(previousPixel, previousPixelUp, previousPixelUpLeft);
    }
  }
}

// Returns the paeth predictor for the given values
unsigned char PNGStego::paethPredictor(unsigned char a, unsigned char b, unsigned char c) {
  int p = a + b - c;
  int pa = abs(p - a);
  int pb = abs(p - b);
  int pc = abs(p - c);

  if (pa <= pb && pa <= pc) {
    return a;
  } else if (pb <= pc) {
    return b;
  } else {
    return c;
  }
}

With the unfilter methods defined, we iterate through each scanline of the decompressed IDAT chunks. We determine the filter to use based on the first byte in the scanline, and unfilter the scanline to access the original raw pixel data (we’ll also update the filter value to show that we have removed the existing filtering):

...
switch(this->uncompressedData[i]) {
      case 0:
        printf("Scanline Filter: None\n");
        break;

      case 1:
        printf("Scanline Filter: Sub\n");
        this->unfilterSub((unsigned char *)this->uncompressedData + i + 1, this->scanlineLength - 1);
        this->uncompressedData[i] = 0;
        break;

      case 2:
        printf("Scanline Filter: Up\n");
        this->unfilterUp((unsigned char *)this->uncompressedData + i + 1, (unsigned char *)this->uncompressedData + i + 1 - this->scanlineLength, this->scanlineLength - 1);
        this->uncompressedData[i] = 0;
        break;

      case 3:
        printf("Scanline Filter: Average\n");
        this->unfilterAverage((unsigned char *)this->uncompressedData + i + 1, (unsigned char *)this->uncompressedData + i + 1 - this->scanlineLength, this->scanlineLength - 1);
        this->uncompressedData[i] = 0;
        break;

      case 4:
        printf("Scanline Filter: Paeth\n");
        this->unfilterPaeth((unsigned char *)this->uncompressedData + i + 1, (unsigned char *)this->uncompressedData + i + 1 - this->scanlineLength, this->scanlineLength - 1);
        this->uncompressedData[i] = 0;
        break;

      default:
        printf("Scanline Filter: Unknown (%d)\n", *((unsigned char *)uncompressedData + i));
        break;
...

The encoding method we’re using for our payload is LSB, where we’ll set the least significant bit to a 1 or 0 depending on our input payload:

void PNGStego::encodeDataIntoScanline(unsigned char *scanline, int scanlineLength) {
  int i;
  unsigned char *current;
  bool eof = false;

  if (!this->bitIterator->hasNext()) {
    return;
  }

  for (i = 0; i < scanlineLength; i++) {
    char bit = this->bitIterator->getNextBit(1, eof);
    if (eof) {
      return;
    }

    if (bit == 1) {
      scanline[i] |= 1;
    } else {
      scanline[i] &= 0xFE;
    }
  }
}

So, what happens when we run this encoder against a surrogate image?

On the left we have the original image, and on the right, we have the same image encoded with our payload. And to the unsuspecting viewer, these two images appear to look same.

It’s also wise at this point to run the output image via a few testing tools to make sure we haven’t corrupted anything while recreating the image. Using pngcheck, we see that everything decodes fine:

Revealing Our Data

With our image built and uploaded to some image hosting service, waiting for it’s day of reckoning to arrive, we need to be able to extract the data out upon delivery.

Unsurprisingly most of the hard work has been done for us in the encoder. As we’re performing the same decoding process to parse the PNG container, we can reuse most of the same code. The big change this time is that we’re going to be extracting the bits after we decompress and unfilter the IDAT chunks.

As each bit of our payload data is extracted from the image pixels, we use the addBit method of BitIterator to slide bits in memory as we recover them from the image:

void BitIterator::addBit(char bit) {
  int byteIndex = this->index / 8;
  int bitIndex = this->index % 8;

  if (bit) {
    this->data[byteIndex] |= (1 << (7 - bitIndex));
  } else {
    this->data[byteIndex] &= ~(1 << (7 - bitIndex));
  }

  this->index++;
}

And again, if we run this example against our generated PNG image, we’ll find that our original payload is dumped from the PNG image just fine:

I’ve added the images I used for testing in the git repo under /testdata/ so you can recreate the tests in this post.

Hopefully at this point you know just how steganography in a PNG file works beyond “run this tool”. Have phun and get creative!