Date: 2020jun26
Platform: web
Q. Algorithm: how to receive big multipart/form-data attachments
A. When a browser uploads a file its sent as multipart/form-data.
A boundary string is defined in the main header then its used
to separate the binary files. The size is not explicitly
stated anywhere.
https://www.google.com/search?q=multipart/form-data+example
This means as you are reading a large file (possibly gigabytes)
you need to continually check for the boundary string.
Quite difficult to do efficiently. It would be nice to
avoid keeping the big file in memory.
My solution, when only one file (last field in form) is being uploaded, is to
parse the upload body until the binary file begins. Then write
the rest (using buffered writes) into a file on storage.
The means only a few KB of your large file is in memory at a time.
So far so good. But what about the boundary?
If the file is a common type it might not actually matter.
For example a PNG file will just ignore it.
But, of course, that's untidy.
Because the boundary is only going to be the last 100 bytes (or so)
of the file I read the file backwards from the end.
And truncate just before the boundary. Pretty fast.
Don't forget to prepend "--" to the start of the boundary.
In pseudocode:
for (position = lastByte;; position--) {
if (currentChar == '\n') {
if (followingCharactersAreBoundary) {
truncateHere
Done!
}
}
}
In Java:
static boolean isBoundary(RandomAccessFile raf, final byte []bBoundary) throws IOException {
try {
for (int i = 0; i < bBoundary.length; i++) {
final byte b = raf.readByte();
if (b != bBoundary[i]) return false;
}
}
catch(EOFException ex) {
return false;
}
return true;
}
static void truncateAtBoundary(final String filename, final String strBoundaryIn) throws IOException {
final String strBoundary = "--" + strBoundaryIn;
final byte []bBoundary = strBoundary.getBytes();
RandomAccessFile raf = new RandomAccessFile(filename, "rw");
final long origLen = raf.length();
// Read backward to a newline
for (long pos = origLen - 1; pos >= 0; pos--) {
raf.seek(pos);
byte b = raf.readByte();
if (b == '\n') { // There is always a \n before the boundary
if (isBoundary(raf, bBoundary)) {
// Check for a \r before the \n (if present we want to truncate it too)
raf.seek(pos - 1);
b = raf.readByte();
if (b == '\r') {
pos--;
}
raf.setLength(pos);
break;
}
}
}
raf.close();
}