Skip to content

Accept file with no specified Content-Type #167

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jul 22, 2025

Conversation

yanis-fourel
Copy link

@yanis-fourel yanis-fourel commented May 4, 2025

When uploading a file via cURL, if it doesn't know what type the file is then it just does not include the Content-Type field in the parameter.
eg.

echo "This is my file content" |  curl -F 'file=@-' http://localhost:3000/

In this case, zap will receive a Hash_Binfile parameter but its content will be empty.
The behavior of cURL seems to be non standard, as the RFC 1867: Form-based File Upload in HTML states:

The definition of multipart/form-data is included in section 7. A
boundary is selected that does not occur in any of the data. (This
selection is sometimes done probabilisticly.) Each field of the form
is sent, in the order in which it occurs in the form, as a part of
the multipart stream. Each part identifies the INPUT name within the
original HTML form. Each part should be labelled with an appropriate
content-type if the media type is known
(e.g., inferred from the file
extension or operating system typing information) or as
application/octet-stream.

For that reason, I am proposing this patch where in case the client (cURL) did not specify the Content-Type of an uploaded file, then zap will default to "application/octet-stream".
This will allow files uploaded without specified types to still be received by applications using zap.

I tried adding unit test for that by adding a src/tests/test_recvfile.zig and a src/tests/test_recvfile_notype.zig tests, but I eventually gave up after spending most of my day failing to send an HTTP POST request with form field containing a file, using std.http.Client.fetch function.
I'd be very happy to create an automatic test, but I would need help doing so. Maybe it'd be cool if we could use cURL in the test? I didn't really want to introduce new dependencies as a first time contributor though, tell me if that'd be a good idea and I may look into it.

Full program to reproduce

const std = @import("std");
const zap = @import("zap");
const Allocator = std.mem.Allocator;

fn on_request(r: zap.Request) !void {
    r.parseBody() catch @panic("Failed to parse request body");

    if (r.getParamCount() != 1) @panic("Expected exactly one parameter");

    var alloc = std.heap.ArenaAllocator.init(std.heap.page_allocator);
    defer alloc.deinit();
    const params = r.parametersToOwnedList(alloc.allocator()) catch @panic("Failed to copy parameters");

    const value = params.items[0].value orelse @panic("Parameter has no value");

    const file = switch (value) {
        .Hash_Binfile => |f| f,
        else => @panic("Parameter is not a file"),
    };

    const content = file.data orelse @panic("File has no data");

    std.debug.print("File content:\n----\n{s}\n----\n", .{content});
}

pub fn main() !void {
    var listener = zap.HttpListener.init(.{
        .port = 3000,
        .on_request = on_request,
        .log = true,
    });
    try listener.listen();

    std.debug.print("Listening on 0.0.0.0:3000\n", .{});

    zap.start(.{
        .threads = 2,
        .workers = 2,
    });
}

Build with zap 0.10.1:

    .dependencies = .{
        .zap = .{
            .url = "git+https://github.com/zigzap/zap?ref=v0.10.1#8d187310c7ee4f86faa7ef0ecdac0bf747216dea",
            .hash = "zap-0.9.1-GoeB84M8JACjZKDNq2LA5hB24Z-ZrZ_HUKRXd8qxL2JW",
        },
    },

First request with cURL (works as expected):

$  echo "This is my file" | curl -F 'file=@-;type=text/plain' http://localhost:3000/

Wireshark capture of the relevant HTTP packet:
image
note the Content-Type: text/plain

server logs:

File content:
----
This is my file
----
127.0.0.1 - - [Sun, 04 May 2025 06:46:24 GMT] "POST / HTTP/1.1" 200 -- 351us

Second request with cURL (does not work as expected):

$ echo "This is my file" | curl -F 'file=@-' http://localhost:3000/

Wireshark capture of the relevant HTTP packet:
image
note the absence of Content-Type:text/plain

server logs:

thread 71754 panic: File has no data
/home/yanis/project/src/main.zig:21:38: 0x11056e4 in on_request (project)
    const content = file.data orelse @panic("File has no data");
[...]

@renerocksai
Copy link
Member

Hi, and thank you for this great finding and contribution. IMHO it makes sense to be pragmatic and allow curl in tests, at least until we find a way to use either Zig's std.http.Client or facil.io in client mode. The latter, unfortunately, would need to go into its own process, as facil.io, by my experience, cannot operate as both a server and a client at the same time. The only benefit of that approach would be being able to build the curl-alternative ourselves before running the test.

Curl is commonly available on all platforms that zap / facil.io targets, so I don't see a strong reason not to use it. It's better than NOT having a test.

You peeked my curiosity, though, so I'm going to try posting a file as form-field with std.http.Client.

BTW, I recently started to parse plain HTML forms (text input fields) in a different project and am not entirely happy with the user experience. I had to use zap.fio a lot with the following pattern, to get form field content out of the parameter hash map:

    // after having called try r.parseBody();
    const project = blk: {
        const fio_params = r.h.*.params;
        const key = zap.fio.fiobj_str_new("project", "project".len);
        const fio_project = zap.fio.fiobj_hash_get(fio_params, key);

        const elem = zap.fio.fiobj_ary_index(fio_project, 0);
        const project = zap.util.fio2str(elem) orelse return error.NoString;
        break :blk project;
    };

So parsing form-fields should ideally support at least non-nested hash maps with array values and needs a bit of work. I am only writing this FYI and also as a reminder to myself.

@renerocksai
Copy link
Member

renerocksai commented May 4, 2025

There you go, also available here: https://zigbin.io/cf6721

In the payload_, you can now play with including the Content-Type and leaving it out, in the test(s).

const FILE_CONTENT =
    \\# Test File
    \\
    \\This is my **test** file
    \\
;

const BOUNDARY = "---XcPmntPm3EGd9NaxNUPFFL";

pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    defer _ = gpa.deinit();

    const allocator = gpa.allocator();

    // Create a HTTP client
    var client = std.http.Client{ .allocator = gpa.allocator() };
    defer client.deinit();

    // Allocate a buffer for server headers
    var buf: [4096]u8 = undefined;

    // construct payload and header strings
    const filename = "myfile.txt"; // or "-"
    const param_name = "file";

    // note: this payload will have the wrong line endings, which we fix immediately afterwards:
    const payload_ = try std.fmt.allocPrint(allocator,
        \\--{s}
        \\Content-Disposition: form-data; name="{s}"; filename="{s}"
        \\Content-Type: text/plain
        \\
        \\{s}
        \\
        \\--{s}--
        \\
    , .{ BOUNDARY, param_name, filename, FILE_CONTENT, BOUNDARY });
    defer allocator.free(payload_);

    // make payload line endings correct
    const payload_correct_line_endings = try std.mem.replaceOwned(
        u8,
        allocator,
        payload_,
        "\n",
        "\r\n",
    );
    defer allocator.free(payload_correct_line_endings);

    const request_content_type = try std.fmt.allocPrint(
        allocator,
        "multipart/form-data; boundary={s}",
        .{BOUNDARY},
    );
    defer allocator.free(request_content_type);

    log.info("PAYLOAD:\n{s}", .{payload_correct_line_endings});
    log.info("Content-Type: {s}", .{request_content_type});
    log.info("POST-ing...", .{});

    var response_storage = std.ArrayList(u8).init(allocator);
    defer response_storage.deinit();

    const res = try client.fetch(.{
        .method = .POST,
        .payload = payload_correct_line_endings,
        // .location = .{ .url = "https://httpbun.org/payload" },
        .location = .{ .url = "http://localhost:3000" },
        .server_header_buffer = &buf,
        .headers = .{
            .content_type = .{ .override = request_content_type },
        },
        .response_storage = .{ .dynamic = &response_storage },
    });
    log.info("res = {}", .{res});
    log.debug("response body:\n{s}", .{response_storage.items});
}

const std = @import("std");
const log = std.log.scoped(.client);

@yanis-fourel
Copy link
Author

Thanks for the reply.

I'll need to grow a bit more familiar with zap and with zig in general to be able to use curl in the tests. I'm still very new to the language.
I can't make promises that I will come back and do it, I discovered the issue as part of a weekend fun side-project to learn zig. I'll try to, because it sounds like something I both want to learn and am capable of doing.

FYI, here was my attempt at writing a test that would post a file as a form-field with std.http.Client:

const std = @import("std");
const zap = @import("zap");

// set default log level to .info and ZAP log level to .debug
pub const std_options: std.Options = .{
    .log_level = .info,
    .log_scope_levels = &[_]std.log.ScopeLevel{
        .{ .scope = .zap, .level = .debug },
    },
};

const expected = "Hello, this is a test file.";
var buffer: [1024]u8 = undefined;
var read_len: ?usize = null;

fn makeRequest(a: std.mem.Allocator, url: []const u8) !void {
    var http_client: std.http.Client = .{ .allocator = a };
    defer http_client.deinit();
    var buf1: [2048]u8 = undefined;
    const tmp = try std.fmt.bufPrint(&buf1,
        \\--------------------------BFM4g5IDYqruwdsFsblXC7
        \\Content-Disposition: form-data; name="file"; filename="myfile.txt"
        \\Content-Type: text/plain
        \\
        \\{s}
        \\--------------------------BFM4g5IDYqruwdsFsblXC7--
        \\
    , .{expected.*});

    var buf2: [2048]u8 = undefined;
    const len = std.mem.replacementSize(u8, tmp, "\n", "\r\n");
    _ = std.mem.replace(u8, tmp, "\n", "\r\n", &buf2);
    const payload = buf2[0..len];

    _ = try http_client.fetch(.{
        .location = .{ .url = url },
        .method = .POST,
        .headers = .{
            .content_type = .{ .override = "multipart/form-data; boundary=--------------------------BFM4g5IDYqruwdsFsblXC7" },
        },
        .payload = payload,
    });

    zap.stop();
}

pub fn on_request(r: zap.Request) !void {
    std.debug.print("Body:\n{s}", .{r.body orelse "<null>"});

    try r.parseBody();

    if (r.getParamCount() != 1) @panic("1");

    // TODO: Use the iterator instead and don't allocate
    var alloc = std.heap.ArenaAllocator.init(std.heap.page_allocator);
    defer alloc.deinit();
    const params = try r.parametersToOwnedList(alloc.allocator());

    const value = params.items[0].value orelse @panic("2");

    const file = switch (value) {
        .Hash_Binfile => |f| f,
        else => return,
    };

    const content = file.data orelse @panic("3");

    read_len = content.len;
    @memcpy(buffer[0..read_len.?], content);
}

test "send file" {
    const allocator = std.testing.allocator;

    // setup listener
    var listener = zap.HttpListener.init(
        .{
            .port = 3002,
            .on_request = on_request,
            .log = false,
            .max_clients = 10,
            .max_body_size = 1 * 1024,
        },
    );
    try listener.listen();

    const thread = try std.Thread.spawn(.{}, makeRequest, .{ allocator, "http://127.0.0.1:3002" });
    defer thread.join();
    zap.start(.{
        .threads = 1,
        .workers = 1,
    });

    if (read_len) |rl| {
        try std.testing.expectEqual(expected.len, rl);
        try std.testing.expectEqualSlices(u8, expected, buffer[0..rl]);
    } else {
        return error.Wrong;
    }
}

I does not work though, it panics at @Panic("1").
I inspected in it Wireshark and the content of the HTTP request, it seem to me to be exactly the same as the request I do with cURL except that even wireshark fails to parse the fields of the form correctly. It's possible I don't understand how HTTP work well enough and I'm missing something.

@renerocksai
Copy link
Member

It took me a while, too. In the mime-encoded part, the boundary itself needs to be prefixed with --, and the final boundary at the end needs to be postfixed with --, too, as you already do. Also, the line endings might be important. See my example above.

So, if you like, update your test, I'm confident it will work, and push it to the PR?

@renerocksai
Copy link
Member

If you don't have time/nerves/interest do do it, I will make it work (just not now). Your PR is a really good start already! Thx again!

@yanis-fourel
Copy link
Author

yanis-fourel commented May 4, 2025

I added tests. I tried to find the best way to express them using the other tests as reference, but it still feels a bit messy.
I had another idea where we could zap.start() in another thread, then that background thread would notify the main thread the request has arrived so the main thread can do the testing, and the main thread would notify the background thread once it's done so it can complete.
In pseudo-code, it would look like:

var req: ?zap.Request = null;
var can_test: std.Thread.ResetEvent = std.Thread.ResetEvent{};
var can_finish: std.Thread.ResetEvent = std.Thread.ResetEvent{};

fn on_request(r: zap.Request) !void {
    req = r;
    can_test.set();
    can_finish.wait();
}

fn main() {
    var listener = zap.HttpListener.init(
        .{
            .port = 3003,
            .on_request = Handler.on_request,
            .log = false,
            .max_clients = 10,
            .max_body_size = 1 * 1024,
        },
    );
    try listener.listen();

    const t1 = try std.Thread.spawn(.{}, makeRequest, .{ allocator, "http://127.0.0.1:3003" });
    defer t1.join();

    const t2 = try std.Thread.spawn(.{}, start_zap, .{.{threads=1, workers=1}});
    defer t2.join();

    can_test.wait();
    defer can_finish.wait();

    const r = req.?;

    // After that point, we can simply do testing as if we were in the `on_request` function, but the error will bubble up

    try r.parseBody();
    // ...
}

However that didn't work. I am guessing zap.start needs to be ran from the main thread?
If we could have that pattern to work, we would be able to create some testing utility to just "listen to a single request and return the zap.Request object received as-is".

Anyways, I guess the tests I pushed are good enough for now :)

@yanis-fourel
Copy link
Author

@renerocksai

@renerocksai
Copy link
Member

hey, thanks and sorry for the delay. i'll merge this now. the "messy" tests are good enough i think.

@renerocksai renerocksai merged commit 35634a9 into zigzap:master Jul 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants