Skip to content

Concurrency issue leading to corrupted uploads #8

@rzajac

Description

@rzajac

I know the title is bit vague so let me explain.

During my tests I wanted to simulate big file uploads with relatively small files so on the front end I set:

simultaneousUploads: 3 (which is I believe default setting)
`chunkSize: 2048 (which is 1024*1024 by default)

This gave me kind of simulation of multiple chunk uploads. Using the library on the back end you have to follow four steps:

  1. validateChunk()
  2. saveChunk()
  3. validateFile()
  4. save()

If you have multiple concurrent uploads this may lead to corrupted files. Lets take a case of last two chunks being uploaded where x axis is time:

chunk1:  |---validateChunk---||---saveChunk---||---validateFile---||---save---|
chunk2:               |---validateChunk---||---saveChunk---||---validateFile---|

when validateFile is called on chunk1 the saveChunk for chunk2 already begun so chunk1::validateFile will return true and proceed to save file with chunk2 not fully saved.

I saw this error in my logs already few times. The above example will also lead to double save call. And in my case it leads to duplicate key error in the database.

To fix the problem library would have to implement locking not only in save but in validateFile methods . Its not enough to call file_exists in validateFile. We have to know the chunk is fully uploaded.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions