Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] TimeSeries Feature #192

Open
wants to merge 13 commits into
base: main
Choose a base branch
from
Open

[WIP] TimeSeries Feature #192

wants to merge 13 commits into from

Conversation

mop9047
Copy link

@mop9047 mop9047 commented Aug 8, 2024

This work in progress feature makes it possible to train time series data in your own neural network, along with the new folder in src called TimeSeries, there are also 4 new examples

@shiffman
Copy link
Member

shiffman commented Aug 8, 2024

Wow, very excited to try out these examples and learn more about this work in our meeting!

package.json Outdated Show resolved Hide resolved
src/index.js Outdated Show resolved Hide resolved
examples/timeSeries-train-quickdraw/index.html Outdated Show resolved Hide resolved
examples/timeSeries-hand-gestures/index.html Show resolved Hide resolved
examples/timeSeries-hand-gestures/index.html Outdated Show resolved Hide resolved
examples/timeSeries-stock-prediction/sketch.js Outdated Show resolved Hide resolved

// code to visualize how much ink left
function inkBar(){
datapoints = map(frame_count,0,ink_multiplier*num_seq, 0,num_seq)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ink bar is a very creative, user-facing solution to the problem! 😊

For a more general solution, I am wondering if the algorithm that Dan mentioned (best in a stand-alone helper function that takes an array of data points and returns a modified array) is a good idea to add?
My less sophisticated approach would be: recording only points where the mouse changed by a certain minimum distance from the previous point. If the number is exceeding the desired number, I'd sample every nth as needed. (Could be nice to see the actual data points that get used, similarly to how you had the green points with the hand btw.)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi professor, I am almost done with the RDP algorithm, and it works well, the code I referenced originally uses vectors, I am currently continuing this idea but do you think it might cause issues with the user's side? (i.e. they need to know use vectors only to make this work)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The vector math is merely inside the RDP function? Then I think this shouldn't be all too confusing

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes you're right! I just finished implementing the RDP and its a timeSeries utility called padCoordinates(seq_to_pad, target_seq_size), I specifically chose this name to introduce the idea of padding and its importance in dealing with time series data. The example is in the mouse Gesture RDP folder.

That said should I remove the ink one entirely? thanks!

examples/timeSeries-train-quickdraw/sketch.js Outdated Show resolved Hide resolved
2. Sequence of arrays (array of array, order matters)
[[],[],[],[]]
3. Sequence of values (inputlabels should be provided by user)
[[,,,,,]] e.g. shape = {inputLabels: ['x','y']} will become [{x: , y: },{x: , y: },{x: , y: },{x: , y: }]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unsure about the third format - how is this different from the first? Is it merely wrapped in another array? (Maybe this one is not needed?)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi professor, I included this format mainly to account for as many formats as possible, in this case the outer [] is not seen by the user and in their perspective they only sort of push all their values, regardless of sequence, inside one array only. (I think its the simplest as the user doesn't have to "group" sequences together) Anyways, tell me what you think and we can discuss if we should scrap it or not.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are examples for the different cases? (Maybe easiest to look at a how this would function from the user side)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Gottfried, I decided that this format is not actually useful as the batch size needs to be specified as well, which leads to some nasty complications when user input is not right. that said I will probably just remove this and stick with the first two.


- at the end of the adding data, the data is formatted to a sequence of objects similar to 1 of xinputs

- new parameter dataModality, either spatial or sequential, spatial uses cnn1d and sequential uses lstm
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dan mentioned this already in our meeting: dataModality is a little bit confusing, since in both cases are "sequential" - we're just using a slightly different layer stackup for better performance with 2D data, if I understand correctly?

I'd be wondering if instead of this parameter, there could be a heuristic which - based on the number of inputs and outputs (or the naming of the labels?) - picks the cnn1d over the default lstm one? If we go down this route, then the library should print an informative text to the console when this heuristic kicks in, with some information how to force the other mode. (The NN also had a special task imageClassificaiton. Maybe we could have e.g. classification, classification_cnn1d, classification_lstm under the hood?)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi professor, I agree with this comment, I think that the dataModality is a little bit confusing, especially when the inputs are strings! as much as I want the data to be automatically categorized, unfortunately I currently cannot see a practical way to know, as other data such as weather also uses numbers exclusively, so instead of this, I changed data modality into spatialData which accepts a boolean, by default without setting it, it is using lstm which works for all data sequences so its not a problem, so if users want more accuracy for less datasets in coordinate data they can just set this as true, I think this is the most pragmatic way for now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still thinking that having different tasks might be more clear and educational compared to a flag (where we don't quite know what its doing without looking at the source code).

Copy link
Author

@mop9047 mop9047 Aug 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Gottfried, I reverted to the old way without the flag. I implemented a change for this and changed the dataModality into dataMode to make it shorter. I keep the spatial but changed the sequential to linear, tell me what you think about these changes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants