-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] TimeSeries Feature #192
base: main
Are you sure you want to change the base?
Conversation
…opying methods in new file)
Wow, very excited to try out these examples and learn more about this work in our meeting! |
|
||
// code to visualize how much ink left | ||
function inkBar(){ | ||
datapoints = map(frame_count,0,ink_multiplier*num_seq, 0,num_seq) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ink bar is a very creative, user-facing solution to the problem! 😊
For a more general solution, I am wondering if the algorithm that Dan mentioned (best in a stand-alone helper function that takes an array of data points and returns a modified array) is a good idea to add?
My less sophisticated approach would be: recording only points where the mouse changed by a certain minimum distance from the previous point. If the number is exceeding the desired number, I'd sample every nth as needed. (Could be nice to see the actual data points that get used, similarly to how you had the green points with the hand btw.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi professor, I am almost done with the RDP algorithm, and it works well, the code I referenced originally uses vectors, I am currently continuing this idea but do you think it might cause issues with the user's side? (i.e. they need to know use vectors only to make this work)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The vector math is merely inside the RDP function? Then I think this shouldn't be all too confusing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes you're right! I just finished implementing the RDP and its a timeSeries utility called padCoordinates(seq_to_pad, target_seq_size)
, I specifically chose this name to introduce the idea of padding and its importance in dealing with time series data. The example is in the mouse Gesture RDP folder.
That said should I remove the ink one entirely? thanks!
2. Sequence of arrays (array of array, order matters) | ||
[[],[],[],[]] | ||
3. Sequence of values (inputlabels should be provided by user) | ||
[[,,,,,]] e.g. shape = {inputLabels: ['x','y']} will become [{x: , y: },{x: , y: },{x: , y: },{x: , y: }] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unsure about the third format - how is this different from the first? Is it merely wrapped in another array? (Maybe this one is not needed?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi professor, I included this format mainly to account for as many formats as possible, in this case the outer [] is not seen by the user and in their perspective they only sort of push all their values, regardless of sequence, inside one array only. (I think its the simplest as the user doesn't have to "group" sequences together) Anyways, tell me what you think and we can discuss if we should scrap it or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are examples for the different cases? (Maybe easiest to look at a how this would function from the user side)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Gottfried, I decided that this format is not actually useful as the batch size needs to be specified as well, which leads to some nasty complications when user input is not right. that said I will probably just remove this and stick with the first two.
src/TimeSeries/index.js
Outdated
|
||
- at the end of the adding data, the data is formatted to a sequence of objects similar to 1 of xinputs | ||
|
||
- new parameter dataModality, either spatial or sequential, spatial uses cnn1d and sequential uses lstm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dan mentioned this already in our meeting: dataModality
is a little bit confusing, since in both cases are "sequential" - we're just using a slightly different layer stackup for better performance with 2D data, if I understand correctly?
I'd be wondering if instead of this parameter, there could be a heuristic which - based on the number of inputs and outputs (or the naming of the labels?) - picks the cnn1d
over the default lstm
one? If we go down this route, then the library should print an informative text to the console when this heuristic kicks in, with some information how to force the other mode. (The NN also had a special task imageClassificaiton
. Maybe we could have e.g. classification
, classification_cnn1d
, classification_lstm
under the hood?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi professor, I agree with this comment, I think that the dataModality is a little bit confusing, especially when the inputs are strings! as much as I want the data to be automatically categorized, unfortunately I currently cannot see a practical way to know, as other data such as weather also uses numbers exclusively, so instead of this, I changed data modality into spatialData
which accepts a boolean, by default without setting it, it is using lstm which works for all data sequences so its not a problem, so if users want more accuracy for less datasets in coordinate data they can just set this as true, I think this is the most pragmatic way for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still thinking that having different tasks
might be more clear and educational compared to a flag (where we don't quite know what its doing without looking at the source code).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Gottfried, I reverted to the old way without the flag. I implemented a change for this and changed the dataModality
into dataMode
to make it shorter. I keep the spatial
but changed the sequential
to linear
, tell me what you think about these changes
This work in progress feature makes it possible to train time series data in your own neural network, along with the new folder in src called TimeSeries, there are also 4 new examples