Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question]: Can now work the TextVectorization method? #1261

Open
Kyokanyou opened this issue Jul 1, 2024 · 1 comment
Open

[Question]: Can now work the TextVectorization method? #1261

Kyokanyou opened this issue Jul 1, 2024 · 1 comment

Comments

@Kyokanyou
Copy link

Kyokanyou commented Jul 1, 2024

Description

I just used this method to creat words vector.
var text_dataset = tf.constant( "quz foo tak");
print(text_dataset);
var vectorizer = KerasApi.keras.preprocessing.TextVectorization(max_tokens: 1000, output_sequence_length: 4) ;
vectorizer.adapt(text_dataset);
print(vectorizer.Apply(tf.constant("quz" )));

The result is
tf.Tensor: shape=(), dtype=string, numpy='quz foo tak'
tf.Tensor: shape=(), dtype=string, numpy='quz'

It seems nothing hanppen. There is also no examples or test code in docs.
Please help me the right way to use this method, thanks.

I just checked the decompiled code, I found some things maybe the bugs that in CombinerPreprocessingLayer : Layer this class, the method adapt is virtual modifier. But this class inherits from the abstract class Layer. So the vectorizer.adapt(text_dataset) in my code executed the adapt method in the abstract class Layer, as a result this code vectorizer.adapt(text_dataset) didn't work or just did nothing. There is no connection between the adapt method in
TextVectorization class and ILayer class I guessed that.

I don't know. I'm not good at check the code. I had a headache. But I want to get help even just for learning code.

Alternatives

No response

@xuse2008
Copy link

xuse2008 commented Jul 8, 2024

Tensorflow.Keras.Text.Tokenizer tok = keras.preprocessing.text.Tokenizer(10000, filters: "!"); //创建一个实例
tok.fit_on_texts(allTextArr);//创建词典。全文编辑词典。allTextArr是全部数据数量,得用空格隔开。比如:{“您 好 啊”,“how are you”}
//-----------------
var sequencesX = tok.texts_to_sequences(oneTextArr);//一条句子进行one-hot编码。这里记得oneTextArr里面的句子同样是是空格分割的词
var x_train = keras.preprocessing.sequence.pad_sequences(sequencesX, maxlen: 100);//转为向量

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants