Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions docs/source/feature/data.md
Original file line number Diff line number Diff line change
Expand Up @@ -167,11 +167,11 @@ sample_weight_fields: 'col_name'
- --ODPS_CONFIG_FILE_PATH: 该环境变量指向的是odpscmd的配置文件
- 在[DataWorks](https://workbench.data.aliyun.com/)的独享资源组中安装pyfg,「资源组列表」- 在一个调度资源组的「操作」栏 点「运维助手」-「创建命令」(选手动输入)-「运行命令」
```shell
/home/tops/bin/pip3 install http://tzrec.oss-cn-beijing.aliyuncs.com/third_party/pyfg065-0.6.5-cp37-cp37m-linux_x86_64.whl --index-url=https://mirrors.aliyun.com/pypi/simple/ --trusted-host=mirrors.cloud.aliyuncs.com
/home/tops/bin/pip3 install http://tzrec.oss-cn-beijing.aliyuncs.com/third_party/pyfg069-0.6.9-cp37-cp37m-linux_x86_64.whl --index-url=https://mirrors.aliyun.com/pypi/simple/ --trusted-host=mirrors.cloud.aliyuncs.com
```
- 在DataWorks中建立`PyODPS 3`节点运行FG,节点调度参数中配置好bizdate参数
```
from pyfg065 import offline_pyfg
from pyfg069 import offline_pyfg
offline_pyfg.run(
o,
input_table="YOU_PROJECT.TABLE_NAME",
Expand Down
107 changes: 70 additions & 37 deletions docs/source/feature/feature.md
Original file line number Diff line number Diff line change
Expand Up @@ -312,37 +312,65 @@ feature_configs: {

- **内置函数**:

| 函数名 | 参数数量 | 解释 |
| ----------- | -------- | -------------------------------------- |
| sin | 1 | sine function |
| cos | 1 | cosine function |
| tan | 1 | tangens function |
| asin | 1 | arcus sine function |
| acos | 1 | arcus cosine function |
| atan | 1 | arcus tangens function |
| sinh | 1 | hyperbolic sine function |
| cosh | 1 | hyperbolic cosine |
| tanh | 1 | hyperbolic tangens function |
| asinh | 1 | hyperbolic arcus sine function |
| acosh | 1 | hyperbolic arcus tangens function |
| atanh | 1 | hyperbolic arcur tangens function |
| log2 | 1 | logarithm to the base 2 |
| log10 | 1 | logarithm to the base 10 |
| log | 1 | logarithm to base e (2.71828...) |
| ln | 1 | logarithm to base e (2.71828...) |
| exp | 1 | e raised to the power of x |
| sqrt | 1 | square root of a value |
| sign | 1 | sign function -1 if x\<0; 1 if x>0 |
| rint | 1 | round to nearest integer |
| abs | 1 | absolute value |
| sigmoid | 1 | sigmoid function |
| l2_norm | 1 | l2 normalize of a vector |
| dot | 2 | dot product of two vectors |
| euclid_dist | 2 | euclidean distance between two vectors |
| min | var. | min of all arguments |
| max | var. | max of all arguments |
| sum | var. | sum of all arguments |
| avg | var. | mean value of all arguments |
| 函数名 | 参数数量 | 解释 |
| ----------- | -------- | ----------------------------------------------------------------------- |
| sin | 1 | sine function |
| cos | 1 | cosine function |
| tan | 1 | tangens function |
| asin | 1 | arcus sine function |
| acos | 1 | arcus cosine function |
| atan | 1 | arcus tangens function |
| sinh | 1 | hyperbolic sine function |
| cosh | 1 | hyperbolic cosine |
| tanh | 1 | hyperbolic tangens function |
| asinh | 1 | hyperbolic arcus sine function |
| acosh | 1 | hyperbolic arcus tangens function |
| atanh | 1 | hyperbolic arcur tangens function |
| log2 | 1 | logarithm to the base 2 |
| log10 | 1 | logarithm to the base 10 |
| log | 1 | logarithm to base e (2.71828...) |
| ln | 1 | logarithm to base e (2.71828...) |
| exp | 1 | e raised to the power of x |
| sqrt | 1 | square root of a value |
| sign | 1 | sign function -1 if x\<0; 1 if x>0 |
| abs | 1 | absolute value |
| rint | 1 | round to nearest integer |
| floor | 1 | 向下取整 |
| ceil | 1 | 向上取整 |
| trunc | 1 | 截断取整(直接去掉小数部分) |
| round | 1 | 四舍五入,总是使用"远离零"的舍入方式(round half away from zero) |
| roundp | 2 | 自定义精度取整函数, e.g. roundp(3.14159,2)=3.14 |
| sigmoid | 1 | sigmoid function |
| sphere_dist | 4 | sphere distance between two gps points, args(lng1, lat1, lng2, lat2) |
| haversine | 4 | haversine distance between two gps points, args(lng1, lat1, lng2, lat2) |
| sigmoid | 1 | sigmoid function |
| min | var. | min of all arguments |
| max | var. | max of all arguments |
| sum | var. | sum of all arguments |
| avg | var. | mean value of all arguments |

备注:上述内置函数支持批量计算和广播机制

- **内置向量函数**:

| 函数名 | 参数数量 | 解释 |
| ------------ | -------- | ----------------------------------------------------- |
| len | 1 | the length of a vector |
| l2_norm | 1 | l2 normalize of a vector |
| squared_norm | 1 | squared normalize of a vector |
| dot | 2 | dot product of two vectors |
| euclid_dist | 2 | euclidean distance between two vectors |
| std_dev | 1 | standard deviation of a vector, divide n |
| pop_std_dev | 1 | population standard deviation of a vector, divide n-1 |
| variance | 1 | sample variance of a vector, divide n |
| pop_variance | 1 | population variance of a vector, divide n-1 |
| reduce_min | 1 | reduce min of a vector |
| reduce_max | 1 | reduce max of a vector |
| reduce_sum | 1 | reduce sum of a vector |
| reduce_mean | 1 | reduce mean of a vector |
| reduce_prod | 1 | reduce product of a vector |

备注:当表达式包含上述内置向量函数时,非向量函数参数的其他变量只能是单值类型(scalar)。

- **内置二元操作符**:

Expand Down Expand Up @@ -402,12 +430,17 @@ feature_configs: {

- **method**: 重合计算方式,可选 query_common_ratio | title_common_ratio | is_contain | is_equal

| 方式 | 描述 | 备注 |
| ------------------ | --------------------------------------------- | ------------------------------ |
| query_common_ratio | 计算query与title间重复term数占query中term比例 | 取值为[0,1] |
| title_common_ratio | 计算query与title间重复term数占title中term比例 | 取值为[0,1] |
| is_contain | 计算query是否全部包含在title中,保持顺序 | 0表示未包含,1表示包含 |
| is_equal | 计算query是否与title完全相同 | 0表示不完全相同,1表示完全相同 |
| 方式 | 描述 | 备注 |
| ------------------- | ----------------------------------------------------------- | ------------------------------------------------------------- |
| query_common_ratio | 计算query与title间重复term数占query中term比例 | 取值为[0,1] |
| title_common_ratio | 计算query与title间重复term数占title中term比例 | 取值为[0,1] |
| is_contain | 计算query是否全部包含在title中,保持顺序 | 0表示未包含,1表示包含 |
| is_equal | 计算query是否与title完全相同 | 0表示不完全相同,1表示完全相同 |
| index_of | 计算query作为整体第一次出现在title中的位置 | 没有出现返回-1.0 |
| proximity_min_cover | 计算query term在title中的邻近度 | 取值为[0, length(title)], 0表示存在不能匹配的term |
| proximity_min_dist | 计算query term在title中的邻近度 (minimum pairwise distance) | 取值为[0, length(title)+1], length(title)+1表示没有匹配的term |
| proximity_max_dist | 计算query term在title中的邻近度 (maximum pairwise distance) | 取值为[0, length(title)+1], length(title)+1表示没有匹配的term |
| proximity_avg_dist | 计算query term在title中的邻近度 (average pairwise distance) | 取值为[0, length(title)+1], length(title)+1表示没有匹配的term |

- 其余配置同RawFeature

Expand Down
2 changes: 1 addition & 1 deletion docs/source/usage/serving.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ cat << EOF > tzrec_rank.json
}
}
],
"processor":"easyrec-torch-1.0"
"processor":"easyrec-torch-1.4"
}
EOF

Expand Down
4 changes: 2 additions & 2 deletions requirements/runtime.txt
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,8 @@ grpcio-tools<1.63.0
numpy<2
pandas
psutil
pyfg @ https://tzrec.oss-accelerate.aliyuncs.com/third_party/pyfg-0.6.5-cp311-cp311-linux_x86_64.whl ; python_version=="3.11"
pyfg @ https://tzrec.oss-accelerate.aliyuncs.com/third_party/pyfg-0.6.5-cp310-cp310-linux_x86_64.whl ; python_version=="3.10"
pyfg @ https://tzrec.oss-cn-beijing.aliyuncs.com/third_party/pyfg-0.6.9-cp311-cp311-linux_x86_64.whl ; python_version=="3.11"
pyfg @ https://tzrec.oss-cn-beijing.aliyuncs.com/third_party/pyfg-0.6.9-cp310-cp310-linux_x86_64.whl ; python_version=="3.10"
pyodps>=0.12.2.1
scikit-learn
tensorboard
Expand Down
4 changes: 4 additions & 0 deletions tzrec/features/expr_feature.py
Original file line number Diff line number Diff line change
Expand Up @@ -116,6 +116,10 @@ def fg_json(self) -> List[Dict[str, Any]]:
"variables": list(self.config.variables),
"value_type": "float",
}
if self.config.separator != "\x1d":
fg_cfg["separator"] = self.config.separator
if self.config.HasField("fill_missing"):
fg_cfg["fill_missing"] = self.config.fill_missing
if len(self.config.boundaries) > 0:
fg_cfg["boundaries"] = list(self.config.boundaries)
return [fg_cfg]
17 changes: 15 additions & 2 deletions tzrec/features/expr_feature_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -190,15 +190,25 @@ def test_expr_feature_with_boundaries(
[[0.2, 0.3], [0.1, 0.2], [0.3, 0.4], []],
[[0.2, 0.2], [0.2, 0.2], [0.2, 0.2], [0.2, 0.2]],
"0.1",
[2, 1, 2, 1],
[2, 1, 2, 2],
[1, 1, 1, 1],
None,
],
[
["0.2\x1d0.3", "", "0.3\x1d0.4"],
["0.2\x1d0.2", "", "0.3\x1d0.4"],
["0.2\x1d0.2", "0.2\x1d0.2", "0.2\x1d0.2"],
"",
[1, 2],
[1, 0, 1],
None,
],
[
["0.2,0.2", "", "0.3,0.4"],
["0.2,0.2", "0.2,0.2", "0.2,0.2"],
"",
[1, 2],
[1, 0, 1],
",",
],
]
)
Expand All @@ -209,6 +219,7 @@ def test_expr_feature_dot(
default_value,
expected_values,
expected_lengths,
sep,
):
expr_feat_cfg = feature_pb2.FeatureConfig(
expr_feature=feature_pb2.ExprFeature(
Expand All @@ -220,6 +231,8 @@ def test_expr_feature_dot(
default_value=default_value,
)
)
if sep is not None:
expr_feat_cfg.expr_feature.separator = sep
expr_feat = expr_feature_lib.ExprFeature(
expr_feat_cfg, fg_mode=FgMode.FG_NORMAL
)
Expand Down
6 changes: 3 additions & 3 deletions tzrec/features/sequence_feature_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -765,7 +765,7 @@ def test_simple_sequence_expr_feature_dense(self):
}
parsed_feat = seq_feat.parse(input_data)
self.assertEqual(parsed_feat.name, "custom_feat")
np.testing.assert_allclose(parsed_feat.values, np.array([[8], [7], [-4], [0]]))
np.testing.assert_allclose(parsed_feat.values, np.array([[8], [7], [0], [0]]))
self.assertTrue(np.allclose(parsed_feat.seq_lengths, np.array([2, 1, 1])))

def test_sequence_expr_feature_sparse(self):
Expand Down Expand Up @@ -811,8 +811,8 @@ def test_sequence_expr_feature_sparse(self):
}
parsed_feat = seq_feat.parse(input_data)
self.assertEqual(parsed_feat.name, "click_50_seq__custom_feat")
np.testing.assert_allclose(parsed_feat.values, np.array([1, 3, 2, 3]))
self.assertTrue(np.allclose(parsed_feat.seq_lengths, np.array([2, 1, 1])))
np.testing.assert_allclose(parsed_feat.values, np.array([1, 3, 2]))
self.assertTrue(np.allclose(parsed_feat.seq_lengths, np.array([2, 1, 0])))


if __name__ == "__main__":
Expand Down
4 changes: 4 additions & 0 deletions tzrec/protos/feature.proto
Original file line number Diff line number Diff line change
Expand Up @@ -380,6 +380,10 @@ message ExprFeature {
optional uint32 embedding_dim = 5;
// boundaries for bucktize numeric expr value
repeated float boundaries = 6;
// fg multi-value separator
optional string separator = 7 [default = "\x1d"];
// fill value when vector length mismatch, default is NaN.
optional float fill_missing = 8;
// embedding pooling type, available is {sum | mean}
optional string pooling = 10 [default = "sum"];
// fg default value
Expand Down
80 changes: 40 additions & 40 deletions tzrec/tests/configs/multi_tower_din_fg_mock.config
Original file line number Diff line number Diff line change
Expand Up @@ -348,24 +348,24 @@ feature_configs {
value_dim: 4
}
}
features {
custom_feature {
feature_name: "custom_2"
operator_name: "SeqExpr"
operator_lib_file: "pyfg/lib/libseq_expr.so"
expression: ["user:cur_time", "item:clk_time_seq"]
operator_params: {
fields {
key: "formula"
value {
string_value: "click_50_seq__cur_time-click_50_seq__clk_time_seq"
}
}
}
boundaries: [1, 2, 3, 4]
embedding_dim: 16
}
}
#features {
# custom_feature {
# feature_name: "custom_2"
# operator_name: "SeqExpr"
# operator_lib_file: "pyfg/lib/libseq_expr.so"
# expression: ["user:cur_time", "item:clk_time_seq"]
# operator_params: {
# fields {
# key: "formula"
# value {
# string_value: "click_50_seq__cur_time-click_50_seq__clk_time_seq"
# }
# }
# }
# boundaries: [1, 2, 3, 4]
# embedding_dim: 16
# }
#}
}
}
feature_configs {
Expand Down Expand Up @@ -397,26 +397,26 @@ feature_configs {
expression: "item:buy_50_raw_5_seq"
}
}
feature_configs {
sequence_custom_feature {
feature_name: "buy_50_custom_3_seq"
sequence_length: 50
sequence_delim: "|"
operator_name: "SeqExpr"
operator_lib_file: "pyfg/lib/libseq_expr.so"
expression: ["item:buy_50_ilng", "item:buy_50_ilat", "user:ulng", "user:ulat"],
operator_params: {
fields {
key: "formula"
value {
string_value: "spherical_distance"
}
}
}
boundaries: [1, 10, 100, 1000]
embedding_dim: 16
}
}
#feature_configs {
# sequence_custom_feature {
# feature_name: "buy_50_custom_3_seq"
# sequence_length: 50
# sequence_delim: "|"
# operator_name: "SeqExpr"
# operator_lib_file: "pyfg/lib/libseq_expr.so"
# expression: ["item:buy_50_ilng", "item:buy_50_ilat", "user:ulng", "user:ulat"],
# operator_params: {
# fields {
# key: "formula"
# value {
# string_value: "spherical_distance"
# }
# }
# }
# boundaries: [1, 10, 100, 1000]
# embedding_dim: 16
# }
#}
model_config {
feature_groups {
group_name: "deep"
Expand Down Expand Up @@ -463,7 +463,7 @@ model_config {
feature_names: "click_50_seq__raw_1"
feature_names: "click_50_seq__raw_2"
feature_names: "click_50_seq__raw_3"
feature_names: "click_50_seq__custom_2"
#feature_names: "click_50_seq__custom_2"
group_type: SEQUENCE
}
feature_groups {
Expand All @@ -474,7 +474,7 @@ model_config {
feature_names: "buy_50_user_id_seq"
feature_names: "buy_50_id_6_seq"
feature_names: "buy_50_raw_5_seq"
feature_names: "buy_50_custom_3_seq"
#feature_names: "buy_50_custom_3_seq"
group_type: SEQUENCE
}
multi_tower_din {
Expand Down
2 changes: 1 addition & 1 deletion tzrec/version.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,4 +9,4 @@
# See the License for the specific language governing permissions and
# limitations under the License.

__version__ = "0.8.3"
__version__ = "0.8.4"