| by YoungTimes | No comments

变长数据Features转换为Tensor

ISSUE

从数据集(Dataset)中构造神经网络输入时,遇到同一维度的的Feature元素个数不同的问题,比如:

$$
\text{features} = [[1, 2, 3], [4, 5], [1, 4, 6, 7]]
$$

这种变长的Feature数据在Tensorflow中是不被支持的,当尝试将变长的list转换为Tensor时:

tf.convert_to_tensor(features)

会有如下的报错:

ValueError: Can't convert non-rectangular Python sequence to Tensor.

解决的方法就是把各个维度补齐,根据不同的目的补齐的方法不同,常见的就是补零或者重复最后一个元素

补零(Padding With Zeros)

import numpy as np
import tensorflow as tf

x = np.array([[1,2,3],[4,5],[1,4,6,7]])
max_length = max(len(row) for row in x)
x_padded = np.array([row + [0] * (max_length - len(row)) for row in x])

print(x_padded)

x_tensor = tf.convert_to_tensor(x_padded)

print(x_tensor)

输出:

[[1 2 3 0]
 [4 5 0 0]
 [1 4 6 7]]

tf.Tensor(
[[1 2 3 0]
 [4 5 0 0]
 [1 4 6 7]], shape=(3, 4), dtype=int64)

重复最后的元素(Repeat Last Element)

import tensorflow as tf

import numpy as np

x = np.array([[1,2,3],[4,5],[1,4,6,7]])
max_length = max(len(row) for row in x)
x_padded = np.array([row + [row[-1]] * (max_length - len(row)) for row in x])

print(x_padded)

x_tensor = tf.convert_to_tensor(x_padded)

print(x_tensor)

输出:

[[1 2 3 3]
 [4 5 5 5]
 [1 4 6 7]]

tf.Tensor(
[[1 2 3 3]
 [4 5 5 5]
 [1 4 6 7]], shape=(3, 4), dtype=int64)

参考材料

1.https://stackoverflow.com/questions/40450506/convert-a-list-with-non-fixed-length-elements-to-tensor?noredirect=1&lq=1

发表评论