介绍(Introduction)

Tensorflow是一个使用数据流图(Data flow graphs)技术进行数值运算的函式库。每一张图都是由节点(Node)和边(Edge)组成的。Tensorflow具有以下几点特性:

灵活性

Tensorflow不是一个严格意义上的神经网络函式库,只要是能够使用数据流图来描述的计算问题,都能够通过Tensorflow来实现。与此同时还能够用简单的Python来实现高层次的功能。

可迁移性

Tensorflow可以在任何具备CPU或者GPU的设备上运行,无需考虑复杂的环境配置问题,。

高效性

Tensorflow可以提升神经网络的训练效率,且具备代码统一的有优势,便于和同行分享。

配置支持

  • Python
  • C++
  • CUDA (GPU环境)
  • CUDNN (GPU环境)

Tensorflow的结构

数据流图(Graph)

数据流图是一种描述有向图的数值计算过程产物。图中的节点通常是代表数学运算,但也可以表示数据的输入、输出和读写等操作。图中的边(Edge)表示节点之间的某种关联,负责在节点之间传递各种数据单元,而Tensorflow的基本运算单元是Tensor。Tensorflow的flow也因此得名。

节点可以被分配到多个设备上运算,也就是所谓的异步并行操作。因为是有向图,所以只有等到先前的节点结束工作时,当前的节点才能够执行相应的操作。

节点(Ops)

Tensorflow中的节点也被称为Operation。一个Ops通常使用0个或者以上的Tensors,通过执行某个特定的运算,产生新的Tensors。一个Tensor表示的是一个多维数组,例如[batch, height, width, channels]这样的形式,数组中的数多为浮点数。

边(Edge)

Tensorflow各节点之间的通道被成为边,也可以理解为流(FLow),作用是在每个节点的计算过程中传输数据Tensor。因为是有向图的关系,边的传输方向也是有自己的规则,因此在Tensorflow的运算过程中往往需要安排好节点和边的关系。

Tensorflow的常见使用步骤

  • 将计算流程表示成图的形式
  • 通过Session来执行图计算
  • 将数据表示为Tensors
  • 通过Variable储存模型的状态数值
  • 使用feeds和fetches来填充数据和抓取数据

Tensorflow运行中通过Session来执行图中各节点的运算,Session将Ops放置到CPU或者GPU中,然后执行他们。执行完毕后,返回相应的结果(Tensors),在Python中这些Tensors的形式是numpy ndarray的objects。

创建数据流图

Tensorflow在使用过程中通常分为施工阶段建设阶段两部分。在施工阶段我们创建一个神经网络的结构和功能,在建设阶段通过Session来反复执行我们所构建的神经网络。

和大多数编程语言类似,Tensorflow的Constant是一种没有输入的ops(常量),但是它本身可以作为其他ops的输入。

1
2
3
4
import tensorflow as tf
matrix1 = tf.constant([[3., 3.]])
matrix2 = tf.constant([[2.], [2.]])
product = tf.matmul(matrix1, matrix2)

这时我们已经在一个Default的Graph里面加入了三个Nodes,两个Constant ops和一个matmul的ops。为了能够得到两个矩阵运算的结果,我们就必须使用session来启动图

在Session中执行数据流图

刚才已经完成了施工的阶段,现在要开始建设阶段了,这样才能实作出我们想要的结果。

1
2
3
4
sess = tf.Session()
result = sess.run(product)
print(result)
sess.close()

用定义式的Session执行需要一个结束的判定,或者我们可以使用with的方式来定义我们的执行过程:

1
2
3
with tf.Session() as sess:
result = sess.run(product)
print(result)

Tensorflow这些节点可以被分配到不同的设备上进行计算。如果是GPU,默认会在第一个GPU(id = 0)上执行,如果想在其他的GPU上执行相应的session,需要进行手动配置:

1
2
3
4
5
6
7
with tf.Session() as sess:
# 也可以用‘/cpu:0’
with tf.device("/gpu:1"):
matrix1 = tf.constant([[3., 3.]])
matrix2 = tf.constant([[2.], [2.]])
product = tf.matmul(matrix1, matrix2)
print(sess.run(product))

在一些交互界面(例如Ipython或者cmd)运行tensorflow的时候,我们往往不需要编译全局而用分布式运算的方式。因此我们可以使用InteractiveSession和eval()、Ops_name.run()等方式来进行分布式运算:

1
2
3
4
5
6
7
8
9
10
11
import tensorflow as tf
sess = tf.InteractiveSession()
a = tf.Variable([1.0, 2.0])
a.initializer.run()
b = tf.constant([3.0, 3.0])
sub = tf.subtract(a, b)
print(sub.eval())
sess.close()

运算中的数据结构Tensors

Tensorflow中使用的数据结构不同于其他语言中的结构,而是一种叫作Tensor的结构,它的本质是一个多维的数据集的表示形式,用来在数据流图中的各节点之间传递信息,一个Tensor具有固定的类型和大小(静态型别)。

变量Variable

变量在图的执行过程中,保持着自己特有的状态信息,能够为图模型的运作保存变化的数值信息:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
import tensorflow as tf
state = tf.Variable(0, name = "counter")
one = tf.constant(1)
new_value = tf.add(state, one)
# 赋值函数
update = tf.assign(state, new_value)
init_op = tf.initializer_all_variables()
with tf.Session() as sess:
sess.run(init_op)
print(sess.run(state))
for _ in range(3):
sess.run(update)
print(sess.run(state))

一般我们会将神经网络的参数初始化为一些变量,等到训练的时候再通过Session来对参数进行更新。

抓取(Fetches)和填充(Feeds)

我们在使用神经网络的过程中,每一个节点的图往往不是封闭的,也就是说它们需要传入和输出一些东西。而为了抓取ops的输出,我们需要执行Session的run函数,然后通过print的方式抓取它们的参数:

1
2
3
4
5
6
7
8
9
10
import tensorflow as tf
input1 = tf.constant(3.0)
input2 = tf.constant(2.0)
input3 = tf.constant(5.0)
intermed = tf.add(input2, input3)
mul = tf.multiply(input1, intermed)
with tf.Session() as sess:
result = sess.run(mul)
print(result)
  • 其中的result计算过程中,虽然mul的计算过程需要用到intermed的计算结果,但是我们不需要另外写入sess.run(intermed)。原因是Tensorflow是一个有向图集,因此我们定义后面的图,它就会自动去追溯先前的所有图并且实作它们。

有的时候我们在计算过程中有些参数我们是在之后建设的过程中才会得到的,因此我们在施工的时候就可以先用一个占位符把它的位置保留:

1
2
3
4
5
6
7
import tensorflow as tf
input1 = tf.placeholder(tf.float32)
input2 = tf.placeholder(tf.float32)
output = tf.multiply(input1, input2)
with tf.Session() as sess:
print(sess.run(output, feed_dict = {input1 : [7.], input2: [2.]}))

或者传入一个numpy array:

1
2
3
4
5
6
7
8
9
10
11
import tensorflow as tf
import numpy as np
n = 5
a = tf.placeholder(dtype = tf.float32)
b = tf.placeholder("float", [None, n])
output = tf.multiply(a, b)
with tf.Session() as sess:
temp = np.asarray([[1., 2., 3., 4., 5.]])
print(sess.run(output, feed_dict = {a: [2.], b: temp}))

Tensorflow范例

拟合曲线的计算

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
import tensorflow as tf
import numpy as np
x_data = np.random.randn(100).astype("float32")
y_data = x_data * 0.1 + 0.3
W = tf.Variable(tf.random_uniform([1], -1.0, 1.0))
b = tf.Variable(tf.zeros([1]))
y = W * x_data + b
loss = tf.reduce_mean(tf.square(y - y_data))
optimizer = tf.train.GradientDescentOptimizer(0.1).minimize(loss)
init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init)
for step in range(101):
sess.run(optimizer)
if step % 20 == 0:
print(step, sess.run(W), sess.run(b))

MNIST手写识别

利用线性分类:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
from tensorflow.examples.tutorials.mnist import input_data
# mnist.train, mnist.test, mnist.validation
mnist = input_data.read_data_sets("MNIST_data/", one_hot = True)
import tensorflow as tf
import numpy as np
batch_size = 64
train_iter = 1000
# input_size = [batch_size, 28, 28]; output_size = one_hot
x = tf.placeholder(tf.float32, [None, 784])
y = tf.placeholder(tf.float32, [None, 10])
W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))
pred = tf.nn.softmax(tf.matmul(x, W) + b)
loss = -tf.reduce_sum(y * tf.log(pred))
optimizer = tf.train.GradientDescentOptimizer(0.01).minimize(loss)
correct_prediction = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init)
for step in range(train_iter):
batch_xs, batch_ys = mnist.train.next_batch(64)
sess.run(optimizer, feed_dict = {x: batch_xs, y: batch_ys})
if step % 100 == 0:
print(sess.run(accuracy, feed_dict = {x: mnist.test.images, y: mnist.test.labels}))

利用RNN(GRU)神经网络:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
from tensorflow.examples.tutorials.mnist import input_data
import tensorflow as tf
import numpy as np
mnist = input_data.read_data_sets("MNIST/", one_hot = True)
n_input = 28
time_step = 28
n_hidden = 128
n_output = 10
learning_rate = 0.001
train_iters = 1000
batch_size = 64
x = tf.placeholder(tf.float32, [None, time_step, n_input])
y = tf.placeholder(tf.float32, [None, n_output])
W = {
'hidden' : tf.Variable(tf.random_normal([n_input, n_hidden])),
'output' : tf.Variable(tf.random_normal([n_hidden, n_output]))
}
b = {
'hidden' : tf.Variable(tf.random_normal([n_hidden])),
'output' : tf.Variable(tf.random_normal([n_output]))
}
def RNN(x, W, b):
x = tf.transpose(x, [1, 0, 2])
x = tf.reshape(x, [-1, n_input])
x = tf.matmul(x, W['hidden']) + b['hidden']
x = tf.split(x, time_step, 0)
lstm_cell = tf.nn.rnn_cell.GRUCell(n_hidden)
outputs, _ = tf.contrib.rnn.static_rnn(lstm_cell, x, dtype = tf.float32)
return tf.matmul(outputs[-1], W['output']) + b['output']
pred = RNN(x, W, b)
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits = pred, labels = y))
optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(loss)
corrent_pred = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))
accuracy = tf.reduce_mean(tf.cast(corrent_pred, tf.float32))
init = tf.global_variables_initializer()
with tf.Session() as sess:
with tf.device('/cpu:0'):
sess.run(init)
for step in range(train_iters):
batch_x, batch_y = mnist.train.next_batch(batch_size)
batch_x = batch_x.reshape(batch_size, time_step, n_input)
sess.run(optimizer, feed_dict = {x: batch_x, y: batch_y})
if step % 100 == 0:
acc = sess.run(accuracy, feed_dict = {x: batch_x, y: batch_y})
cost = sess.run(loss, feed_dict = {x: batch_x, y: batch_y})
print("MSG : Epoch {}, Training_accuracy = {:.6f}, Training_loss = {:.5f}".format((step // 100) + 1, acc, cost))
test_data = mnist.test.images.reshape(-1, time_step, n_input)
test_labels = mnist.test.labels
print("MSG : Testing_accuracy = {:.6f}".format(sess.run(accuracy, feed_dict = {x: test_data, y: test_labels})))

利用CNN神经网络:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
from tensorflow.examples.tutorials.mnist import input_data
import tensorflow as tf
mnist = input_data.read_data_sets('MNIST/', one_hot = True)
time_step = 28
n_input = 28
n_output = 10
n_hidden = 1024
learning_rate = 0.001
train_iters = 20000
batch_size = 64
dropout = 0.5
strides_size = 1
kernal_size = 2
window_size = 5
def weight_variable(shape):
initial = tf.truncated_normal(shape, stddev = 0.1)
return tf.Variable(initial)
def bias_variable(shape):
initial = tf.constant(0.1, shape = shape)
return tf.Variable(initial)
def conv(x, W):
return tf.nn.conv2d(x, W, strides = [strides_size] * 4, padding = 'SAME')
def max_pooling(x):
return tf.nn.max_pool(x, ksize = [1, window_size, window_size, 1], strides = [1, 2, 2, 1], padding = 'SAME')
x = tf.placeholder(tf.float32, [None, time_step, n_input])
x_image = tf.reshape(x, [-1, time_step, n_input, 1])
y = tf.placeholder(tf.float32, [None, n_output])
keep_prob = tf.placeholder(tf.float32)
# conv1 layer
W_conv1 = weight_variable([window_size, window_size, 1, 32])
b_conv1 = bias_variable([32])
# conv2 layer
W_conv2 = weight_variable([window_size, window_size, 32, 64])
b_conv2 = bias_variable([64])
# linear flatten layer
W_fc1 = weight_variable([7*7*64, n_hidden])
b_fc1 = bias_variable([n_hidden])
# softmax layer
W_fc2 = weight_variable([n_hidden, n_output])
b_fc2 = bias_variable([n_output])
def CNN(x, W_conv1, b_conv1, W_conv2, b_conv2, W_fc1, b_fc1, W_fc2, b_fc2, keep_prob):
# [-1, 28, 28, 1]
h_conv1 = tf.nn.relu(conv(x, W_conv1) + b_conv1)
h_pool1 = max_pooling(h_conv1)
h_pool1_drop = tf.nn.dropout(h_pool1, keep_prob)
# [-1, 14, 14, 32]
h_conv2 = tf.nn.relu(conv(h_pool1_drop, W_conv2) + b_conv2)
h_pool2 = max_pooling(h_conv2)
h_pool2_drop = tf.nn.dropout(h_pool2, keep_prob)
# [-1, 7, 7, 64]
h_fc1 = tf.reshape(h_pool2_drop, [-1, 7*7*64])
h_fc1 = tf.nn.relu(tf.matmul(h_fc1, W_fc1) + b_fc1)
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)
# [-1, n_hidden] -> [-1, n_output]
# return tf.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2)
return tf.matmul(h_fc1_drop, W_fc2) + b_fc2
pred = CNN(x_image, W_conv1, b_conv1, W_conv2, b_conv2, W_fc1, b_fc1, W_fc2, b_fc2, keep_prob)
# loss = -tf.reduce_sum(y * tf.log(pred))
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits = pred, labels = y))
optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(loss)
corrent_pred = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))
accuracy = tf.reduce_mean(tf.cast(corrent_pred, tf.float32))
init = tf.global_variables_initializer()
with tf.Session() as sess:
with tf.device('/gpu:0'):
sess.run(init)
for step in range(train_iters):
batch_x, batch_y = mnist.train.next_batch(batch_size)
batch_x = batch_x.reshape(batch_size, time_step, n_input)
sess.run(optimizer, feed_dict = {x: batch_x, y: batch_y, keep_prob: dropout})
if step % 100 == 0:
acc = sess.run(accuracy, feed_dict = {x: batch_x, y: batch_y, keep_prob: 1.0})
cost = sess.run(loss, feed_dict = {x: batch_x, y: batch_y, keep_prob: 1.0})
print("MSG : Epoch {}, Training accuracy = {:.6f}, Training loss = {:.5f}".format((step // 100) + 1, acc, cost))
test_data = mnist.test.images.reshape(-1, time_step, n_input)
test_labels = mnist.test.labels
print(sess.run(accuracy, feed_dict = {x: test_data, y: test_labels, keep_prob: 1.0}))

至此,基本能够掌握Tensorflow在神经网络构建过程中的一些流程细节。