醉后不知天在水
满船清梦压星河
前言
TensorFlow是一个用于数值计算的强大开源库,非常适合大型机器学习项目。其背后原理很简单,先定义一个用来计算的图,然后执行计算。
基本操作
安装
pip3 install --upgrade tensorflow
检验是否安装成功:
python3 -c 'import tensorflow; print(tensorflow.__version__)'
创建一个会话图并执行
import tensorflow as tf
x = tf.Variable(3, name='x')
y = tf.Variable(4, name='y')
f = x*x*y + y + 2
sess = tf.Session()
sess.run(x.initializer)
sess.run(y.initializer)
result = sess.run(f)
print(result)
sess.close()
每次需要打开一个会话才能执行一个图的计算,每次都重复sess.run()有点笨拙,因而:
with tf.Session() as sess:
x.initializer.run()
y.initializer.run()
result = f.eval()
这样在with模块中会有一个默认会话
除了手工为每个变量调用初始化器之外,还可以使用global_variables_initializer()来完成此工作。这会在图中创建一个节点,这个节点会在会话执行时初始化所有变量:
init = tf.global_variables_initializer()
with tf.Session() as sess:
init.run()
result = f.eval()
也可以使用InteractiveSession,它和常规会话不同之处在于,创建时会将自己设置为默认会话,因此无需使用with模块,但需要手动结束会话
sess = tf.InteractiveSession()
init.run()
result = f.eval()
sess.close()
管理图
我们所创建的所有节点都会被自动添加到图上,大部分情况下,这不是问题,不过有时候想要管理多个不依赖的图,则可以创建新的图,利用with模块将其临时设置为默认图:
x1 = tf.Variable(1)
x1.graph is tf.get_default_graph()#True
graph = tf.Graph()
with graph.as_default():
x2 = tf.Variable(2)
x2.graph is graph#True
x2.graph is tf.get_default_graph()#False
节点值的生命周期
当求值一个节点时,TensorFlow会自动检测该节点的依赖节点,并先对这些节点进行求值。但是图每次执行时,所有节点值会被丢弃重新计算,但是变量值不一样,变量值从初始化开始会一直持续到会话结束。如果不希望重复求值,那么需要特殊指定:
w = tf.constant(3)
x = w + 2
y = x + 5
z = x * 3
with tf.Session() as sess:
y_val, z_val = sess.run([y,z])
TensorFlow中的线性回归
TensorFlow中可以接受任意数量的输入,也可以产生任意数量的输出。比如加法和乘法接受两个输入,产生一个输出。常量和变量(源操作)没有输入。输入和输出都是张量。以加州房价数据为例,展示一次线性回归操作(利用正规方程求解):
housing = fetch_california_housing()
m, n = housing.data.shape
housing_data_plus_bias = np.c_[np.ones((m,1)),housing.data]
X = tf.constant(housing_data_plus_bias, dtype=tf.float32, name='X')
y = tf.constant(housing.target.reshape(-1,1), dtype=tf.float32, name='y')
XT = tf.transpose(X)
theta = tf.matmul(tf.matmul(tf.matrix_inverse(tf.matmul(XT, X)), XT), y)
with tf.Session() as sess:
theta_value = theta.eval()
与利用Numpy直接求解方程相比,上述代码如果有GPU,会自动分发计算到GPU上去
梯度下降
手工计算梯度
n_epochs = 1000
learning_rate = 0.01
X = tf.constant(housing_data_plus_bias, dtype=tf.float32, name='X')
y = tf.constant(housing.target.reshape(-1,1), dtype=tf.float32, name='y')
theta = tf.Variable(tf.random_uniform([n+1, 1], -1.0, 1.0), name='theta')
y_pred = tf.matmul(X, theta, name='predictions')
error = y_pred - y
mse = tf.reduce_mean(tf.square(error), name='mse')
gradients = 2/m * tf.matmul(tf.transpose(X), error)
training_op = tf.assign(theta, theta - learning_rate * gradients)
init = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
for epoch in range(n_epochs):
if epoch % 100 == 0:
print('Epoch ', epoch, 'MSE = ', mse.eval())
sess.run(training_op)
best_theta = theta.eval()
自动微分
gradients = tf.gradients(mse, [theta])[0]
优化器
n_epochs = 1000
learning_rate = 0.01
X = tf.constant(housing_data_plus_bias, dtype=tf.float32, name='X')
y = tf.constant(housing.target.reshape(-1,1), dtype=tf.float32, name='y')
theta = tf.Variable(tf.random_uniform([n+1, 1], -1.0, 1.0), name='theta')
y_pred = tf.matmul(X, theta, name='predictions')
error = y_pred - y
mse = tf.reduce_mean(tf.square(error), name='mse')
#gradients = 2/m * tf.matmul(tf.transpose(X), error)
gradients = tf.gradients(mse, [theta])[0]
#training_op = tf.assign(theta, theta - learning_rate * gradients)
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
training_op = optimizer.minimize(mse)
init = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
for epoch in range(n_epochs):
if epoch % 100 == 0:
print('Epoch ', epoch, 'MSE = ', mse.eval())
sess.run(training_op)
best_theta = theta.eval()
提供数据
小批量梯度下降
每次迭代时用小批量X、y替换的方法,最简单是创建占位符节点。占位符节点不进行实际的计算,只负责输出我们想要输出的值。
A = tf.placeholder(tf.float32, shape=[None, 3])
B = A + 5
with tf.Session() as sess:
B_val_1 = B.eval(feed_dict={A: [[1,2,3]]})
B_val_2 = B.eval(feed_dict={A: [[4,5,6],[7,8,9]]})
模版:
#define placeholder
X = tf.placeholder(tf.float32, shape=(None,n + 1), name='X')
y = rf.placeholder(tf.float32, shape=(None,1), name='y')
#define batches
batche_size = 100
n_batches = int(np.ceil(n / batche_size))
#define graph
def fetch_batch(epoch, batch_index, batche_size):
[...]#load data from disk
return X_batche, y_batche
init = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
for epoch in range(n_batches):
X_batche, y_batche = fetch_batch(epoch, batch_index, batche_size)
sess.run(training_op, feed_dict={X: X_batche, y: y_batche})
best_theta = theta.eval()
保存和恢复模型
训练过程可能需要定期将检查点保存起来,这样在崩溃时,可以从最近的一个检查点恢复,而不是从头再来。
saver = tf.train.Saver()
...
#save model
saver.save(sess, 'path')
#restore model
saver.restore(sess, 'path')
命名作用域
with tf.name_scope('loss') as scope:
error = y_pred - y
tf.reduce_mean(tf.square(error), name='mse')
模块化
def relu(X):
w_shape = (int(X.get_shape()[1]), 1)
w = tf.Variable(tf.random_normal(w_shape), name='weights')
b = tf.Variable(0.0, name='bias')
z = tf.add(tf.matmul(X, w), b, name='z')
return tf.maximum(z, 0, name='relu')
n_features = 3
X = tf.placeholder(tf.float32, shape=(None, n_features), name='X')
relus = [relu(X) for i in range(5)]
output = tf.add_n(relus, name='output')