Tensorflow and Keras

If you are looking at this post, it means you are also pretty much a newbie to TensorFlow, like me, as of 2020-07-29.


Keras is already part of TensorFlow, so, use from tensorflow.keras import ***, not from keras import ***.

TensorFlow backend

Early stopping


model.fit(..., callbacks=[EarlyStopping(monitor='val_loss', patience=5, verbose=1, mode='min', restore_best_weights=True)], ...)

Reproducibility of results

Set all random seeds
Use tensorflow.keras instead standalone keras
Use model.predict_on_batch(x).numpy() for predicting speed.

Put this at the very beginning should work.

import os, random
import numpy as np
import tensorflow as tf
random.seed(42) # python random seed
np.random.seed(42) # numpy random seed
tf.random.set_seed(42) # tensorflow randome seed
os.environ['TF_DETERMINISTIC_OPS'] = '1' # ensure GPU reproducibility

Update all codes to tf.keras SEEMS solved the reproducibility problem.

BUT, the speed is 10x slower than using keras directly. After some digging, I find a workaround:

  • Use model.predict_on_batch(x) to do sequential predictions.
    • Because model.predict() will trigger the same calculation path as in model.fit(), including gradient computation or something I don’t understand. See here for details.
    • Also, use model(x) for predicting seems speed up a lot.
    • Using model.compile(..., experimental_run_tf_function=False) seems also speed up a lot.
  • This will cause another problem, the returned value should be a ndarray, but somehow I got a tftensor. So, I need to use model.predict_on_batch(x).numpy() to get the ndarray from the tftensor explicitly.
    • I guess this is a bug and would be fixed in the future, because the docs say predict_on_batch() always returns a numpy.

predict() v.s. predict_on_batch():

  • predict() is used for training
  • predict_on_batch() is used for pure predicting
  • They have a huge speed difference on small testing data. Guess I would never understand the background causes.

About Pure TensorFlow

GradientTape 是新版的自动微分器


General Optimization

(Not read yet) An overview of gradient descent optimization algorithms