I'm trying to train srresnet-mse using my own data set. Sometimes I get an error message. The first time it occurred between 0 and 100 eras, then between 100 and 200, then after 600 eras. In my data set there are about one hundred thousand images. I suspect that this is due to my data set. Can you help me understand what the problem is?
/opt/ds/anaconda3/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
from ._conv import register_converters as _register_converters
Logging results for this session in folder "results/srresnet-mse".
2018-09-03 12:22:04.150374: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-09-03 12:22:07.392768: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1212] Found device 0 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:c1:00.0
totalMemory: 10.92GiB freeMemory: 10.76GiB
2018-09-03 12:22:07.392867: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1312] Adding visible gpu devices: 0
2018-09-03 12:22:07.811512: I tensorflow/core/common_runtime/gpu/gpu_device.cc:993] Creating TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10415 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:c1:00.0, compute capability: 6.1)
[0] Test: 0.4038988, Train: 0.5311046 [Set5] PSNR: 11.46, SSIM: 0.1051 [Set14] PSNR: 12.50, SSIM: 0.0841 [BSD100] PSNR: 13.13, SSIM: 0.1036
[100] Test: 0.2326521, Train: 0.3028869 [Set5] PSNR: 13.47, SSIM: 0.4380 [Set14] PSNR: 14.62, SSIM: 0.4203 [BSD100] PSNR: 15.39, SSIM: 0.4153
2018-09-03 12:23:46.013589: W tensorflow/core/kernels/queue_base.cc:277] _0_input_producer: Skipping cancelled enqueue attempt with queue not closed
2018-09-03 12:23:46.026015: W tensorflow/core/kernels/queue_base.cc:277] _2_input_producer_1: Skipping cancelled enqueue attempt with queue not closed
2018-09-03 12:23:46.026804: W tensorflow/core/kernels/queue_base.cc:277] _5_batch_2/fifo_queue: Skipping cancelled enqueue attempt with queue not closed
2018-09-03 12:23:46.027426: W tensorflow/core/kernels/queue_base.cc:277] _3_batch_1/fifo_queue: Skipping cancelled enqueue attempt with queue not closed
2018-09-03 12:23:46.027743: W tensorflow/core/kernels/queue_base.cc:277] _4_input_producer_2: Skipping cancelled enqueue attempt with queue not closed
Traceback (most recent call last):
File "/opt/ds/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1361, in _do_call
return fn(*args)
File "/opt/ds/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1340, in _run_fn
target_list, status, run_metadata)
File "/opt/ds/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 516, in __exit__
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.OutOfRangeError: FIFOQueue '_1_batch/fifo_queue' is closed and has insufficient elements (requested 14, current size 0)
[[Node: batch = QueueDequeueManyV2[component_types=[DT_UINT8], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/device:CPU:0"](batch/fifo_queue, batch/n)]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "train.py", line 134, in <module>
main()
File "train.py", line 121, in main
batch_hr = sess.run(get_train_batch)
File "/opt/ds/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 905, in run
run_metadata_ptr)
File "/opt/ds/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1137, in _run
feed_dict_tensor, options, run_metadata)
File "/opt/ds/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1355, in _do_run
options, run_metadata)
File "/opt/ds/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1374, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.OutOfRangeError: FIFOQueue '_1_batch/fifo_queue' is closed and has insufficient elements (requested 14, current size 0)
[[Node: batch = QueueDequeueManyV2[component_types=[DT_UINT8], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/device:CPU:0"](batch/fifo_queue, batch/n)]]
Caused by op 'batch', defined at:
File "train.py", line 134, in <module>
main()
File "train.py", line 68, in main
get_train_batch, get_val_batch, get_eval_batch = build_inputs(args, sess)
File "/home/ds/ykochnev/SRGAN-orig/utilities.py", line 55, in build_inputs
get_train_batch = build_input_pipeline(train_filenames, batch_size=args.batch_size, img_size=args.image_size, random_crop=True)
File "/home/ds/ykochnev/SRGAN-orig/utilities.py", line 36, in build_input_pipeline
image_batch = tf.train.batch([image], batch_size=batch_size, num_threads=num_threads, capacity=10 * batch_size)
File "/opt/ds/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/input.py", line 989, in batch
name=name)
File "/opt/ds/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/input.py", line 763, in _batch
dequeued = queue.dequeue_many(batch_size, name=name)
File "/opt/ds/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/data_flow_ops.py", line 483, in dequeue_many
self._queue_ref, n=n, component_types=self._dtypes, name=name)
File "/opt/ds/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/gen_data_flow_ops.py", line 2430, in _queue_dequeue_many_v2
component_types=component_types, timeout_ms=timeout_ms, name=name)
File "/opt/ds/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/opt/ds/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3271, in create_op
op_def=op_def)
File "/opt/ds/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1650, in __init__
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access
OutOfRangeError (see above for traceback): FIFOQueue '_1_batch/fifo_queue' is closed and has insufficient elements (requested 14, current size 0)
[[Node: batch = QueueDequeueManyV2[component_types=[DT_UINT8], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/device:CPU:0"](batch/fifo_queue, batch/n)]]
I'm trying to train srresnet-mse using my own data set. Sometimes I get an error message. The first time it occurred between 0 and 100 eras, then between 100 and 200, then after 600 eras. In my data set there are about one hundred thousand images. I suspect that this is due to my data set. Can you help me understand what the problem is?