Releases: tensorflow/data-validation
TensorFlow Data Validation 1.0.0
Major Features and Improvements
- N/A
Bug Fixes and Other Changes
- Increased the threshold beyond which a string feature value is considered
"large" by the experimental sketch-based top-k/unique generator to 1024. - Added normalized AMI to sklearn mutual information generator.
- Depends on
apache-beam[gcp]>=2.29,<3. - Depends on
tensorflow>=1.15.2,!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,<3. - Depends on
tensorflow-metadata>=1.0,<1.1. - Depends on
tfx-bsl>=1.0,<1.1.
Known Issues
- N/A
Breaking Changes
- N/A
Deprecations
- Removed the following deprecated symbols. Their deprecation was announced
in 0.30.0.
tfdv.validate_instancetfdv.lift_stats_generatortfdv.partitioned_stats_generatortfdv.get_feature_value_slicer
- Removed parameter
compression_typein
tfdv.generate_statistics_from_tfrecord
TensorFlow Data Validation 0.26.1
Major Features and Improvements
- N/A
Bug Fixes and Other Changes
- Depends on
apache-beam[gcp]>=2.25,!=2.26.*,<2.29.
Known Issues
- N/A
Breaking changes
- N/A
Deprecations
- N/A
TensorFlow Data Validation 0.30.0
Major Features and Improvements
-
This version is the last version before TFDV 1.0. Once 1.0, all the TFDV
public APIs (i.e. symbols in the root__init__.py) will be subject to
semantic versioning. We are deprecating some public APIs in this version
and they will be removed in 1.0. -
Sketch-based top-k/unique stats generator now is able to detect invalid
utf-8 sequences / large texts and replace them with a placeholder.
It will not suffer from memory issue usually caused by image / large text
features in the data. Note that this generator is not by default used yet. -
Added
StatsOptions.experimental_use_sketch_based_topk_uniqueswhich
enables the sketch-based top-k/unique stats generator.
Bug Fixes and Other Changes
- Fixed bug in
display_schemathat caused domains not to be displayed. - Modified how
get_schema_dataframeoutputs numeric domains. - Anomalies previously (un)classified as UKNOWN_TYPE now trigger more specific
anomaly types: INVALID_DOMAIN_SPECIFICATION and MULTIPLE_REASONS. - Depends on
tensorflow-metadata>=0.30,<0.31. - Depends on
tfx-bsl>=0.30,<0.31.
Known Issues
- N/A
Breaking Changes
- N/A
Deprecations
tfdv.LiftStatsGeneratoris going to be removed in the next version from
the public API. To enable that generator,
supplyStatsOptions.label_featuretfdv.NonStreamingCustomStatsGeneratoris going to be removed in the next
version from the public API. You may continue to import it from TFDV
but it will not be subject to compatibility guarantees.tfdv.validate_instanceis going to be removed in the next
version from the public API. You may continue to import it from TFDV
but it will not be subject to compatibility guarantees.- Removed
tfdv.DecodeCSV,tfdv.DecodeTFExample(deprecated in 0.27). - Removed
feature_whitelistintfdv.StatsOptions(deprecated in 0.28).
Usefeature_allowlistinstead. tfdv.get_feature_value_sliceris deprecated.
tfdv.experimental_get_feature_value_sliceris introduced as a replacement.
TFDV is likely to have a different slicing functionality post 1.0, which
may not be compatible with the current slicers.StatsOptions.slicing_functionsis deprecated.
StatsOptions.experimental_slicing_functionsis introduced as a
replacement.tfdv.WriteStatisticsToTextis removed (deprecated in 0.25.0).- Parameter
compression_typeintfdv.generate_statistics_from_tfrecord
is deprecated. The compression type is currently automatically determined.
TensorFlow Data Validation 0.29.0
Major Features and Improvements
- N/A
Bug Fixes and Other Changes
- Added check for invalid min and max values for
values_countsfor nested
features. - Bumped the mininum bazel version required to build TFDV to 3.7.2.
- Depends on
absl-py>=0.9,<0.13. - Depends on
tensorflow-metadata>=0.29,<0.30. - Depends on
tfx-bsl>=0.29,<0.30.
Known Issues
- N/A
Breaking Changes
- N/A
Deprecations
- N/A
TensorFlow Data Validation 0.28.0
Major Features and Improvements
- Add anomaly detection for max bytes size for images.
Bug Fixes and Other Changes
- Depends on
numpy>=1.16,<1.20. - Fixed a bug that affected all CombinerFeatureStatsGenerators.
- Allow for
bytestype inget_feature_value_slicerin addition toText
andint. - Fixed a bug that caused TFDV to improperly infer a fixed shape when
tfdv.infer_schemaandtfdv.update_schemawere called with
infer_feature_shape=True. - Deprecated parameter
infer_feature_shapeof functiontfdv.update_schema.
If a schema feature has a pre-defined shape,tfdv.update_schemawill
always validate it. Otherwise, it will not try to add a shape. - Deprecated
tfdv.StatsOptions.feature_whitelistand added
feature_allowlistas a replacement. The former will be removed in the next
release. - Added
get_schema_dataframeandget_anomalies_dataframeutility
functions. - Depends on
apache-beam[gcp]>=2.28,<3. - Depends on
tensorflow-metadata>=0.28,<0.29. - Depends on
tfx-bsl>=0.28.1,<0.29.
Known Issues
- N/A
Breaking Changes
- N/A
Deprecations
- N/A
TensorFlow Data Validation 0.27.0
Major Features and Improvements
- Performance improvement to
BasicStatsGenerator.
Bug Fixes and Other Changes
- Added a
compact()andsetup()interface toCombinerStatsGenerator,
CombinerFeatureStatsWrapperGenerator,BasicStatsGenerator,
CompositeStatsGenerator, andConstituentStatsGenerator. - Stopped depending on
tensorflow-transform. - Depends on
apache-beam[gcp]>=2.27,<3. - Depends on
pyarrow>=1,<3. - Depends on
tensorflow>=1.15.2,!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,<3. - Depends on
tensorflow-metadata>=0.27,<0.28. - Depends on
tfx-bsl>=0.27,<0.28.
Known Issues
- N/A
Breaking changes
- N/A
Deprecations
tfdv.DecodeCSVandtfdv.DecodeTFExampleare deprecated. Use
tfx_bsl.public.tfxio.CsvTFXIOandtfx_bsl.public.tfxio.TFExampleRecord
instead.
TensorFlow Data Validation 0.26.0
Version 0.26.0
Major Features and Improvements
- Added support for per-feature example weights which allows associating each
column its specific weight column. See theper_feature_weight_override
parameter inStatsOptions.__init__.
Bug Fixes and Other Changes
- Newly added LifecycleStage.DISABLED is now exempt from validation (similar
to LifecycleStage.DEPRECATED, etc). - Fixed a bug where TFDV blindly trusts the claim type in the provided schema.
TFDV now computes the stats according to the actual type of the data, and
only when the actual type matches the claim in the schema will it compute
type-specific stats (e.g. categorical ints). - Added an option to control whether to add default stats generators when
tfdv.GenerateStatistics(). - Started using a new quantiles computation routine that does not depend on
TF. This could potentially increase the performance of TFDV under certain
workloads. - Extending schema_util to support sematic domains.
- Moving natural_language_stats_generator to
natural_language_domain_inferring_stats_generator. - Providing vocab_utils to assist in opening / loading vocabulary files.
- A SchemaDiff will be reported upon J-S skew/drift.
- Fixed a bug in FLOAT_TYPE_SMALL_FLOAT anomaly message.
- Depends on
apache-beam[gcp]>=2.25,!=2.26.*,<3. - Depends on
tensorflow>=1.15.2,!=2.0.*,!=2.1.*,!=2.2.*,!=2.4.*,<3. - Depends on
tensorflow-metadata>=0.26,<0.27. - Depends on
tensorflow-transform>=0.26,<0.27. - Depends on
tfx-bsl>=0.26,<0.27.
Known Issues
- N/A
Breaking changes
- N/A
Deprecations
- N/A
TensorFlow Data Validation 0.25.0
Version 0.25.0
Major Features and Improvements
-
Add support for detecting drift and distribution skew in numeric features.
-
tfdv.validate_statisticsnow also reports the raw measurements of
distribution skew/drift (if any is done), regardless whether skew/drift is
detected. The report is in thedrift_skew_infoof theAnomaliesproto
(return value ofvalidate_statistics). -
From this release TFDV will also be hosting nightly packages on
https://pypi-nightly.tensorflow.org. To install the nightly package use the
following command:pip install -i https://pypi-nightly.tensorflow.org/simple tensorflow-data-validationNote: These nightly packages are unstable and breakages are likely to
happen. The fix could often take a week or more depending on the complexity
involved for the wheels to be available on the PyPI cloud service. You can
always use the stable version of TFDV available on PyPI by running the
commandpip install tensorflow-data-validation.
Bug Fixes and Other Changes
- Added
tfdv.load_stats_binaryto load stats what were written using
tfdv.WriteStatisticsToText(nowtfdv.WriteStatisticsToBinaryFile). - Anomalies previously (un)classified as UKNOWN_TYPE now trigger more specific
anomaly types: DOMAIN_INVALID_FOR_TYPE, UNEXPECTED_DATA_TYPE,
FEATURE_MISSING_NAME, FEATURE_MISSING_TYPE, INVALID_SCHEMA_SPECIFICATION - Fixed a bug that
import tensorflow_data_validationwould fail if IPython
is not installed. IPython is an optional dependency of TFDV. - Depends on
apache-beam[gcp]>=2.25,<3. - Depends on
tensorflow-metadata>=0.25,<0.26. - Depends on
tensorflow-transform>=0.25,<0.26. - Depends on
tfx-bsl>=0.25,<0.26.
Known Issues
- N/A
Breaking Changes
tfdv.WriteStatisticsToTextis renamed as
tfdv.WriteStatisticsToBinaryFile. The former is still available but will
be removed in a future release.
Deprecations
- N/A
TensorFlow Data Validation 0.24.1
Major Features and Improvements
- N/A
Bug Fixes and Other Changes
- Depends on
apache-beam[gcp]>=2.24,<3. - Depends on
tensorflow-transform>=0.24.1,<0.25. - Depends on
tfx-bsl>=0.24.1,<0.25.
Known Issues
- N/A
Breaking Changes
- N/A
Deprecations
- N/A
TensorFlow Data Validation 0.23.1
Major Features and Improvements
- N/A
Bug Fixes and Other Changes
- Depends on
apache-beam[gcp]>=2.24,<3.
Known Issues
- N/A
Breaking Changes
- N/A
Deprecations
- Deprecated python 3.5 support.