Selected topic
Data Annotation
Prefer practical output? Use related tools below while reading.
Data annotation is the process of adding labels or annotations to data to prepare it for machine learning model training. It's a crucial step in data preprocessing, as annotated data provides context and meaning to the raw data, enabling models to learn from it accurately.
python
import tensorflow as tf# Load the dataset of images
from tensorflow.keras.preprocessing.image import ImageDataGenerator
# Define data generators for training, validation, and testing sets
train_datagen = ImageDataGenerator(rescale=1./255,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True)
validation_datagen = ImageDataGenerator(rescale=1./255)
test_datagen = ImageDataGenerator(rescale=1./255)
# Define the data preprocessing pipeline
datagen_pipeline = tf.data.Dataset.from_tensor_slices((train_images, train_labels))
datagen_pipeline = datagen_pipeline.map(lambda image, label: (tf.image.resize(image, (224, 224)), label))
# Create a dataset for training and validation sets
train_dataset = datagen_pipeline.batch(32).prefetch(tf.data.experimental.AUTOTUNE)
validation_dataset = datagen_pipeline.batch(32).prefetch(tf.data.experimental.AUTOTUNE)
# Define the CNN model architecture
model = tf.keras.models.Sequential([
tf.keras.layers.Conv2D(32, (3, 3), activation='relu', input_shape=(224, 224, 3)),
tf.keras.layers.MaxPooling2D((2, 2)),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(1, activation='sigmoid')
])
# Compile the model
model.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy'])
# Train the model on the training dataset
model.fit(train_dataset, epochs=10)
In this example, we first load the dataset of images and define data generators for training, validation, and testing sets. We then define a data preprocessing pipeline using TensorFlow's tf.data.Dataset API to resize images to 224x224 pixels. Finally, we create datasets for training and validation sets and compile a CNN model architecture using Keras.
Data annotation is an essential step in machine learning development. By accurately annotating your data, you can improve the performance of your models and unlock valuable insights from your data.