tensorflow-neural-networks
$
npx mdskill add TheBushidoCollective/han/tensorflow-neural-networksTrain complex neural networks with TensorFlow's Keras API.
- Enables building custom architectures for image classification tasks.
- Integrates with Bash for execution and Read for dataset access.
- Executes training loops using Adam optimizer and sparse loss.
- Outputs model summaries and training history metrics.
SKILL.md
.github/skills/tensorflow-neural-networksView on GitHub ↗
---
name: tensorflow-neural-networks
description: Build and train neural networks with TensorFlow
allowed-tools: [Bash, Read]
---
# TensorFlow Neural Networks
Build and train neural networks using TensorFlow's high-level Keras API and low-level custom implementations. This skill covers everything from simple sequential models to complex custom architectures with multiple outputs, custom layers, and advanced training techniques.
## Sequential Models with Keras
The Sequential API provides the simplest way to build neural networks by stacking layers linearly.
### Basic Image Classification
```python
import tensorflow as tf
from tensorflow import keras
import numpy as np
# Load MNIST dataset
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
# Preprocess data
x_train = x_train.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0
x_train = x_train.reshape(-1, 28 * 28)
x_test = x_test.reshape(-1, 28 * 28)
# Build Sequential model
model = keras.Sequential([
keras.layers.Dense(128, activation='relu', input_shape=(784,)),
keras.layers.Dropout(0.2),
keras.layers.Dense(64, activation='relu'),
keras.layers.Dropout(0.2),
keras.layers.Dense(10, activation='softmax')
])
# Compile model
model.compile(
optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy']
)
# Display model architecture
model.summary()
# Train model
history = model.fit(
x_train, y_train,
batch_size=32,
epochs=5,
validation_split=0.2,
verbose=1
)
# Evaluate model
test_loss, test_accuracy = model.evaluate(x_test, y_test, verbose=0)
print(f"Test accuracy: {test_accuracy:.4f}")
# Make predictions
predictions = model.predict(x_test[:5])
predicted_classes = np.argmax(predictions, axis=1)
print(f"Predicted classes: {predicted_classes}")
print(f"True classes: {y_test[:5]}")
# Save model
model.save('mnist_model.h5')
# Load model
loaded_model = keras.models.load_model('mnist_model.h5')
```
### Convolutional Neural Network
```python
def create_cnn_model(input_shape=(224, 224, 3), num_classes=1000):
"""Create CNN model for image classification."""
model = tf.keras.Sequential([
# Block 1
tf.keras.layers.Conv2D(64, (3, 3), activation='relu', padding='same',
input_shape=input_shape),
tf.keras.layers.Conv2D(64, (3, 3), activation='relu', padding='same'),
tf.keras.layers.MaxPooling2D((2, 2)),
tf.keras.layers.BatchNormalization(),
# Block 2
tf.keras.layers.Conv2D(128, (3, 3), activation='relu', padding='same'),
tf.keras.layers.Conv2D(128, (3, 3), activation='relu', padding='same'),
tf.keras.layers.MaxPooling2D((2, 2)),
tf.keras.layers.BatchNormalization(),
# Block 3
tf.keras.layers.Conv2D(256, (3, 3), activation='relu', padding='same'),
tf.keras.layers.Conv2D(256, (3, 3), activation='relu', padding='same'),
tf.keras.layers.MaxPooling2D((2, 2)),
tf.keras.layers.BatchNormalization(),
# Classification head
tf.keras.layers.GlobalAveragePooling2D(),
tf.keras.layers.Dense(512, activation='relu'),
tf.keras.layers.Dropout(0.5),
tf.keras.layers.Dense(num_classes, activation='softmax')
])
return model
```
### CIFAR-10 CNN Architecture
```python
def generate_model():
return tf.keras.models.Sequential([
tf.keras.layers.Conv2D(32, (3, 3), padding='same', input_shape=x_train.shape[1:]),
tf.keras.layers.Activation('relu'),
tf.keras.layers.Conv2D(32, (3, 3)),
tf.keras.layers.Activation('relu'),
tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),
tf.keras.layers.Dropout(0.25),
tf.keras.layers.Conv2D(64, (3, 3), padding='same'),
tf.keras.layers.Activation('relu'),
tf.keras.layers.Conv2D(64, (3, 3)),
tf.keras.layers.Activation('relu'),
tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),
tf.keras.layers.Dropout(0.25),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(512),
tf.keras.layers.Activation('relu'),
tf.keras.layers.Dropout(0.5),
tf.keras.layers.Dense(10),
tf.keras.layers.Activation('softmax')
])
model = generate_model()
```
## Custom Layers
Create reusable custom layers by subclassing `tf.keras.layers.Layer`.
### Custom Dense Layer
```python
import tensorflow as tf
class CustomDense(tf.keras.layers.Layer):
def __init__(self, units=32, activation=None):
super(CustomDense, self).__init__()
self.units = units
self.activation = tf.keras.activations.get(activation)
def build(self, input_shape):
"""Create layer weights."""
self.w = self.add_weight(
shape=(input_shape[-1], self.units),
initializer='glorot_uniform',
trainable=True,
name='kernel'
)
self.b = self.add_weight(
shape=(self.units,),
initializer='zeros',
trainable=True,
name='bias'
)
def call(self, inputs):
"""Forward pass."""
output = tf.matmul(inputs, self.w) + self.b
if self.activation is not None:
output = self.activation(output)
return output
def get_config(self):
"""Enable serialization."""
config = super().get_config()
config.update({
'units': self.units,
'activation': tf.keras.activations.serialize(self.activation)
})
return config
# Use custom components
custom_model = tf.keras.Sequential([
CustomDense(64, activation='relu', input_shape=(10,)),
CustomDense(32, activation='relu'),
CustomDense(1, activation='sigmoid')
])
custom_model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
```
### Residual Block
```python
import tensorflow as tf
class ResidualBlock(tf.keras.layers.Layer):
def __init__(self, filters, kernel_size=3):
super(ResidualBlock, self).__init__()
self.conv1 = tf.keras.layers.Conv2D(filters, kernel_size, padding='same')
self.bn1 = tf.keras.layers.BatchNormalization()
self.conv2 = tf.keras.layers.Conv2D(filters, kernel_size, padding='same')
self.bn2 = tf.keras.layers.BatchNormalization()
self.activation = tf.keras.layers.Activation('relu')
self.add = tf.keras.layers.Add()
def call(self, inputs, training=False):
x = self.conv1(inputs)
x = self.bn1(x, training=training)
x = self.activation(x)
x = self.conv2(x)
x = self.bn2(x, training=training)
x = self.add([x, inputs]) # Residual connection
x = self.activation(x)
return x
```
### Custom Projection Layer with TF NumPy
```python
class ProjectionLayer(tf.keras.layers.Layer):
"""Linear projection layer using TF NumPy."""
def __init__(self, units):
super(ProjectionLayer, self).__init__()
self._units = units
def build(self, input_shape):
import tensorflow.experimental.numpy as tnp
stddev = tnp.sqrt(self._units).astype(tnp.float32)
initial_value = tnp.random.randn(input_shape[1], self._units).astype(
tnp.float32) / stddev
# Note that TF NumPy can interoperate with tf.Variable.
self.w = tf.Variable(initial_value, trainable=True)
def call(self, inputs):
import tensorflow.experimental.numpy as tnp
return tnp.matmul(inputs, self.w)
# Call with ndarray inputs
layer = ProjectionLayer(2)
tnp_inputs = tnp.random.randn(2, 4).astype(tnp.float32)
print("output:", layer(tnp_inputs))
# Call with tf.Tensor inputs
tf_inputs = tf.random.uniform([2, 4])
print("\noutput: ", layer(tf_inputs))
```
## Custom Models
Build complex architectures by subclassing `tf.keras.Model`.
### Multi-Task Model
```python
import tensorflow as tf
class MultiTaskModel(tf.keras.Model):
def __init__(self, num_classes_task1=10, num_classes_task2=5):
super(MultiTaskModel, self).__init__()
# Shared layers
self.conv1 = tf.keras.layers.Conv2D(32, 3, activation='relu')
self.pool = tf.keras.layers.MaxPooling2D()
self.flatten = tf.keras.layers.Flatten()
self.shared_dense = tf.keras.layers.Dense(128, activation='relu')
# Task-specific layers
self.task1_dense = tf.keras.layers.Dense(64, activation='relu')
self.task1_output = tf.keras.layers.Dense(num_classes_task1,
activation='softmax', name='task1')
self.task2_dense = tf.keras.layers.Dense(64, activation='relu')
self.task2_output = tf.keras.layers.Dense(num_classes_task2,
activation='softmax', name='task2')
def call(self, inputs, training=False):
# Shared feature extraction
x = self.conv1(inputs)
x = self.pool(x)
x = self.flatten(x)
x = self.shared_dense(x)
# Task 1 branch
task1 = self.task1_dense(x)
task1_output = self.task1_output(task1)
# Task 2 branch
task2 = self.task2_dense(x)
task2_output = self.task2_output(task2)
return task1_output, task2_output
```
### Three-Layer Neural Network Module
```python
class Model(tf.Module):
"""A three layer neural network."""
def __init__(self):
self.layer1 = Dense(128)
self.layer2 = Dense(32)
self.layer3 = Dense(NUM_CLASSES, use_relu=False)
def __call__(self, inputs):
x = self.layer1(inputs)
x = self.layer2(x)
return self.layer3(x)
@property
def params(self):
return self.layer1.params + self.layer2.params + self.layer3.params
```
## Recurrent Neural Networks
### Custom GRU Cell
```python
import tensorflow.experimental.numpy as tnp
class GRUCell:
"""Builds a traditional GRU cell with dense internal transformations.
Gated Recurrent Unit paper: https://arxiv.org/abs/1412.3555
"""
def __init__(self, n_units, forget_bias=0.0):
self._n_units = n_units
self._forget_bias = forget_bias
self._built = False
def __call__(self, inputs):
if not self._built:
self.build(inputs)
x, gru_state = inputs
# Dense layer on the concatenation of x and h.
y = tnp.dot(tnp.concatenate([x, gru_state], axis=-1), self.w1) + self.b1
# Update and reset gates.
u, r = tnp.split(tf.sigmoid(y), 2, axis=-1)
# Candidate.
c = tnp.dot(tnp.concatenate([x, r * gru_state], axis=-1), self.w2) + self.b2
new_gru_state = u * gru_state + (1 - u) * tnp.tanh(c)
return new_gru_state
def build(self, inputs):
# State last dimension must be n_units.
assert inputs[1].shape[-1] == self._n_units
# The dense layer input is the input and half of the GRU state.
dense_shape = inputs[0].shape[-1] + self._n_units
self.w1 = tf.Variable(tnp.random.uniform(
-0.01, 0.01, (dense_shape, 2 * self._n_units)).astype(tnp.float32))
self.b1 = tf.Variable((tnp.random.randn(2 * self._n_units) * 1e-6 + self._forget_bias
).astype(tnp.float32))
self.w2 = tf.Variable(tnp.random.uniform(
-0.01, 0.01, (dense_shape, self._n_units)).astype(tnp.float32))
self.b2 = tf.Variable((tnp.random.randn(self._n_units) * 1e-6).astype(tnp.float32))
self._built = True
@property
def weights(self):
return (self.w1, self.b1, self.w2, self.b2)
```
### Custom Dense Layer Implementation
```python
import tensorflow.experimental.numpy as tnp
class Dense:
def __init__(self, n_units, activation=None):
self._n_units = n_units
self._activation = activation
self._built = False
def __call__(self, inputs):
if not self._built:
self.build(inputs)
y = tnp.dot(inputs, self.w) + self.b
if self._activation != None:
y = self._activation(y)
return y
def build(self, inputs):
shape_w = (inputs.shape[-1], self._n_units)
lim = tnp.sqrt(6.0 / (shape_w[0] + shape_w[1]))
self.w = tf.Variable(tnp.random.uniform(-lim, lim, shape_w).astype(tnp.float32))
self.b = tf.Variable((tnp.random.randn(self._n_units) * 1e-6).astype(tnp.float32))
self._built = True
@property
def weights(self):
return (self.w, self.b)
```
### Sequential RNN Model
```python
class Model:
def __init__(self, vocab_size, embedding_dim, rnn_units, forget_bias=0.0, stateful=False, activation=None):
self._embedding = Embedding(vocab_size, embedding_dim)
self._gru = GRU(rnn_units, forget_bias=forget_bias, stateful=stateful)
self._dense = Dense(vocab_size, activation=activation)
self._layers = [self._embedding, self._gru, self._dense]
self._built = False
def __call__(self, inputs):
if not self._built:
self.build(inputs)
xs = inputs
for layer in self._layers:
xs = layer(xs)
return xs
def build(self, inputs):
self._embedding.build(inputs)
self._gru.build(tf.TensorSpec(inputs.shape + (self._embedding._embedding_dim,), tf.float32))
self._dense.build(tf.TensorSpec(inputs.shape + (self._gru._cell._n_units,), tf.float32))
self._built = True
@property
def weights(self):
return [layer.weights for layer in self._layers]
@property
def state(self):
return self._gru.state
def create_state(self, *args):
self._gru.create_state(*args)
def reset_state(self, *args):
self._gru.reset_state(*args)
```
## Training Configuration
### Model Parameters
```python
# Length of the vocabulary in chars
vocab_size = len(vocab)
# The embedding dimension
embedding_dim = 256
# Number of RNN units
rnn_units = 1024
# Batch size
BATCH_SIZE = 64
# Buffer size to shuffle the dataset
BUFFER_SIZE = 10000
```
### Training Constants for MNIST
```python
# Size of each input image, 28 x 28 pixels
IMAGE_SIZE = 28 * 28
# Number of distinct number labels, [0..9]
NUM_CLASSES = 10
# Number of examples in each training batch (step)
TRAIN_BATCH_SIZE = 100
# Number of training steps to run
TRAIN_STEPS = 1000
# Loads MNIST dataset.
train, test = tf.keras.datasets.mnist.load_data()
train_ds = tf.data.Dataset.from_tensor_slices(train).batch(TRAIN_BATCH_SIZE).repeat()
# Casting from raw data to the required datatypes.
def cast(images, labels):
images = tf.cast(
tf.reshape(images, [-1, IMAGE_SIZE]), tf.float32)
labels = tf.cast(labels, tf.int64)
return (images, labels)
```
### Post-Training Quantization
```python
# Load MNIST dataset
mnist = keras.datasets.mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
# Normalize the input image so that each pixel value is between 0 to 1.
train_images = train_images / 255.0
test_images = test_images / 255.0
# Define the model architecture
model = keras.Sequential([
keras.layers.InputLayer(input_shape=(28, 28)),
keras.layers.Reshape(target_shape=(28, 28, 1)),
keras.layers.Conv2D(filters=12, kernel_size=(3, 3), activation=tf.nn.relu),
keras.layers.MaxPooling2D(pool_size=(2, 2)),
keras.layers.Flatten(),
keras.layers.Dense(10)
])
# Train the digit classification model
model.compile(optimizer='adam',
loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
model.fit(
train_images,
train_labels,
epochs=1,
validation_data=(test_images, test_labels)
)
```
## When to Use This Skill
Use the tensorflow-neural-networks skill when you need to:
- Build image classification models with CNNs
- Create text processing models with RNNs or transformers
- Implement custom layer architectures for specific use cases
- Design multi-task learning models with shared representations
- Train sequential models for tabular data
- Implement residual connections or skip connections
- Create embedding layers for discrete inputs
- Build autoencoders or generative models
- Fine-tune pre-trained models with custom heads
- Implement attention mechanisms in custom architectures
- Create time-series prediction models
- Design reinforcement learning policy networks
- Build siamese networks for similarity learning
- Implement custom gradient computation in layers
- Create models with dynamic architectures based on input
## Best Practices
1. **Use Keras Sequential for simple architectures** - Start with Sequential API for linear layer stacks before moving to functional or subclassing APIs
2. **Leverage pre-built layers** - Use tf.keras.layers built-in implementations before creating custom layers
3. **Initialize weights properly** - Use appropriate initializers (glorot_uniform, he_normal) based on activation functions
4. **Add batch normalization** - Place BatchNormalization layers after Conv2D/Dense layers for training stability
5. **Use dropout for regularization** - Apply Dropout layers (0.2-0.5) to prevent overfitting in fully connected layers
6. **Compile before training** - Always call model.compile() with optimizer, loss, and metrics before fit()
7. **Monitor validation metrics** - Use validation_split or validation_data to track overfitting during training
8. **Save model checkpoints** - Implement ModelCheckpoint callback to save best models during training
9. **Use model.summary()** - Verify architecture and parameter counts before training
10. **Implement early stopping** - Add EarlyStopping callback to prevent unnecessary training iterations
11. **Normalize input data** - Scale pixel values to [0,1] or standardize features to mean=0, std=1
12. **Use appropriate activation functions** - ReLU for hidden layers, softmax for multi-class, sigmoid for binary
13. **Set proper loss functions** - sparse_categorical_crossentropy for integer labels, categorical_crossentropy for one-hot
14. **Implement custom get_config()** - Override get_config() in custom layers for model serialization
15. **Use training parameter in call()** - Pass training flag to enable/disable dropout and batch norm behavior
## Common Pitfalls
1. **Forgetting to normalize data** - Unnormalized inputs cause slow convergence and poor performance
2. **Wrong loss function for labels** - Using categorical_crossentropy with integer labels causes errors
3. **Missing input_shape** - First layer needs input_shape parameter for model building
4. **Overfitting on small datasets** - Add dropout, augmentation, or reduce model capacity
5. **Learning rate too high** - Causes unstable training and loss divergence
6. **Not shuffling training data** - Leads to biased batch statistics and poor generalization
7. **Batch size too small** - Causes noisy gradients and slow training on large datasets
8. **Too many parameters** - Large models overfit and train slowly on limited data
9. **Vanishing gradients in deep networks** - Use residual connections or batch normalization
10. **Not using validation data** - Cannot detect overfitting or tune hyperparameters properly
11. **Forgetting to set training=False** - Dropout/BatchNorm behave incorrectly during inference
12. **Incompatible layer dimensions** - Output shape of one layer must match input of next
13. **Not calling build() before weights** - Custom layers need proper initialization before accessing weights
14. **Using wrong optimizer** - Adam works well generally, but SGD with momentum for some tasks
15. **Ignoring class imbalance** - Implement class weights or resampling for imbalanced datasets
## Resources
- [TensorFlow Keras Guide](https://www.tensorflow.org/guide/keras)
- [Building Custom Layers](https://www.tensorflow.org/guide/keras/custom_layers_and_models)
- [Training and Evaluation](https://www.tensorflow.org/guide/keras/train_and_evaluate)
- [Sequential Model API](https://www.tensorflow.org/api_docs/python/tf/keras/Sequential)
- [Model Subclassing Guide](https://www.tensorflow.org/guide/keras/custom_layers_and_models#the_model_class)
- [RNN Guide](https://www.tensorflow.org/guide/keras/rnn)
- [Custom Training Loops](https://www.tensorflow.org/guide/keras/writing_a_training_loop_from_scratch)
- [Transfer Learning Guide](https://www.tensorflow.org/tutorials/images/transfer_learning)
More from TheBushidoCollective/han
- absinthe-resolversUse when implementing GraphQL resolvers with Absinthe. Covers resolver patterns, dataloader integration, batching, and error handling.
- absinthe-schemaUse when designing GraphQL schemas with Absinthe. Covers type definitions, interfaces, unions, enums, and schema organization patterns.
- absinthe-subscriptionsUse when implementing real-time GraphQL subscriptions with Absinthe. Covers Phoenix channels, PubSub, and subscription patterns.
- act-docker-setupUse when configuring Docker environments for act, selecting runner images, managing container resources, or troubleshooting Docker-related issues with local GitHub Actions testing.
- act-local-testingUse when testing GitHub Actions workflows locally with act. Covers act CLI usage, Docker configuration, debugging workflows, and troubleshooting common issues when running workflows on your local machine.
- act-workflow-syntaxUse when creating or modifying GitHub Actions workflow files. Provides guidance on workflow syntax, triggers, jobs, steps, and expressions for creating valid GitHub Actions workflows that can be tested locally with act.
- ameba-configurationUse when configuring Ameba rules and settings for Crystal projects including .ameba.yml setup, rule management, severity levels, and code quality enforcement.
- ameba-custom-rulesUse when creating custom Ameba rules for Crystal code analysis including rule development, AST traversal, issue reporting, and rule testing.
- ameba-integrationUse when integrating Ameba into development workflows including CI/CD pipelines, pre-commit hooks, GitHub Actions, and automated code review processes.
- analyze-performanceAnalyze performance metrics and identify slow transactions in Sentry