neural networks

Layers of weighted connections that learn to approximate functions from data. The computational substrate of modern artificial intelligence.

architecture

  • neuron: weighted sum of inputs passed through an activation function (ReLU, sigmoid, tanh)
  • layer: collection of neurons. Input, hidden, and output layers.
  • feedforward: signals flow in one direction, universal approximation theorem
  • recurrent (RNN): connections loop back, modeling sequences, LSTM and GRU variants
  • convolutional (CNN): local receptive fields, weight sharing, dominant in vision
  • transformer: self-attention mechanism, parallelizable, dominant in language. GPT, BERT

learning

  • backpropagation: computing gradients of the loss function via chain rule, flowing error backward through layers
  • gradient descent: adjusting weights to minimize loss (SGD, Adam)
  • loss function: measures the gap between prediction and ground truth
  • overfitting and regularization: dropout, weight decay, early stopping

deep learning

Stacking many layers enables hierarchical feature extraction. Deeper networks learn increasingly abstract representations. Scale of data, compute, and parameters drives capability.

connections

algorithms underpin training optimization. data structures (tensors, graphs) organize network computation. type theory informs tensor shape checking. consensus algorithms in cyber enable decentralized training and inference.