Layers of weighted connections that learn to approximate functions from data. The computational substrate of modern artificial intelligence.

architecture

neuron: weighted sum of inputs passed through an activation function (ReLU, sigmoid, tanh)

layer: collection of neurons. Input, hidden, and output layers.

feedforward: signals flow in one direction, universal approximation theorem

recurrent (RNN): connections loop back, modeling sequences, LSTM and GRU variants

convolutional (CNN): local receptive fields, weight sharing, dominant in vision

transformer: self-attention mechanism, parallelizable, dominant in language. GPT, BERT

learning

backpropagation: computing gradients of the loss function via chain rule, flowing error backward through layers

gradient descent: adjusting weights to minimize loss (SGD, Adam)

loss function: measures the gap between prediction and ground truth

overfitting and regularization: dropout, weight decay, early stopping

deep learning

Stacking many layers enables hierarchical feature extraction. Deeper networks learn increasingly abstract representations. Scale of data, compute, and parameters drives capability.

connections

algorithms underpin training optimization. data structures (tensors, graphs) organize network computation. type theory informs tensor shape checking. consensus algorithms in cyber enable decentralized training and inference.

Local Graph