KlongPy: PyTorch Back End and Autograd

KlongPy now supports a PyTorch backend that enables GPU acceleration and automatic differentiation for gradient-based computations. The torch backend outperforms NumPy by up to 8x on large arrays and provides exact gradients via the `:>` autograd operator, while the `∇` operator always uses numeric differentiation regardless of backend. Users can enable the PyTorch backend through the `--backend torch` flag at the command line or by setting `backend="torch"` when creating a KlongInterpreter.

PyTorch Backend and Autograd ¶ pytorch-backend-and-autograd KlongPy supports multiple array backends. The PyTorch backend enables GPU acceleration and automatic differentiation autograd for gradient-based computations. Enabling the PyTorch Backend ¶ enabling-the-pytorch-backend Command Line ¶ command-line Use --backend flag kgpy --backend torch With GPU device selection kgpy --backend torch --device cuda Programmatically ¶ programmatically python from klongpy import KlongInterpreter Create interpreter with torch backend klong = KlongInterpreter backend="torch" print klong. backend.name 'torch' With specific device klong = KlongInterpreter backend="torch", device="cuda" Backend Comparison ¶ backend-comparison | Feature | NumPy Backend | PyTorch Backend | |---|---|---| | Default | Yes | No use --backend torch | | Object dtype | Yes | No | | String operations | Yes | Not supported | | GPU acceleration | No | Yes CUDA/MPS | | Autograd | Numeric only | Native autograd | | Small array performance | Faster | Slightly slower | | Large array performance | Good | Better especially on GPU | Performance ¶ performance The torch backend excels with large arrays: Benchmark NumPy Torch Winner --------------------------------------------------------- vector add 100K 0.04ms 0.08ms NumPy 2x vector add 1M 0.36ms 0.07ms Torch 5x compound expr 1M 0.61ms 0.07ms Torch 8x grade up 100K 0.59ms 0.19ms Torch 3x For small arrays <100K elements , NumPy is slightly faster due to lower dispatch overhead. For larger arrays, torch wins significantly. Automatic Differentiation ¶ automatic-differentiation KlongPy provides several gradient and differentiation operators: Typing Special Characters ¶ typing-special-characters | Symbol | Name | Mac | Windows | |---|---|---|---| ∇ | Nabla | Character Viewer Ctrl+Cmd+Space | Alt+8711 | ∂ | Partial | Option + d | Alt+8706 | On Mac, ∂ can be typed directly with Option + d . For ∇ , use the Character Viewer or copy-paste. : Autograd Operator Recommended ¶ autograd-operator-recommended The : operator uses PyTorch autograd for exact gradients: f::{x^2} :" Define f x = x^2 f: 3 :" Compute f' 3 = 6.0 The syntax is function: point where: - function is a scalar-valued function must return a single number - point is the input at which to compute the gradient ∇ Numeric Gradient Operator ¶ numeric-gradient-operator The ∇ operator always uses numeric differentiation finite differences , regardless of backend: f::{x^2} :" Define f x = x^2 3∇f :" Compute f' 3 ≈ 6.0 The syntax is point∇function note: reversed order from : . How They Work ¶ how-they-work | Operator | Method | Precision | Speed | |---|---|---|---| : with torch | PyTorch autograd | Exact | Fast | : without torch | Numeric | ~1e-6 error | Slower | ∇ any backend | Always numeric | ~1e-6 error | Slower | With the torch backend --backend torch or backend='torch' , prefer : for: - Exact gradients no floating-point approximation error - Complex computational graphs - Better performance on large arrays Examples ¶ examples Scalar function: f::{x^3} :" f x = x^3 f: 2 :" f' 2 = 3 4 = 12.0 Polynomial: p::{ 3 x^4 - 2 x^2 +x} :" p x = 3x^4 - 2x^2 + x p: 1 :" p' 1 = 12 - 4 + 1 = 9.0 Vector function sum of squares : g::{+/x^2} :" g x = sum x i^2 g: 1.0 2.0 3.0 :" 2 4 6 = 2 x Gradient descent: f::{x^2} x::5.0 lr::0.1 :" Update rule: x = x - lr grad x::x- lr f: x Multi-Parameter Gradients ¶ multi-parameter-gradients Compute gradients for multiple parameters simultaneously using a list of symbols: w::2.0 b::3.0 loss::{ w^2 + b^2 } :" Compute gradients for both w and b grads::loss: w b :" 4.0 6.0 = 2w, 2b This is especially useful for neural network training: w::1.0 b::0.0 X:: 1 2 3 Y:: 3 5 7 :" MSE loss loss::{ +/ w X +b-Y ^2 %3} :" Compute both gradients in one call grads::loss: w b Jacobian Computation ¶ jacobian-computation Compute the Jacobian matrix matrix of partial derivatives using the ∂ operator or .jacobian function: f::{x^2} :" Element-wise square :" Using ∂ operator point∂function 1 2 ∂f :" 2 0 0 4 diagonal matrix :" Using .jacobian function .jacobian f; 1 2 :" Same result For vector-valued functions f: R^n - R^m, the Jacobian is an m x n matrix where J i,j = df i/dx j. Multi-Parameter Jacobians ¶ multi-parameter-jacobians Just like gradients, you can compute Jacobians with respect to multiple parameters using a list of symbols: w:: 1.0 2.0 b:: 3.0 4.0 f::{w^2} :" Returns w0^2, w1^2 :" Compute Jacobians for both w and b jacobians:: w b ∂f :" Returns J w, J b This returns a list of Jacobian matrices, one per parameter. Useful for analyzing how vector-valued functions depend on multiple parameter sets. Custom Optimizers ¶ custom-optimizers KlongPy provides the gradient primitives : , ∂ , .jacobian . For optimizers, use the example classes in examples/autograd/optimizers.py which you can copy to your project and customize. Manual gradient descent no optimizer needed : w::10.0 loss::{w^2} lr::0.1 :" Update rule: w = w - lr gradient {w::w- lr loss: w }' 50 w :" Close to 0 Using a custom optimizer class: - Copy examples/autograd/optimizers.py to your project directory - Import with .pyf : :" Import the optimizer class .pyf "optimizers";"SGDOptimizer" :" Setup parameters and loss w::10.0 loss::{w^2} :" Create optimizer with learning rate 0.1 opt::SGDOptimizer klong; "w" ;:{ "lr" 0.1 } :" Run optimization steps {opt loss }' 50 w :" Close to 0 Available example optimizers: - SGDOptimizer - Stochastic Gradient Descent with optional momentum - AdamOptimizer - Adam optimizer with adaptive learning rates SGD with momentum: .pyf "optimizers";"SGDOptimizer" opt::SGDOptimizer klong; "w" ;:{ "lr" 0.01 "momentum" 0.9 } Adam optimizer: .pyf "optimizers";"AdamOptimizer" opt::AdamOptimizer klong; "w" "b" ;:{ "lr" 0.001 } Training loop example: .pyf "optimizers";"AdamOptimizer" w::1.0;b::0.0 X:: 1 2 3 ;Y:: 3 5 7 loss::{ +/ w X +b-Y ^2 %3} opt::AdamOptimizer klong; "w" "b" ;:{ "lr" 0.1 } :" Train for 500 steps {opt loss }' 500 Creating your own optimizer: The example optimizers use multi grad of fn from klongpy.autograd to compute gradients for multiple parameters. Copy and modify the optimizer classes to implement custom update rules RMSprop, AdaGrad, learning rate schedules, etc. . GPU Acceleration ¶ gpu-acceleration When CUDA or Apple MPS is available, tensors automatically use GPU: python from klongpy import KlongInterpreter klong = KlongInterpreter backend='torch' print klong. backend.device 'cuda:0', 'mps:0', or 'cpu' Device Selection ¶ device-selection The backend automatically selects the best available device: 1. CUDA NVIDIA GPU - if available 2. MPS Apple Silicon - if available 3. CPU - fallback MPS Limitations ¶ mps-limitations Apple's MPS backend has some limitations: - No float64 support uses float32 - Some operations fall back to CPU Mixing with Python ¶ mixing-with-python Access torch tensors directly: python from klongpy import KlongInterpreter klong = KlongInterpreter backend='torch' KlongPy operations return torch tensors result = klong '2 1+ 1000000' print type result <class 'torch.Tensor' print result.device cuda:0, mps:0, or cpu Convert to numpy when needed import numpy as np np result = result.cpu .numpy Best Practices ¶ best-practices - Use torch for large computations : Switch to torch backend for arrays 100K elements - Keep data as tensors : Avoid unnecessary conversions between numpy and torch - Batch operations : Combine operations to minimize dispatch overhead - Use autograd for gradients : Native autograd is faster and more accurate than numeric differentiation Function Compilation ¶ function-compilation The torch backend supports compiling Klong functions for optimized execution using torch.compile : .compile fn;input - Compile Function ¶ compilefninput-compile-function Compiles a function for faster execution: f::{x^2} cf::.compile f;3.0 :" Returns compiled function cf 5.0 :" 25.0 optimized The compiled function runs significantly faster for complex computations. .export fn;input;path - Export Computation Graph ¶ exportfninputpath-export-computation-graph Exports the function's computation graph to a file for inspection: f::{ x^3 + 2 x^2 +x} info::.export f;2.0;"model.pt2" .p info@"graph" :" Print computation graph Returns a dictionary with: - "compiled fn" - The compiled function - "export path" - Path where graph was saved - "graph" - String representation of computation graph The exported .pt2 file can be loaded with torch.export.load in Python. .compilex fn;input;options - Extended Compilation ¶ compilexfninputoptions-extended-compilation Compile with advanced options for mode and backend: f::{x^2} :" Fast compilation for development cf::.compilex f;3.0;:{ "mode" "reduce-overhead" } :" Maximum optimization for production cf::.compilex f;3.0;:{ "mode" "max-autotune" } :" Debug mode no compilation cf::.compilex f;3.0;:{ "backend" "eager" } Options dictionary: - "mode" - Compilation mode see table below - "backend" - Compilation backend see table below - "fullgraph" - Set to 1 to require full graph compilation - "dynamic" - Set to 1 for dynamic shapes, 0 for static .cmodes - Query Compilation Modes ¶ cmodes-query-compilation-modes Get information about available modes and backends: info::.cmodes .p info@"modes" :" Available compilation modes .p info@"backends" :" Available backends .p info@"recommendations" :" Suggested settings Compilation Mode Comparison ¶ compilation-mode-comparison | Mode | Compile Time | Runtime Speed | Best For | |---|---|---|---| default | Medium | Good | General use | reduce-overhead | Fast | Moderate | Development/testing | max-autotune | Slow | Best | Production | Backend Comparison ¶ backend-comparison 1 | Backend | Description | |---|---| inductor | Default - C++/Triton code generation fastest | eager | No compilation - runs original Python debugging | aot eager | Ahead-of-time eager debugging + autograd | cudagraphs | CUDA graphs - reduces GPU kernel launch overhead | Note: Compilation requires a C++ compiler on your system. Use "backend" "eager" to bypass compilation for debugging. If compilation fails, an error message will indicate the issue. Gradient Verification ¶ gradient-verification Use .gradcheck to verify that autograd gradients are correct: .gradcheck fn;inputs - Verify Gradients ¶ gradcheckfninputs-verify-gradients Verifies autograd gradients against numeric gradients: f::{x^2} .gradcheck f;3.0 :" Returns 1 if correct g::{+/x^2} .gradcheck g; 1.0 2.0 3.0 :" Returns 1 This uses torch.autograd.gradcheck internally for rigorous verification. Use cases: - Verifying custom gradient implementations - Debugging gradient computation issues - Ensuring numerical stability Troubleshooting ¶ troubleshooting "PyTorch backend does not support object dtype" ¶ pytorch-backend-does-not-support-object-dtype The torch backend cannot handle mixed-type arrays or nested structures with varying shapes. Use the numpy backend for these cases. MPS float64 errors ¶ mps-float64-errors MPS doesn't support float64. The backend automatically converts to float32, but some precision-sensitive operations may behave differently. Slow small array operations ¶ slow-small-array-operations For arrays <10K elements, numpy may be faster. Consider using numpy backend for small array workloads or batching small operations together. torch.compile errors ¶ torchcompile-errors If .compile fails with C++ errors, ensure you have: - A C++ compiler installed clang++ or g++ - The required header files may need Xcode Command Line Tools on macOS