{"slug": "a-brief-introduction-to-q-and-kdb-x", "title": "A Brief Introduction to q and KDB-X", "summary": "KDB-X, a high-performance ecosystem built on the q language, offers a concise, dynamically typed programming environment optimized for streaming, real-time, and historical data. The platform eliminates multi-layer architecture overhead by integrating application logic and data, and provides a minimalist syntax derived from APL for efficient data manipulation.", "body_md": "# A Brief Introduction to q and KDB-X[¶](#a-brief-introduction-to-q-and-kdb-x)\n\n*Welcome! This page introduces the basics of q and KDB‑X. Through practical examples, you'll learn how to create data, run queries, and understand the core principles of q's concise syntax and high‑performance design. No prior knowledge is required. You'll pick up the essentials as you work through the exercises.*\n\nKDB-X is a powerful ecosystem built on top of **q**. The q language is a concise, expressive, dynamically typed, interpreted programming language with a built-in database engine optimized for streaming, real-time, and historical data. By bringing the application logic and data together, KDB-X eliminates the overhead associated with complex multi-layer architectures.\n\nIf you don't have KDB-X installed yet, follow this quick [install KDB-X guide](../get_started/kdb-x-install.html).\n\n## Launch q[¶](#launch-q)\n\nIn your terminal, type `q`\n\nto start an interactive session. When the `q)`\n\nprompt appears, the interpreter is ready.\n\n```\nq\nKDB-X 5.0.20251113 2025.11.13 Copyright (C) 1993-2025 Kx Systems\n...\n\nq)\n```\n\nNote\n\nThe code examples below are cumulative. Each section assumes the variables and state defined in all preceding sections are still in your active q session.\n\n## Standard constructs[¶](#standard-constructs)\n\nLike most languages, q allows you to create scalars, [lists](../how_to/basics/data_structures/lists.html), and [dictionaries](../how_to/basics/data_structures/dictionaries.html), and assign them to variables (using the colon `:`\n\n). Below are some common q commands and their Python equivalents:\n\n**q**\n\n```\nq)n:8       / Assign an integer\nq)n\n8\nq)show PI: 3.14\n3.14\n\nq)b:0b      / A boolean (0b for false, 1b for true)\n/ Create a list 0-4 and reverse it\nq)show l:reverse til 5\n4 3 2 1 0\n\nq)(8; 3.14; (\"Alice\"; \"Bob\"; \"Mike\")) / A nested list\n8\n3.14\n(\"Alice\";\"Bob\";\"Mike\")\n/ Assign to multiple values\nq)(n; friends; (ONE; ; THREE)): (8; (\"Alice\"; \"Bob\"; \"Mike\"); 1 2 3) / pattern matching\n\nq)show contacts:([Alice: \"555-0101\"; Bob: \"555-0723\"; Mike: \"555-6666\"])\nAlice| \"555-0101\"\nBob  | \"555-0723\"\nMike | \"555-6666\"\n```\n\n**Python**\n\n```\n>>> n = 8\n>>> n\n8\n>>> PI = 3.14\n>>> PI\n3.14\n>>> b = False\n\n>>> l = list(reversed(range(5)))\n>>> l\n[4, 3, 2, 1, 0]\n>>> [8, 3.14, [\"Alice\", \"Bob\", \"Mike\"]]\n[8, 3.14, ['Alice', 'Bob', 'Mike']]\n\n>>> n, friends, (ONE, _, THREE) = [8, [\"Alice\", \"Bob\", \"Mike\"], [1,2,3]] # unpacking\n\n>>> contacts = {\"Alice\": \"555-0101\", \"Bob\": \"555-0723\", \"Mike\": \"555-6666\"}\n>>> contacts\n{'Alice': '555-0101', 'Bob': '555-0723', 'Mike': '555-6666'}\n```\n\nYou can define functions, use execution controls (like if-then-else), and call built-in operators and functions.\n\n**q**\n\n```\nq)callRandomFriend:{f: rand key contacts; \"Calling \", string[f], \" at \", contacts f}\n\nq)callRandomFriend[]\n\"Calling Bob at 555-0723\"\nq)area:{[r] PI*r*r}\nq)if[n<14; \"I'm a child!\"] / if statement\n\"I'm a child!\"\n```\n\n**Python**\n\n``` python\n>>> import random\n>>> def callRandomFriend():\n...     key, value = random.choice(list(contacts.items()))\n...     return f\"Calling {key} at {value}\"\n...\n>>> callRandomFriend()\n'Calling Mike at 555-6666'\n>>> area = lambda r: PI * r * r\n>>> if n < 14:\n...     \"I'm a child!\"\n...\n\"I'm a child!\"\n```\n\nUse `exit 0`\n\n, `\\\\`\n\nor Ctrl-D (that is EOF) to exit a q session.\n\nYou can put your q commands into a text file and run it:\n\n```\nq myscript.q\n```\n\nor load it into your q session:\n\n```\nq)\\l myscript.q\n```\n\n## The beauty of q[¶](#the-beauty-of-q)\n\nThe following sections highlight what makes q distinctive.\n\n### Minimalist syntax (no noise)[¶](#minimalist-syntax-no-noise)\n\nq descends from APL (A Programming Language), a language rooted in mathematical notation. In q, lists, dictionaries, and functions are all **mappings** - a unified concept that means the same square-bracket notation works for all three.\n\n**q**\n\n```\nq)l[2]              / Indexing a list\n2\nq)contacts[`Alice]  / Looking up a dictionary by a key\n\"555-0101\"\nq)area[5]           / Applying a function\n78.5\n```\n\n**Python**\n\n```\n>>> l[2]\n2\n>>> contacts[\"Alice\"]\n'555-0101'\n>>> area(5)\n78.5\n```\n\nThis is polymorphism at its most fundamental level. To further reduce \"noise\", q allows you to omit brackets and use whitespace to separate list items.\n\n```\nq)l 2    / Equivalent to l[2]\n2\nq)4 1 7  / A list of integers (no commas or parentheses needed)\n4 1 7\n```\n\nIt is common in mathematics to use function parameters `x`\n\n, `y`\n\n, or `z`\n\n. You can omit parameter declaration and q will understand that you mean `x`\n\n, `y`\n\n, and `z`\n\n**in that order**:\n\n```\nq)manhattan:{sum abs x-y}\nq)manhattan[1 2 3; 3 2 1]\n4\n```\n\nReducing boilerplate code is a basic principle in q.\n\n### Right-to-left evaluation[¶](#right-to-left-evaluation)\n\nUnlike most languages, q has **no operator precedence**. Expressions are evaluated strictly from **right to left**.\n\n```\nq)2*1+3 / 1+3 is 4, then 2*4\n8\nq)3+2>1 / True is converted to 1\n4\n```\n\nYou can use parentheses to override this order, but to keep the code clean, q developers often simply rearrange the expression:\n\n```\nq)3+2*1     / Instead of (2*1)+3\n5\nq)1<3+2     / Instead of (3+2)>1\n1b\n```\n\nThis encourages **linear thinking**: you chain operations together, much like a Linux pipe, except that data is processed from right to left.\n\n### Vector operations[¶](#vector-operations)\n\nq is a **vector programming language**. Most operators work on entire lists automatically without the need for explicit loops (like `for`\n\nor list comprehension in Python).\n\n**q**\n\n```\nq)show l:reverse til 5\n4 3 2 1 0\n\nq)2*l     / Scalar multiplication across a list\n8 6 4 2 0\nq)l 3 0   / Indexing by a list\n1 4\n/ Adding two lists element-wise, recursively\nq)(1; 2; 3 4) + (10; 20; 30 40)\n11\n22\n33 44\n```\n\n**Python**\n\n```\n>>> l = list(reversed(range(5)))\n>>> l\n[4, 3, 2, 1, 0]\n\n>>> [2 * x for x in l]\n[8, 6, 4, 2, 0]\n>>> [l[i] for i in [3, 0]]\n[1, 4]\n\n>>> def add_lists_recursive(list1, list2):\n...    result = []\n...    for a, b in zip(list1, list2):\n...        if isinstance(a, list) and isinstance(b, list):\n...            result.append(recursive_add(a, b))\n...        else:\n...            result.append(a + b)\n...    return result\n>>> add_lists_recursive([1, 2, [3, 4]], [10, 20, [30, 40]])\n[11, 22, [33, 44]]\n```\n\n**Numpy**\n\n``` python\n>>> import numpy as np\n>>> l = np.arange(5)[::-1]\n>>> l\narray([4, 3, 2, 1, 0])\n>>> 2 * l\narray([8, 6, 4, 2, 0])\n>>> l[[3, 0]]\narray([1, 4])\n\n>>> np.array([1, 2, np.array([3, 4])], dtype=object) + np.array([10, 20, np.array([30, 40])], dtype=object)\narray([11, 22, array([33, 44])], dtype=object)\n```\n\n### Functional programming[¶](#functional-programming)\n\nThe q language also treats functions as first-class citizens. You can pass and return functions like any other data type.\n\n**q**\n\n```\nq)manhattan:{sum abs x-y}\nq)euclidean:{sqrt sum (x-y)*x-y}\n\nq)logDistance:{[x;y;distance] \"The distance is: \", string distance[x;y]}\nq)logDistance[1 2 3; 4 2 -1; euclidean]\n\"The distance is: 5\"\n```\n\n**Python**\n\n``` python\n>>> import math\n>>> manhattan = lambda x, y: sum(abs(xi - yi) for xi, yi in zip(x, y))\n>>> euclidean = lambda x, y: math.sqrt(sum((xi - yi) ** 2 for xi, yi in zip(x, y)))\n>>> log_distance = lambda x, y, distance: f\"The distance is: {distance(x, y)}\"\n>>> log_distance([1, 2, 3], [4, 2, -1], euclidean)\n'The distance is: 5.0'\n```\n\nHigher-order functions (called [Iterators](../ref/iterators/index.html)) make complex data manipulation extremely concise.\n\n**q**\n\n```\nq)count each (1 2; 5 4 3; til 20)   / Apply 'count' to each sub-list\n2 3 20\nq)add: {x+y}\nq)/ Cumulative sum, `(+) scan 1 2 3` also works:\nq)add scan 1 2 3            / or simply use `sums 1 2 3`\n1 3 6\n```\n\n**Python**\n\n```\n>>> list(map(len, [[1, 2], [5, 4, 3], range(20)]))\n[2, 3, 20]\n>>> add = lambda x, y: x+y\n>>> from itertools import accumulate\n>>> list(accumulate([1, 2, 3], add))\n[1, 3, 6]\n```\n\n### Interned strings: symbols[¶](#interned-strings-symbols)\n\nSymbols are atomic entities preceded by a backtick (for example,``AAPL`\n\n). Internally, q stores indices into a lookup table (a process called **interning**). This makes comparing two symbols — like checking if a ticker in a billion-row table matches``AAPL`\n\n— incredibly fast, as the computer only has to compare two integers rather than checking every letter in a word.\n\n```\nq)friends:`Alice` Bob`Mike   / List of symbols\nq)friends?`Mike             / Reverse lookup: find the index of Mike\n2\n```\n\nNote\n\nSymbols work best for low-cardinality data (tickers, exchange codes, status flags). For high-cardinality data with values that rarely repeat, use [strings](../ref/datatypes.html#strings) instead. Each unique symbol is permanently added to the intern table for the lifetime of the q process.\n\n### Extreme terseness[¶](#extreme-terseness)\n\nThe trade-off for q's power is brevity. q developers value minimal keystrokes, which does lead to heavy **overloading** of symbols. For example, the `?`\n\nsymbol can perform ten different operations depending on its arguments. In the previous section, you saw that it can denote reverse lookup; below we show three other usages (called [roll, deal and permute](../ref/deal.html)) related to random number generation:\n\n**q**\n\n```\nq)rand 10\n9\n\nq)4?10          / Four random integers\n4 5 4 2\nq)show l:-4?10  / Four random integers without duplicates\n6 0 8 5\n\nq)0N?l          / Permutation\n8 6 0 5\n```\n\n**Python**\n\n``` python\n>>> import random\n>>> random.randint(0,9)\n9\n>>> [random.randint(0, 9) for _ in range(4)]\n[4, 5, 4, 2]\n>>> l=random.sample(range(0, 10), 4)\n>>> l\n[6, 0, 8, 5]\n>>> random.sample(l, len(l))\n[8, 6, 0, 5]\n```\n\n## Tables[¶](#tables)\n\nTables are treated as first-class citizens in q, which means they are a primary data type just like integers or lists. You can think of a table from two different perspectives:\n\n**A list of rows**: where all rows are dictionaries of the same keys.** A list of columns**: where each column is a named list with values of the same length.\n\nWhile you can interact with a table as a list of rows, q stores them internally as a **list of columns**. This columnar structure is the secret to q's performance advantage in data analysis.\n\nPandas equivalents are shown alongside each q snippet.\n\n### Creating tables[¶](#creating-tables)\n\nIn q, a dictionary is a mapping formed by two equal-length lists. A list of dictionaries forms a table when all dictionaries share the same keys:\n\n**q**\n\n```\nq)(([name: `Alice; phone: \"555-0101\"; age: 23]); ([name: `Bob; phone: \"555-0723\"; age: 32]); ([name: `Mike; phone: \"555-6666\"; age: 22]))\nname  phone      age\n--------------------\nAlice \"555-0101\" 23\nBob   \"555-0723\" 32\nMike  \"555-6666\" 22\n```\n\n**Pandas**\n\n```\n>>> pd.DataFrame(data = [\n...     {\"name\": \"Alice\", \"phone\": \"555-0101\", \"age\": 23},\n...     {\"name\": \"Bob\",   \"phone\": \"555-0723\", \"age\": 32},\n...     {\"name\": \"Mike\",  \"phone\": \"555-6666\", \"age\": 22}\n...     ])\n    name     phone  age\n0  Alice  555-0101   23\n1    Bob  555-0723   32\n2   Mike  555-6666   22\n```\n\nYou can create a simple table by defining its columns directly using the `([] ...)`\n\nsyntax:\n\n**q**\n\n```\nq)show t:([] name: `Alice` Bob`Mike; phone: (\"555-0101\"; \"555-0723\"; \"555-6666\"); age: 23 32 22)\nname  phone      age\n--------------------\nAlice \"555-0101\" 23\nBob   \"555-0723\" 32\nMike  \"555-6666\" 22\n```\n\n**Pandas**\n\n```\n>>> t = pd.DataFrame({\n...     \"name\": [\"Alice\", \"Bob\", \"Mike\"],\n...     \"phone\": [\"555-0101\", \"555-0723\", \"555-6666\"],\n...     \"age\": [23, 32, 22]\n... })\n>>> t\n    name     phone  age\n0  Alice  555-0101   23\n1    Bob  555-0723   32\n2   Mike  555-6666   22\n```\n\nBecause tables are integrated into the language, you can manipulate them with standard list and dictionary syntax:\n\n**q**\n\n```\nq)t 1           / Get the second row\nname | `Bob\nphone| \"555-0723\"\nage  | 32\n\nq)avg t`age     / Get the average of the age column\n25.66667\n```\n\n**Pandas**\n\n```\n>>> t.iloc[1]\nname          Bob\nphone    555-0723\nage            32\nName: 1, dtype: object\n>>> t[\"age\"].mean()\nnp.float64(25.666666666666668)\n```\n\nTables can be keyed by one or more columns. A [keyed table](../ref/table.html#keyed-tables) is a dictionary mapping key records to a value records. You can look up rows by key values:\n\n```\nq)kt: `name xkey t\nq)kt `Bob\nphone| \"555-0723\"\nage  | 32\n```\n\n## q-sql[¶](#q-sql)\n\nAlongside its functional programming model, q includes a built-in query language called q-sql. It looks similar to SQL but is more expressive and follows q's right-to-left evaluation rules.\n\nThe following examples use synthetic capital markets data generated by the [KDB-X datagen](../modules/datagen/overview.html) module:\n\n## Modules\n\nKDB-X also supports [modules](../modules/module-index.html) — a new feature that provides a native packaging and encapsulation mechanism for q code. You load modules directly into your q session using the `use`\n\nkeyword.\n\n```\nq)([getInMemoryTables]): use `kx.datagen.capmkts    / Load the module\nq)(trade; quote; ; master; exnames): getInMemoryTables[]\nq)trade\nsym  time                 price size stop cond ex\n-------------------------------------------------\nSOFI 0D09:30:01.180477706 214   36   0    K\nAMZN 0D09:30:01.490170061 92.11 90   1    T    A\nSNAP 0D09:30:02.534750053 9     74   0    T\nSNAP 0D09:30:05.617603533 9     84   0    L\nTSLA 0D09:30:06.389750220 62.97 62   0    Z\nPEP  0D09:30:08.910057414 22    23   0    U    Y\n..\nq)count quote / number of rows\n13497\n```\n\nNote\n\nFor brevity, we display only a limited number of rows in this document. You can set the console size by using [ \\c](../ref/syscmds.html#c-console-size).\n\nIn q-sql, you don't need `SELECT *`\n\n. If you don't specify columns, q assumes you want all of them.\n\n```\nq)select from trade where size > 90\nsym  time                 price size stop cond ex\n-------------------------------------------------\nTXN  0D09:30:18.828937844 18.02 99   0    9\nGOOG 0D09:30:22.425490937 72.02 92   0    P    M\nT    0D09:30:40.218699347 18.01 97   0\nXPEV 0D09:33:31.365513849 6.01  99   0\nT    0D09:33:37.277742547 18.03 93   0    X\nXPEV 0D09:35:00.264738568 6.01  92   0    9\nSBUX 0D09:36:32.798154308 5.03  98   0    M\nHPQ  0D09:36:37.699847666 36.17 98   0    I    N\n..\n```\n\nThe real power of q-sql appears when you combine it with q's vector capabilities. For example, you can calculate total volume by exchange:\n\n**q**\n\n```\nq)select sum size by ex from trade\nex| size\n--| -----\n  | 21579\nA | 2512\nB | 2191\nC | 2482\nD | 3227\nI | 2811\n..\n```\n\n**Pandas**\n\n```\n>>> trade.groupby('ex')[['size']].sum()\n     size\nex\n    21579\nA    2512\nB    2191\nC    2482\nD    3227\nI    2811\n..\n```\n\nBecause q handles dictionaries and vectors natively, you can perform joins inline without complex syntax. In this example, the `exnames`\n\ndictionary maps exchange IDs to their full names directly:\n\n**q**\n\n```\nq)exnames `A` B  / Indexing a dictionary by a list\n\"NYSE American\"\n\"NASDAQ OMX BX\"\nq)select sum size by exnames ex from trade\nex                              | size\n--------------------------------| -----\n\"\"                              | 21579\n\"Cboe BYX Exchange\"             | 1796\n\"Cboe BZX Exchange\"             | 2320\n\"Cboe EDGA Exchange\"            | 2368\n\"Cboe EDGX Exchange\"            | 3097\n\"Chicago Broad Options Exchange\"| 2551\n\"Chicago Stock Exchange\"        | 2203\n..\n```\n\n**Pandas**\n\n```\n>>> [exnames[k] for k in [\"A\", \"B\"]]\n[b'NYSE American', b'NASDAQ OMX BX']\n\n>>> trade.groupby(trade['ex'].map(exnames))[['size']].sum()\n                                       size\nex\nb'Cboe BYX Exchange'                   1796\nb'Cboe BZX Exchange'                   2320\nb'Cboe EDGA Exchange'                  2368\nb'Cboe EDGX Exchange'                  3097\nb'Chicago Broad Options Exchange'      2551\nb'Chicago Stock Exchange'              2203\n```\n\nThis demonstrates q's \"zero noise\" principle. In SQL, this would require a formal JOIN statement; in q, it is a simple dictionary lookup applied across a vector.\n\nIn practice, business logic can be highly complex. q-sql lets you leverage the full expressiveness of q to implement sophisticated analyses concisely. The following statement creates a new column, `pricegroup`\n\n, that assigns price‑group identifiers within each symbol. Consecutive rows with the same price belong to the same price group.\n\n```\nq)update pricegroup: sums differ price by sym from select from trade where sym in `SNAP` SOFI\nsym  time                 price  size stop cond ex pricegroup\n-------------------------------------------------------------\nSOFI 0D09:30:01.180477706 214    36   0    K       1\nSNAP 0D09:30:02.534750053 9      74   0    T       1\nSNAP 0D09:30:05.617603533 9      84   0    L       1\nSOFI 0D09:31:10.843041058 214.26 46   0    9       2\nSNAP 0D09:32:11.259991414 9.01   36   0    4       2\nSOFI 0D09:33:35.131385974 214.46 68   0    5       3\n```\n\n## Price group query explanation\n\n[ differ](../ref/differ.html) returns a boolean list flagging each position where the value changes from its predecessor:\n\n```\nq)differ 9 9 9.01 9.01 9 9.02\n101011b\n```\n\n[ sums](../ref/sum.html#sums) accumulates these flags — booleans are treated as 0/1 in arithmetic operations — yielding a running group counter that increments at each transition:\n\n```\nq)sums differ 9 9 9.01 9.01 9 9.02\n1 1 2 2 3 4i\n```\n\nThe `update ... by sym`\n\nclause ensures each symbol is grouped and processed independently.\n\nThe expressiveness of q-sql makes complex calculations both readable and manageable.\n\n### Interfaces[¶](#interfaces)\n\nFor anyone coming from a traditional database background, KDB-X also provides a [standard SQL](../modules/sql/quickstart.html) interface:\n\n```\nq).s.init[]         / initialize SQL interface\nq)s)SELECT * FROM trade WHERE size > 90     / use 's)' prefix for SQL\nsym  time                 price size stop cond ex\n-------------------------------------------------\nTXN  0D09:30:18.828937844 18.02 99   0    9\nGOOG 0D09:30:22.425490937 72.02 92   0    P    M\nT    0D09:30:40.218699347 18.01 97   0\nXPEV 0D09:33:31.365513849 6.01  99   0\nT    0D09:33:37.277742547 18.03 93   0    X\nXPEV 0D09:35:00.264738568 6.01  92   0    9\nSBUX 0D09:36:32.798154308 5.03  98   0    M\nHPQ  0D09:36:37.699847666 36.17 98   0    I    N\n..\n```\n\nIf you're familiar with Python, [KDB-X Python](kdb-x-python-overview.html) is a great place to start. You can run a q process inside a Python process and use familiar syntax.\n\n``` python\n>>> import pykx as kx\n>>> # load the data\n>>> trade.select(columns=kx.Column(\"size\").sum(), by=\"ex\")\npykx.KeyedTable(pykx.q('\nex| size\n--| -----\n  | 21579\nA | 2512\nB | 2191\nC | 2482\nD | 3227\nI | 2811\n..\n```\n\n### Time-series support[¶](#time-series-support)\n\nq was built for time-series data. It treats temporal types (times, dates, timestamps, timedeltas) natively. You can cast data types on the fly, or use dot notation. For instance, using``time.minute`\n\nto group data by the minute and using [ within](../ref/within.html) to restrict to a time interval:\n\n**q**\n\n```\n/ Average mid-price for TSLA between 1 PM and 2 PM, grouped by minute\nq)select avgMid: avg (bid + ask)%2 by time.minute from quote where sym=`TSLA, time within 13:00 14:00\ntime | avgMid\n-----| --------\n13:00| 64.4125\n13:03| 64.66\n13:04| 64.4875\n13:07| 64.3425\n13:08| 64.64833\n13:09| 64.32\n..\n```\n\n**Pandas**\n\n```\n>>> quote.loc[\n...     (quote['sym'] == 'TSLA') &\n...     (quote['time'].between(pd.to_timedelta('13:00:00'), pd.to_timedelta('14:00:00')))] \\\n...     .assign(avgMid=(quote['bid'] + quote['ask']) / 2) \\\n...     .groupby(quote['time'].dt.floor('min'))[['avgMid']].mean()\n                    avgMid\ntime\n0 days 13:00:00  64.412500\n0 days 13:03:00  64.660000\n0 days 13:04:00  64.487500\n0 days 13:07:00  64.342500\n0 days 13:08:00  64.648333\n0 days 13:09:00  64.320000\n```\n\n### Joins[¶](#joins)\n\nq supports standard relational [joins](../ref/joins.html) like **left join** (`lj`\n\n) and **inner join** (`ij`\n\n) but is most famous for its specialized temporal joins.\n\nTo join metadata (like company descriptions) from `master`\n\nto `trade`\n\n, the following example uses `sym`\n\n, the key column of the `master`\n\ntable:\n\n**q**\n\n```\nq)trade lj master\nsym  time                 price size stop cond ex description                 issueprice\n----------------------------------------------------------------------------------------\nSOFI 0D09:30:01.180477706 214   36   0    K       SoFi Technologies, Inc.     214\nAMZN 0D09:30:01.490170061 92.11 90   1    T    A  Amazon.com, Inc.            92\nSNAP 0D09:30:02.534750053 9     74   0    T       Snap Inc.                   9\nSNAP 0D09:30:05.617603533 9     84   0    L       Snap Inc.                   9\nTSLA 0D09:30:06.389750220 62.97 62   0    Z       Tesla, Inc.                 63\n..\n```\n\n**Pandas**\n\n```\n>>> trade.join(master, on=\"sym\")\n       sym                      time  ...              description  issueprice\n0     SOFI 0 days 09:30:01.180477706  ...  SoFi Technologies, Inc.         214\n1     AMZN 0 days 09:30:01.490170061  ...         Amazon.com, Inc.          92\n2     SNAP 0 days 09:30:02.534750053  ...                Snap Inc.           9\n...    ...                       ...  ...                      ...         ...\n1288   AIG 0 days 15:59:59.316044754  ...  AMERICAN INTL GROUP INC          27\n1289  TSLA 0 days 15:59:59.652057702  ...              Tesla, Inc.          63\n1290  AAPL 0 days 15:59:59.808157553  ...        APPLE INC COM STK          84\n\n[1291 rows x 9 columns]\n```\n\nYou can run queries on the joined table:\n\n**q**\n\n```\nq)select open: first price, close: last price by description from trade lj master\ndescription                | open  close\n---------------------------| -----------\nADVANCED MICRO DEVICES     | 33.05 34.62\nAMERICAN INTL GROUP INC    | 27.03 28.9\nAPPLE INC COM STK          | 84.1  86.92\nAT&T Inc.                  | 18.01 19.06\n..\n```\n\n**Pandas**\n\n```\n>>> trade.join(master, on=\"sym\") \\\n... .groupby(\"description\")[\"price\"].agg(open='first', close='last')\n                               open   close\ndescription\nADVANCED MICRO DEVICES        33.05   34.62\nAMERICAN INTL GROUP INC       27.03   28.90\nAPPLE INC COM STK             84.10   86.92\nAT&T Inc.                     18.01   19.06\n..\n```\n\nIn financial data, trades and quotes rarely happen at the exact same time (q supports nanosecond precision). An **as-of join** aligns two tables by finding the \"prevailing\" value. For every trade, `aj`\n\nfinds the most recent quote that occurred **at or before** that trade's time:\n\n**q**\n\n```\n/ Matches each trade with the symbol's quote valid at that moment\nq)aj[`sym` time; trade; quote]\nsym  time                 price size stop cond ex bid    ask    bsize asize mode\n--------------------------------------------------------------------------------\nSOFI 0D09:30:01.180477706 214   36   0    K       213.37 214.45 13    39    Q\nAMZN 0D09:30:01.490170061 92.11 90   1    T    A  91.56  92.14  17    32    E\nSNAP 0D09:30:02.534750053 9     74   0    T       8.44   9.04   18    91    M\nSNAP 0D09:30:05.617603533 9     84   0    L       8.17   9.66   80    68    4\n..\n```\n\n**Pandas**\n\n```\n>>> pd.merge_asof(trade, quote, on='time', by='sym')\n       sym                      time   price  size   stop  cond ex_x     bid     ask  bsize  asize  mode ex_y\n0     SOFI 0 days 09:30:01.180477706  214.00    36  False  b'K'       213.37  214.45     13     39  b'Q'\n1     AMZN 0 days 09:30:01.490170061   92.11    90   True  b'T'    A   91.56   92.14     17     32  b'E'    A\n2     SNAP 0 days 09:30:02.534750053    9.00    74  False  b'T'         8.44    9.04     18     91  b'M'\n...    ...                       ...     ...   ...    ...   ...  ...     ...     ...    ...    ...   ...  ...\n1288   AIG 0 days 15:59:59.316044754   28.90    69  False  b'T'    C   28.63   29.28     60     81  b'M'    C\n1289  TSLA 0 days 15:59:59.652057702   65.83    44  False  b'Q'        65.24   66.22     30     26  b'4'\n1290  AAPL 0 days 15:59:59.808157553   86.92    64  False  b'R'    D   86.82   87.90     86     76  b'B'    D\n\n[1291 rows x 13 columns]\n```\n\nA [window join](../ref/wj.html) is a powerful generalization of the as-of join. Instead of taking just the last value, it looks at a **window of time** around each record and performs an aggregation (like an average or max).\n\nExample: calculate the volume-weighted average price (VWAP) for quotes in a window starting 1 minute before and ending 5 seconds after each trade:\n\n```\nq)wj[-00:01 00:00:05+\\:trade.time; `sym` time; trade; (quote; (wavg;`asize;` ask); (wavg;`bsize;` bid))]\nsym  time                 price size stop cond ex ask      bid\n-------------------------------------------------------------------\nSOFI 0D09:30:01.180477706 214   36   0    K       66.43636 65.54799\nAMZN 0D09:30:01.490170061 92.11 90   1    T    A  65.21634 51.99918\nSNAP 0D09:30:02.534750053 9     74   0    T       57.21473 52.7337\nSNAP 0D09:30:05.617603533 9     84   0    L       50.89472 52.93455\n..\n```\n\n## Each Left\n\n`\\:`\n\nis the [Each Left](../ref/maps.html#each-left-and-each-right) iterator. It applies the function to each element on the left, holding the right argument fixed. When the right argument is a list, the result is a nested list — one row per left element:\n\n```\nq)1 2 +\\: 100 200 300\n101 201 301\n102 202 302\n```\n\nEach left's counterpart `/:`\n\nis Each Right. A handy mnemonic: the pipe in `\\:`\n\ntilts left, and the pipe in `/:`\n\ntilts right.\n\n### Foreign keys[¶](#foreign-keys)\n\nYou can link tables dynamically in a query using join operators, or define the relationship statically. In q, this static relationship is called a [ foreign key](../how_to/interact_with_databases/foreign-keys.html), which functions similarly to foreign keys in traditional relational databases.\n\nIn the previous left join example, you linked the `trade`\n\nand `master`\n\ntables on the fly using the `sym`\n\ncolumn. You can make this relationship permanent by \"casting\" the `sym`\n\ncolumn in the `trade`\n\ntable to the `master`\n\ntable (which is keyed on `sym`\n\n):\n\n```\nq)update `master$sym from `trade\n`trade\n```\n\nNote\n\nThe backtick before the table name (``trade`\n\n) indicates that the update happens in-place, modifying the actual table rather than returning a new copy.\n\nOnce a foreign key is established, you no longer need to perform explicit joins to access information from the parent table. You can use dot notation to \"reach through\" the link.\n\nIn the query below, notice how you access `description`\n\nof the master table through `sym.description`\n\n:\n\n**q**\n\n```\nq)select o: first price, c: last price by sym.description from trade\ndescription                | o      c\n---------------------------| -------------\nADVANCED MICRO DEVICES     | 33.01  34.35\nAMERICAN INTL GROUP INC    | 27.02  27.5\nAPPLE INC COM STK          | 83.99  88.1\nAT&T Inc.                  | 18.02  18.72\n```\n\n**SQL**\n\n```\nq)s)SELECT description, FIRST(trade.price) AS o, LAST(trade.price) AS c FROM trade JOIN master ON trade.sym = master.sym GROUP BY description;\ndescription                 o      c\n-----------------------------------------\nADVANCED MICRO DEVICES      33.05  34.62\nAMERICAN INTL GROUP INC     27.03  28.9\nAPPLE INC COM STK           84.1   86.92\nAT&T Inc.                   18.01  19.06\n```\n\nBeyond **cleaner syntax**, foreign keys offer two major advantages. Queries are **faster** because q does not need to recalculate the entire mapping, which also means you get the second benefit of a **smaller query memory footprint**.\n\n## Persistence[¶](#persistence)\n\nThis tutorial has worked exclusively with in-memory objects so far. If you close your q session, these objects vanish. To keep your data, use the `set`\n\nfunction to persist it to disk.\n\n### Simple persistence[¶](#simple-persistence)\n\nIn q, file paths are represented as symbols prefixed with a colon (e.g. ``:kdbdata`\n\n). You can save any q object — variables, dictionaries, or even functions — directly to a file.\n\n```\nq)contacts:([Alice: \"555-0101\"; Bob: \"555-0723\"; Mike: \"555-6666\"])\nq)`:kdbdata/contacts set contacts   / Save a dictionary\nq)`:kdbdata/callRandomFriend set {f: rand key contacts; \"Calling \", string[f], \" at \", contacts f}\nq)t: ([] name: `Alice` Bob`Mike; phone: (\"555-0101\"; \"555-0723\"; \"555-6666\"); age: 23 32 22)\nq)`:kdbdata/t set t                 / Save a table\n```\n\nThese objects are saved in a high-performance binary format. From a new q session, you can bring them back using `get`\n\n:\n\n```\nq)get `:kdbdata/contacts\nAlice| \"555-0101\"\nBob  | \"555-0723\"\nMike | \"555-6666\"\nq)get `:kdbdata/t\nname  phone      age\n--------------------\nAlice \"555-0101\" 23\nBob   \"555-0723\" 32\nMike  \"555-6666\" 22\n```\n\nIf a directory contains multiple kdb+ files, you can load the entire directory at once using the `\\l`\n\ncommand. This automatically assigns the file names as variable names in your session:\n\n```\nq)\\l kdbdata    / Load everything in the 'kdbdata' folder\nq)contacts      / 'contacts' is now available in the workspace\nAlice| \"555-0101\"\nBob  | \"555-0723\"\nMike | \"555-6666\"\nq)callRandomFriend[]\n\"Calling Alice at 555-0101\"\nq)t\nname  phone      age\n--------------------\nAlice \"555-0101\" 23\nBob   \"555-0723\" 32\nMike  \"555-6666\" 22\n```\n\n### Scaling up: splaying and partitioning[¶](#scaling-up-splaying-and-partitioning)\n\nWhile the approach above is fine for small objects, it has an important limitation: it copies the entire file into your RAM (with the exception of homogeneous list files). For analysts working with gigabytes or terabytes of data, this isn't feasible.\n\nFor better performance, you \"splay\" a table — meaning q saves each column as its own individual file. This allows q to perform columnar I/O: if you only want to calculate the average price, q only reads the price file and ignores size, time, and ex.\n\nTo handle massive datasets, tables are divided into **partitions**, typically by `date`\n\n.\n\nThis example uses the `datagen`\n\nmodule to build a multi-day, partitioned database on disk:\n\n```\nq)([getInMemoryTables; buildPersistedDB]): use `kx.datagen.capmkts  / Load the module\nq)buildPersistedDB[\"/tmp/kdbdb\"; 10000; ([start: 2026.02.01; end: 2026.02.02])]\n```\n\nIf you look at the file system, you should see a clean, hierarchical structure:\n\n``` bash\n$ tree /tmp/kdbdb\n/tmp/kdbdb\n├── 2026.02.01\n│   ├── quote\n│   │   ├── asize\n│   │   ├── ask\n│   │   ├── bid\n│   │   ├── bsize\n│   │   ├── ex\n│   │   ├── mode\n│   │   ├── sym\n│   │   └── time\n│   └── trade\n│       ├── cond\n│       ├── ex\n│       ├── price\n│       ├── size\n│       ├── stop\n│       ├── sym\n│       └── time\n├── 2026.02.02\n│   ├── quote\n│   │   ├── asize\n│   │   ├── ask\n│   │   ├── bid\n│   │   ├── bsize\n│   │   ├── ex\n│   │   ├── mode\n│   │   ├── sym\n│   │   └── time\n│   └── trade\n│       ├── cond\n│       ├── ex\n│       ├── price\n│       ├── size\n│       ├── stop\n│       ├── sym\n│       └── time\n├── daily\n├── exnames\n├── master\n└── sym\n\n4 directories, 25 files\n```\n\nWhen you load a partitioned database with `\\l`\n\n, KDB-X does not \"load\" the data - instead, it memory-maps it.\n\n## Memory-mapping\n\n[Memory mapping](https://en.wikipedia.org/wiki/Memory-mapped_file) is a technique that maps on-disk files directly into a process's address space, bypassing the usual copy from disk into process memory buffers. This eliminates copying overhead and lets the OS manage data access efficiently through its virtual memory subsystem. For a deeper dive, see the [KX blog on memory mapping](https://kx.com/blog/memory-mapping-in-kdb/).\n\n```\nq)\\l /tmp/kdbdb\n```\n\nYou can run q-sql and SQL queries on the mapped KDB-X database. KDB-X reads only the data your query needs.\n\n```\nq)select sum size by 0D00:10 xbar time from trade where date=last date\ntime                | size\n--------------------| ------\n0D09:30:00.000000000| 105690\n0D09:40:00.000000000| 53574\n0D09:50:00.000000000| 48170\n0D10:00:00.000000000| 41788\n0D10:10:00.000000000| 36279\n..\n```\n\nIn the above query:\n\n- q only looks inside the 2026.02.02 folder (ignoring all other days)\n- q only reads the\n`size`\n\nand`time`\n\nfiles (ignoring`price`\n\n,`ex`\n\n, etc.)\n\nThis technique lets you analyze datasets much larger than your physical memory. For example, you can query a 10 TB database on a laptop with 16 GB of RAM if you aggregate the data or request only a subset of columns or dates at a time.\n\n### Open formats[¶](#open-formats)\n\nKDB-X is not limited to its native, highly optimized binary format. It supports a range of open and industry-standard data formats to enable interoperability with the broader data ecosystem.\n\n**KDB-X natively supports Parquet**, one of the most widely adopted open columnar formats. You can run q-sql queries directly against Parquet files without any conversion step. Furthermore,\n\n**virtual tables** allow you to mix Parquet and kdb+ data in the same query, providing a unified q-sql interface regardless of the underlying storage format.\n\nFor broader ecosystem integration, the ** KX Fusion libraries** provide connectors to other open formats, including Apache Arrow, Avro, and HDF5, among others.\n\n## Performance[¶](#performance)\n\nKDB-X isn't just a database; it is fundamentally a **vector processing engine**. Its performance comes from its ability to treat data as contiguous blocks of memory, allowing it to leverage modern CPU features and massive parallelization.\n\n### Hardware acceleration (SIMD)[¶](#hardware-acceleration-simd)\n\nAt its core, q is optimized for SIMD (Single Instruction, Multiple Data). This allows the CPU to perform the same operation (like addition or multiplication) on multiple data points in a single clock cycle. When you add two columns in q, you aren't just looping; you are engaging the hardware's vector lanes.\n\n### Parallel processing[¶](#parallel-processing)\n\nKDB-X can distribute workloads across multiple CPU cores. By starting your q process with the `-s`\n\nflag, you enable secondary threads:\n\n```\nq /tmp/kdbdb -s 4     # Enable 4 secondary threads for parallel execution\n```\n\nWhen you run an aggregation like `sum`\n\nor `avg`\n\non a long vector, q automatically [splits the vector into chunks](../ref/mt-primitives.html#peach-vs-implicit-parallelism), processes them in parallel across your cores, and combines the result (a \"map-reduce\" pattern). This also applies to **partitioned data**: KDB-X can scan multiple days of data simultaneously.\n\nFor even larger scales, you can use [segmented databases](../how_to/interact_with_databases/segment.html) to spread data across multiple physical disks. This enables parallel I/O, allowing you to read terabytes of data at the speed of your hardware's combined throughput — all without changing a single line of your q-sql code.\n\n### Attributes: the \"secret sauce\"[¶](#attributes-the-secret-sauce)\n\nIn traditional databases, you create indexes. In q, you apply **attributes**. These are metadata labels that tell the q engine about the structure of your data, allowing it to choose the fastest possible algorithm for a query, as these two examples show:\n\n**Sorted (s#)**: Applied to an ordered column like`time`\n\n. It enables binary search (\\(O(\\log n)\\)), making lookups nearly instantaneous.**Parted (p#)**: Typically used for the main identifier column like`sym`\n\nin on-disk databases. It tells q that all identical symbols are stored in contiguous blocks. This allows q to jump straight to the start of a symbol's data and read it in one burst.\n\nYou can check the attributes of a table using the `meta`\n\ncommand. The `a`\n\ncolumn below shows the `parted`\n\nattribute for `sym`\n\n:\n\n```\nq)meta trade\nc    | t f a\n-----| -----\ndate | d\nsym  | s   p\ntime | n\nprice| f\nsize | j\nstop | b\ncond | c\nex   | s\n```\n\nBy using the parted (`p`\n\n) attribute on `sym`\n\n, a query for a single ticker like `select from trade where sym in `AAPL` GOOG`\n\ndoesn't need to scan the whole `sym`\n\nvector; it knows exactly where `AAPL`\n\nand `GOOG`\n\ndata start and end on the disk. Less I/O means faster queries.\n\n## Acting as a database[¶](#acting-as-a-database)\n\nWhile this guide so far has used q primarily as a standalone analysis tool, its true power lies in its ability to act as a high-performance database server. By specifying a port with the `-p`\n\nparameter, you can enable network connectivity:\n\n```\nq /tmp/kdbdb -s 4 -p 5100\n```\n\nOnce the process is listening, anyone with network access can connect to your session and query your data. Common ways to connect include:\n\n- Another q process\n- A web browser (via built-in HTTP support)\n- VS Code (using the\n[KDB-X extension](https://code.kx.com/vscode/)) - Jupyter Notebooks (using\n[pykx](kdb-x-python-overview.html)) [KX Developer](https://code.kx.com/developer/)or[KX Analyst](https://code.kx.com/analyst/)[KX Dashboard](https://code.kx.com/dashboards/)- Third-party IDEs (like\n[Kdb Studio](https://github.com/finos/kdb-studio))\n\nThe following sections begin exploring the first two options (but only scratch the surface of what is possible).\n\n### Connect from another q process[¶](#connect-from-another-q-process)\n\nIn a separate terminal, start a second q session. Use the `hopen`\n\ncommand to create a connection handle to the server:\n\n```\nq)h: hopen 5100     / Opens a connection to localhost:5100. 'h' is our \"handle\".\n```\n\nNow you can send commands through that handle. The simplest way is to pass a query as a string:\n\n```\nq)h \"select nr: count i by sym from trade\"\nsym | nr\n----| ----\nAAPL| 1940\nAIG | 1906\nAMD | 1973\nAMZN| 1934\n..\nq)\n```\n\nSending strings is easy, but can be inconvenient, especially when you pass parameters. q also supports **functional form**. You define a function on the server, and the client calls it by passing the function name and arguments in a list.\n\nOn the **Server**:\n\n```\n/ Define a \"Stored Procedure\" to get basic stats for a specific symbol\nq)getTradeStatOf: {[x] select nr: count i, sum size, avgprice: avg price from trade where sym=x}\n```\n\nOn the **Client**:\n\n```\nq)h (`getTradeStatOf; `TSLA)    / Simpler and safer than string manipulation\nnr   size   avgprice\n--------------------\n1914 103341 65.97574\n```\n\n### Connect from a web browser[¶](#connect-from-a-web-browser)\n\nEvery q process started with `-p`\n\nis also a lightweight web server. This is incredibly useful for quick inspections. If you navigate to `http://localhost:5100`\n\nin your browser, you can see all the variables currently in memory. Click on a variable to see its content.\n\nYou can even execute queries directly from the URL bar by appending a `?`\n\nfollowed by your q code:\n\n### The advantage of a unified architecture[¶](#the-advantage-of-a-unified-architecture)\n\nTraditional enterprise architectures suffer from \"impedance mismatch.\" In these systems, data is stored in a relational database while business logic is written in a separate application layer using languages like Java, Rust, or Python. This separation creates significant friction: a substantial amount of engineering resources is wasted on Object-Relational Mapping (ORM) and data serialization — simply translating data from database rows into programming objects. Furthermore, to improve performance, developers often split logic between the two layers using brittle stored procedures, creating a fragile environment that is difficult to synchronize, test, and maintain.\n\nKDB-X eliminates this overhead by providing a unified framework. In the q ecosystem, there is no distinction between the database and the programming language; the table is a native data structure. Business logic lives directly alongside the data, allowing for complex calculations to be executed where the data resides rather than moving massive datasets across a network to an application server. This proximity ensures that data traversal is minimized, resulting in performance gains that would be impossible in a multi-tier architecture.\n\nThis architectural simplicity translates into a significantly lower Total Cost of Ownership (TCO) for organizations. By collapsing the stack into a single layer, organizations reduce their hardware footprint and simplify their deployment pipelines. Maintenance becomes more straightforward because there is a single environment for both data and logic. Ultimately, this allows smaller teams of \"dev-analysts\" to build and support systems that would typically require large, specialized departments in a traditional software stack.\n\n## From language to architecture: kdb+ tick[¶](#from-language-to-architecture-kdb-tick)\n\nEverything described so far – the vector engine, columnar tables, q-sql, memory-mapped partitions, and the database server – forms the q programming language and its runtime. But q is not just a tool for analysts; it is a platform for building production‑grade systems. The canonical example is [kdb+ tick](../how_to/manage_streaming_data/architecture.html), the most widely used architecture ever implemented in q.\n\n[Released](https://github.com/KxSystems/kdb-tick) in the early 2000s, kdb+ tick is a complete, production-grade streaming data architecture for capturing, storing, and querying high-frequency time-series data in real time. Its most remarkable feature is its size: the entire system is implemented in just **34 lines of q code**. There is no boilerplate and no scaffolding – only the essential logic required to ingest and publish real‑time data.\n\nDespite its brevity, kdb+ tick has been deployed at the majority of the world's leading investment banks and financial institutions for over two decades. It processes **billions of financial events – trades, quotes, order book updates – every single trading day**, making it one of the most battle-tested real-time data systems ever built for electronic trading.\n\n### Three processes, one architecture[¶](#three-processes-one-architecture)\n\nkdb+ tick separates responsibility across three specialized q processes, each optimized for a specific function.\n\n**The Tickerplant (TP)** is a low-latency, high-volume publish-subscribe hub that decouples data publishers from their subscribers.**The Real-Time Database (RDB)** subscribes to the tickerplant and collects today's data entirely in memory. New records become queryable within milliseconds, and the columnar in-memory layout enables complex analytical queries over millions of intraday rows to execute in microseconds.**The Historical Database (HDB)** stores all previous days' data using the splayed, partitioned layout described earlier in the[persistence](#persistence)section. It memory-maps this data rather than loading it into RAM, allowing the system to address**petabytes of historical time-series** while reading only the columns and partitions required for a query.\n\nThe architecture addresses **failure scenarios**. For example, if the RDB process exits unexpectedly – because of hardware faults, operating‑system signals, or unbounded queries – it automatically recovers on restart.\n\nThe HDB scales horizontally to support hundreds of concurrent users through **TCP socket sharding**, a technique built into the q runtime. Because q's memory-mapped data is inherently read-only and shared across threads, the system requires no locking and performs no data copying. Increasing capacity is purely a configuration change, not a code change.\n\nkdb+ tick is the clearest demonstration of what the q ecosystem was designed to enable. The language is not just a query tool bolted onto a database; it is a substrate from which entire systems can be composed. The 34 lines of code that implement kdb+ tick have processed more financial data than almost any other software system ever written, precisely because q eliminates everything that does not directly contribute to solving the problem at hand.\n\n## Next steps[¶](#next-steps)\n\n- Read\n[Q for Mortals](q4m/index.html)if you prefer a book‑style introduction with more detail. - Explore other\n[tutorials](tutorials_and_examples.html)to continue your learning journey.", "url": "https://wpnews.pro/news/a-brief-introduction-to-q-and-kdb-x", "canonical_source": "https://code.kx.com/kdb-x/learn/brief-introduction.html", "published_at": "2026-06-18 12:41:09+00:00", "updated_at": "2026-06-18 12:52:38.634378+00:00", "lang": "en", "topics": ["developer-tools", "ai-infrastructure"], "entities": ["KDB-X", "Kx Systems", "q"], "alternates": {"html": "https://wpnews.pro/news/a-brief-introduction-to-q-and-kdb-x", "markdown": "https://wpnews.pro/news/a-brief-introduction-to-q-and-kdb-x.md", "text": "https://wpnews.pro/news/a-brief-introduction-to-q-and-kdb-x.txt", "jsonld": "https://wpnews.pro/news/a-brief-introduction-to-q-and-kdb-x.jsonld"}}