A Brief Introduction to q and KDB-X

KDB-X, a high-performance ecosystem built on the q language, offers a concise, dynamically typed programming environment optimized for streaming, real-time, and historical data. The platform eliminates multi-layer architecture overhead by integrating application logic and data, and provides a minimalist syntax derived from APL for efficient data manipulation.

A Brief Introduction to q and KDB-X ¶ a-brief-introduction-to-q-and-kdb-x Welcome This page introduces the basics of q and KDB‑X. Through practical examples, you'll learn how to create data, run queries, and understand the core principles of q's concise syntax and high‑performance design. No prior knowledge is required. You'll pick up the essentials as you work through the exercises. KDB-X is a powerful ecosystem built on top of q . The q language is a concise, expressive, dynamically typed, interpreted programming language with a built-in database engine optimized for streaming, real-time, and historical data. By bringing the application logic and data together, KDB-X eliminates the overhead associated with complex multi-layer architectures. If you don't have KDB-X installed yet, follow this quick install KDB-X guide ../get started/kdb-x-install.html . Launch q ¶ launch-q In your terminal, type q to start an interactive session. When the q prompt appears, the interpreter is ready. q KDB-X 5.0.20251113 2025.11.13 Copyright C 1993-2025 Kx Systems ... q Note The code examples below are cumulative. Each section assumes the variables and state defined in all preceding sections are still in your active q session. Standard constructs ¶ standard-constructs Like most languages, q allows you to create scalars, lists ../how to/basics/data structures/lists.html , and dictionaries ../how to/basics/data structures/dictionaries.html , and assign them to variables using the colon : . Below are some common q commands and their Python equivalents: q q n:8 / Assign an integer q n 8 q show PI: 3.14 3.14 q b:0b / A boolean 0b for false, 1b for true / Create a list 0-4 and reverse it q show l:reverse til 5 4 3 2 1 0 q 8; 3.14; "Alice"; "Bob"; "Mike" / A nested list 8 3.14 "Alice";"Bob";"Mike" / Assign to multiple values q n; friends; ONE; ; THREE : 8; "Alice"; "Bob"; "Mike" ; 1 2 3 / pattern matching q show contacts: Alice: "555-0101"; Bob: "555-0723"; Mike: "555-6666" Alice| "555-0101" Bob | "555-0723" Mike | "555-6666" Python n = 8 n 8 PI = 3.14 PI 3.14 b = False l = list reversed range 5 l 4, 3, 2, 1, 0 8, 3.14, "Alice", "Bob", "Mike" 8, 3.14, 'Alice', 'Bob', 'Mike' n, friends, ONE, , THREE = 8, "Alice", "Bob", "Mike" , 1,2,3 unpacking contacts = {"Alice": "555-0101", "Bob": "555-0723", "Mike": "555-6666"} contacts {'Alice': '555-0101', 'Bob': '555-0723', 'Mike': '555-6666'} You can define functions, use execution controls like if-then-else , and call built-in operators and functions. q q callRandomFriend:{f: rand key contacts; "Calling ", string f , " at ", contacts f} q callRandomFriend "Calling Bob at 555-0723" q area:{ r PI r r} q if n<14; "I'm a child " / if statement "I'm a child " Python python import random def callRandomFriend : ... key, value = random.choice list contacts.items ... return f"Calling {key} at {value}" ... callRandomFriend 'Calling Mike at 555-6666' area = lambda r: PI r r if n < 14: ... "I'm a child " ... "I'm a child " Use exit 0 , \\ or Ctrl-D that is EOF to exit a q session. You can put your q commands into a text file and run it: q myscript.q or load it into your q session: q \l myscript.q The beauty of q ¶ the-beauty-of-q The following sections highlight what makes q distinctive. Minimalist syntax no noise ¶ minimalist-syntax-no-noise q descends from APL A Programming Language , a language rooted in mathematical notation. In q, lists, dictionaries, and functions are all mappings - a unified concept that means the same square-bracket notation works for all three. q q l 2 / Indexing a list 2 q contacts Alice / Looking up a dictionary by a key "555-0101" q area 5 / Applying a function 78.5 Python l 2 2 contacts "Alice" '555-0101' area 5 78.5 This is polymorphism at its most fundamental level. To further reduce "noise", q allows you to omit brackets and use whitespace to separate list items. q l 2 / Equivalent to l 2 2 q 4 1 7 / A list of integers no commas or parentheses needed 4 1 7 It is common in mathematics to use function parameters x , y , or z . You can omit parameter declaration and q will understand that you mean x , y , and z in that order : q manhattan:{sum abs x-y} q manhattan 1 2 3; 3 2 1 4 Reducing boilerplate code is a basic principle in q. Right-to-left evaluation ¶ right-to-left-evaluation Unlike most languages, q has no operator precedence . Expressions are evaluated strictly from right to left . q 2 1+3 / 1+3 is 4, then 2 4 8 q 3+2 1 / True is converted to 1 4 You can use parentheses to override this order, but to keep the code clean, q developers often simply rearrange the expression: q 3+2 1 / Instead of 2 1 +3 5 q 1<3+2 / Instead of 3+2 1 1b This encourages linear thinking : you chain operations together, much like a Linux pipe, except that data is processed from right to left. Vector operations ¶ vector-operations q is a vector programming language . Most operators work on entire lists automatically without the need for explicit loops like for or list comprehension in Python . q q show l:reverse til 5 4 3 2 1 0 q 2 l / Scalar multiplication across a list 8 6 4 2 0 q l 3 0 / Indexing by a list 1 4 / Adding two lists element-wise, recursively q 1; 2; 3 4 + 10; 20; 30 40 11 22 33 44 Python l = list reversed range 5 l 4, 3, 2, 1, 0 2 x for x in l 8, 6, 4, 2, 0 l i for i in 3, 0 1, 4 def add lists recursive list1, list2 : ... result = ... for a, b in zip list1, list2 : ... if isinstance a, list and isinstance b, list : ... result.append recursive add a, b ... else: ... result.append a + b ... return result add lists recursive 1, 2, 3, 4 , 10, 20, 30, 40 11, 22, 33, 44 Numpy python import numpy as np l = np.arange 5 ::-1 l array 4, 3, 2, 1, 0 2 l array 8, 6, 4, 2, 0 l 3, 0 array 1, 4 np.array 1, 2, np.array 3, 4 , dtype=object + np.array 10, 20, np.array 30, 40 , dtype=object array 11, 22, array 33, 44 , dtype=object Functional programming ¶ functional-programming The q language also treats functions as first-class citizens. You can pass and return functions like any other data type. q q manhattan:{sum abs x-y} q euclidean:{sqrt sum x-y x-y} q logDistance:{ x;y;distance "The distance is: ", string distance x;y } q logDistance 1 2 3; 4 2 -1; euclidean "The distance is: 5" Python python import math manhattan = lambda x, y: sum abs xi - yi for xi, yi in zip x, y euclidean = lambda x, y: math.sqrt sum xi - yi 2 for xi, yi in zip x, y log distance = lambda x, y, distance: f"The distance is: {distance x, y }" log distance 1, 2, 3 , 4, 2, -1 , euclidean 'The distance is: 5.0' Higher-order functions called Iterators ../ref/iterators/index.html make complex data manipulation extremely concise. q q count each 1 2; 5 4 3; til 20 / Apply 'count' to each sub-list 2 3 20 q add: {x+y} q / Cumulative sum, + scan 1 2 3 also works: q add scan 1 2 3 / or simply use sums 1 2 3 1 3 6 Python list map len, 1, 2 , 5, 4, 3 , range 20 2, 3, 20 add = lambda x, y: x+y from itertools import accumulate list accumulate 1, 2, 3 , add 1, 3, 6 Interned strings: symbols ¶ interned-strings-symbols Symbols are atomic entities preceded by a backtick for example, AAPL . Internally, q stores indices into a lookup table a process called interning . This makes comparing two symbols — like checking if a ticker in a billion-row table matches AAPL — incredibly fast, as the computer only has to compare two integers rather than checking every letter in a word. q friends: Alice Bob Mike / List of symbols q friends? Mike / Reverse lookup: find the index of Mike 2 Note Symbols work best for low-cardinality data tickers, exchange codes, status flags . For high-cardinality data with values that rarely repeat, use strings ../ref/datatypes.html strings instead. Each unique symbol is permanently added to the intern table for the lifetime of the q process. Extreme terseness ¶ extreme-terseness The trade-off for q's power is brevity. q developers value minimal keystrokes, which does lead to heavy overloading of symbols. For example, the ? symbol can perform ten different operations depending on its arguments. In the previous section, you saw that it can denote reverse lookup; below we show three other usages called roll, deal and permute ../ref/deal.html related to random number generation: q q rand 10 9 q 4?10 / Four random integers 4 5 4 2 q show l:-4?10 / Four random integers without duplicates 6 0 8 5 q 0N?l / Permutation 8 6 0 5 Python python import random random.randint 0,9 9 random.randint 0, 9 for in range 4 4, 5, 4, 2 l=random.sample range 0, 10 , 4 l 6, 0, 8, 5 random.sample l, len l 8, 6, 0, 5 Tables ¶ tables Tables are treated as first-class citizens in q, which means they are a primary data type just like integers or lists. You can think of a table from two different perspectives: A list of rows : where all rows are dictionaries of the same keys. A list of columns : where each column is a named list with values of the same length. While you can interact with a table as a list of rows, q stores them internally as a list of columns . This columnar structure is the secret to q's performance advantage in data analysis. Pandas equivalents are shown alongside each q snippet. Creating tables ¶ creating-tables In q, a dictionary is a mapping formed by two equal-length lists. A list of dictionaries forms a table when all dictionaries share the same keys: q q name: Alice; phone: "555-0101"; age: 23 ; name: Bob; phone: "555-0723"; age: 32 ; name: Mike; phone: "555-6666"; age: 22 name phone age -------------------- Alice "555-0101" 23 Bob "555-0723" 32 Mike "555-6666" 22 Pandas pd.DataFrame data = ... {"name": "Alice", "phone": "555-0101", "age": 23}, ... {"name": "Bob", "phone": "555-0723", "age": 32}, ... {"name": "Mike", "phone": "555-6666", "age": 22} ... name phone age 0 Alice 555-0101 23 1 Bob 555-0723 32 2 Mike 555-6666 22 You can create a simple table by defining its columns directly using the ... syntax: q q show t: name: Alice Bob Mike; phone: "555-0101"; "555-0723"; "555-6666" ; age: 23 32 22 name phone age -------------------- Alice "555-0101" 23 Bob "555-0723" 32 Mike "555-6666" 22 Pandas t = pd.DataFrame { ... "name": "Alice", "Bob", "Mike" , ... "phone": "555-0101", "555-0723", "555-6666" , ... "age": 23, 32, 22 ... } t name phone age 0 Alice 555-0101 23 1 Bob 555-0723 32 2 Mike 555-6666 22 Because tables are integrated into the language, you can manipulate them with standard list and dictionary syntax: q q t 1 / Get the second row name | Bob phone| "555-0723" age | 32 q avg t age / Get the average of the age column 25.66667 Pandas t.iloc 1 name Bob phone 555-0723 age 32 Name: 1, dtype: object t "age" .mean np.float64 25.666666666666668 Tables can be keyed by one or more columns. A keyed table ../ref/table.html keyed-tables is a dictionary mapping key records to a value records. You can look up rows by key values: q kt: name xkey t q kt Bob phone| "555-0723" age | 32 q-sql ¶ q-sql Alongside its functional programming model, q includes a built-in query language called q-sql. It looks similar to SQL but is more expressive and follows q's right-to-left evaluation rules. The following examples use synthetic capital markets data generated by the KDB-X datagen ../modules/datagen/overview.html module: Modules KDB-X also supports modules ../modules/module-index.html — a new feature that provides a native packaging and encapsulation mechanism for q code. You load modules directly into your q session using the use keyword. q getInMemoryTables : use kx.datagen.capmkts / Load the module q trade; quote; ; master; exnames : getInMemoryTables q trade sym time price size stop cond ex ------------------------------------------------- SOFI 0D09:30:01.180477706 214 36 0 K AMZN 0D09:30:01.490170061 92.11 90 1 T A SNAP 0D09:30:02.534750053 9 74 0 T SNAP 0D09:30:05.617603533 9 84 0 L TSLA 0D09:30:06.389750220 62.97 62 0 Z PEP 0D09:30:08.910057414 22 23 0 U Y .. q count quote / number of rows 13497 Note For brevity, we display only a limited number of rows in this document. You can set the console size by using \c ../ref/syscmds.html c-console-size . In q-sql, you don't need SELECT . If you don't specify columns, q assumes you want all of them. q select from trade where size 90 sym time price size stop cond ex ------------------------------------------------- TXN 0D09:30:18.828937844 18.02 99 0 9 GOOG 0D09:30:22.425490937 72.02 92 0 P M T 0D09:30:40.218699347 18.01 97 0 XPEV 0D09:33:31.365513849 6.01 99 0 T 0D09:33:37.277742547 18.03 93 0 X XPEV 0D09:35:00.264738568 6.01 92 0 9 SBUX 0D09:36:32.798154308 5.03 98 0 M HPQ 0D09:36:37.699847666 36.17 98 0 I N .. The real power of q-sql appears when you combine it with q's vector capabilities. For example, you can calculate total volume by exchange: q q select sum size by ex from trade ex| size --| ----- | 21579 A | 2512 B | 2191 C | 2482 D | 3227 I | 2811 .. Pandas trade.groupby 'ex' 'size' .sum size ex 21579 A 2512 B 2191 C 2482 D 3227 I 2811 .. Because q handles dictionaries and vectors natively, you can perform joins inline without complex syntax. In this example, the exnames dictionary maps exchange IDs to their full names directly: q q exnames A B / Indexing a dictionary by a list "NYSE American" "NASDAQ OMX BX" q select sum size by exnames ex from trade ex | size --------------------------------| ----- "" | 21579 "Cboe BYX Exchange" | 1796 "Cboe BZX Exchange" | 2320 "Cboe EDGA Exchange" | 2368 "Cboe EDGX Exchange" | 3097 "Chicago Broad Options Exchange"| 2551 "Chicago Stock Exchange" | 2203 .. Pandas exnames k for k in "A", "B" b'NYSE American', b'NASDAQ OMX BX' trade.groupby trade 'ex' .map exnames 'size' .sum size ex b'Cboe BYX Exchange' 1796 b'Cboe BZX Exchange' 2320 b'Cboe EDGA Exchange' 2368 b'Cboe EDGX Exchange' 3097 b'Chicago Broad Options Exchange' 2551 b'Chicago Stock Exchange' 2203 This demonstrates q's "zero noise" principle. In SQL, this would require a formal JOIN statement; in q, it is a simple dictionary lookup applied across a vector. In practice, business logic can be highly complex. q-sql lets you leverage the full expressiveness of q to implement sophisticated analyses concisely. The following statement creates a new column, pricegroup , that assigns price‑group identifiers within each symbol. Consecutive rows with the same price belong to the same price group. q update pricegroup: sums differ price by sym from select from trade where sym in SNAP SOFI sym time price size stop cond ex pricegroup ------------------------------------------------------------- SOFI 0D09:30:01.180477706 214 36 0 K 1 SNAP 0D09:30:02.534750053 9 74 0 T 1 SNAP 0D09:30:05.617603533 9 84 0 L 1 SOFI 0D09:31:10.843041058 214.26 46 0 9 2 SNAP 0D09:32:11.259991414 9.01 36 0 4 2 SOFI 0D09:33:35.131385974 214.46 68 0 5 3 Price group query explanation differ ../ref/differ.html returns a boolean list flagging each position where the value changes from its predecessor: q differ 9 9 9.01 9.01 9 9.02 101011b sums ../ref/sum.html sums accumulates these flags — booleans are treated as 0/1 in arithmetic operations — yielding a running group counter that increments at each transition: q sums differ 9 9 9.01 9.01 9 9.02 1 1 2 2 3 4i The update ... by sym clause ensures each symbol is grouped and processed independently. The expressiveness of q-sql makes complex calculations both readable and manageable. Interfaces ¶ interfaces For anyone coming from a traditional database background, KDB-X also provides a standard SQL ../modules/sql/quickstart.html interface: q .s.init / initialize SQL interface q s SELECT FROM trade WHERE size 90 / use 's ' prefix for SQL sym time price size stop cond ex ------------------------------------------------- TXN 0D09:30:18.828937844 18.02 99 0 9 GOOG 0D09:30:22.425490937 72.02 92 0 P M T 0D09:30:40.218699347 18.01 97 0 XPEV 0D09:33:31.365513849 6.01 99 0 T 0D09:33:37.277742547 18.03 93 0 X XPEV 0D09:35:00.264738568 6.01 92 0 9 SBUX 0D09:36:32.798154308 5.03 98 0 M HPQ 0D09:36:37.699847666 36.17 98 0 I N .. If you're familiar with Python, KDB-X Python kdb-x-python-overview.html is a great place to start. You can run a q process inside a Python process and use familiar syntax. python import pykx as kx load the data trade.select columns=kx.Column "size" .sum , by="ex" pykx.KeyedTable pykx.q ' ex| size --| ----- | 21579 A | 2512 B | 2191 C | 2482 D | 3227 I | 2811 .. Time-series support ¶ time-series-support q was built for time-series data. It treats temporal types times, dates, timestamps, timedeltas natively. You can cast data types on the fly, or use dot notation. For instance, using time.minute to group data by the minute and using within ../ref/within.html to restrict to a time interval: q / Average mid-price for TSLA between 1 PM and 2 PM, grouped by minute q select avgMid: avg bid + ask %2 by time.minute from quote where sym= TSLA, time within 13:00 14:00 time | avgMid -----| -------- 13:00| 64.4125 13:03| 64.66 13:04| 64.4875 13:07| 64.3425 13:08| 64.64833 13:09| 64.32 .. Pandas quote.loc ... quote 'sym' == 'TSLA' & ... quote 'time' .between pd.to timedelta '13:00:00' , pd.to timedelta '14:00:00' \ ... .assign avgMid= quote 'bid' + quote 'ask' / 2 \ ... .groupby quote 'time' .dt.floor 'min' 'avgMid' .mean avgMid time 0 days 13:00:00 64.412500 0 days 13:03:00 64.660000 0 days 13:04:00 64.487500 0 days 13:07:00 64.342500 0 days 13:08:00 64.648333 0 days 13:09:00 64.320000 Joins ¶ joins q supports standard relational joins ../ref/joins.html like left join lj and inner join ij but is most famous for its specialized temporal joins. To join metadata like company descriptions from master to trade , the following example uses sym , the key column of the master table: q q trade lj master sym time price size stop cond ex description issueprice ---------------------------------------------------------------------------------------- SOFI 0D09:30:01.180477706 214 36 0 K SoFi Technologies, Inc. 214 AMZN 0D09:30:01.490170061 92.11 90 1 T A Amazon.com, Inc. 92 SNAP 0D09:30:02.534750053 9 74 0 T Snap Inc. 9 SNAP 0D09:30:05.617603533 9 84 0 L Snap Inc. 9 TSLA 0D09:30:06.389750220 62.97 62 0 Z Tesla, Inc. 63 .. Pandas trade.join master, on="sym" sym time ... description issueprice 0 SOFI 0 days 09:30:01.180477706 ... SoFi Technologies, Inc. 214 1 AMZN 0 days 09:30:01.490170061 ... Amazon.com, Inc. 92 2 SNAP 0 days 09:30:02.534750053 ... Snap Inc. 9 ... ... ... ... ... ... 1288 AIG 0 days 15:59:59.316044754 ... AMERICAN INTL GROUP INC 27 1289 TSLA 0 days 15:59:59.652057702 ... Tesla, Inc. 63 1290 AAPL 0 days 15:59:59.808157553 ... APPLE INC COM STK 84 1291 rows x 9 columns You can run queries on the joined table: q q select open: first price, close: last price by description from trade lj master description | open close ---------------------------| ----------- ADVANCED MICRO DEVICES | 33.05 34.62 AMERICAN INTL GROUP INC | 27.03 28.9 APPLE INC COM STK | 84.1 86.92 AT&T Inc. | 18.01 19.06 .. Pandas trade.join master, on="sym" \ ... .groupby "description" "price" .agg open='first', close='last' open close description ADVANCED MICRO DEVICES 33.05 34.62 AMERICAN INTL GROUP INC 27.03 28.90 APPLE INC COM STK 84.10 86.92 AT&T Inc. 18.01 19.06 .. In financial data, trades and quotes rarely happen at the exact same time q supports nanosecond precision . An as-of join aligns two tables by finding the "prevailing" value. For every trade, aj finds the most recent quote that occurred at or before that trade's time: q / Matches each trade with the symbol's quote valid at that moment q aj sym time; trade; quote sym time price size stop cond ex bid ask bsize asize mode -------------------------------------------------------------------------------- SOFI 0D09:30:01.180477706 214 36 0 K 213.37 214.45 13 39 Q AMZN 0D09:30:01.490170061 92.11 90 1 T A 91.56 92.14 17 32 E SNAP 0D09:30:02.534750053 9 74 0 T 8.44 9.04 18 91 M SNAP 0D09:30:05.617603533 9 84 0 L 8.17 9.66 80 68 4 .. Pandas pd.merge asof trade, quote, on='time', by='sym' sym time price size stop cond ex x bid ask bsize asize mode ex y 0 SOFI 0 days 09:30:01.180477706 214.00 36 False b'K' 213.37 214.45 13 39 b'Q' 1 AMZN 0 days 09:30:01.490170061 92.11 90 True b'T' A 91.56 92.14 17 32 b'E' A 2 SNAP 0 days 09:30:02.534750053 9.00 74 False b'T' 8.44 9.04 18 91 b'M' ... ... ... ... ... ... ... ... ... ... ... ... ... ... 1288 AIG 0 days 15:59:59.316044754 28.90 69 False b'T' C 28.63 29.28 60 81 b'M' C 1289 TSLA 0 days 15:59:59.652057702 65.83 44 False b'Q' 65.24 66.22 30 26 b'4' 1290 AAPL 0 days 15:59:59.808157553 86.92 64 False b'R' D 86.82 87.90 86 76 b'B' D 1291 rows x 13 columns A window join ../ref/wj.html is a powerful generalization of the as-of join. Instead of taking just the last value, it looks at a window of time around each record and performs an aggregation like an average or max . Example: calculate the volume-weighted average price VWAP for quotes in a window starting 1 minute before and ending 5 seconds after each trade: q wj -00:01 00:00:05+\:trade.time; sym time; trade; quote; wavg; asize; ask ; wavg; bsize; bid sym time price size stop cond ex ask bid ------------------------------------------------------------------- SOFI 0D09:30:01.180477706 214 36 0 K 66.43636 65.54799 AMZN 0D09:30:01.490170061 92.11 90 1 T A 65.21634 51.99918 SNAP 0D09:30:02.534750053 9 74 0 T 57.21473 52.7337 SNAP 0D09:30:05.617603533 9 84 0 L 50.89472 52.93455 .. Each Left \: is the Each Left ../ref/maps.html each-left-and-each-right iterator. It applies the function to each element on the left, holding the right argument fixed. When the right argument is a list, the result is a nested list — one row per left element: q 1 2 +\: 100 200 300 101 201 301 102 202 302 Each left's counterpart /: is Each Right. A handy mnemonic: the pipe in \: tilts left, and the pipe in /: tilts right. Foreign keys ¶ foreign-keys You can link tables dynamically in a query using join operators, or define the relationship statically. In q, this static relationship is called a foreign key ../how to/interact with databases/foreign-keys.html , which functions similarly to foreign keys in traditional relational databases. In the previous left join example, you linked the trade and master tables on the fly using the sym column. You can make this relationship permanent by "casting" the sym column in the trade table to the master table which is keyed on sym : q update master$sym from trade trade Note The backtick before the table name trade indicates that the update happens in-place, modifying the actual table rather than returning a new copy. Once a foreign key is established, you no longer need to perform explicit joins to access information from the parent table. You can use dot notation to "reach through" the link. In the query below, notice how you access description of the master table through sym.description : q q select o: first price, c: last price by sym.description from trade description | o c ---------------------------| ------------- ADVANCED MICRO DEVICES | 33.01 34.35 AMERICAN INTL GROUP INC | 27.02 27.5 APPLE INC COM STK | 83.99 88.1 AT&T Inc. | 18.02 18.72 SQL q s SELECT description, FIRST trade.price AS o, LAST trade.price AS c FROM trade JOIN master ON trade.sym = master.sym GROUP BY description; description o c ----------------------------------------- ADVANCED MICRO DEVICES 33.05 34.62 AMERICAN INTL GROUP INC 27.03 28.9 APPLE INC COM STK 84.1 86.92 AT&T Inc. 18.01 19.06 Beyond cleaner syntax , foreign keys offer two major advantages. Queries are faster because q does not need to recalculate the entire mapping, which also means you get the second benefit of a smaller query memory footprint . Persistence ¶ persistence This tutorial has worked exclusively with in-memory objects so far. If you close your q session, these objects vanish. To keep your data, use the set function to persist it to disk. Simple persistence ¶ simple-persistence In q, file paths are represented as symbols prefixed with a colon e.g. :kdbdata . You can save any q object — variables, dictionaries, or even functions — directly to a file. q contacts: Alice: "555-0101"; Bob: "555-0723"; Mike: "555-6666" q :kdbdata/contacts set contacts / Save a dictionary q :kdbdata/callRandomFriend set {f: rand key contacts; "Calling ", string f , " at ", contacts f} q t: name: Alice Bob Mike; phone: "555-0101"; "555-0723"; "555-6666" ; age: 23 32 22 q :kdbdata/t set t / Save a table These objects are saved in a high-performance binary format. From a new q session, you can bring them back using get : q get :kdbdata/contacts Alice| "555-0101" Bob | "555-0723" Mike | "555-6666" q get :kdbdata/t name phone age -------------------- Alice "555-0101" 23 Bob "555-0723" 32 Mike "555-6666" 22 If a directory contains multiple kdb+ files, you can load the entire directory at once using the \l command. This automatically assigns the file names as variable names in your session: q \l kdbdata / Load everything in the 'kdbdata' folder q contacts / 'contacts' is now available in the workspace Alice| "555-0101" Bob | "555-0723" Mike | "555-6666" q callRandomFriend "Calling Alice at 555-0101" q t name phone age -------------------- Alice "555-0101" 23 Bob "555-0723" 32 Mike "555-6666" 22 Scaling up: splaying and partitioning ¶ scaling-up-splaying-and-partitioning While the approach above is fine for small objects, it has an important limitation: it copies the entire file into your RAM with the exception of homogeneous list files . For analysts working with gigabytes or terabytes of data, this isn't feasible. For better performance, you "splay" a table — meaning q saves each column as its own individual file. This allows q to perform columnar I/O: if you only want to calculate the average price, q only reads the price file and ignores size, time, and ex. To handle massive datasets, tables are divided into partitions , typically by date . This example uses the datagen module to build a multi-day, partitioned database on disk: q getInMemoryTables; buildPersistedDB : use kx.datagen.capmkts / Load the module q buildPersistedDB "/tmp/kdbdb"; 10000; start: 2026.02.01; end: 2026.02.02 If you look at the file system, you should see a clean, hierarchical structure: bash $ tree /tmp/kdbdb /tmp/kdbdb ├── 2026.02.01 │ ├── quote │ │ ├── asize │ │ ├── ask │ │ ├── bid │ │ ├── bsize │ │ ├── ex │ │ ├── mode │ │ ├── sym │ │ └── time │ └── trade │ ├── cond │ ├── ex │ ├── price │ ├── size │ ├── stop │ ├── sym │ └── time ├── 2026.02.02 │ ├── quote │ │ ├── asize │ │ ├── ask │ │ ├── bid │ │ ├── bsize │ │ ├── ex │ │ ├── mode │ │ ├── sym │ │ └── time │ └── trade │ ├── cond │ ├── ex │ ├── price │ ├── size │ ├── stop │ ├── sym │ └── time ├── daily ├── exnames ├── master └── sym 4 directories, 25 files When you load a partitioned database with \l , KDB-X does not "load" the data - instead, it memory-maps it. Memory-mapping Memory mapping https://en.wikipedia.org/wiki/Memory-mapped file is a technique that maps on-disk files directly into a process's address space, bypassing the usual copy from disk into process memory buffers. This eliminates copying overhead and lets the OS manage data access efficiently through its virtual memory subsystem. For a deeper dive, see the KX blog on memory mapping https://kx.com/blog/memory-mapping-in-kdb/ . q \l /tmp/kdbdb You can run q-sql and SQL queries on the mapped KDB-X database. KDB-X reads only the data your query needs. q select sum size by 0D00:10 xbar time from trade where date=last date time | size --------------------| ------ 0D09:30:00.000000000| 105690 0D09:40:00.000000000| 53574 0D09:50:00.000000000| 48170 0D10:00:00.000000000| 41788 0D10:10:00.000000000| 36279 .. In the above query: - q only looks inside the 2026.02.02 folder ignoring all other days - q only reads the size and time files ignoring price , ex , etc. This technique lets you analyze datasets much larger than your physical memory. For example, you can query a 10 TB database on a laptop with 16 GB of RAM if you aggregate the data or request only a subset of columns or dates at a time. Open formats ¶ open-formats KDB-X is not limited to its native, highly optimized binary format. It supports a range of open and industry-standard data formats to enable interoperability with the broader data ecosystem. KDB-X natively supports Parquet , one of the most widely adopted open columnar formats. You can run q-sql queries directly against Parquet files without any conversion step. Furthermore, virtual tables allow you to mix Parquet and kdb+ data in the same query, providing a unified q-sql interface regardless of the underlying storage format. For broader ecosystem integration, the KX Fusion libraries provide connectors to other open formats, including Apache Arrow, Avro, and HDF5, among others. Performance ¶ performance KDB-X isn't just a database; it is fundamentally a vector processing engine . Its performance comes from its ability to treat data as contiguous blocks of memory, allowing it to leverage modern CPU features and massive parallelization. Hardware acceleration SIMD ¶ hardware-acceleration-simd At its core, q is optimized for SIMD Single Instruction, Multiple Data . This allows the CPU to perform the same operation like addition or multiplication on multiple data points in a single clock cycle. When you add two columns in q, you aren't just looping; you are engaging the hardware's vector lanes. Parallel processing ¶ parallel-processing KDB-X can distribute workloads across multiple CPU cores. By starting your q process with the -s flag, you enable secondary threads: q /tmp/kdbdb -s 4 Enable 4 secondary threads for parallel execution When you run an aggregation like sum or avg on a long vector, q automatically splits the vector into chunks ../ref/mt-primitives.html peach-vs-implicit-parallelism , processes them in parallel across your cores, and combines the result a "map-reduce" pattern . This also applies to partitioned data : KDB-X can scan multiple days of data simultaneously. For even larger scales, you can use segmented databases ../how to/interact with databases/segment.html to spread data across multiple physical disks. This enables parallel I/O, allowing you to read terabytes of data at the speed of your hardware's combined throughput — all without changing a single line of your q-sql code. Attributes: the "secret sauce" ¶ attributes-the-secret-sauce In traditional databases, you create indexes. In q, you apply attributes . These are metadata labels that tell the q engine about the structure of your data, allowing it to choose the fastest possible algorithm for a query, as these two examples show: Sorted s : Applied to an ordered column like time . It enables binary search \ O \log n \ , making lookups nearly instantaneous. Parted p : Typically used for the main identifier column like sym in on-disk databases. It tells q that all identical symbols are stored in contiguous blocks. This allows q to jump straight to the start of a symbol's data and read it in one burst. You can check the attributes of a table using the meta command. The a column below shows the parted attribute for sym : q meta trade c | t f a -----| ----- date | d sym | s p time | n price| f size | j stop | b cond | c ex | s By using the parted p attribute on sym , a query for a single ticker like select from trade where sym in AAPL GOOG doesn't need to scan the whole sym vector; it knows exactly where AAPL and GOOG data start and end on the disk. Less I/O means faster queries. Acting as a database ¶ acting-as-a-database While this guide so far has used q primarily as a standalone analysis tool, its true power lies in its ability to act as a high-performance database server. By specifying a port with the -p parameter, you can enable network connectivity: q /tmp/kdbdb -s 4 -p 5100 Once the process is listening, anyone with network access can connect to your session and query your data. Common ways to connect include: - Another q process - A web browser via built-in HTTP support - VS Code using the KDB-X extension https://code.kx.com/vscode/ - Jupyter Notebooks using pykx kdb-x-python-overview.html KX Developer https://code.kx.com/developer/ or KX Analyst https://code.kx.com/analyst/ KX Dashboard https://code.kx.com/dashboards/ - Third-party IDEs like Kdb Studio https://github.com/finos/kdb-studio The following sections begin exploring the first two options but only scratch the surface of what is possible . Connect from another q process ¶ connect-from-another-q-process In a separate terminal, start a second q session. Use the hopen command to create a connection handle to the server: q h: hopen 5100 / Opens a connection to localhost:5100. 'h' is our "handle". Now you can send commands through that handle. The simplest way is to pass a query as a string: q h "select nr: count i by sym from trade" sym | nr ----| ---- AAPL| 1940 AIG | 1906 AMD | 1973 AMZN| 1934 .. q Sending strings is easy, but can be inconvenient, especially when you pass parameters. q also supports functional form . You define a function on the server, and the client calls it by passing the function name and arguments in a list. On the Server : / Define a "Stored Procedure" to get basic stats for a specific symbol q getTradeStatOf: { x select nr: count i, sum size, avgprice: avg price from trade where sym=x} On the Client : q h getTradeStatOf; TSLA / Simpler and safer than string manipulation nr size avgprice -------------------- 1914 103341 65.97574 Connect from a web browser ¶ connect-from-a-web-browser Every q process started with -p is also a lightweight web server. This is incredibly useful for quick inspections. If you navigate to http://localhost:5100 in your browser, you can see all the variables currently in memory. Click on a variable to see its content. You can even execute queries directly from the URL bar by appending a ? followed by your q code: The advantage of a unified architecture ¶ the-advantage-of-a-unified-architecture Traditional enterprise architectures suffer from "impedance mismatch." In these systems, data is stored in a relational database while business logic is written in a separate application layer using languages like Java, Rust, or Python. This separation creates significant friction: a substantial amount of engineering resources is wasted on Object-Relational Mapping ORM and data serialization — simply translating data from database rows into programming objects. Furthermore, to improve performance, developers often split logic between the two layers using brittle stored procedures, creating a fragile environment that is difficult to synchronize, test, and maintain. KDB-X eliminates this overhead by providing a unified framework. In the q ecosystem, there is no distinction between the database and the programming language; the table is a native data structure. Business logic lives directly alongside the data, allowing for complex calculations to be executed where the data resides rather than moving massive datasets across a network to an application server. This proximity ensures that data traversal is minimized, resulting in performance gains that would be impossible in a multi-tier architecture. This architectural simplicity translates into a significantly lower Total Cost of Ownership TCO for organizations. By collapsing the stack into a single layer, organizations reduce their hardware footprint and simplify their deployment pipelines. Maintenance becomes more straightforward because there is a single environment for both data and logic. Ultimately, this allows smaller teams of "dev-analysts" to build and support systems that would typically require large, specialized departments in a traditional software stack. From language to architecture: kdb+ tick ¶ from-language-to-architecture-kdb-tick Everything described so far – the vector engine, columnar tables, q-sql, memory-mapped partitions, and the database server – forms the q programming language and its runtime. But q is not just a tool for analysts; it is a platform for building production‑grade systems. The canonical example is kdb+ tick ../how to/manage streaming data/architecture.html , the most widely used architecture ever implemented in q. Released https://github.com/KxSystems/kdb-tick in the early 2000s, kdb+ tick is a complete, production-grade streaming data architecture for capturing, storing, and querying high-frequency time-series data in real time. Its most remarkable feature is its size: the entire system is implemented in just 34 lines of q code . There is no boilerplate and no scaffolding – only the essential logic required to ingest and publish real‑time data. Despite its brevity, kdb+ tick has been deployed at the majority of the world's leading investment banks and financial institutions for over two decades. It processes billions of financial events – trades, quotes, order book updates – every single trading day , making it one of the most battle-tested real-time data systems ever built for electronic trading. Three processes, one architecture ¶ three-processes-one-architecture kdb+ tick separates responsibility across three specialized q processes, each optimized for a specific function. The Tickerplant TP is a low-latency, high-volume publish-subscribe hub that decouples data publishers from their subscribers. The Real-Time Database RDB subscribes to the tickerplant and collects today's data entirely in memory. New records become queryable within milliseconds, and the columnar in-memory layout enables complex analytical queries over millions of intraday rows to execute in microseconds. The Historical Database HDB stores all previous days' data using the splayed, partitioned layout described earlier in the persistence persistence section. It memory-maps this data rather than loading it into RAM, allowing the system to address petabytes of historical time-series while reading only the columns and partitions required for a query. The architecture addresses failure scenarios . For example, if the RDB process exits unexpectedly – because of hardware faults, operating‑system signals, or unbounded queries – it automatically recovers on restart. The HDB scales horizontally to support hundreds of concurrent users through TCP socket sharding , a technique built into the q runtime. Because q's memory-mapped data is inherently read-only and shared across threads, the system requires no locking and performs no data copying. Increasing capacity is purely a configuration change, not a code change. kdb+ tick is the clearest demonstration of what the q ecosystem was designed to enable. The language is not just a query tool bolted onto a database; it is a substrate from which entire systems can be composed. The 34 lines of code that implement kdb+ tick have processed more financial data than almost any other software system ever written, precisely because q eliminates everything that does not directly contribute to solving the problem at hand. Next steps ¶ next-steps - Read Q for Mortals q4m/index.html if you prefer a book‑style introduction with more detail. - Explore other tutorials tutorials and examples.html to continue your learning journey.