Functions starting with m_
allocate a new object.
Functions starting with q_
are queries/predicates, and return a boolean.
Functions ending with R
are either supposed to be called rarely, or the caller expects that a part of it happens rarely.
Functions ending with N
are non-inlined versions of another one.
Functions ending with F
are rarely invoked fallback parts of a function.
Functions ending with P
take a pointer argument.
Functions ending with U
return (or take) a non-owned object (U
= "unincremented").
Functions ending with _c1
are monadic implementations, _c2
are dyadic (see builtin implementations)
Variables starting with bi_
are builtins (primitives or special values).
Which arguments are consumed usually is described in a comment after the function or its prototype. Otherwise, check the source.
src/
builtins/
sfns.c structural functions
fns.c other functions
arithd.c dyadic arithmetic functions
arithm.c monadic arithmetic functions (incl •math stuff)
cmp.c dyadic comparison functions
sort.c sort/grade/bins
md1.c 1-modifiers
md2.c 2-modifiers
sysfn.c •-definitions
internal.c •internal
utils/ utilities included as needed
file.h file system operations
hash.h hashing things
mut.h copying multiple arrays into a single array
talloc.h temporary allocations (described more below)
utf.h UTF-8 things
opt/ files which aren't needed for every build configuration
gen/ generated files
jit/ simple JIT compiler for x86-64
core/ things included everywhere
h.h core CBQN definitions
builtins.h definitions of all built-in functions (excluding things defined by means of nfns.c)
core.h file imported everywhere that defines the base BQN model
nfns.c native functions for things that need to keep some state (e.g. •FLines needs to also hold the path its relative to)
load.c loads the self-hosted compiler, runtime and formatter, initializes CBQN globals
main.c main function & commandline stuff
ns.c namespaces
vm.c virtual machine interpreter
)
c1
,c2
inh.h
- correspondingly monadically or dyadically invoke a functionevalBC
invm.c
- VM bytecode interpreterslash_c2
inbuiltins/sfns.c
- implementation of𝕨/𝕩
GC2i
&GC2f
invocations inbuiltins/arithd.c
- dyadic pervasive builtins- See
load.c
fruntime
items for more builtins (remove leadingbi_
& append_c1
/_c2
to get the implementation function) load_init
inload.c
- loads the BQN runtime & compilerbqn_comp
inload.c
- execute BQN code from a stringBN(allocL)
inopt/mm_buddyTemplate.h
- fast path of buddy memory allocator; invoked fromopt/mm_buddy.h
B
represents any BQN object. It's a 64-bit NaN-boxed value; some of the NaN-boxed values, determined by the top 16 bits, are heap-allocated (i.e. low 48 bits are a Value*
), some aren't:
Type checks (all are safe to execute on any B object):
test tag description heap-allocated
isF64(x) F64_TAG a number no
isChr(x) C32_TAG a character no
isAtm(x) [many] !isArr(x) depends
isVal(x) [many] heap-allocated yes
isFun(x) FUN_TAG a function yes
isMd1(x) MD1_TAG a 1-modifier yes
isMd2(x) MD2_TAG a 2-modifier yes
isMd (x) [many] any modifier yes
isCallable(x) [many] isFun|isMd yes
isArr(x) ARR_TAG an array type yes
isNsp(x) NSP_TAG a namespace yes
isObj(x) OBJ_TAG internal yes
and then there are some extra types for variable slot references for the VM & whatever; see h.h *_TAG definitions
Functions for converting/using basic types:
m_f64(x) // f64 → B
m_c32(x) // codepoint → B
m_i32(x) // i32 → B
m_usz(x) // usz → B
o2i(x) // B → i32, throw if impossible
o2i64(x) // B → i64, throw if impossible
o2u64(x) // B → u64, throw if impossible
o2s(x) // B → usz, throw if impossible
o2c(x) // B → c32, throw if impossible
o2f(x) // B → f64, throw if impossible
o2iu(x) // B → i32, assumes is valid (u ≡ unchecked)
o2i64u(x) // B → i64, assumes is valid
o2su(x) // B → usz, assumes is valid
o2cu(x) // B → c32, assumes is valid
o2fu(x) // B → f64, assumes is valid
o2b(x) // B → bool, throw if impossible
q_TYPE(x) // query if x is convertible to TYPE (see definitions in h.h)
q_N(x) // query if x is · (≡ bi_N)
noFill(x) // if x represents undefined fill (returned by getFill*; ≡ bi_noFill)
tag(x,*_TAG) // pointer → B
See src/h.h for more basics
All heap-allocated objects have a type - t_i32arr
, t_f64slice
, t_funBl
, t_temp
, etc. Full list is at #define FOR_TYPE
in src/h.h
.
An object can be allocated with mm_alloc(sizeInBytes, t_something)
. The returned object starts with the structure of Value
, so custom data must be after that. mm_free
can be used to force-free an object regardless of its reference count.
A heap-allocated object from type B
can be cast to a Value*
with v(x)
, to an Arr*
with a(x)
, or to a specific pointer type with c(Type,x)
.
TY(x)
/ PTY(x)
givs you the type of an object (one of t_whatever
), which is used to dynamically determine how to interpret an object. Note that the type is separate from the tag used for NaN-boxing.
The reference count of any B
object can be incremented/decremented with inc(x)
/dec(x)
, and any subtype of Value*
can use ptr_inc(x)
/ptr_dec(x)
. inc(x)
and ptr_inc(x)
will return the argument, so you can use it inline. dec(x)
and ptr_dec(x)
will return the object to the memory manager if the refcount as a result goes to zero.
Since reference counting is hard, there's make heapverify
that verifies that any code executed does it right (and screams unreadable messages when it doesn't). After any changes, I'd suggest running test/mainCfgs.sh path/to/mlochbaum/BQN
, which'll run a couple primary configurations, including said heapverify
.
Temporary allocations can be made with utils/talloc.h
:
#include "utils/talloc.h"
TALLOC(char, buf, 123); // allocate char* buf with 123 elements
// buf is now a regular char* and can be stored/passed around as needed
TREALLOC(buf, 456); // extend buf
TFREE(buf); // free buf
// if the size is guaranteed to be small enough, using VLAs is fine i guess
TSALLOC(i32, stack, 10); // allocate an i32 stack with initially reserved 10 items (initial reserve must be positive!)
TSADD(stack, 15); // add a single item
TSADD(stack, (i32*){1,2,3}, 3); // add many items
usz sz = TSSIZE(stack); // get the current height of the stack
i32 item = stack[1]; // get the 2nd item
TSFREE(stack); // free the stack
// note that TSALLOC creates multiple local variables, and as such cannot be passed around to other functions
All virtual method accesses require that the argument is heap-allocated.
You can get a virtual function of a B
object with TI(x, something)
. There's also TIv(x, something)
for a pointer x
instead. See #define FOR_TI
in src/h.h
for available functions.
Call a BQN function object with c1(f, x)
or c2(f, w, x)
. A specific builtin can be called by looking up the appropriate name in src/builtins.h
, adding the bi_
prefix, and invoking it with c1
/c2
. Note that these functions consume w
and x
, but leave the refcout of f
untouched. (usually, which arguments are consumed is specified in a comment after either the function definition or prototype)
Calling a modifier involves deriving it with m1_d
/m2_d
, using a regular c1
/c2
, and managing the refcounts of everything while at that.
The list of builtin functions is specified in the initial macros of src/builtins.h
, where A
/M
/D
are used for ambivalent/monadic/dyadic. Once added, bi_yourName
will be available, and the required of the following functions must be defined somewhere in the source:
// functions:
B yourName_c1(B t, B x);
B yourName_c2(B t, B w, B x);
// 1-modifiers:
B yourName_c1(Md1D* d, B x);
B yourName_c2(Md1D* d, B w, B x);
// 2-modifiers:
B yourName_c1(Md2D* d, B x);
B yourName_c2(Md2D* d, B w, B x);
For functions, in most cases, the t
parameter (representing 𝕊
/"this") is unused (it must be ignored for functions managed by builtins.h
), but can be used for objects from nfns.h
to store state with a function.
For modifiers, the d
parameter stores the operands and the modifier itself. Use d->f
for 𝔽
, d->g
for 𝔾
, d->m1
for _𝕣
, d->m2
for _𝕣_
, and tag(d,FUN_TAG)
for 𝕊
.
The implementation should consume the w
/x
arguments, but not t
/d
.
// im - monadic inverse
// ix - 𝕩-inverse - w⊸𝔽⁼ 𝕩 aka 𝕨 F⁼ 𝕩
// iw - 𝕨-inverse - 𝔽⟜x⁼ w
// the calls for these must be in some `whatever_init()` function, and apply only to builtins specified in builtins.h
c(BFn,bi_someFunction)->im = someFunction_im; // set the monadic inverse; someFunction_im has the signature of a regular monadic call implementation
c(BFn,bi_someFunction)->ix = someFunction_ix; // etc
c(BFn,bi_someFunction)->iw = someFunction_iw;
c(BMd1,bi_some1mod)->ix = some1mod_ix;
c(BMd2,bi_some2mod)->im = some2mod_im; // you get the idea
// for new types, the appropriate virtual functions (fn_im/fn_is/fn_iw/fn_ix/m1_im/m1_iw/m1_ix/m2_im/m2_iw/m2_ix) can be set
There exist various macros to view the main metadata of an array:
operation | B x; |
Value* x / Arr* x / etc |
result type |
---|---|---|---|
get shape | SH(x) |
PSH(x) |
usz* |
get item amount (product of shape) | IA(x) |
PIA(x) |
usz |
get rank | RNK(x) |
PRNK(x) |
ur |
set rank | SRNK(x) |
SPRNK(x) |
N/A |
The shape pointer of a rank 0 or 1 array will point to the object's own ia
field (the one read by IA(x)
). Otherwise, it'll point inside a t_shape
object (ShArr*
's a
field).
Allocating an array:
i32* rp; B r = m_i32arrv(&rp, 123); // allocate a 123-element i32 vector
i32* rp; B r = m_i32arrc(&rp, x); // allocate an array with the same shape as x (x must be an array; x isn't consumed)
i32* rp; Arr* r = m_i32arrp(&rp, 123); // allocate a 123-element i32-array without allocating shape
// then do one of these:
arr_shVec(r); // set shape of r to a vector
usz* sh = arr_shAlloc(r, 4); // allocate a rank 4 shape; write to sh the individual items; sh will be NULL for ranks 0 and 1
arr_shCopy(r, x); // copy the shape object of x (doesn't consume x)
B result = taga(r);
// see stuff.h for m_shArr/arr_shSetI/arr_shSetU for ways to batch-assign a single shape object to multiple objects
u32* rp; B r = m_c32arrv(%rp, 10); // 10-char string
// etc for m_(i8|i16|i32|c8|c16|c32|f64)arr[vcp]
// arbitrary object arrays:
// initialized with all elements being 0.0s, which you can replace with `r.a[i]=val`, and get the result with `r.b`; simple, but may not be optimal
HArr_p r = m_harr0v(10); // new 10-item vector
HArr_p r = m_harr0c(10, x); // new 10-item array with the same shape as x
HArr_p r = m_harr0p(10); // new 10-item array without any set shape. Use the arr_shWhatever(r.c, …)
// safe known size array creation without preinitialization:
M_HARR(r, 123) // allocate a 123-item arbitrary object array
HARR_ADD(r, i, val); // write val to the next position in the array. The 'i' variable is just a hint, all calls must be consecutive either way
HARR_ADDA(r, val); // the above but without needing the useless 'i' parameter
// then do one of these to get the finished object:
B result = HARR_FV(r); // sets shape to a vector
B result = HARR_FC(r, x); // copies the shape of x, doesn't consume x
B result = HARR_FCD(r, x); // copies the shape of x and consumes it
usz* sh = HARR_FA(r, 4); // allocate shape for a rank 4 array. To get the result `B` object, do HARR_O(r).b later
// If at any point you want to free the object before finishing it, do HARR_ABANDON(r)
// If you're sure GC cannot happen (that includes no allocating) before all items in the array are set, you can use:
HArr_p r = m_harrUv(10); // 10-item vector
HArr_p r = m_harrUc(10, x); // 10-item array with the same shape as x
HArr_p r = m_harrUp(10); // 10-item array without any set shape. Use the arr_shWhatever(r.c, …)
// you can use withFill to add a fill element to a created array (or manually create a fillarr, see src/core/fillarr.h)
B r = m_c32vec(U"⟨1⋄2⋄3⟩", 7); // a constant string with unicode chars
B r = m_c32vec_0(U"⟨1⋄2⋄3⟩"); // ..or with implicit length
B r = m_c8vec("hello", 5); // a constant ASCII string
B r = m_c8vec_0("hello"); // ..or with implicit length
B r = utf8Decode("⟨1⋄2⋄3⟩", 17) // decode UTF-8 from a char*
B r = utf8Decode_0("⟨1⋄2⋄3⟩") // ..or with implicit length
#include "utils/utf.h"
u64 sz = utf8lenB(x); TALLOC(char, buf, sz+1); toUTF8(x, buf); buf[sz]=0; /*use buf as a C-string*/ TFREE(buf);
// src/utils/mut.h provides a way to build an array by copying parts of other arrays
// some functions for specific cases:
B r = m_unit(x); // equivalent to <𝕩
B r = m_hunit(x); // like the above, except no fill is set
B r = m_atomUnit(x); // if x is likely to be an atom, this is a better alternative to m_unit
B r = m_hVec1(a); // ⟨a⟩
B r = m_hVec2(a,b); // ⟨a,b⟩
B r = m_hVec3(a,b,c); // ⟨a,b,c⟩
B r = m_hVec4(a,b,c,d); // ⟨a,b,c,d⟩
B r = emptyHVec(); // an empty vector with no fill
B r = emptyIVec(); // an empty integer vector
B r = emptyCVec(); // an empty character vector
B r = emptySVec(); // an empty string vector
// rank can be manually set for an exsiting object with sprnk or srnk, but be careful to keep the shape object in sync!
Retrieving data from arrays:
// generic methods:
SGet(x) // initializes the getter for fast reads; the argument must be a variable name
B c = Get(x,n); // in a loop, reating the n-th item
SGetU(x)
B c = GetU(x,n); // alternatively, GetU can be used to not increment the result. Useful for temporary usage of the item
B c = IGet(x,n); // skip the initialize/call separation; don't use in loops
B c = IGetU(x,n);
// for specific array types:
if (TI(x,elType)==el_i32) i32* xp = i32any_ptr(x); // for either t_i32arr or t_i32slice; for t_i32arr only, there's i32arr_ptr(x)
if (TI(x,elType)==el_c32) u32* xp = c32any_ptr(x); // ↑
if (TI(x,elType)==el_f64) f64* xp = f64any_ptr(x); // ↑
if (TY(x)==t_harr) B* xp = harr_ptr(x);
if (TY(x)==t_harr || TY(x)==t_hslice) B* xp = hany_ptr(x); // note that elType==el_B doesn't imply hany_ptr is safe!
if (TY(x)==t_fillarr) B* xp = fillarr_ptr(x);
B* xp = arr_bptr(x); // will return NULL if the array isn't backed by contiguous B*-s
// functions to convert arrays to a specific type array: (all consume their argument)
I8Arr* a = toI8Arr(x); // convert x to an I8Arr instance (returns the argument if it already is)
I8Arr* a = cpyI8Arr(x); // get an I8Arr with reference count 1 with the same items
B a = toI8Any(x); // get an object which be a valid argument to i8any_ptr
// same logic applies for:
// toBitArr/toI8Arr/toI16Arr/toI32Arr/toF64Arr/toC8Arr/toC16Arr/toC32Arr/toHArr
// cpyBitArr/cpyI8Arr/cpyI16Arr/cpyI32Arr/cpyF64Arr/cpyC8Arr/cpyC16Arr/cpyC32Arr/cpyHArr
// toI8Any/toI16Any/toI32Any/toF64Any/toC8Any/toC16Any/toC32Any
Throw an error with thrM("some message")
or thr(some B object)
or thrOOM()
. What to do with active references at the time of the error is TBD when GC is actually completed, but you can just not worry about that for now.
A fancier message can be created with thrF(message, …)
with printf-like (but different!!) varargs (source in do_fmt
):
%i decimal i32 (also for i8/i16/ur)
%l decimal i64
%ui decimal u32 (also for u8/u16)
%ul decimal u64
%xi hex u32
%xl hex u64
%s decimal usz
%f f64
%p pointer
%c char
%S char* C-string consisting of ASCII
%U char* of UTF-8 data
%R a B object of a number or string (string is printed without quotes or escaping)
%H the shape of a B object
%B a B object, formatted by •Repr (be very very careful to not give a potentially large object, which'd lead to unreadably long messages!)
%% "%"
See #define CATCH
in src/h.h
for how to catch errors.
Use assert(predicate)
for checks (for optimized builds they're replaced with if (!predicate) invoke_undefined_behavior();
so it's still invoked!!). UD;
can be used to explicitly invoke undefined behavior (equivalent in behavior to assert(false);
), which is useful for things like untaken default
branches in switch
statements.
There's also err("message")
that (at least currently) is kept in optimized builds as-is, and always kills the process on being called.
A couple functions for usage in GDB are defined:
void g_pst() // print a CBQN stacktrace; might not work if paused in the middle of stackframe manipulation, but it tries
void g_p(B x) // print x
void g_i(B x) // print •internal.Info x
void g_pv(Value* x) // g_p but for an untagged value
void g_iv(Value* x) // g_i but for an untagged value
Value* g_v(B x) // untag a value
Arr* g_a(B x) // untag a value to Arr*
B g_t (void* x) // tag pointer with OBJ_TAG
B g_ta(void* x) // tag pointer with ARR_TAG
B g_tf(void* x) // tag pointer with FUN_TAG
// invoke with "p g_p(whatever)"; for g_pst, you may need to do "p (void)g_pst()" in non-debug builds