There are only nine types in the language, a 32-bit integer, a 32-bit float (without denormals), resizable array, string (just a different variant of an array), shared array (array of fixed length and element size), a hash table, function references, weak references and native handles.
The array type is automatically compressed to use only unsigned bytes or shorts if the values don't exceed the ranges required for these smaller types. This allow to efficiently store byte buffers and Unicode strings.
The float type reuses the mark used for array references by limiting the valid array references to use only 23 bits (the size of stored significand). If the integer bitwise representation of a (positive) float value is in this range it denotes a denormalized number. Since the utility of such numbers is often negative (many CPUs have slow fallback or don't implement them at all) and have very limited upsides, they're flushed to zero and thus provide ability to share floats with array references without collisions.
The language lacks direct support for 64-bit integers and floats. However intrinsic functions are provided to support arithmetic with bigger numbers and 64-bit floats (doubles).
The boolean type is using zero for the false value and any non-zero (including array references as these have the integer value bigger than zero) for the true value. This is used in several statements when testing the conditions.
Objects are defined and used using a convention and stored in arrays. The auto-incrementing constants are used to define object fields as well as the size of the object. For example:
const { @OBJ_field1, @OBJ_field2, OBJ_foo, OBJ_bar, OBJ_SIZE };
The @
is used to mark private fields. Extending of objects is also possible (given that SIZE
constant is not private):
const { SUBCLASS_some = OBJ_SIZE, SUBCLASS_more_fields, SUBCLASS_SIZE };
To create and extend instances of objects you can use the following functions:
function obj_create(foo, bar) { var obj = object_create(OBJ_SIZE); obj->OBJ_foo = foo; obj->OBJ_bar = bar; return obj; } function subclass_create(foo, bar, some) { var subclass = object_extend(obj_create(foo, bar), SUBCLASS_SIZE); subclass->OBJ_foo = null; subclass->SUBCLASS_some = some; return subclass; }
Alternative syntax (using token processing):
class Object { var @foo: Foo; var @bar: Bar; constructor create(foo: Foo, bar: Bar) { this.foo = foo; this.bar = bar; } } class Subclass: Object { var @some; constructor create(foo: Foo, bar: Bar, some) { super::create(foo, bar); this.foo = null; this.some = some; } }
To access fields you can use the ->
operator. Which is just a better syntax for
array access using named constant.
obj->OBJ_foo = 5; obj[OBJ_foo] = 5; // the same
There are standard operators for addition (+
), subtraction (-
), multiplication
(*
), division (/
), remainder (%
), bitwise AND (&
),
bitwise OR (|
), bitwise XOR (^
), bitwise negation (~
), boolean
negation (!
), signed comparison (<
, <=
, >=
,
>
), equivalency comparison (==
, !=
), signed shifts
(<<
, >>
), unsigned shift (>>>
), logical AND
(&&
) and logical OR (||
).
You can also combine these with an assignment with these combined operators: +=
, -=
,
*=
, /=
, %=
, &=
, |=
, ^=
,
<<=
, >>=
and >>>=
.
The logical AND and OR operators are short-circuiting the evaluation of the arguments.
Integer literals can be specified as a decimal or a hexadecimal number. Hexadecimal numbers are
prefixed with 0x
. You can also use character literal which simply gets the corresponding
Unicode value as an integer. Character literal is a single character enclosed with single quotes
('
), you can use several escape sequences to get all kinds of characters. You can also
specify up to 4 characters (in LATIN1, having values 0-255) these are combined as individual bytes
of the resulting 32-bit integer stored in little endian format.
String literals are enclosed by double quotes ("
). String literals are read only, to make
them writable use string concatenation (for example: {"mutable string"}
) which
creates a new string instance.
This is the list of escape sequences for string and character literals:
\r |
CR (13, 0x0D) |
\n |
LF (10, 0x0A) |
\t |
TAB (9, 0x09) |
\\ |
backslash (92, 0x5C) |
\' |
apostrophe (39, 0x27), not needed in string literals |
\" |
quotes (34, 0x22), not needed in char literals |
\XX |
8-bit value (each X is a hex number) |
\uXXXX |
16-bit Unicode code point, excluding surrogate pairs 0xD800..0xDFFF (each X is a hex number) |
\UXXXXXX |
21-bit Unicode code point with maximum of 0x10FFFF, excluding surrogate pairs 0xD800..0xDFFF (each X is a hex number) |
Extended operator is wrapped in the {
and }
symbols. It has five forms:
{}
{"value of PI: ", 3.141592654}
+
,-
,*
,/
), for example:
{1.0 * 2.0}
{"key": "value", 123: 456}
{ var a = func(); =a*a }
The string concatenation takes the string representation of each element and concatenates them together.
The single element form is often used to create mutable instance of the constant string. For example:
{"this is now a mutable string"}
unary | ~ ! + - ++ -- |
multiplicative | * / % |
additive | + - |
bitwise | << >> >>> & | ^ |
comparison | < <= > >= == != === !== |
logical | && || |
ternary | ?: |
assignment | = += -= *= /= %= &= |= ^= <<= >>= >>>= |
The operator operands are processed from the left to the right for the whole expression (including the assignments). The exception is for the conditional operators (ternary operator and short-circuiting logical operators). The operators themselves are applied according the precedence table.
This has effect on expressions containing pre/post increments/decrements, inner assignments, statement expressions, function calls, etc.
There are three predefined constants null
(0), false
(0) and true
(1) to make the intent of the code more clear.
These functions are specially handled by the interpreter, using direct bytecodes
for better performance. The float functions can also work with doubles (and 64bit
integers for float
and int
functions), having each
parameter taking two slots with low order half first.
length(value)
min(a, b)
max(a, b)
clamp(x, min, max)
abs(value)
add32(a, b)
add32(a, b, carry)
sub32(a, b)
sub32(a, b, borrow)
mul32(a, b)
add64(a_lo, a_hi, b_lo, b_hi)
sub64(a_lo, a_hi, b_lo, b_hi)
mul64(v1, v2)
umul64(v1, v2)
mul64
function.
mul64(v1_lo, v1_hi, v2_lo, v2_hi)
div64(v1_lo, v1_hi, v2_lo, v2_hi)
udiv64(v1_lo, v1_hi, v2_lo, v2_hi)
div64
function.
rem64(v1_lo, v1_hi, v2_lo, v2_hi)
urem64(v1_lo, v1_hi, v2_lo, v2_hi)
rem64
function.
float(a)
float(lo, hi)
int(a)
int(lo, hi)
fconv(a)
fconv(lo, hi)
fadd(a_lo, a_hi, b)
fadd(a_lo, a_hi, b_lo, b_hi)
fsub(a_lo, a_hi, b)
fsub(a_lo, a_hi, b_lo, b_hi)
fmul(a_lo, a_hi, b)
fmul(a_lo, a_hi, b_lo, b_hi)
fdiv(a_lo, a_hi, b)
fdiv(a_lo, a_hi, b_lo, b_hi)
fcmp_lt(a_lo, a_hi, b)
fcmp_lt(a_lo, a_hi, b_lo, b_hi)
fcmp_le(a_lo, a_hi, b)
fcmp_le(a_lo, a_hi, b_lo, b_hi)
fcmp_gt(a_lo, a_hi, b)
fcmp_gt(a_lo, a_hi, b_lo, b_hi)
fcmp_ge(a_lo, a_hi, b)
fcmp_ge(a_lo, a_hi, b_lo, b_hi)
fcmp_eq(a_lo, a_hi, b)
fcmp_eq(a_lo, a_hi, b_lo, b_hi)
fcmp_ne(a_lo, a_hi, b)
fcmp_ne(a_lo, a_hi, b_lo, b_hi)
fabs(a)
fabs(lo, hi)
fmin(a, b)
fmin(a_lo, a_hi, b)
fmin(a_lo, a_hi, b_lo, b_hi)
fmax(a, b)
fmax(a_lo, a_hi, b)
fmax(a_lo, a_hi, b_lo, b_hi)
fclamp(x, min, max)
fclamp(x_lo, x_hi, min, max)
fclamp(x_lo, x_hi, min_lo, min_hi, max_lo, max_hi)
floor(a)
floor(lo, hi)
ifloor(a)
ifloor(lo, hi)
ceil(a)
ceil(lo, hi)
iceil(a)
iceil(lo, hi)
round(a)
round(lo, hi)
iround(a)
iround(lo, hi)
pow(a, b)
pow(a_lo, a_hi, b)
pow(a_lo, a_hi, b_lo, b_hi)
b
.
sqrt(a)
sqrt(lo, hi)
cbrt(a)
cbrt(lo, hi)
exp(a)
exp(lo, hi)
e
raised to the power of a
.
ln(a)
ln(lo, hi)
a
.
log2(a)
log2(lo, hi)
a
.
log10(a)
log10(lo, hi)
a
.
sin(a)
sin(lo, hi)
a
.
cos(a)
cos(lo, hi)
a
.
asin(a)
asin(lo, hi)
a
.
acos(a)
acos(lo, hi)
a
.
tan(a)
tan(lo, hi)
a
.
atan(a)
atan(lo, hi)
a
.
atan2(y, x)
atan2(y_lo, y_hi, x_lo, x_hi)
y
and x
coordinates using signs to determine the quadrant of the result.
is_int(value)
is_float(value)
is_array(value)
is_string(value)
is_hash(value)
is_shared(value)
is_const(value)
is_funcref(value)
is_weakref(value)
is_handle(value)
clone(value)
clone_deep(value)
array_create(length)
array_create(length, element_size)
array_create_shared(length, element_size)
array_get_shared_count(array)
array_get_element_size(array)
array_set_length(array, length)
array_copy(dest, dest_off, src, src_off, count)
array_fill(array, value)
array_fill(array, off, count, value)
array_extract(array, off, count)
array_insert(array, off, value)
array_insert_array(dest, idx, src)
array_insert_array(dest, idx, src, off, len)
array_append(array, other)
array_append(array, other, off, count)
array_replace_range(dest, start, end, src)
array_replace_range(dest, start, end, src, off, len)
start
and end
(exclusive) with the content of given array.
array_remove(array, off)
array_remove(array, off, count)
array_clear(array)
string_const(s)
string_const(s, off, len)
string_parse_int(s)
string_parse_int(s, default_value)
string_parse_int(s, off, len)
string_parse_int(s, off, len, default_value)
string_parse_float(s)
string_parse_float(s, default_value)
string_parse_float(s, off, len)
string_parse_float(s, off, len, default_value)
string_parse_long(s)
string_parse_long(s, off, len)
string_parse_long(s, off, len, default_lo, default_hi)
is_int
function to distinguish between valid value and an error.
string_parse_double(s)
string_parse_double(s, off, len)
string_parse_double(s, off, len, default_lo, default_hi)
is_int
function to distinguish between valid value and an error.
string_from_long(lo, hi)
string_from_long(s, lo, hi)
string_from_double(lo, hi)
string_from_double(s, lo, hi)
string_from_utf8(arr)
string_from_utf8(arr, off, len)
string_from_utf8(s, arr)
string_from_utf8(s, arr, off, len)
string_to_utf8(s)
string_to_utf8(s, off, len)
string_to_utf8(arr, s)
string_to_utf8(arr, s, off, len)
object_create(size)
array_create
function, except that it's intent is for creating of objects.
object_extend(obj, size)
array_set_length
function, except that it's intent is for extending of objects
and the array is returned.
weakref_create(obj)
weakref_create(obj, container)
weakref_create(obj, container, key)
weakref_get(ref)
funcref_call(func, params)
hash_get(hash, key, default_value)
hash_entry(hash, idx)
hash_contains(hash, key)
hash_remove(hash, key)
hash_keys(hash)
hash_values(hash)
hash_pairs(hash)
hash_clear(hash)
error(message)
"func#1 (file.fix:123)"
).
It is permitted to pass another error as a message.
log(value)
dump(value)
log
function.
to_string(value)
to_string(value, newlines)
heap_collect()
heap_size()
perf_reset()
perf_log(msg)
log
function) with information about elapsed
time since the previous perf_log
call and also since the performance debugging
timer was reset.
serialize(value)
serialize(buf, value)
unserialize(buf)
unserialize(buf, off, len)
unserialize(buf, off_ref)
These functions are available only during token processing. They always return an error if called otherwise.
script_query(name, file_name, constants, locals, functions)
@
at the beginning if private) and the value is either an integer, a float,
a string or in case the constant references some other constant the value is an array where
the elements are: the value, the referenced script name and constant name. The locals and
functions are just arrays of the names. All of the output parameters are optional and the
order of the values reflects the order in the source file (after being processed by the token
processors).
script_line(line)
script_line(fname, tokens, src, line)
script.fix(123)
".
This function is used for error reporting in token processors. It also correctly adjusts
the file name and line based on the @stack_trace_lines
constant in the tokens. The tokens for currently processed script are used when not provided.
Providing the file name is optional.
script_postprocess(func, data)
postprocess(data, fname, tokens, src)
script_compile(src)
script_compile(tokens, src)
tokens_parse(tokens, src, s, line)
tokens_parse(tokens, src, s, off, len, line)
token_parse_string(s)
token_parse_string(s, off, len)
token_escape_string(s)
The arrays have ability to compress element sizes to just 8-bit and 16-bit unsigned integers, this is to allow to work with binary data and Unicode strings efficiently. As a side-effect many arrays are also compressed as the numbers generally tend to be near zero.
The compiler directly emits bytecode, skipping any intermediate forms such as AST trees. The forward jump bytecodes are using fixed length encoding to simplify the compiler. This allow to jump up to 2048 bytecodes (should suffice for most cases), however when bigger jump is encountered the whole single script is recompiled with long jumps.
The integer arithmetic operators work on the raw 32-bit integer value, even when it's a reference or float. The result is always an integer. Similarly float operators (using the extended operator) are interpreting this 32-bit value as a float and always return a float. This also allows to mix integer operations with floats to directly work with the float bitwise representation.
The garbage collector is non-compacting, meaning the integer values of references are kept the same. This can be used to create hash keys compared as references and not by values, simply by using arithmetic operation that don't modify the integer value to get the raw reference index (for example by ORing with zero). However you have to make sure that the original reference is still referenced somewhere. You can use weak references if you can't provide this guarantee, at the expense of allocation of an extra object per each key.
The general approach is to use multiple smaller heaps (one or multiple per thread) at natural sections of the application. Some examples:
This approach makes GC pauses pretty localized and very quick making them a non-issue. It's best to clone data between threads/heaps to communicate. You can use shared memory to avoid copy of actual data with some minor adjustments needed in the code for eg. storing structured data.
Basically you can use shared array to store array of objects simply by adjusting the
syntax a bit: using shared_array[obj+OBJ_field]
instead of the usual
obj->OBJ_field
. You also need to do your own pointer arithmetic,
using obj
as an integer offset and by adding OBJ_SIZE
when
you want to go to next element. You can store pointers to other objects using
a simple offset in the shared array.
Having smaller heaps also allow to collect the garbage as needed without much downsides. For example you can call GC after processing a network request or doing a spike of work, you can avoid the need for the obnoxious try/finally blocks to reclaim native handles in case of an error.
The language supports arbitrary modification of the tokens before they're fed into the compiler. This allow to add new syntax, adjust existing one or even change it completely.
This is achieved by using the use
keyword which runs the specified script
with the tokens to modify. Usage example:
use "foreach"; // at the top of the file, before any imports foreach (var k, v in hash_table) { ... }
The implementation script has a single function process_tokens(fname, tokens, src)
that accepts the file name, packed array of tokens (every entry taking multiple slots)
and the original UTF-8 encoded source (whole file). The tokens start right after the string literal
in the use
statement. This allow to potentially pass parameters to the processing
script.
The token types are these (these are final, no changes will be made to them):
const { @TOK_IDENT, @TOK_FUNC_REF, @TOK_NUMBER, @TOK_HEX_NUMBER, @TOK_FLOAT_NUMBER, @TOK_CHAR, @TOK_STRING, @TOK_UNKNOWN, @KW_DO, @KW_IF, @KW_FOR, @KW_USE, @KW_VAR, @KW_CASE, @KW_ELSE, @KW_BREAK, @KW_CONST, @KW_WHILE, @KW_IMPORT, @KW_RETURN, @KW_SWITCH, @KW_DEFAULT, @KW_CONTINUE, @KW_FUNCTION };
The tokens structure has these members:
const { @TOK_type, @TOK_off, @TOK_len, @TOK_line, @TOK_SIZE };
This is the type, offset into the source and the length of the token. It also contains the line number in the file for error reporting (used both at compile-time and in runtime). The length of the tokens is always divisible by 4 (the size of the structure). To add new tokens the new corresponding source code fragment can be easily appended to the source and referenced by the indicies.
Supported symbols are directly represented as their ASCII value in the type. Multi character
symbols are represented as individual bytes in little endian format (eg. >=
is
0x3D3E) and directly maps to the multi character literals by simply using eg. '>='
in the source code.
Invalid tokens are passed to the token processors with the TOK_UNKNOWN type. The least amount of characters to produce the same error are used. For unknown characters the maximum consecutive amount of such characters are used in a single TOK_UNKNOWN token. This allow to process arbitrary syntaxes even when they collide with the built-in syntax.
When manipulating the tokens a special care must be taken to not desynchronize symbols stored
in the type with the source representation defined with TOK_off
and TOK_len
.
Also each token must have unique offset in the source code, some token processors use this offset
to uniquely identify the token even when it's moved in the tokens array.
Potential usages are very broad, however remember that with great power comes great responsibility, please try to follow these guidelines:
Good luck & have fun on your token processing voyage! :-)
Length | Symbols |
---|---|
1 character | ( ) { } [ ] , ; ~ : @ ? + - * / % & | ^ = ! < > # $ . \ ` |
2 characters | += ++ -= -- -> *= /= %= &= && |= || ^= <= << >= >> == != .. |
3 characters | === !== <<= >>= >>> |
4 characters | >>>= |
When you convert tokens back to a valid source code (eg. for dumping the tokens as a readable code) you may want to omit any unneeded whitespace. You can omit whitespace between any symbols that are not producing another symbol with such concatenation, these are the combinations that require an extra whitespace:
+ +
|
- ->
|
/ =
|
& ===
|
= ==
|
< <<=
|
<< ==
|
== =
|
The token processors should use only built-in functions. They may use optional native functions provided by customized compilers, but should generally work without them.
The running environment can differ, it can be either in the same heap as the other code (in case of the interpreter), or it can be in a separate heap with possible incremental compilation (various tools and compilers). When incremental compilation is used the heap used for token processing is serialized to disk so it can be resumed later. Not much is needed to support this other than to be prepared to process source files repeatedly.
Sometimes you need to provide extra metadata (eg. class descriptions) so that different token processors (or just different versions) can work together. Usually you would use local variables to track such data (which has the benefit of not storing them in the final result in the case of static compilation) but they're limited to the particular version of the token processor.
The official way to do this is to use private constants with descriptive names that don't clash with normal constants. These can then use different values, though string constants are most useful as you can put in custom micro-syntax. Or in case the data is complicated and generated anyway, you can just serialize data into a string, it can be then directly unserialized because of the nature how strings are implemented.
Remember that this metadata is part of the API and should be therefore designed for backward and forward compatiblity.
It is preferred that you provide human readable syntax for your metadata. For example:
const @class_SomeClass = "prefix=some_class,static=create#2:create_other#3"; const @method_SomeClass_create_2 = "(Integer, Float): SomeClass";
The language implementation already implement two of such defineable metadata to adjust stack traces of errors.
You can set custom function name in error stack traces. Example:
const @function_some_func_2 = "SomeFunc(int,float)"; const @function_hidden_0 = ""; // removes function from the stack trace (use with caution!) function some_func(a, b) { ... } function hidden() { // best used for some generated wrapper functions // that are unlikely to cause error on their own // and would unpleasantly obscure the stack trace }
You can use the @stack_trace_lines
constant to set different file
names for ranges of lines. You can also optionally insert virtual functions
into the stack trace (useful for macros). The value is a serialized array into
a string constant. The array contains multiple entries, each having 5 slots.
The slots are: start line, end line (inclusive), file name, line number
and optional name of inserted virtual function.
The array is processed from the beginning to the end. The virtual functions are inserted to the top (therefore they appear in reverse order in the stack trace). The file name for last matching range without virtual function is used. This allows to put more broad ranges before the more concrete ranges.
const { @LINE_start, @LINE_end, @LINE_file_name, @LINE_line_num, @LINE_func_name, @LINE_SIZE }; function process_tokens(fname, tokens, src) { var lines = []; ... lines[off+LINE_start] = 1000; lines[off+LINE_end] = 2000; lines[off+LINE_file_name] = "some_name.fix"; lines[off+LINE_line_num] = 123; lines[off+LINE_func_name] = null; ... tokens_parse(tokens, src, {"const @stack_trace_lines = ", token_escape_string(serialize(lines)), ";"}); ... }
Use errors with this format for reporting any syntax errors: "script.fix(123): syntax error
".
This is easily achieved using the built-in script_line
function like this:
return 0, error({script_line(line), ": syntax error"});
This makes sure to contain the proper file name (even when changed using the
@stack_trace_lines
private constant).
The serialization format has fixed structure and won't be changed in the future. It is therefore suited for both temporary and long-term storage, as well as for exchanging data between different systems.
The format is binary, all numbers are stored in little endian byte order. There is no header, but application specific usages may contain one. A single value is encoded, or in case of arrays and hashes more values will recursivelly follow. The arrays are encoded on their first use and assigned an internal ID starting from zero. Whenever the reference to the array is encountered again, only the ID is used. This allow to serialize complete graphs of objects. An empty value is encoded simply as an integer with the value of zero or by using an empty array (depending on the application).
Each value starts with a type and a length in a single byte. The type is in the low 4 bits and the length is in the high 4 bits. The length is present only for arrays, strings and hashes (it is an error to be present on other types). This allows to put the length for up to 12 elements directly into the first byte. The length is read as an additional unsigned byte (when the length is 13), unsigned short (when the length is 14) or 32-bit integer (when the length is 15). The shortest representation must be used and it's an error to accept bigger representation of smaller length.
The floats are stored with the denormals flushed to zero. It's an error to accept values that have them present. Flushing is done on the bitwise integer representation of the float, setting the bits 0-22 (inclusive, totalling of 23 bits) to zero in the case that all of the bits 23-30 (the rest, excluding the sign bit) are set to zero. Similarly, when deserializing, if the bitwise representation (with the sign bit masked away) is between 1 and 0x7FFFFF (inclusive) an error must be emitted as that would be a float number in a denormalized form.
The floats must have NaN (not-a-number) values normalized to a quiet NaN without any payload. If the exponent bits are all set (meaning it's an infinity or NaN) and the low 23 bits are non-zero (it's a NaN), change the low 23 bits to have only the most significant bit set. It's an error to accept non-normalized NaNs.
The smallest array/string/ref/int/float form must be always used for storage (in case of an empty array/string the ARRAY_BYTE/STRING_BYTE type must be used). It's an error to accept bigger array/string/ref/int/float forms that don't have values requiring it. The hash tables must not contain duplicate keys, and it's an error to accept duplicate keys.
These restrictions are to ensure having a canonical format suitable for hash keys and exchange between different systems and implementations as well as to minimize covert channels for data leakage.
Here is a table of all types:
Type | ID | Description |
---|---|---|
ZERO | 0 | zero integer value |
BYTE | 1 | 8-bit unsigned integer |
SHORT | 2 | 16-bit unsigned integer |
INT | 3 | 32-bit signed integer |
FLOAT | 4 | 32-bit float (stored with the denormals flushed to zero and having normalized NaNs) |
FLOAT_ZERO | 5 | positive zero float value |
REF | 6 | reference to previously encountered array/string/hash (as 32-bit index) |
REF_SHORT | 7 | reference to previously encountered array/string/hash (as 16-bit index) |
ARRAY | 8 | array of values |
ARRAY_BYTE | 9 | array of unsigned 8-bit integers |
ARRAY_SHORT | 10 | array of unsigned 16-bit integers |
ARRAY_INT | 11 | array of signed 32-bit integers |
STRING_BYTE | 12 | string with each character stored as an unsigned 8-bit integer |
STRING_SHORT | 13 | string with each character stored as an unsigned 16-bit integer |
STRING_INT | 14 | string with each character stored as a signed 32-bit integer |
HASH | 15 | hash table with the insertion order preserved, contains given number of pairs of key and value |
Sometimes you need to identify object types at runtime. Since there are no user types in the language you (directly or indirectly using token processor) have to use a little trick. It uses the property of function references where they are uniquely identified even across different heaps. This allow to use them as a marker in data structures, even when the structures are deep cloned into another heap.
However for ordinary cases it's better to use type member in the base class. This runtime detection is more suitable for cases where you pass unknown kinds of objects and still want to identify certain types without a doubt.
While you would usually use native handles to determine liveness of objects between different threads (heaps) you can also use the pure language constructs to achieve that.
So you want to make some object available to a different heap as a reference instead of copying. There is a property of shared arrays that when cloned to other heap they retain the same reference within the heap if they were already cloned there before. Additionally you can check how many heaps are referencing the shared array.
From these two features you can easily construct handles that you can pass around and determine global liveness. To create such handle, just create a zero-sized shared array and assign the internal data to it using a hash table. Then pass around the shared array reference. To obtain internal data just get it from the hash table. To determine when the reference is not used anymore in other heaps just traverse the hash table and check if the number of active heaps is just one. Then you know you can release the internal data and invalidate the handle. To make it less processor intensive just iterate the hash table in small batches.
Weak references are useful in various scenarios. In the basic form it simply allow you to make the target object garbage collected when no normal references are pointing to it. When this occurs the reference to target is simply cleared from the weak reference and you can't obtain the original object anymore.
This is useful in cases where you register the object to receive some events (change listeners, timers, etc.) but the original object has become unreferenced in the meantime. Ordinarilly you would need to manually deregister it from receiving the events, which would be painful to implement when usually it's not needed for anything else.
Instead in your listener you use a weak reference to the object and once you detect the object is not around anymore you just deregister receiving of the event. To avoid accumulation of many such listener proxies the event source object can directly implement weak listeners.
In the more advanced form, you can set an automatic action when the target object disappears. You can provide a container (hash table or array) to remove an entry (for hash) or add an entry (for array). With this you can create mapping to objects (to externally add additional data to existing objects), implement self-clearing caches or even general detection of object destruction to run arbitrary code (however you need to check periodically for newly collected objects).
In some rare cases you may need to jump out of nested functions and continue other execution path. You can use the exception mechanism for that but instead of creating an error you can pass a function reference as a marker. You can then uniquely detect it and choose a different execution path. This operation is fast as no stack trace is required to be gathered.
The only caveat is that you must make sure that no code between the throwing and catching would wrap the function reference to an error.
Sometimes you end up in a situation where two scripts are depending on each other. This is currently not supported in FixScript because all the possible solutions are quickly leading to overcomplicated and fragile mess, esp. when used with token processors. Maybe a solution will be found in the future, but seems unlikely. There are also some overcomplicated hacks possible using token processors but these are also limited (for example combining multiple complex token processors would be next to impossible).
The solution is to simply merge such scripts into one, after all if they depend on each other so much they should be one unit. Still there are cases where putting it all together would create an unmaintable mess. This is possible to solve using a token processor that simply includes the content of the other script, making it a single unit yet separated in different files.
Other times there is some dependency but it is minimal. For such cases it's better to workaround the problem by dynamic calling of functions or creating a third script file with common stuff.
An important property of objects is to encapsulate the internal state from the outside. Often you need to store strings or other mutable objects. But simply storing the reference would expose the internal state to outside manipulation. Therefore you need to guard it by making a copy and store it internally. And then when returning the data back to the user you need to create another copy so the internal state is fully guarded.
While this works it is clearly quite inefficient. The issue is mostly with strings as usually with other kinds of objects storing it by reference is the intended usage.
It is therefore recommended to use the string_const function to store a constant variant of the provided string. The function returns the same string when it's already a constant and it also maintains a set of existing constant strings so there is never more than one instance of the same constant string in the heap.
This way there is just one or no copy (when it was already constant) when storing and there is no need to do anything when returning it to the user.
However for objects intended to work with serialization there is another complication. Once deserialized the strings are no longer constant. You would either need to make all strings constant by recursivelly traversing the whole data structure, or you can make it more efficient to just convert it in the getter before returning it to the user.
The getter needs to call the string_const function and store the result into the object before returning it to the user. At first it will convert it, but only for these strings that are actually obtained, and if obtained more times it will just return the same constant string.