An Overview of Lua’s Design and Implementation

The Implementation of Lua

An Overview of Lua’s Design and Implementation

As mentioned in the introduction, the goals in our implementation of Lua are:
Simplicity : We seek the simplest language we can a ord and the simplest C code that implements this language. This implies a simple syntax with a small number of language constructs, not far from the tradition.
E ciency : We seek fast compilation and fast execution of Lua programs. This implies a fast, smart, one-pass compiler and a fast virtual machine.
Portability : We want Lua to run on as many platforms as possible. We want to be able to compile the Lua core unmodi ed everywhere and to run Lua pro-grams unmodi ed on every platform that has a suitable Lua interpreter. This implies a clean ANSI C implementation with special attention to portability issues, such as avoiding dark corners of C and its libraries, and ensuring that it also compiles cleanly as C++. We seek warning-free compilations.
Embeddability : Lua is an extension language; it is designed to provide scripting facilities to larger programs. This and the other goals imply the existence of a C API that is simple and powerful, but which relies mostly on built-in C types.
Low embedding cost : We want it to be easy to add Lua to an application without bloating it. This implies tight C code and a small Lua core, with extensions being added as user libraries.
These goals are somewhat con icting. For instance, Lua is frequently used as a data-description language, for storing and loading con guration les and some-times quite large databases (Lua programs with a few megabytes are not uncom-mon). This implies that we need a fast Lua compiler. On the other hand, we want Lua programs to execute fast. This implies a smart compiler, one that generates good code for the virtual machine. So, the implementation of the Lua compiler has to balance between these two requirements. However, the compiler cannot be too large; otherwise it would bloat the whole package. Currently the compiler accounts for approximately 30% of the size of the Lua core. For memory-limited applications, such as embedded systems, it is possible to embed Lua without the compiler. Lua programs are then precompiled o -line and loaded at run time by a tiny module (which is also fast because it loads binary les).
Lua uses a hand-written scanner and a hand-written recursive descent parser. Until version 3.0, Lua used a parser produced by yacc [13], which proved a valu-able tool when the language’s syntax was less stable. However, the hand-written parser is smaller, more e cient, more portable, and fully reentrant. It also pro-vides better error messages.
The Lua compiler uses no intermediate representation. It emits instructions for the virtual machine \on the y » as it parses a program. Nevertheless, it does perform some optimizations. For instance, it delays the generation of code for base expressions like variables and constants. When it parses such expressions, it generates no code; instead, it uses a simple structure to represent them. There-fore, it is very easy to check whether an operand for a given instruction is a constant or a local variable and use those values directly in the instruction, thus avoiding unnecessary and costly moves (see Section 3).
To be portable across many di erent C compilers and platforms, Lua can-not use several tricks commonly used by interpreters, such as direct threaded code [8, 16]. Instead, it uses a standard while{switch dispatch loop. Also, at places the C code seems unduly complicated, but the complication is there to en-sure portability. The portability of Lua’s implementation has increased steadily throughout the years, as Lua got compiled under many di erent C compilers in many di erent platforms (including several 64-bit platforms and some 16-bit platforms).
We consider that we have achieved our design and implementation goals. Lua is a very portable language: it runs on any machine with an ANSI C compiler, from embedded systems to mainframes. Lua is really lightweight: for instance, on Linux its stand-alone interpreter, complete with all standard libraries, takes less than 150 Kbytes; the core is less than 100 Kbytes. Lua is e cient: independent benchmarks [2, 4] show Lua as one of the fastest language implementations in the realm of scripting languages (i.e., interpreted and dynamically-typed languages). We also consider Lua a simple language, being syntactically similar to Pascal and semantically similar to Scheme, but this is subjective.

The Representation of Values

Lua is a dynamically-typed language: types are attached to values rather than to variables. Lua has eight basic types: nil, boolean, number, string, table, function, userdata, and thread. Nil is a marker type having only one value, also called nil. Boolean values are the usual true and false. Numbers are double-precision oating-point numbers, corresponding to the type double in C, but it is easy to compile Lua using float or long instead. (Several games consoles and smaller machines lack hardware support for double.) Strings are arrays of bytes with an explicit size, and so can contain arbitrary binary data, including embedded zeros. Tables are associative arrays, which can be indexed by any value (except nil) and can hold any value. Functions are either Lua functions or C functions written according to a protocol for interfacing with the Lua virtual machine. Userdata are essentially pointers to user memory blocks, and come in two avors: heavy, whose blocks are allocated by Lua and are subject to garbage collection, and light, whose blocks are allocated and freed by the user. Finally, threads represent coroutines. Values of all types are rst-class values: we can store them in global variables, local variables and table elds, pass them as arguments to functions, return them from functions, etc.
typedef struct {
int t;
Value v;
} TObject;
typedef union {
GCObject *gc;
void *p;
lua_Number n;
int b;
} Value;
Figure 1: Lua values are represented as tagged unions
Lua represents values as tagged unions, that is, as pairs (t; v), where t is an integer tag identifying the type of the value v, which is a union of C types implementing Lua types. Nil has a single value. Booleans and numbers are im-plemented as ‘unboxed’ values: v represents values of those types directly in the union. This implies that the union must have enough space for a double. Strings, tables, functions, threads, and userdata values are implemented by reference: v contains pointers to structures that implement those values. Those structures share a common head, which keeps information needed for garbage collection. The rest of the structure is speci c to each type.
Figure 1 shows a glimpse of the actual implementation of Lua values. TObject is the main structure in this implementation: it represents the tagged unions (t; v) described above. Value is the union that implements the values. Values of type nil are not explicitly represented in the union because the tag is enough to identify them. The eld n is used for numbers (by default, lua_Number is double). The eld b is used for booleans. The eld p is used for light userdata. The eld gc is used for the other values (strings, tables, functions, heavy userdata, and threads), which are those subject to garbage collection.
One consequence of using tagged unions to represent Lua values is that copying values is a little expensive: on a 32-bit machine with 64-bit doubles, the size of a TObject is 12 bytes (or 16 bytes, if doubles are aligned on 8-byte boundaries) and so copying a value requires copying 3 (or 4) machine words. However, it is di cult to implement a better representation for values in ANSI C. Several dynamically-typed languages (e.g., the original implementa-tion of Smalltalk80 [9]) use spare bits in each pointer to store the value’s type tag. This trick works in most machines because, due to alignment, the last two or three bits of a pointer are always zero, and therefore can be used for other pur-poses. However, this technique is neither portable nor implementable in ANSI C. The C standard does not even ensures that a pointer ts in any integral type and so there is no standard way to perform bit manipulation over pointers.
Another option to reduce the size of a value would be to keep the explicit tag, but to avoid putting a double in the union. For instance, all numbers could be represented as heap-allocated objects, just like strings. (Python uses this technique, except that it preallocates some small integer values.) However, that representation would make the language quite slow. Alternatively, integer values could be represented as unboxed values, directly inside the union, while oating-point values would go to the heap. That solution would greatly increase the complexity of the implementation of all arithmetic operations.
Like earlier interpreted languages, such as Snobol [11] and Icon [10], Lua in-ternalizes strings using a hash table: it keeps a single copy of each string with no duplications. Moreover, strings are immutable: once internalized, a string cannot be changed. Hash values for strings are computed by a simple expression that mixes bitwise and arithmetic operations, thus shu ing all bits involved. Hash values are saved when the string is internalized to allow fast string comparison and table indexing. The hash function does not look at all bytes of the string if the string is too long. This allows fast hashing of long strings. Avoiding loss of performance when handling long strings is important because they are com-mon in Lua. For instance, it is usual to process les in Lua by reading them completely into memory into a single long string.

Tables

Tables are the main | in fact, the only | data-structuring mechanism in Lua. Tables play a key role not only in the language but also in its implementation. E ort spent on a good implementation of tables is rewarded in the language because tables are used for several internal tasks, with no qualms about perfor-mance. This helps to keep the implementation small. Conversely, the absence of any other data-structuring mechanism places a pressure on an e cient imple-mentation of tables.
Tables in Lua are associative arrays, that is, they can be indexed by any value (except nil) and can hold values of any type. Moreover, they are dynamic in the sense that they may grow when data is added to them (by assigning a value to a hitherto non-existent eld) and shrink when data is removed from them (by assigning nil to a eld).
Unlike many other scripting languages, Lua does not have an array type. Arrays are represented as tables with integer keys. The use of tables for ar-rays bring bene ts to the language. The main one is simplicity: Lua does not need two di erent sets of operators to manipulate tables and arrays. Moreover, programmers do not have to choose between the two representations. The im-plementation of sparse arrays is trivial in Lua. In Perl, for instance, you can run out of memory if you try to run the program $a[1000000000]=1;, as it triggers the creation of an array with one billion elements. The equivalent Lua program, a={[1000000000]=1}, creates a table with a single entry.
Until Lua 4.0, tables were implemented strictly as hash tables: all pairs were explicitly stored. Lua 5.0 brought a new algorithm to optimize the use of tables as arrays: it optimizes pairs with integer keys by not storing the keys and storing the values in an actual array. More precisely, in Lua 5.0, tables are implemented as hybrid data structures: they contain a hash part and an array part. Figure 2 shows a possible con guration for a table with the pairs « x » ! 9:3, 1 ! 100, 2 ! 200, 3 ! 300. Note the array part on the right: it does not store the integer keys. This division is made only at a low implementation level; access to table elds is transparent, even to the virtual machine. Tables automatically and dy-namically adapt their two parts according to their contents: the array part tries to store the values corresponding to integer keys from 1 to some limit n. Values corresponding to non-integer keys or to integer keys outside the array range are stored in the hash part.
When a table needs to grow, Lua recomputes the sizes for its hash part and its array part. Either part may be empty. The computed size of the array part is the largest n such that at least half the slots between 1 and n are in use (to avoid wasting space with sparse arrays) and there is at least one used slot between n=2 + 1 and n (to avoid a size n when n=2 would do). After computing the new sizes, Lua creates the new parts and re-inserts the elements from the old parts into the new ones. As an example, suppose that a is an empty table; both its array part and hash part have size zero. If we execute a[1]=v, the table needs to grow to accommodate the new key. Lua will choose n = 1 for the size of the new array part (with a single entry 1 ! v). The hash part will remain empty.
This hybrid scheme has two advantages. First, access to values with integer keys is faster because no hashing is needed. Second, and more important, the array part takes roughly half the memory it would take if it were stored in the hash part, because the keys are implicit in the array part but explicit in the hash part. As a consequence, if a table is being used as an array, it performs as an array, as long as its integer keys are dense. Moreover, no memory or time penalty is paid for the hash part, because it does not even exist. The converse holds: if the table is being used as an associative array, and not as an array, then the array part is likely to be empty. These memory savings are important because it is common for a Lua program to create many small tables, for instance when tables are used to implement objects.
The hash part uses a mix of chained scatter table with Brent’s variation [3]. A main invariant of these tables is that if an element is not in its main position (i.e., the original position given by its hash value), then the colliding element is in its own main position. In other words, there are collisions only when two elements have the same main position (i.e., the same hash values for that table size). There are no secondary collisions. Because of that, the load factor of these tables can be 100% without performance penalties.

Functions and Closures

When Lua compiles a function it generates a prototype containing the vir-tual machine instructions for the function, its constant values (numbers, literal strings, etc.), and some debug information. At run time, whenever Lua executes a function…end expression, it creates a new closure. Each closure has a ref-erence to its corresponding prototype, a reference to its environment (a table wherein it looks for global variables), and an array of references to upvalues, which are used to access outer local variables.
The combination of lexical scoping with rst-class functions creates a well-known di culty for accessing outer local variables. Consider the example in Figure 3. When add2 is called, its body accesses the outer local variable x (func-tion parameters in Lua are local variables). However, by the time add2 is called, the function add that created add2 has already returned. If x was created in the stack, its stack slot would no longer exist.

Cours gratuitTélécharger le cours complet

Télécharger aussi :

Laisser un commentaire

Votre adresse e-mail ne sera pas publiée. Les champs obligatoires sont indiqués avec *