The Ruby C API
Before You Start
For the greatest chance of success with this guide, I recommend being fairly comfortable with C and very comfortable with Ruby.
Using Ruby’s C API does not require any advanced C concepts, however the API is huge and largely undocumented. After you start using it, you will likely find yourself delving through the Ruby source code at some point to figure out the behavior of some obscure function or macro. The Ruby source uses some fairly sophisticated C, so you should at least feel comfortable reading it.
You can think of the C API is being a big, clunky alternative to writing normal Ruby code. However the simple, elegant patterns of Ruby can be pretty unintuitive once translated into the language of the API. Having a strong intuition for Ruby’s internal logic and the ideas behind its design will go a long way toward steering you toward the correct API functions.
The Two Paths
The official Ruby interpreter is written in C. That means that everything you can do in Ruby, you can also do using function calls to Ruby’s C API. Why in the world would you do this? There are two good reasons:
- You’re writing some fancy application in C or C++ and you want some parts of your code to leverage the dynamic flexibility of Ruby. You can run the Ruby interpreter inside of your application and use the API to retrieve the results of Ruby code.
- You’re writing some fancy application in Ruby and you want some parts of your
code to leverage the speed and power of C (or an existing C library). You can
expose C code to Ruby using the API and compile a special library that Ruby
can
require
.
You’ll need to structure your C code differently depending on your goal. If
you want to embed the Ruby interpreter in C, read Running Ruby in C. If
you want to require
a compiled C library, read Running C in Ruby.
After you finish that, come back here to learn about the API.
Eval
The quick ‘n’ dirty way to run some Ruby code from C is to eval
it
This is a good fallback if you can’t find an API function for something that you
want to do1. rb_eval_string_protect()
returns the result of the Ruby
code and sets state
to some nonzero value if any exception is raised. VALUE
is the C data type for all Ruby objects, as explained in the next section.
If state
is nonzero, result
will be a VALUE
representing nil
and you
should handle the exception. Alternatively, you can use rb_eval_string()
which
doesn’t take a state
argument and instead raises any exceptions normally. See
Exceptions for how to handle both of these cases.
Unlike eval
in Ruby, these functions evaluate the string in an isolated
binding—like when you require
something. So local variables in the
string will not be accessible from elsewhere and vice versa.
However, like using eval
in Ruby, using these functions is not a good
practice. It’s inefficient since the parser is invoked and it somewhat defeats
the point of writing in C. If you just want to call some Ruby method, we’ll go
over a better way to do that later on.
VALUE
Before we go any further, we need to understand VALUE
s. Due to the danger of
monkeying around inside the VM, the API never lets you directly access Ruby’s
objects2. Instead, your C code will store and pass around pointers to
Ruby objects (like how variables in Ruby contain pointers to objects). These
pointers can be passed to various API functions and macros that will safely
access and manipulate the Ruby objects. VALUE
is the API-defined C type for
these pointers.
Probably the most frequent question you’ll have is: “is this VALUE
the right
type?”. There are a couple macros for performing this test, and both take a
T_
constant corresponding to the Ruby class you’re
testing for e.g. T_STRING
, T_ARRAY
, etc.
These tests work for subclasses too: if you’re testing for a subclass of Array
use T_ARRAY
, if you’re testing for a subclass of Object
use
T_OBJECT
3. That being said, these tests do not work like is_a?
;
even though everything in Ruby is_a? Object
, testing against T_OBJECT
will
only return true for objects for which there is no better fitting constant.
For certain classes, there are specialized macros that are a little more efficient than the previous:
If you want to handle a VALUE
that could be one of a variety of types, the
previous macros can be a little clumsy. In that case you can use the TYPE()
macro to get the T_
constant and handle your logic in a switch
:
Constants
Most of the standard Ruby constants have global VALUE
s defined for them in the
API so you don’t need an API call to access them. Modules are prefixed with
rb_m
e.g. rb_mKernel
; classes are prefixed with rb_c
e.g. rb_cObject
;
subclasses of Exception
are prefixed with rb_e
e.g. rb_eRuntimeError
; and
the standard IO streams are prefixed with rb_
e.g. rb_stderr
. nil
,
false
, and true
are prefixed with Q
e.g. Qnil
.4 As a convenience,
Qfalse
is also false in C (0
).
Translation
A few Ruby classes are analogous to C types. These classes will be your primary means of transferring data between C and Ruby.
Fixnum
Ruby’s Fixnum
corresponds to C’s long
. The FIX2LONG()
macro gives you the
long
for a Fixnum
. For smaller C types there’s FIX2UINT()
, FIX2INT()
,
and FIX2SHORT()
, but these will raise a RangeError
if the number wouldn’t
fit.
In the other direction, LONG2FIX()
works for long
and every smaller
integer C type5.
Bignum
Ruby’s Bignum
is for anything bigger than a Fixnum
, so it works if you need
to work with long long
, for example. rb_big2ll()
and rb_big2ull()
will get
you long long
and unsigned long long
from a Bignum
(or raise a
RangeError
if appropriate).
See Numeric for the reverse direction.
Float
Ruby’s Float
corresponds to C’s double
. The RFLOAT_VALUE()
macro gives you the
double
for a Float
.
See Numeric for the reverse direction.
Numeric
There are a host of “NUM” macros that try to be more duck-typish about things.
These will convert their C types to whatever Ruby Numeric
subclass seems
appropriate:
INT2NUM()
forint
UINT2NUM()
forunsigned int
LONG2NUM()
forlong
ULONG2NUM()
forunsigned long
LL2NUM()
forlong long
ULL2NUM()
forunsigned long long
DBL2NUM()
fordouble
And there are macros for the opposite direction, which will try to convert
whatever Numeric
to the desired C type. These will raise a RangeError
if
the value wouldn’t fit or TypeError
if there is no implicit numeric conversion
(so you can safely pass non-Numeric
objects).
NUM2CHR()
forchar
(works forunsigned char
too)NUM2SHORT()
forshort
NUM2USHORT()
forunsigned short
NUM2INT()
forint
NUM2UINT()
forunsigned int
NUM2LONG()
forlong
NUM2ULONG()
forunsigned long
NUM2LL()
forlong long
NUM2ULL()
forunsigned long long
NUM2DBL()
fordouble
A major gotcha with these is that none of the macros for converting to
unsigned types raise an exception if you pass a negative value (surprisingly
this isn’t a bug). NUM2CHR()
also has a couple quirks: it will only
raise a RangeError
if the value is too big for an int and when passed a
string it returns the numeric value of the first character rather than raising a
TypeError
.
If you know that the conversion is safe, you should prefer the macros from the previous sections as they skip the range checks.
String
Ruby’s String
kinda corresponds to C’s char*
. The simplest macro is
StringValueCStr()
which returns a null-terminated char*
for a String
. The
problem here is that a Ruby String
might contain nulls - in which case
StringValueCStr()
will raise an ArgumentError
! Instead you can use the
macros StringValuePtr()
and RSTRING_LEN()
to get a (possibly unterminated)
char*
and the string’s length as a long
.
Conversely, if you have a null-terminated char*
, you can use
rb_str_new_cstr()
to create a Ruby String
. And if you want your String
to
contain nulls, use rb_str_new()
which takes a char*
and the string’s length
(as a long
). The encodings of these strings will be ASCII-8BIT
, which is
often undesirable in Ruby. You can pass the string VALUE
to
rb_str_export_locale()
to get a new VALUE
with your locale’s
encoding6.
If you want to build more complex strings, you can do so using the printf
-like
function rb_sprintf()
. This accepts all of the usual conversion specifiers,
but also accepts an API-defined specifier PRIsVALUE
which takes a
corresponding VALUE
argument. This conversion specifier substitutes a string
by sending the object to_s
. You can substitute the result of inspect
instead
by adding the +
flag.
This custom specifier should work for any printf
-like function in the API.
PRIsVALUE
works by hijacking the i
conversion specifier, so when printing
an int
you should use d
to ensure that Ruby doesn’t think it’s actually a
VALUE
.
Symbol
The API defines a C type ID
which corresponds to Ruby’s Symbol
. Just like
how Ruby passes around Symbol
s as method or variable names, many API calls
that need a method or variable name use an ID
. To convert between a Symbol
and an ID
use the SYM2ID()
and ID2SYM()
macros. Instead of a Symbol
you
may want to convert to/from a char*
C string. To get an ID
from a char*
use rb_intern()
and for the reverse use rb_id2name()
.
Since many API functions require an ID
and in many cases you will not have the
appropriate ID
at hand, the API also defines a slew of functions that instead
take a char*
and which do the rb_intern()
call for you. Since these
functions are often more readable and the overhead of the rb_intern()
call is
negligible, I have opted to use the char*
versions of the API functions
wherever possible in this guide. If you find yourself frequently using a certain
C string in API calls, you may see some performance benefit by storing the ID
and using the ID
versions of the functions (though you’ll have to look these
up yourself in the Ruby headers).
Send
This section contains API functions for directly calling Ruby methods. You
should prefer these functions to rb_eval_string()
and the like whenever
possible. They are faster since they skip the parser and allow for some
compile-time checks.
The easiest way to send an object a method looks like this:
This is roughly equivalent to the Ruby code
The first argument is the receiver. The next is the ID
for the
method name. The third argument is the number of method arguments, which is
needed since rb_funcall()
is a varargs function. Then come the actual method
arguments.
Alternatively, you can use rb_funcallv()
where the fourth argument is a
VALUE*
pointing to a C array of arguments. This also has the variant
rb_funcallv_public()
which is like public_send
in Ruby.
Passing Blocks
If you want to pass a Proc
as the block to a method, that’s easy. The function
is just like rb_funcallv()
but with the proc on the end.
If you don’t have a proc for the block, you’ll need to define a certain kind of
C function to represent the block. Then there’s a different variant of
rb_funcallv()
but with a couple extra arguments for the block:
The last argument to rb_block_call()
is helpful for passing in values outside
the block function’s scope, but in this example we don’t need it (thus nil
). I
also recommend against using the first argument to your block function unless
you’re sure that only one value was yielded. You can always get all the
arguments from argv
, so why not play it safe?7
Builtins
Many of Ruby’s built-in classes have API functions defined for their most useful
methods. Using them can save you from the verbosity of always using
rb_funcall()
and can provide more compile-time checks. There are far too many
functions to list here, so I recommend checking them out in the header
ruby/intern.h
.
Functions are generally named like rb_(class)_(method)
and take at least one
VALUE
argument (the receiver). E.g. rb_ary_pop()
for Array#pop
,
rb_obj_dup()
for Object#dup
, etc.
Require
The API can also load some Ruby code from a script. There’s an equivalent to
require
:
As with require
, these could raise exceptions. Read the next
section for how to handle them.
There are also functions for load
if you want to load a script multiple times:
Just like load
in Ruby, these functions can wrap the loaded code in
an anonymous module to protect the global namespace. Just pass a nonzero value
for the second argument.
Exceptions
Raise
To raise an exception, use:
The first and second arguments are the exception class and message—like
raise
in Ruby. The big difference is that the message is a format string just
like in rb_sprintf()
, letting you more easily build a useful
message.
You can also construct exception objects directly using rb_exc_new_cstr
,
rb_exc_new
, and rb_exc_new_str
. All of these accept an exception class as
their first argument and then they work just like their string
counterparts, constructing an exception using a null-terminated string,
non-null-terminated string, and a String
object, resp. Then you can raise your
exception object with rb_exc_raise
.
Rescue
There are several ways to rescue exceptions using the API. All of them require
the code you’re protecting to be in a function that takes and returns a single
VALUE
.
Unless you wanted to rescue a function of exactly this type, you will probably need to make a wrapper function in this format that runs the desired code. The way to access a rescued exception is also independent of the way it is rescued:
rb_errinfo()
essentially gives you the VALUE
of Ruby’s $!
(which will be
Qnil
if no exception occurred). Unlike in Ruby, you must manually clear the
exception after reading it8. Otherwise later API calls might read the old
value and think another exception has occurred.
Next we will go over several methods of rescuing; you can use whichever you like, but I think that generally the right choice is determined by your use-case of the API.
rb_rescue2
If you’re compiling a library to be loaded by Ruby, you have it easy. Any
exceptions raised in the API can be rescued as usual in your Ruby code. If you
want to rescue an exception in the API, you can use rb_rescue2()
which is
similar to Ruby’s rescue
.
The first two arguments are the function to protect and its argument, the next
two are the function to call if an exception is raised and its argument.
rb_rescue2()
is a varargs function, so after that comes a list of the
exception classes you want to rescue. The last argument should always be 0
to
indicate the end of the class list. Like rescue
in Ruby, any exceptions not in
this list will not be rescued. If you just want to rescue StandardError
(like
a blank rescue
in Ruby), you can use rb_rescue()
which takes just the first
four arguments of rb_rescue2()
.
The API does not provide an easy way to run different rescue code for different exception classes as Ruby does. You’ll need to rescue all the classes you want at once and use some kind of switch to handle them separately.
The API also does not directly provide an equivalent to Ruby’s else
i.e. code
to run when no exception was raised. One way to do this is using the return
value of rb_rescue2()
. If no exception is raised, it returns the return value
of the first (dangerous) function, otherwise the return value of the second
(rescue) function. By having these return, say, Qtrue
and Qfalse
you can
detect which case you are in.
rb_protect
If you’re embedding the Ruby interpreter in C, you need to be extremely
careful when calling API functions that could raise exceptions: an uncaught
exception will segfault the VM and kill your program. You could call
rb_rescue2()
with rb_eException
, but there’s another approach for rescuing
all exceptions:
Like rb_rescue2()
, the first two arguments are for calling the function to
protect. However, like rb_eval_string_protect()
, if an exception is raised
it returns Qnil
and sets state
to some nonzero value. If you want to
re-raise the exception, pass state
to rb_jump_tag()
(this also works for the
state from the other *_protect()
functions).
Ensure
rb_ensure()
is similar to rb_rescue()
except that it doesn’t do anything
about exceptions and the second function is always called after the first.
That may sound simple enough, but that means that if you want the usual begin;
rescue; ensure; end
structure as in Ruby, you’ll need another layer of
wrapping:
Like ensure
in Ruby, the return value of ensure_func()
is never used. If no
exception occurs, rb_rescue()
will return the value of begin_func()
which
returns the value of dangerous_func()
. If an exception does occur,
rb_rescue()
returns the value of rescue_func()
.
Definitions, Declarations
So far we’ve been creating and modifying objects directly in the VM’s memory,
but none of our API calls have had a visible effect within the Ruby code: a
String
made with rb_str_new_cstr()
can only be accessed from C by default.
There are a few ways to make things visible to Ruby but they all work the same
general way: by defining some name that Ruby can access e.g. a variable name, a
method name, etc. A general warning though: unlike Ruby, the API lets you give
things invalid names. Ruby will raise a SyntaxError
or NameError
if you
try to name a class foo
(not constant) or an instance variable bar
(no @
),
but the API will happily create them. The API handles this by not exposing
invalid names to Ruby. Since that’s probably not what you want, double check
the names you choose!
Most of the API functions in this section correspond closely to metaprogramming
methods in Ruby. When you’re trying to do something using the API, it can be
helpful to think about how you would do it in Ruby using only metaprogramming
method calls. For example, rather than class Foo; def bar; end; end
, think
Foo = Class.new; Foo.define_method(:bar) {}
.
Global Variables
The simplest way to deal with globals is:
If you’re frequently accessing Ruby’s globals, you can set up a VALUE
which
will be automatically synchronized with one.
The VALUE
should be initialized before you create the global in Ruby and it
should be global in C as well—you don’t want it to go out of scope while
Ruby is using it! For rb_define_hooked_variable()
, you can pass NULL
for the
getter/setter if you want to synchronize normally for that operation. Or you can
throw out global
entirely with rb_define_virtual_variable()
though of course
the getter and setter must be defined in that case.
If you ever create a global VALUE
in C which is not exposed to Ruby, you
must tell the garbage collector about it to prevent it from being prematurely
cleaned up:
Class and Instance Variables
Getting/setting instance variables is similar to the simple way of accessing globals, but of course you need an object to get the variable from.
There isn’t an automatic way to synchronize instance variables like you can with globals.
To iterate over all instance variables, use rb_ivar_foreach
.
For class variables, the methods are rb_cv_get()
and rb_cv_set()
and of
course the first argument should be a class object.
Constants
Constants are defined similarly, but with the module to define them under:
You undefine a constant by setting it to Qundef
. Getting a constant’s VALUE
is a little nuanced. The API function you call depends on what you want to
happen if the constant is not defined in the module you specify:
All of these API calls will get private constants too.
Modules and Classes
Defining modules is super easy.
Classes work the same way but they also need a superclass.
Methods
Here’s where it gets interesting. There are many kinds of API calls for defining
methods, but before you use any of them you’ll need a C function that the method
calls. The function must return a VALUE
and have one VALUE
argument for the
receiver of the method. There are three ways you can define its other arguments:
So really the API only lets you define two types of methods: ones that take a fixed number of arguments, and ones that slurp up all of their arguments. What about all of Ruby’s fancy argument features? Where are optional arguments, options hashes, blocks, and all the mixtures of those?
Parsing Arguments
Well, if you accept a variable number of arguments you could code all of that
logic yourself in the method, and make it behave like it has a fancier method
definition in Ruby. Thankfully, the API has a shortcut for doing exactly that.
To use it, you should use the C array function definition, then you can pass
argc
and argv
along to:
Here fmt
is a format string describing how the method arguments would look in
Ruby. The string can have at most 6 characters, where each character describes a
different section of the arguments. The six sections and their corresponding
characters are (in order):
- The number of leading mandatory arguments: a digit
- The number of optional arguments: a digit
- A splatted argument:
*
- The number of trailing mandatory arguments: a digit
- Keyword arguments:
:
- A block argument:
&
Each section is optional, so you can leave out the characters for things you
don’t need. Be aware that the parsing of the format string is greedy: 1*
describes a method with one mandatory argument and a splat. If you want one
optional argument and a splat you must specify 01*
. Following the format
string, you must pass a VALUE*
for each Ruby argument. The number of
pointers passed should equal the “total” of the six sections, though you can
pass NULL
for an argument you don’t care about. For example the format string
21*&
should have 5 VALUE*
s passed (2 mandatory, 1 optional, 1 splatted, 1
block).
rb_scan_args()
unpacks argv
using the VALUE*
s you pass it and will
raise a fitting exception if the wrong number of arguments were passed.
You can also use the return value of rb_scan_args()
to determine how the
function was called. It returns the number of arguments that were passed in
Ruby.
Handling Blocks
There are two ways to check if your C method has been called with a block:
There are two ways to capture the block as a proc. If you’re using
rb_scan_args()
for your method arguments, just include &
in your format
string to get it. If you aren’t using rb_scan_args()
, there’s an API call
equivalent to Proc.new
which converts the method’s block to a proc:
VALUE block;
block = rb_block_proc();
If you don’t want to capture the block, there are a few ways to yield to it:
There’s also rb_yield_values2()
which is like rb_yield_values()
but instead
of varargs the second argument is a VALUE*
9.
Super
You might want to call super
in your method.
Unlike in Ruby rb_call_super()
will not implicitly pass along the method
arguments to the super if you give it no arguments. You must explicitly pass the
correct argc
and argv
(it does automatically pass self
). For that reason
I recommend using the C array style of method definition if you want to use
rb_call_super()
.
Definition
Setting up the C function is the hard part, now it’s easy to define the method
in Ruby. Every API call to create a method takes at least the method name
(char*
), a pointer to your C function, and an argc
describing its arguments.
argc
should be:
- For a fixed number of arguments, the number of arguments (not counting the receiver)
- For a variable number of arguments in a C array,
-1
- For a variable number of arguments in a Ruby Array,
-2
Everything is pretty self-explanatory from there:
There’s also a shortcut for defining a method in a module and its
singleton class. This is used a lot in Math
, for example, letting you include
Math
to avoid typing Math.
before every method call.
Other Stuff
Some simple API functions for class/method definitions:
Data
By now you should be able to create and manipulate Ruby classes using the API,
but how can you create a Ruby class that encapsulates data from the C world? If
your data can be naturally translated into VALUE
s it’s easy: convert and
assign to instance variables as usual. But what if your data have no Ruby analog
(e.g. data structures defined by some C library)?
The API lets you encapsulate C data by creating a VALUE
of the desired class
and then storing a void*
pointing to the C data inside the Ruby object. Then
whenever you need access to the C data, you can unpack the pointer and cast it
back to the correct type. But where does this encapsulation occur? Let’s answer
that question with a question: what happens when you tell Ruby to create an
object using new
? Basically this:
Before calling the instance method initialize
that we know so well, new
first calls the class method allocate
to actually create the object. That is
the method you’ll need to define if you want your objects to wrap C data. The
following example creates a class Foo
which wraps an int
that can be set
by initialize
:
In most cases you’ll probably be wrapping something more complicated (like a
struct
), but the principles will be the same. After allocating the C data, we
use the TypedData_Wrap_Struct()
10 macro to wrap the pointer in a
VALUE
. This wrapping takes three arguments: the class of the object (self
because we’re in a class method), a pointer to a struct, and the data pointer to
be wrapped. The tricky part is the struct pointer; it provides additional
information for internal use by Ruby:
wrap_struct_name
is a string used by Ruby to identify your type. It doesn’t really matter what it is as long as it’s sensible and uniquefunction
is a struct containing several function pointers for use by the garbage collectordmark
will be described later, but as long your C data doesn’t point to any Ruby objects you don’t need itdfree
will be called when your object is destroyed and should free all memory allocated by the objectdsize
is called by Ruby to check how much memory your object is taking up. It can be omitted, but it’s polite to include itdata
can point to arbitrary data. Think of it as wrapping C data at a class level. Also not manadatoryflags
lets you enable additional optimizations when your objects are garbage collected. As long as yourdfree
function doesn’t unlock the GVL (why would you do that???) you can safely set it toRUBY_TYPED_FREE_IMMEDIATELY
for a slight performance improvement
If you don’t set some of these members, you should zero them out so that Ruby doesn’t accidentally read garbage data. That’s why I used C99’s designated initializer syntax in the example above: any members you omit will be safely cleared by the compiler.
VALUE
s that wrap C data will have type T_DATA
with respect to the TYPE()
macro. This helps ensure a clear separation between native Ruby objects
and those wrapping C data.
Once you’ve done all of that work to wrap up the C data, getting it back out is
easy: TypedData_Get_Struct()
takes the object to unwrap, the C type of the
underlying data, the same struct pointer as before, and the pointer to assign
the data to.
This separation of allocation and initialization doesn’t jive with RAII,
so if you’re using C++ you will probably want to use placement new when
wrapping data. If you’re having trouble splitting up allocation and
initialization, you can just wrap your data in a struct
and do the actual
allocation in initialize
.
In simple cases (like the previous example) you can make your code a little less
verbose. If the function to free your data just calls free()
as in the
example, you can pass RUBY_DEFAULT_FREE
for dfree
and Ruby will
free it for you (don’t use NULL
unless you like memory leaks). Similarly,
if your allocation is just a malloc()
as in the example, the macro
TypedData_Make_Struct()
does the allocation for you and wraps it. We could
shorten the previous example as such:
Marking
That dmark
pointer in the type structure above is the pointer to your object’s
“mark function”. This is so named because of the garbage collector’s “mark and
sweep” algorithm. The basic idea behind mark and sweep is that when the garbage
collector needs to free up memory, it performs two passes: the first (mark) pass
iterates through every referenced Ruby object and marks it as active, then the
second (sweep) pass iterates through every allocated Ruby object and frees the
ones that haven’t been marked active.
This is relevant to wrapping C data because it’s possible that you might wrap a
C struct
which contains a Ruby VALUE
—which the garbage collector is
responsible for cleaning up. Since the garbage collector is only aware of
VALUE
s referenced by Ruby (not by C pointers), it won’t be able to mark the
referenced VALUE
as active. The result is that as soon as the garbage
collector needs to free up some memory, your C data is going to end up with a
reference to a nonexistent Ruby object. Note that this kind of wrapping of Ruby
data inside C data is a really bad idea, precisely because of this kind of
issue. But if you really must…
In the following example, we’ll wrap a C struct
which contains a VALUE
. The
mark function has the same signature as the free function and all it has to do
is mark any VALUE
s in the struct
:
If your struct
contains a pointer to a C array of VALUE
s, you can instead
use rb_gc_mark_locations()
which takes two arguments: the pointers to the
start and end of the array (the end being equal to the starting pointer plus the
array length).11
Threading
Ruby in C Threads
If you’re making a lot of API calls and running a lot of Ruby code from C, at some point you might catch yourself thinking, “I’m running all of these slow Ruby methods using the API. Maybe I can thread things to keep my code fast!” That’s a reasonable thought, but when you act on it keep in mind that the Ruby VM is not at all thread safe. Ideally, all of your API code should run in a single thread. If not, you’ll probably need to wrap every API call with a locked mutex to make sure that you never ever have multiple threads interacting with the API at the same time.
If you just want to create normal Ruby Thread
s using the API (and don’t mind
the GVL, as described in the next section), there’s an easy way to do that:
Other Thread
functions are in ruby/intern.h
(but there’s always
rb_funcall()
for everything else).
C in Ruby Threads
On the other hand, if you expose some heavy C code to Ruby with the API (if
you’re writing an extension that wraps a C library, for example), you should
spend some time thinking about a nasty thing called the global VM lock (GVL).
Because most of the API is not Thread
safe, the GVL locks down almost all Ruby
code so that only a single Thread
can run at a time. This is the reason why
you’ll often hear people say that Thread
does not allow true parallelism.
The VM also applies the GVL to any C code you expose to Ruby. That’s why you can
use the API without worrying about it exploding when someone calls your C code
from inside a Thread
. The downside of this is that if your C code takes a
while to run, you won’t see any performance benefit from calling it in a
Thread
because it will block all other threads while it runs. But the GVL
is only needed to protect API calls. If you have some C code that doesn’t use
the API, you can tell the VM to release the GVL before running your code in a
thread and to reacquire it when it completes, allowing for true parallelism.
Locking and unlocking the GVL does carry a performance hit, so only resort to
this if you notice that you’re having significant problems due to blocked
threads.
The code to do this is considered so fancy by the Ruby developers that you actually need to include another header to use it. First we’ll look at the slightly simpler way to release the GVL:
Since the function that is run without the GVL gets and returns data using
void*
, you may want to define a struct
for passing data via pointers.
If you unlock the GVL as above you will find that while your code does run
in parallel, it can’t be interrupted (by signals, Thread.kill
, etc.)! To
allow for that you must pass an unblocking function using the last two
arguments:
The unblocking function is called in the event of an interrupt. To make it work, you will probably need to pass a pointer to both functions that can be used to communicate an interrupt from one to the other. The interrupted function should perform any necessary cleanup before returning early.
Alternatively, if the interrupted function doesn’t need to perform any special
cleanup, you can use the built in unblocking function RUBY_UBF_IO
12
(which ignores the unblocking argument). That simply forwards the interrupt to
the running thread.13
If you go through all of that effort to release the GVL only to find that you need to make an API call in your unlocked thread, there’s a function to temporarily reacquire the GVL:
See Also
extension.rdoc
Ruby does have official API documentation. It’s a bit spotty and has some poor recommendations (in my opinion), but it is also a little more exhaustive on certain topics. In many cases this is because I intentionally skipped something that I either found not useful or better documented elsewhere.
Headers
I think some of the handiest resources are the Ruby headers themselves. The full
API (i.e. everything you get by including ruby.h
) easily consists of a
thousand functions, macros, constants, and globals—most of which have
never been documented. However most things are reasonably named and you should
be able to figure out what they do from the header. Most everything you need
should be in the headers ruby/ruby.h
and ruby/intern.h
. The former has all
of the VM and metaprogramming functions, the latter has all of the functions for
interacting with Ruby’s built in classes.
There are also some headers not pulled in by ruby.h
which you can include to
get additional API functionality. Maybe one day I’ll write another section to
this guide going over them:
ruby/debug.h
(experimental) functions for profiling and tracing coderuby/encoding.h
functions for working with string encodingsruby/io.h
additional functions for Ruby’s IO classruby/re.h
additional functions for Ruby’s Regexp classruby/thread.h
functions for working with the GVLruby/version.h
functions for version introspection. Do not use this as feature-detection code!ruby/vm.h
(experimental) functions for VM control
Source
If you find some function in the header that isn’t documented anywhere, your next stop should be the Ruby source code.
When reading through the source code, always keep the headers at hand: there are lots of really useful functions in there that look like they should be in the API, but actually aren’t. In most cases there should be an API function elsewhere that wraps the call to the useful function.
Examples
Head over to the Examples page for short, compilable examples of the API in action.
Contribute
Now that you’ve finished reading my guide, did you notice something significant that I left out? Did I make some stupid mistake? Check out the source for this site on Github and you can report issues, submit pull requests, and download all of the code examples.
Footnotes
-
There’s also
rb_eval_string_wrap()
which should be useful, but is actually the same asrb_eval_string_protect()
due to a bug. ↩ -
That’s a blatant lie. The API definitely lets you mess around with the internal data structures of objects (look for things with names starting with capital R). But it’s generally not a good idea and not necessary. ↩
-
Or use
T_DATA
if the object wraps a C pointer. ↩ -
There’s also
Qundef
representing an undefined value, but this has no Ruby equivalent and is rarely used. In fact, outside of those rare occasions,Qundef
can segfault the VM if Ruby was expecting a normalVALUE
. ↩ -
There is a
CHR2FIX()
macro, but in my tests this sometimes gave unexpected results.LONG2FIX()
should work. ↩ -
I don’t know what the best way is to handle
wchar_t
. In my tests I had some success just treating them aschar
s, but I think that may have been a happy accident, and could certainly fail on different platforms. ↩ -
The documentation mentions
rb_iter_break()
andrb_iter_break_value()
for breaking out of a block, but can’t you just return early? I can’t think of a use-case for these. ↩ -
The documentation states that “You have to clear the error info… when ignoring the caught exception” during
rb_protect
. But I can’t find any documentation of when it would be cleared for you—it seems like you always have to clear it. ↩ -
And there’s
rb_yield_block()
which takes two unused arguments and is never called by anything in Ruby. Odd. ↩ -
The
TypedData*
macros are the preferred way to wrap data since Ruby 1.9.2. If you’re using an older version of Ruby you can check out an older version of this guide on Github to see how it used to be done. ↩ -
There’s also the enigmatically named
rb_gc_mark_maybe()
, but I’m not sure when it is needed. ↩ -
You can also use
RUBY_UBF_PROCESS
, but this seems to be a leftover from deprecated code and has the exact same effect. ↩ -
There is also the function
rb_thread_call_without_gvl2()
. The documentation inthread.c
says that if it “detects interrupt, it returns immediately,” but I’m not sure what this means. If the unblocking function doesn’t kill the thread, it still waits for the thread to finish on its own before returning. ↩