Before You Start

For the greatest chance of success with this guide, I recommend being fairly comfortable with C and very comfortable with Ruby.

Using Ruby’s C API does not require any advanced C concepts, however the API is huge and largely undocumented. After you start using it, you will likely find yourself delving through the Ruby source code at some point to figure out the behavior of some obscure function or macro. The Ruby source uses some fairly sophisticated C, so you should at least feel comfortable reading it.

You can think of the C API is being a big, clunky alternative to writing normal Ruby code. However the simple, elegant patterns of Ruby can be pretty unintuitive once translated into the language of the API. Having a strong intuition for Ruby’s internal logic and the ideas behind its design will go a long way toward steering you toward the correct API functions.

The Two Paths

The official Ruby interpreter is written in C. That means that everything you can do in Ruby, you can also do using function calls to Ruby’s C API. Why in the world would you do this? There are two good reasons:

  1. You’re writing some fancy application in C or C++ and you want some parts of your code to leverage the dynamic flexibility of Ruby. You can run the Ruby interpreter inside of your application and use the API to retrieve the results of Ruby code.
  2. You’re writing some fancy application in Ruby and you want some parts of your code to leverage the speed and power of C (or an existing C library). You can expose C code to Ruby using the API and compile a special library that Ruby can require.

You’ll need to structure your C code differently depending on your goal. If you want to embed the Ruby interpreter in C, read Running Ruby in C. If you want to require a compiled C library, read Running C in Ruby. After you finish that, come back here to learn about the API.

Eval

The quick ‘n’ dirty way to run some Ruby code from C is to eval it

int state;
VALUE result;
result = rb_eval_string_protect("puts 'Hello, world!'", &state);

if (state)
{
	/* handle exception */
}

This is a good fallback if you can’t find an API function for something that you want to do1. rb_eval_string_protect() returns the result of the Ruby code and sets state to some nonzero value if any exception is raised. VALUE is the C data type for all Ruby objects, as explained in the next section.

If state is nonzero, result will be a VALUE representing nil and you should handle the exception. Alternatively, you can use rb_eval_string() which doesn’t take a state argument and instead raises any exceptions normally. See Exceptions for how to handle both of these cases.

Unlike eval in Ruby, these functions evaluate the string in an isolated binding—like when you require something. So local variables in the string will not be accessible from elsewhere and vice versa.

However, like using eval in Ruby, using these functions is not a good practice. It’s inefficient since the parser is invoked and it somewhat defeats the point of writing in C. If you just want to call some Ruby method, we’ll go over a better way to do that later on.

VALUE

Before we go any further, we need to understand VALUEs. Due to the danger of monkeying around inside the VM, the API never lets you directly access Ruby’s objects2. Instead, your C code will store and pass around pointers to Ruby objects (like how variables in Ruby contain pointers to objects). These pointers can be passed to various API functions and macros that will safely access and manipulate the Ruby objects. VALUE is the API-defined C type for these pointers.

Probably the most frequent question you’ll have is: “is this VALUE the right type?”. There are a couple macros for performing this test, and both take a T_ constant corresponding to the Ruby class you’re testing for e.g. T_STRING, T_ARRAY, etc.

RB_TYPE_P(obj, T_STRING);  /* return true if obj is a String */
Check_Type(obj, T_STRING); /* or, raise a TypeError unless obj is a String */

These tests work for subclasses too: if you’re testing for a subclass of Array use T_ARRAY, if you’re testing for a subclass of Object use T_OBJECT3. That being said, these tests do not work like is_a?; even though everything in Ruby is_a? Object, testing against T_OBJECT will only return true for objects for which there is no better fitting constant.

For certain classes, there are specialized macros that are a little more efficient than the previous:

FIXNUM_P(obj);        /* like RB_TYPE_P(obj, T_FIXNUM) */
RB_FLOAT_TYPE_P(obj); /* like RB_TYPE_P(obj, T_FLOAT) */
SYMBOL_P(obj);        /* like RB_TYPE_P(obj, T_SYMBOL) */
NIL_P(obj);           /* like RB_TYPE_P(obj, T_NIL) */
RTEST(obj);           /* return true if obj is "truthy" i.e. not nil or false */

If you want to handle a VALUE that could be one of a variety of types, the previous macros can be a little clumsy. In that case you can use the TYPE() macro to get the T_ constant and handle your logic in a switch:

VALUE obj;

switch (TYPE(obj))
{
	case T_NIL:
		/* handle NilClass */
		break;
	case T_FIXNUM:
		/* handle Fixnum */
		break;
	case T_STRING:
		/* handle String */
		break;
	/* ... */
}

Constants

Most of the standard Ruby constants have global VALUEs defined for them in the API so you don’t need an API call to access them. Modules are prefixed with rb_m e.g. rb_mKernel; classes are prefixed with rb_c e.g. rb_cObject; subclasses of Exception are prefixed with rb_e e.g. rb_eRuntimeError; and the standard IO streams are prefixed with rb_ e.g. rb_stderr. nil, false, and true are prefixed with Q e.g. Qnil.4 As a convenience, Qfalse is also false in C (0).

Translation

A few Ruby classes are analogous to C types. These classes will be your primary means of transferring data between C and Ruby.

Fixnum

Ruby’s Fixnum corresponds to C’s long. The FIX2LONG() macro gives you the long for a Fixnum. For smaller C types there’s FIX2UINT(), FIX2INT(), and FIX2SHORT(), but these will raise a RangeError if the number wouldn’t fit.

In the other direction, LONG2FIX() works for long and every smaller integer C type5.

Bignum

Ruby’s Bignum is for anything bigger than a Fixnum, so it works if you need to work with long long, for example. rb_big2ll() and rb_big2ull() will get you long long and unsigned long long from a Bignum (or raise a RangeError if appropriate).

See Numeric for the reverse direction.

Float

Ruby’s Float corresponds to C’s double. The RFLOAT_VALUE() macro gives you the double for a Float.

See Numeric for the reverse direction.

Numeric

There are a host of “NUM” macros that try to be more duck-typish about things. These will convert their C types to whatever Ruby Numeric subclass seems appropriate:

  • INT2NUM() for int
  • UINT2NUM() for unsigned int
  • LONG2NUM() for long
  • ULONG2NUM() for unsigned long
  • LL2NUM() for long long
  • ULL2NUM() for unsigned long long
  • DBL2NUM() for double

And there are macros for the opposite direction, which will try to convert whatever Numeric to the desired C type. These will raise a RangeError if the value wouldn’t fit or TypeError if there is no implicit numeric conversion (so you can safely pass non-Numeric objects).

  • NUM2CHR() for char (works for unsigned char too)
  • NUM2SHORT() for short
  • NUM2USHORT() for unsigned short
  • NUM2INT() for int
  • NUM2UINT() for unsigned int
  • NUM2LONG() for long
  • NUM2ULONG() for unsigned long
  • NUM2LL() for long long
  • NUM2ULL() for unsigned long long
  • NUM2DBL() for double

A major gotcha with these is that none of the macros for converting to unsigned types raise an exception if you pass a negative value (surprisingly this isn’t a bug). NUM2CHR() also has a couple quirks: it will only raise a RangeError if the value is too big for an int and when passed a string it returns the numeric value of the first character rather than raising a TypeError.

If you know that the conversion is safe, you should prefer the macros from the previous sections as they skip the range checks.

String

Ruby’s String kinda corresponds to C’s char*. The simplest macro is StringValueCStr() which returns a null-terminated char* for a String. The problem here is that a Ruby String might contain nulls - in which case StringValueCStr() will raise an ArgumentError! Instead you can use the macros StringValuePtr() and RSTRING_LEN() to get a (possibly unterminated) char* and the string’s length as a long.

Conversely, if you have a null-terminated char*, you can use rb_str_new_cstr() to create a Ruby String. And if you want your String to contain nulls, use rb_str_new() which takes a char* and the string’s length (as a long). The encodings of these strings will be ASCII-8BIT, which is often undesirable in Ruby. You can pass the string VALUE to rb_str_export_locale() to get a new VALUE with your locale’s encoding6.

If you want to build more complex strings, you can do so using the printf-like function rb_sprintf(). This accepts all of the usual conversion specifiers, but also accepts an API-defined specifier PRIsVALUE which takes a corresponding VALUE argument. This conversion specifier substitutes a string by sending the object to_s. You can substitute the result of inspect instead by adding the + flag.

VALUE x;
x = rb_str_new_cstr("Hello, world!");

VALUE str;
str = rb_sprintf("pi = %f. %"PRIsVALUE" inspected: %+"PRIsVALUE, M_PI, x, x);

/* pi = 3.141593. Hello, world! inspected: "Hello, world!" */

This custom specifier should work for any printf-like function in the API. PRIsVALUE works by hijacking the i conversion specifier, so when printing an int you should use d to ensure that Ruby doesn’t think it’s actually a VALUE.

Symbol

The API defines a C type ID which corresponds to Ruby’s Symbol. Just like how Ruby passes around Symbols as method or variable names, many API calls that need a method or variable name use an ID. To convert between a Symbol and an ID use the SYM2ID() and ID2SYM() macros. Instead of a Symbol you may want to convert to/from a char* C string. To get an ID from a char* use rb_intern() and for the reverse use rb_id2name().

Since many API functions require an ID and in many cases you will not have the appropriate ID at hand, the API also defines a slew of functions that instead take a char* and which do the rb_intern() call for you. Since these functions are often more readable and the overhead of the rb_intern() call is negligible, I have opted to use the char* versions of the API functions wherever possible in this guide. If you find yourself frequently using a certain C string in API calls, you may see some performance benefit by storing the ID and using the ID versions of the functions (though you’ll have to look these up yourself in the Ruby headers).

Send

This section contains API functions for directly calling Ruby methods. You should prefer these functions to rb_eval_string() and the like whenever possible. They are faster since they skip the parser and allow for some compile-time checks.

The easiest way to send an object a method looks like this:

VALUE obj;
VALUE result;

result = rb_funcall(obj, rb_intern("=="), 1, Qnil);

This is roughly equivalent to the Ruby code

result = obj.send(:==, nil)

The first argument is the receiver. The next is the ID for the method name. The third argument is the number of method arguments, which is needed since rb_funcall() is a varargs function. Then come the actual method arguments.

Alternatively, you can use rb_funcallv() where the fourth argument is a VALUE* pointing to a C array of arguments. This also has the variant rb_funcallv_public() which is like public_send in Ruby.

Passing Blocks

If you want to pass a Proc as the block to a method, that’s easy. The function is just like rb_funcallv() but with the proc on the end.

VALUE proc;

/* assuming proc is assigned a Proc from somewhere */
result = rb_funcall_with_block(obj, rb_intern("each"), 0, NULL, proc);

If you don’t have a proc for the block, you’ll need to define a certain kind of C function to represent the block. Then there’s a different variant of rb_funcallv() but with a couple extra arguments for the block:

VALUE my_block(VALUE block_arg, VALUE data, int argc, VALUE* argv)
{
	/* block_arg will be the first yielded value */
	/* data will be the last argument you passed to rb_block_call */
	/* if multiple values are yielded, use argc/argv to access them */
}

void some_func()
{
	/* ... */

	VALUE obj;
	VALUE result;

	result = rb_block_call(obj, rb_intern("each"), 0, NULL, my_block, Qnil);

	/* ... */
}

The last argument to rb_block_call() is helpful for passing in values outside the block function’s scope, but in this example we don’t need it (thus nil). I also recommend against using the first argument to your block function unless you’re sure that only one value was yielded. You can always get all the arguments from argv, so why not play it safe?7

Builtins

Many of Ruby’s built-in classes have API functions defined for their most useful methods. Using them can save you from the verbosity of always using rb_funcall() and can provide more compile-time checks. There are far too many functions to list here, so I recommend checking them out in the header ruby/intern.h.

Functions are generally named like rb_(class)_(method) and take at least one VALUE argument (the receiver). E.g. rb_ary_pop() for Array#pop, rb_obj_dup() for Object#dup, etc.

Require

The API can also load some Ruby code from a script. There’s an equivalent to require:

rb_require("foo");

/* or, using a Ruby String (first argument is ignored) */
rb_f_require(Qnil, rb_str_new_cstr("foo"));

As with require, these could raise exceptions. Read the next section for how to handle them.

There are also functions for load if you want to load a script multiple times:

VALUE script = rb_str_new_cstr("./foo.rb");

rb_load(script, 0);

/* or, handle exceptions like rb_eval_string_protect() does */
int state;
rb_load_protect(script, 0, &state);

if (state)
{
	/* got exception */
}

Just like load in Ruby, these functions can wrap the loaded code in an anonymous module to protect the global namespace. Just pass a nonzero value for the second argument.

Exceptions

Raise

To raise an exception, use:

rb_raise(rb_eRuntimeError, "Error code %d", 404);

The first and second arguments are the exception class and message—like raise in Ruby. The big difference is that the message is a format string just like in rb_sprintf(), letting you more easily build a useful message.

You can also construct exception objects directly using rb_exc_new_cstr, rb_exc_new, and rb_exc_new_str. All of these accept an exception class as their first argument and then they work just like their string counterparts, constructing an exception using a null-terminated string, non-null-terminated string, and a String object, resp. Then you can raise your exception object with rb_exc_raise.

Rescue

There are several ways to rescue exceptions using the API. All of them require the code you’re protecting to be in a function that takes and returns a single VALUE.

VALUE dangerous_func(VALUE obj1)
{
	/* code that could raise an exception */
	return obj2;
}

Unless you wanted to rescue a function of exactly this type, you will probably need to make a wrapper function in this format that runs the desired code. The way to access a rescued exception is also independent of the way it is rescued:

VALUE exception = rb_errinfo(); /* get last exception */
rb_set_errinfo(Qnil);           /* clear last exception */

rb_errinfo() essentially gives you the VALUE of Ruby’s $! (which will be Qnil if no exception occurred). Unlike in Ruby, you must manually clear the exception after reading it8. Otherwise later API calls might read the old value and think another exception has occurred.

Next we will go over several methods of rescuing; you can use whichever you like, but I think that generally the right choice is determined by your use-case of the API.

rb_rescue2

If you’re compiling a library to be loaded by Ruby, you have it easy. Any exceptions raised in the API can be rescued as usual in your Ruby code. If you want to rescue an exception in the API, you can use rb_rescue2() which is similar to Ruby’s rescue.

VALUE rescue_func(VALUE obj1)
{
	/* handle exception */
	return obj2;
}

void some_function()
{
	/* ... */

	VALUE result;

	/* rescue TypeError and RangeError */
	result = rb_rescue2(dangerous_func, dangerous_arg, rescue_func, rescue_arg, rb_eTypeError, rb_eRangeError, 0);

	/* ... */
}

The first two arguments are the function to protect and its argument, the next two are the function to call if an exception is raised and its argument. rb_rescue2() is a varargs function, so after that comes a list of the exception classes you want to rescue. The last argument should always be 0 to indicate the end of the class list. Like rescue in Ruby, any exceptions not in this list will not be rescued. If you just want to rescue StandardError (like a blank rescue in Ruby), you can use rb_rescue() which takes just the first four arguments of rb_rescue2().

The API does not provide an easy way to run different rescue code for different exception classes as Ruby does. You’ll need to rescue all the classes you want at once and use some kind of switch to handle them separately.

The API also does not directly provide an equivalent to Ruby’s else i.e. code to run when no exception was raised. One way to do this is using the return value of rb_rescue2(). If no exception is raised, it returns the return value of the first (dangerous) function, otherwise the return value of the second (rescue) function. By having these return, say, Qtrue and Qfalse you can detect which case you are in.

rb_protect

If you’re embedding the Ruby interpreter in C, you need to be extremely careful when calling API functions that could raise exceptions: an uncaught exception will segfault the VM and kill your program. You could call rb_rescue2() with rb_eException, but there’s another approach for rescuing all exceptions:

void some_function()
{
	/* ... */

	VALUE result;

	int state;
	result = rb_protect(dangerous_func, dangerous_arg, &state);

	if (state)
	{
		/* handle exception */
	}
	else
	{
		/* no exception occured */
	}

	/* ... */
}

Like rb_rescue2(), the first two arguments are for calling the function to protect. However, like rb_eval_string_protect(), if an exception is raised it returns Qnil and sets state to some nonzero value. If you want to re-raise the exception, pass state to rb_jump_tag() (this also works for the state from the other *_protect() functions).

Ensure

rb_ensure() is similar to rb_rescue() except that it doesn’t do anything about exceptions and the second function is always called after the first. That may sound simple enough, but that means that if you want the usual begin; rescue; ensure; end structure as in Ruby, you’ll need another layer of wrapping:

VALUE ensure_func(VALUE obj1)
{
	/* stuff to always run after dangerous_func */
	return obj2;
}

/* wrap rb_ensure so we can rescue an exception */
VALUE begin_func(VALUE dangerous_arg)
{
	return rb_ensure(dangerous_func, dangerous_arg, ensure_func, Qnil);
}

VALUE rescue_func(VALUE obj1)
{
	/* handle exception */
	return obj2;
}

void some_function()
{
	/* ... */

	VALUE result = rb_rescue(begin_func, dangerous_arg, rescue_func, rescue_arg);

	/* ... */
}

Like ensure in Ruby, the return value of ensure_func() is never used. If no exception occurs, rb_rescue() will return the value of begin_func() which returns the value of dangerous_func(). If an exception does occur, rb_rescue() returns the value of rescue_func().

Definitions, Declarations

So far we’ve been creating and modifying objects directly in the VM’s memory, but none of our API calls have had a visible effect within the Ruby code: a String made with rb_str_new_cstr() can only be accessed from C by default.

There are a few ways to make things visible to Ruby but they all work the same general way: by defining some name that Ruby can access e.g. a variable name, a method name, etc. A general warning though: unlike Ruby, the API lets you give things invalid names. Ruby will raise a SyntaxError or NameError if you try to name a class foo (not constant) or an instance variable bar (no @), but the API will happily create them. The API handles this by not exposing invalid names to Ruby. Since that’s probably not what you want, double check the names you choose!

Most of the API functions in this section correspond closely to metaprogramming methods in Ruby. When you’re trying to do something using the API, it can be helpful to think about how you would do it in Ruby using only metaprogramming method calls. For example, rather than class Foo; def bar; end; end, think Foo = Class.new; Foo.define_method(:bar) {}.

Global Variables

The simplest way to deal with globals is:

VALUE gv;

rb_gv_set("$x", gv);
gv = rb_gv_get("$x");

If you’re frequently accessing Ruby’s globals, you can set up a VALUE which will be automatically synchronized with one.

VALUE global;

VALUE global_getter(ID id)
{
	/* return some VALUE, probably based on global */
}

void global_setter(VALUE val, ID id)
{
	/* set global, probably based on val */
}

void some_func()
{
	/* ... */

	/* initialize global first! */

	/* $w can be changed freely in Ruby */
	rb_define_variable("$w", &global);

	/* assigning a new value to $x in Ruby will raise a NameError */
	rb_define_readonly_variable("$x", &global);

	/* $y can be changed freely in Ruby, but through the specified functions */
	rb_define_hooked_variable("$y", &global, global_getter, global_setter);

	/* same as previous, but there's no corresponding VALUE! */
	rb_define_virtual_variable("$z", global_getter, global_setter)

	/* ... */
}

The VALUE should be initialized before you create the global in Ruby and it should be global in C as well—you don’t want it to go out of scope while Ruby is using it! For rb_define_hooked_variable(), you can pass NULL for the getter/setter if you want to synchronize normally for that operation. Or you can throw out global entirely with rb_define_virtual_variable() though of course the getter and setter must be defined in that case.

If you ever create a global VALUE in C which is not exposed to Ruby, you must tell the garbage collector about it to prevent it from being prematurely cleaned up:

rb_global_variable(&global);

Class and Instance Variables

Getting/setting instance variables is similar to the simple way of accessing globals, but of course you need an object to get the variable from.

VALUE obj;
VALUE iv;

iv = rb_iv_get(obj, "@x");
rb_iv_set(obj, "@x", iv);

There isn’t an automatic way to synchronize instance variables like you can with globals.

To iterate over all instance variables, use rb_ivar_foreach.

For class variables, the methods are rb_cv_get() and rb_cv_set() and of course the first argument should be a class object.

Constants

Constants are defined similarly, but with the module to define them under:

rb_define_const(rb_mMath, "PI_ISH", DBL2NUM(3.14));

/* shortcut for defining under rb_cObject */
rb_define_global_const("PI_ISH", DBL2NUM(3.14));

You undefine a constant by setting it to Qundef. Getting a constant’s VALUE is a little nuanced. The API function you call depends on what you want to happen if the constant is not defined in the module you specify:

VALUE constant;

/* if not defined, call const_missing hook */
constant = rb_const_get_at(rb_mMath, rb_intern("PI_ISH"));

/* if not defined, look for it up the inheritance chain, call const_missing if still not found */
constant = rb_const_get(rb_mMath, rb_intern("PI_ISH"));

/* same as previous, but print a warning if the constant ends up coming from Object (i.e. toplevel) */
constant = rb_const_get_from(rb_mMath, rb_intern("PI_ISH"));

All of these API calls will get private constants too.

Modules and Classes

Defining modules is super easy.

VALUE mFoo;
VALUE mBar;

/* toplevel module Foo */
mFoo = rb_define_module("Foo");
/* nested module Foo::Bar */
mBar = rb_define_module_under(mFoo, "Bar");

Classes work the same way but they also need a superclass.

VALUE cFoo;
VALUE cBar;

/* toplevel class Foo < Object */
cFoo = rb_define_class("Foo", rb_cObject);
/* nested class Foo::Bar < Array */
cBar = rb_define_class_under(cFoo, "Bar", rb_cArray);

Methods

Here’s where it gets interesting. There are many kinds of API calls for defining methods, but before you use any of them you’ll need a C function that the method calls. The function must return a VALUE and have one VALUE argument for the receiver of the method. There are three ways you can define its other arguments:

/* normal mandatory args (can have up to 16 args not counting self) */
VALUE my_method(VALUE self, VALUE arg1, VALUE arg2)
{
	/* ... */
}

/* or, slurp all args into a Ruby Array */
VALUE my_method(VALUE self, VALUE args)
{
	/* ... */
}

/* or, pass all args as a C array */
VALUE my_method(int argc, VALUE* argv, VALUE self)
{
	/* ... */
}

So really the API only lets you define two types of methods: ones that take a fixed number of arguments, and ones that slurp up all of their arguments. What about all of Ruby’s fancy argument features? Where are optional arguments, options hashes, blocks, and all the mixtures of those?

Parsing Arguments

Well, if you accept a variable number of arguments you could code all of that logic yourself in the method, and make it behave like it has a fancier method definition in Ruby. Thankfully, the API has a shortcut for doing exactly that. To use it, you should use the C array function definition, then you can pass argc and argv along to:

int rb_scan_args(int argc, const VALUE* argv, const char* fmt, ...);

Here fmt is a format string describing how the method arguments would look in Ruby. The string can have at most 6 characters, where each character describes a different section of the arguments. The six sections and their corresponding characters are (in order):

  1. The number of leading mandatory arguments: a digit
  2. The number of optional arguments: a digit
  3. A splatted argument: *
  4. The number of trailing mandatory arguments: a digit
  5. Keyword arguments: :
  6. A block argument: &

Each section is optional, so you can leave out the characters for things you don’t need. Be aware that the parsing of the format string is greedy: 1* describes a method with one mandatory argument and a splat. If you want one optional argument and a splat you must specify 01*. Following the format string, you must pass a VALUE* for each Ruby argument. The number of pointers passed should equal the “total” of the six sections, though you can pass NULL for an argument you don’t care about. For example the format string 21*& should have 5 VALUE*s passed (2 mandatory, 1 optional, 1 splatted, 1 block).

rb_scan_args() unpacks argv using the VALUE*s you pass it and will raise a fitting exception if the wrong number of arguments were passed.

VALUE my_method(int argc, VALUE* argv, VALUE self)
{
	/*
	 * We want to define a method like
	 *
	 *     def my_method man1, opt1 = true, opt2 = false, *splat, man2, **opts, &blk
	*/

	VALUE man1, man2;
	VALUE opt1, opt2;
	VALUE splat;
	VALUE opts;
	VALUE blk;

	rb_scan_args(argc, argv, "12*1:&", &man1, &opt1, &opt2, &splat, &man2, &opts, &blk);

	/* you must manually set the default values for optional arguments */
	if (NIL_P(opt1)) opt1 = Qtrue;
	if (NIL_P(opt2)) opt2 = Qfalse;
	/* opts will be nil (rather than {}) if no keyword arguments were passed */
	if (NIL_P(opts)) opts = rb_hash_new();

	/* ... */
}

You can also use the return value of rb_scan_args() to determine how the function was called. It returns the number of arguments that were passed in Ruby.

Handling Blocks

There are two ways to check if your C method has been called with a block:

/* raise a LocalJumpError if we don't have a block */
rb_need_block();

/* or the softer approach */
if (rb_block_given_p())
{
	/* code to run when we have a block */
}

There are two ways to capture the block as a proc. If you’re using rb_scan_args() for your method arguments, just include & in your format string to get it. If you aren’t using rb_scan_args(), there’s an API call equivalent to Proc.new which converts the method’s block to a proc:

VALUE block;
block = rb_block_proc();

If you don’t want to capture the block, there are a few ways to yield to it:

VALUE result;

/* yield a value. To yield nothing, use Qundef */
result = rb_yield(Qundef);
/* yield several values */
result = rb_yield_values(3, Qtrue, Qfalse, Qnil);
/* splat a Ruby array and yield it */
result = rb_yield_splat(ary);

There’s also rb_yield_values2() which is like rb_yield_values() but instead of varargs the second argument is a VALUE*9.

Super

You might want to call super in your method.

VALUE rb_call_super(int argc, const VALUE *argv);

Unlike in Ruby rb_call_super() will not implicitly pass along the method arguments to the super if you give it no arguments. You must explicitly pass the correct argc and argv (it does automatically pass self). For that reason I recommend using the C array style of method definition if you want to use rb_call_super().

Definition

Setting up the C function is the hard part, now it’s easy to define the method in Ruby. Every API call to create a method takes at least the method name (char*), a pointer to your C function, and an argc describing its arguments. argc should be:

  1. For a fixed number of arguments, the number of arguments (not counting the receiver)
  2. For a variable number of arguments in a C array, -1
  3. For a variable number of arguments in a Ruby Array, -2

Everything is pretty self-explanatory from there:

/* the usual */
rb_define_method(klass, "my_method", my_method, argc);
/* or, like a toplevel def (by defining a method in Kernel) */
rb_define_global_function("my_method", my_method, argc);

/* or, with access control */
rb_define_private_method(klass, "my_method", my_method, argc);
rb_define_protected_method(klass, "my_method", my_method, argc);

/* or, in the singleton class */
rb_define_singleton_method(object, "my_method", my_method, argc);

There’s also a shortcut for defining a method in a module and its singleton class. This is used a lot in Math, for example, letting you include Math to avoid typing Math. before every method call.

rb_define_module_function(module, "my_method", my_method, argc);

Other Stuff

Some simple API functions for class/method definitions:

/* klass.include module */
rb_include_module(klass, module);

/* klass.prepend module */
rb_prepend_module(klass, module);

/* obj.extend module */
rb_extend_object(obj, module);

/* klass.class_eval { undef :method } */
rb_undef_method(klass, "method")

/* klass.class_eval { alias :meth2 :meth1 } */
rb_define_alias(klass, "meth2", "meth1")

/* klass.attr_reader :x */
rb_define_attr(klass, "x", 1, 0);
/* klass.attr_writer :x */
rb_define_attr(klass, "x", 0, 1);
/* klass.attr_accessor :x */
rb_define_attr(klass, "x", 1, 1);

/* obj.singleton_class # handy in combination with the other functions */
VALUE singleton;
singleton = rb_singleton_class(obj);

Data

By now you should be able to create and manipulate Ruby classes using the API, but how can you create a Ruby class that encapsulates data from the C world? If your data can be naturally translated into VALUEs it’s easy: convert and assign to instance variables as usual. But what if your data have no Ruby analog (e.g. data structures defined by some C library)?

The API lets you encapsulate C data by creating a VALUE of the desired class and then storing a void* pointing to the C data inside the Ruby object. Then whenever you need access to the C data, you can unpack the pointer and cast it back to the correct type. But where does this encapsulation occur? Let’s answer that question with a question: what happens when you tell Ruby to create an object using new? Basically this:

class Class
  def new *args, &blk
    obj = allocate
    obj.initialize(*args, &blk)
    obj
  end

  def allocate
    # create and return an empty instance
  end
end

Before calling the instance method initialize that we know so well, new first calls the class method allocate to actually create the object. That is the method you’ll need to define if you want your objects to wrap C data. The following example creates a class Foo which wraps an int that can be set by initialize:

#include <stdlib.h>

void foo_free(void* data)
{
	free(data);
}

size_t foo_size(const void* data)
{
	return sizeof(int);
}

static const rb_data_type_t foo_type = {
	.wrap_struct_name = "foo",
	.function = {
		.dmark = NULL,
		.dfree = foo_free,
		.dsize = foo_size,
	},
	.data = NULL,
	.flags = RUBY_TYPED_FREE_IMMEDIATELY,
};

VALUE foo_alloc(VALUE self)
{
	/* allocate */
	int* data = malloc(sizeof(int));

	/* wrap */
	return TypedData_Wrap_Struct(self, &foo_type, data);
}

VALUE foo_m_initialize(VALUE self, VALUE val)
{
	int* data;
	/* unwrap */
	TypedData_Get_Struct(self, int, &foo_type, data);

	*data = NUM2INT(val);

	return self;
}

void some_func()
{
	/* ... */

	VALUE cFoo = rb_define_class("Foo", rb_cObject);

	rb_define_alloc_func(cFoo, foo_alloc);
	rb_define_method(cFoo, "initialize", foo_m_initialize, 1);

	/* ... */
}

In most cases you’ll probably be wrapping something more complicated (like a struct), but the principles will be the same. After allocating the C data, we use the TypedData_Wrap_Struct()10 macro to wrap the pointer in a VALUE. This wrapping takes three arguments: the class of the object (self because we’re in a class method), a pointer to a struct, and the data pointer to be wrapped. The tricky part is the struct pointer; it provides additional information for internal use by Ruby:

  • wrap_struct_name is a string used by Ruby to identify your type. It doesn’t really matter what it is as long as it’s sensible and unique
  • function is a struct containing several function pointers for use by the garbage collector
  • dmark will be described later, but as long your C data doesn’t point to any Ruby objects you don’t need it
  • dfree will be called when your object is destroyed and should free all memory allocated by the object
  • dsize is called by Ruby to check how much memory your object is taking up. It can be omitted, but it’s polite to include it
  • data can point to arbitrary data. Think of it as wrapping C data at a class level. Also not manadatory
  • flags lets you enable additional optimizations when your objects are garbage collected. As long as your dfree function doesn’t unlock the GVL (why would you do that???) you can safely set it to RUBY_TYPED_FREE_IMMEDIATELY for a slight performance improvement

If you don’t set some of these members, you should zero them out so that Ruby doesn’t accidentally read garbage data. That’s why I used C99’s designated initializer syntax in the example above: any members you omit will be safely cleared by the compiler.

VALUEs that wrap C data will have type T_DATA with respect to the TYPE() macro. This helps ensure a clear separation between native Ruby objects and those wrapping C data.

Once you’ve done all of that work to wrap up the C data, getting it back out is easy: TypedData_Get_Struct() takes the object to unwrap, the C type of the underlying data, the same struct pointer as before, and the pointer to assign the data to.

This separation of allocation and initialization doesn’t jive with RAII, so if you’re using C++ you will probably want to use placement new when wrapping data. If you’re having trouble splitting up allocation and initialization, you can just wrap your data in a struct and do the actual allocation in initialize.

In simple cases (like the previous example) you can make your code a little less verbose. If the function to free your data just calls free() as in the example, you can pass RUBY_DEFAULT_FREE for dfree and Ruby will free it for you (don’t use NULL unless you like memory leaks). Similarly, if your allocation is just a malloc() as in the example, the macro TypedData_Make_Struct() does the allocation for you and wraps it. We could shorten the previous example as such:

static const rb_data_type_t foo_type = {
	.wrap_struct_name = "foo",
	.function = {
		.dfree = RUBY_DEFAULT_FREE,
		.dsize = foo_size,
	},
	.flags = RUBY_TYPED_FREE_IMMEDIATELY,
};

VALUE foo_alloc(VALUE self)
{
	int* data;
	/* allocate and wrap. note that it needs the type to allocate */
	return Data_Make_Struct(self, int, &foo_type, data);
}

Marking

That dmark pointer in the type structure above is the pointer to your object’s “mark function”. This is so named because of the garbage collector’s “mark and sweep” algorithm. The basic idea behind mark and sweep is that when the garbage collector needs to free up memory, it performs two passes: the first (mark) pass iterates through every referenced Ruby object and marks it as active, then the second (sweep) pass iterates through every allocated Ruby object and frees the ones that haven’t been marked active.

This is relevant to wrapping C data because it’s possible that you might wrap a C struct which contains a Ruby VALUE—which the garbage collector is responsible for cleaning up. Since the garbage collector is only aware of VALUEs referenced by Ruby (not by C pointers), it won’t be able to mark the referenced VALUE as active. The result is that as soon as the garbage collector needs to free up some memory, your C data is going to end up with a reference to a nonexistent Ruby object. Note that this kind of wrapping of Ruby data inside C data is a really bad idea, precisely because of this kind of issue. But if you really must…

In the following example, we’ll wrap a C struct which contains a VALUE. The mark function has the same signature as the free function and all it has to do is mark any VALUEs in the struct:

/* ... */

struct foo_data
{
	VALUE x;
};

void foo_mark(void* data)
{
	rb_gc_mark(((struct foo_data*)data)->x);
}

static const rb_data_type_t foo_type = {
	/* ... */
	.function = {
		.dmark = foo_mark,
		/* ... */
	},
	/* ... */
};


VALUE foo_alloc(VALUE self)
{
	struct foo_data* data;
	/* wrap */
	return TypedData_Make_Struct(self, struct foo_data, &foo_type, data);
}

/* ... */

If your struct contains a pointer to a C array of VALUEs, you can instead use rb_gc_mark_locations() which takes two arguments: the pointers to the start and end of the array (the end being equal to the starting pointer plus the array length).11

Threading

Ruby in C Threads

If you’re making a lot of API calls and running a lot of Ruby code from C, at some point you might catch yourself thinking, “I’m running all of these slow Ruby methods using the API. Maybe I can thread things to keep my code fast!” That’s a reasonable thought, but when you act on it keep in mind that the Ruby VM is not at all thread safe. Ideally, all of your API code should run in a single thread. If not, you’ll probably need to wrap every API call with a locked mutex to make sure that you never ever have multiple threads interacting with the API at the same time.

If you just want to create normal Ruby Threads using the API (and don’t mind the GVL, as described in the next section), there’s an easy way to do that:

VALUE my_thread(VALUE arg)
{
	/* ... */
}

void some_func()
{
	/* ... */

	VALUE thread;

	thread = rb_thread_create(my_thread, arg);

	/* ... */
}

Other Thread functions are in ruby/intern.h (but there’s always rb_funcall() for everything else).

C in Ruby Threads

On the other hand, if you expose some heavy C code to Ruby with the API (if you’re writing an extension that wraps a C library, for example), you should spend some time thinking about a nasty thing called the global VM lock (GVL). Because most of the API is not Thread safe, the GVL locks down almost all Ruby code so that only a single Thread can run at a time. This is the reason why you’ll often hear people say that Thread does not allow true parallelism.

The VM also applies the GVL to any C code you expose to Ruby. That’s why you can use the API without worrying about it exploding when someone calls your C code from inside a Thread. The downside of this is that if your C code takes a while to run, you won’t see any performance benefit from calling it in a Thread because it will block all other threads while it runs. But the GVL is only needed to protect API calls. If you have some C code that doesn’t use the API, you can tell the VM to release the GVL before running your code in a thread and to reacquire it when it completes, allowing for true parallelism. Locking and unlocking the GVL does carry a performance hit, so only resort to this if you notice that you’re having significant problems due to blocked threads.

The code to do this is considered so fancy by the Ruby developers that you actually need to include another header to use it. First we’ll look at the slightly simpler way to release the GVL:

#include <ruby/thread.h>

void* slow_func(void* slow_arg)
{
	/* slow code that DOES NOT USE THE API */
}

VALUE my_method(VALUE self)
{
	/* arg parsing, API stuff, etc. */

	rb_thread_call_without_gvl(slow_func, slow_arg, NULL, NULL);

	/* more API stuff. probably turn the result of slow_func into a VALUE */
}

Since the function that is run without the GVL gets and returns data using void*, you may want to define a struct for passing data via pointers.

If you unlock the GVL as above you will find that while your code does run in parallel, it can’t be interrupted (by signals, Thread.kill, etc.)! To allow for that you must pass an unblocking function using the last two arguments:

#include <ruby/thread.h>

/* ... */

void unblocking_func(void* arg)
{
	/* somehow tell slow_func to return early */
}

VALUE my_method(VALUE self)
{
	/* ... */

	rb_thread_call_without_gvl(slow_func, slow_arg, unblocking_func, unblocking_arg);

	/* ... */
}

The unblocking function is called in the event of an interrupt. To make it work, you will probably need to pass a pointer to both functions that can be used to communicate an interrupt from one to the other. The interrupted function should perform any necessary cleanup before returning early.

Alternatively, if the interrupted function doesn’t need to perform any special cleanup, you can use the built in unblocking function RUBY_UBF_IO12 (which ignores the unblocking argument). That simply forwards the interrupt to the running thread.13

If you go through all of that effort to release the GVL only to find that you need to make an API call in your unlocked thread, there’s a function to temporarily reacquire the GVL:

/* ... */

void* api_func(void* api_arg)
{
	/* call API functions */
}

void* slow_func(void* slow_arg)
{
	/* ... */

	rb_thread_call_with_gvl(api_func, api_arg);

	/* ... */
}

/* ... */

See Also

extension.rdoc

Ruby does have official API documentation. It’s a bit spotty and has some poor recommendations (in my opinion), but it is also a little more exhaustive on certain topics. In many cases this is because I intentionally skipped something that I either found not useful or better documented elsewhere.

Headers

I think some of the handiest resources are the Ruby headers themselves. The full API (i.e. everything you get by including ruby.h) easily consists of a thousand functions, macros, constants, and globals—most of which have never been documented. However most things are reasonably named and you should be able to figure out what they do from the header. Most everything you need should be in the headers ruby/ruby.h and ruby/intern.h. The former has all of the VM and metaprogramming functions, the latter has all of the functions for interacting with Ruby’s built in classes.

There are also some headers not pulled in by ruby.h which you can include to get additional API functionality. Maybe one day I’ll write another section to this guide going over them:

  • ruby/debug.h (experimental) functions for profiling and tracing code
  • ruby/encoding.h functions for working with string encodings
  • ruby/io.h additional functions for Ruby’s IO class
  • ruby/re.h additional functions for Ruby’s Regexp class
  • ruby/thread.h functions for working with the GVL
  • ruby/version.h functions for version introspection. Do not use this as feature-detection code!
  • ruby/vm.h (experimental) functions for VM control

Source

If you find some function in the header that isn’t documented anywhere, your next stop should be the Ruby source code.

$ git clone https://github.com/ruby/ruby.git

When reading through the source code, always keep the headers at hand: there are lots of really useful functions in there that look like they should be in the API, but actually aren’t. In most cases there should be an API function elsewhere that wraps the call to the useful function.

Examples

Head over to the Examples page for short, compilable examples of the API in action.

Contribute

Now that you’ve finished reading my guide, did you notice something significant that I left out? Did I make some stupid mistake? Check out the source for this site on Github and you can report issues, submit pull requests, and download all of the code examples.

Footnotes

  1. There’s also rb_eval_string_wrap() which should be useful, but is actually the same as rb_eval_string_protect() due to a bug

  2. That’s a blatant lie. The API definitely lets you mess around with the internal data structures of objects (look for things with names starting with capital R). But it’s generally not a good idea and not necessary. 

  3. Or use T_DATA if the object wraps a C pointer

  4. There’s also Qundef representing an undefined value, but this has no Ruby equivalent and is rarely used. In fact, outside of those rare occasions, Qundef can segfault the VM if Ruby was expecting a normal VALUE

  5. There is a CHR2FIX() macro, but in my tests this sometimes gave unexpected results. LONG2FIX() should work. 

  6. I don’t know what the best way is to handle wchar_t. In my tests I had some success just treating them as chars, but I think that may have been a happy accident, and could certainly fail on different platforms. 

  7. The documentation mentions rb_iter_break() and rb_iter_break_value() for breaking out of a block, but can’t you just return early? I can’t think of a use-case for these. 

  8. The documentation states that “You have to clear the error info… when ignoring the caught exception” during rb_protect. But I can’t find any documentation of when it would be cleared for you—it seems like you always have to clear it. 

  9. And there’s rb_yield_block() which takes two unused arguments and is never called by anything in Ruby. Odd. 

  10. The TypedData* macros are the preferred way to wrap data since Ruby 1.9.2. If you’re using an older version of Ruby you can check out an older version of this guide on Github to see how it used to be done. 

  11. There’s also the enigmatically named rb_gc_mark_maybe(), but I’m not sure when it is needed. 

  12. You can also use RUBY_UBF_PROCESS, but this seems to be a leftover from deprecated code and has the exact same effect. 

  13. There is also the function rb_thread_call_without_gvl2(). The documentation in thread.c says that if it “detects interrupt, it returns immediately,” but I’m not sure what this means. If the unblocking function doesn’t kill the thread, it still waits for the thread to finish on its own before returning. 

Comments