C implementation
The C implementation is very similar to the pure Python implementation except that there
are memory management issues to address. These do not occur in Python because there is
automatic memory allocation and freeing of unused memory (called garbage collection).
Further, if you want to be able to call the C functionality from Python then you need to
write additional code to accomplish this, to convert from Python’s own data structures to
more normal C types on the way in, and convert the result C types to Python’s own data
types on the way out. This approach assumes that you have an existing C library that you
want to use without having to rewrite it. If instead you were writing a C implementation
from scratch then an alternative strategy would be to use Python’s or NumPy’s own types
throughout your C code. Whether or not this is a good idea depends on the details of the
code, and what else you might want to do with it, though the C types generally give
quicker execution. Here we will write the core part of the C implementation to use
standard C types rather than Python’s or NumPy’s own types. We will not explain all the
vagaries of the C syntax but just the important differences with Python.
In C it is standard practice to have two files go together: the first (suffix ‘.h’) gives a
specification of the interface to the functionality, in particular stating what functions you
want to make publicly accessible, and the other file (suffix ‘.c’) is the actual
implementation of the functionality. Here we will call the former file somap.h and the
latter somap.c. In general there might be lots of functions that you want to specify as being
publicly accessible. Here we will only have one function, so somap.h has:
extern double ***selfOrganisingMap(double **inputs,
int ninputs, int depth, int nrows, int ncols,
double **spread, int width, int height, int nsteps);
In C we have to specify types explicitly. Here we have ‘int’ for integers and ‘double’ for
double-precision floating point numbers.
6
The two-asterisk ** and three-asterisk ***
syntaxes are effectively the C way of specifying that the corresponding variable is a list of
lists or a list of lists of lists, respectively. The function selfOrganisingMap is expecting
nine arguments. The extra arguments compared with the Python implementation are to
specify the sizes of the lists, because in C the list variables themselves do not contain this
information (one of the common sources of bugs in C programs). The result returned has
to be specified and here it is a list of lists of lists of double-precision floats, which
represents the constructed map. The ‘extern’ just says that the function is implemented
elsewhere (i.e. in this case in the file somap.c).
The implementation of selfOrganisingMap is very similar to the Python version. In
somap.c we declare the function, just repeating what is given in somap.h but without the
‘extern’:
double ***selfOrganisingMap(double **inputs,
int ninputs, int depth, int nrows, int ncols,
double **spread, int width, int height, int nsteps)
In C, blocks of code are delineated not by whitespace but by curly brackets, ‘{’ and ‘}’.
In C we also need to explicitly declare the data type of any variable that is being used.
And in C statements have to end with a semicolon ‘;’.
{
int step, i;
double decay;
double ***somap;
We then set up the initial value for somap. This is similar to what happens in Python
except that here we also require explicit memory allocation, and to be tidy we put it in a
separate function, randomMap. In C the equivalent (as such) of None is called NULL. For
the memory allocation a result of NULL means the allocation failed. This can be checked
with the short expression !somap. If that happens there is not much we can do except to
give up.
somap = randomMap(nrows, ncols, depth);
if (!somap)
return NULL;
We then repeat the same loops as in the Python version:
for (step = 0; step < nsteps; step++)
{
decay = exp(-step / (double) nsteps);
for (i = 0; i < ninputs; i++)
updateSomap(somap, inputs[i], depth , nrows, ncols,
spread, width, height, decay);
}
return somap;
}
The randomMap function does the memory allocation and setting the initial values to
random numbers. The ‘static’ means that this function is not visible outside this file, so is
private. The function which allocates memory is called malloc and you have to tell it how
many bytes of memory you want allocated, and sizeof provides the relevant multiplier to
go from the number of objects to the number of bytes.
static double ***randomMap(int nrows, int ncols, int depth)
{
int i, j, k;
double ***x;
x = (double ***) malloc(nrows * sizeof(double **));
if (!x)
return NULL;
for (i = 0; i < nrows; i++)
{
x[i] = (double **) malloc(ncols * sizeof(double *));
if (!x[i])
return NULL;
for (j = 0; j < ncols; j++)
{
x[i][j] = (double *) malloc(depth * sizeof(double));
if (!x[i][j])
return NULL;
for (k = 0; k < depth; k++)
x[i][j][k] = randomNumber();
}
}
return x;
}
The randomNumber() function should return a uniformly sampled random number
between 0 and 1. On most computers there is a function provided for this; for example, on
many Unix systems there is one called drand48().
The updateSomap() function is very similar in C to the pure Python implementation,
except here we have to declare the type of variables.
static void updateSomap(double ***somap, double *input,
int depth, int nrows, int ncols,
double **spread, int width, int height, double decay)
{
int halfWidth, halfHeight, i, j, k, l, m, imin, jmin;
double diff, diff2, diff2min, lambda;
imin = jmin = -1;
diff2min = 0; // will change in first pass
for (i = 0; i < nrows; i++)
{
for (j = 0; j < ncols; j++)
{
diff2 = 0.0;
for (k = 0; k < depth; k++)
{
diff = somap[i][j][k] - input[k];
diff2 += diff * diff;
}
if ((imin == -1) || (diff2 < diff2min))
{
imin = i;
jmin = j;
diff2min = diff2;
}
}
}
halfWidth = (width-1) / 2;
halfHeight = (height-1) / 2;
for (k = 0; k < width; k++)
{
i = (imin + k - halfWidth + nrows) % nrows;
for (l = 0; l < height; l++)
{
j = (jmin + l - halfHeight + ncols) % ncols;
lambda = decay * spread[k][l];
for (m = 0; m < depth; m++)
somap[i][j][m] = (1.0-lambda) * somap[i][j][m] + lambda * input[m];
}
}
}
The wrapper around this C code that converts to and from Python data types is the
boilerplate code. We will name the file py_somap.c, but it could be called anything. The
way that Python interfaces to C code changed between Python 2 and Python 3, and if you
want to write this wrapper code to be compliant with both then you can check whether the
defined constant PY_MAJOR_VERSION is >= 3, and if it is then use the Python 3 form
of the code, and otherwise the Python 2 form. C has convenient macros #ifdef / #else /
#endif which can be used to selectively include code in compilation.
In order to access the module in Python there must be a publicly available function. In
Python 2 it is named as ‘init’ concatenated with the module name. Here we will name the
module somap and so the function must be named initsomap. This function returns
nothing, which in C means void. We first need to call the internal Python function (albeit
via C) Py_InitModule in order to specify the module name and functions that will be made
available to Python.
In Python 3 the publicly available function instead has ‘PyInit_’ concatenated with the
module name, so here PyInit_somap. It returns the module object, which is of type
PyObject *. And to initialise the module we call the function PyModule_Create(), which
expects an argument of type PyModuleDef *, which in turn contains the name of the
module and the functions that will be made available to Python.
Then we need to call import_array in order to initialise NumPy and also create an
exception object, ErrorObject, so that we can report errors that occur when using this
module. In Python 2 we return nothing, but in Python 3 we return the module object.
#if PY_MAJOR_VERSION >= 3
static struct PyModuleDef module_def = {
PyModuleDef_HEAD_INIT,
"somap", /* m_name */
NULL, /* m_doc */
-1, /* m_size */
Somap_type_methods, /* m_methods */
NULL, /* m_reload */
NULL, /* m_traverse */
NULL, /* m_clear */
NULL, /* m_free */
};
#endif
#if PY_MAJOR_VERSION >= 3
PyObject *PyInit_somap(void)
#else
void initsomap(void)
#endif
{
PyObject *module;
/* create the module and add the functions */
#if PY_MAJOR_VERSION >= 3
module = PyModule_Create(&module_def);
#else
module = Py_InitModule("somap", Somap_type_methods);
#endif
import_array(); /* needed for numpy, otherwise it crashes */
/* create exception object and add to module */
ErrorObject = PyErr_NewException("somap.error", NULL, NULL);
Py_INCREF(ErrorObject);
PyModule_AddObject(module, "error", ErrorObject);
/* check for errors */
if (PyErr_Occurred())
Py_FatalError("can't initialize module somap");
#if PY_MAJOR_VERSION >= 3
return module;
#endif
}
Everything else in the module will be ‘static’, i.e. private in the C world (although the
specified functions will be available in the Python world), and should be placed above this
function in the file.
The exception object is just a simple variable. All Python objects in C are of type either
PyObject* or extensions thereof.
static PyObject *ErrorObject;
The functions available to the module are passed as the second argument to
Py_InitModule and here this is called Somap_type_methods. This is a list with each entry
containing four items: the name of the function (method) as it will be called in Python, the
function itself, the calling convention and a documentation string, here called somap_doc.
The list should be terminated with an entry that contains a NULL value for the function
name. Here we only have one function, which we also call somap.
static char somap_doc[] = "Creates a self-organising map";
static struct PyMethodDef Somap_type_methods[] =
{
{ "somap", (PyCFunction) somap, METH_VARARGS, somap_doc },
{ NULL, NULL, 0, NULL }
};
The most common convention is METH_VARARGS, where the functions expect two
arguments, the first being the object itself (by convention called self, as in Python code)
and the second being a tuple containing the values passed to the function in Python (by
convention called args). For other calling conventions see the Python documentation (see
links at
http://www.cambridge.org/pythonforbiology
).
The function somap is what does the work converting from the Python world to the C
world, calling the C function and then converting back from the C world to the Python
world. It first unpacks the arguments, and checks that they are of the correct data type. The
function PyArg_ParseTuple unpacks the args tuple. Here the tuple is expected to be of
length five, because there are five arguments that are passed in from the Python code. The
first element of the tuple is the inputs object, the second the spread object, the third the
number of rows, the fourth the number of columns and the fifth the number of steps for
the algorithm. The ‘O!’ in the call to PyArg_ParseTuple says that the corresponding
element in the tuple must be Python objects of the specified type, here PyArray_Type,
which is the NumPy array type. The ‘i’ in the call to PyArg_ParseTuple says that the last
three elements must be integers. Next it is checked that inputs_obj and spread_obj are both
of the required type and shape as NumPy arrays. The RETURN_ERROR C macro is a
shorthand for creating an error report (exception) that will be passed back to Python.
#define RETURN_ERROR(message) \
{ PyErr_SetString(ErrorObject, message); return NULL; }
static PyObject *somap(PyObject *self, PyObject *args)
{
int ninputs, nsteps, nrows, ncols, depth, width, height, i, j, k;
PyArrayObject *inputs_obj, *spread_obj;
PyObject *somap_obj;
double **inputs, **spread, ***somap;
npy_intp dims[3];
if (!PyArg_ParseTuple(args, "O!O!iii", &PyArray_Type, &inputs_obj,
&PyArray_Type, &spread_obj, &nrows, &cols, &nsteps))
RETURN_ERROR("need 5 args: inputs, spread, nrows, ncols, nsteps");
if (!PyArray_Check(inputs_obj))
RETURN_ERROR("inputs needs to be NumPy array");
if (PyArray_NDIM(inputs_obj) != 2)
RETURN_ERROR("inputs needs to be NumPy array with ndim 2");
if (!PyArray_Check(spread_obj))
RETURN_ERROR("spread needs to be NumPy array");
if (PyArray_NDIM(spread_obj) != 2)
RETURN_ERROR("spread needs to be NumPy array with ndim 2");
if (PyArray_TYPE(inputs_obj) != NPY_DOUBLE)
RETURN_ERROR("inputs needs to be array of doubles");
if (PyArray_TYPE(spread_obj) != NPY_DOUBLE)
RETURN_ERROR("spread needs to be array of doubles");
Next the function determines the size of inputs_obj and spread_obj and then copies
these two NumPy arrays into standard C arrays.
ninputs = PyArray_DIM(inputs_obj, 0);
depth = PyArray_DIM(inputs_obj, 1);
width = PyArray_DIM(spread_obj, 0);
height = PyArray_DIM(spread_obj, 1);
if (!(inputs = copyArray2(inputs_obj)))
RETURN_ERROR("getting inputs as C array");
if (!(spread = copyArray2(spread_obj)))
{
freeArray2(inputs, nrows, ncols);
RETURN_ERROR("getting spread as C array");
}
If there are any problems then the function returns an error message. The copyArray2
function includes memory allocation, and there are of course corresponding memory-
freeing functions. We also need a freeArray3 function for later use.
static void freeArray2(double **array, int m)
{
int i;
for (i = 0; i < m; i++)
free(array[i]);
free(array);
}
static void freeArray3(double ***array, int m, int n)
{
int i;
for (i = 0; i < m; i++)
freeArray2(array[i], n);
free(array);
}
static double **allocArray2(int m, int n)
{
int i;
double **array;
array = (double **) malloc(m*sizeof(double *));
if (array)
{
for (i = 0; i < m; i++)
array[i] = (double *) malloc(n*sizeof(double));
}
return array;
}
static double **copyArray2(PyArrayObject *array_obj)
{
int i, j, m, n;
double **array;
m = PyArray_DIM(array_obj, 0);
n = PyArray_DIM(array_obj, 1);
array = allocArray2(m, n);
if (array)
{
for (i = 0; i < m; i++)
{
for (j = 0; j < n; j++)
array[i][j] = *((double *)
PyArray_GETPTR2(array_obj, i, j));
}
}
return array;
}
Getting back to the somap function, up to this point we have just been converting the
Python data types into C data types. Next the function calls the straight C function that we
wrote above, which is the whole point of the exercise and is the one line of code that
actually does anything directly useful.
somap = selfOrganisingMap(inputs, ninputs, depth, nrows, ncols,
spread, width, height, nsteps);
Finally the function copies the result back into a Python data type and frees memory
and returns the result to Python.
dims[0] = nrows;
dims[1] = ncols;
dims[2] = 3;
somap_obj = PyArray_SimpleNew(3, dims, NPY_DOUBLE);
for (i = 0; i < nrows; i++)
for (j = 0; j < ncols; j++)
for (k = 0; k < depth; k++)
*((double *) PyArray_GETPTR3(somap_obj,i,j,k)) = somap[i][j][k];
freeArray3(somap, nrows, ncols);
freeArray2(inputs, ninputs);
freeArray2(spread, width);
return somap_obj;
}
Once written, the C code needs to be compiled before it can be used, to convert the
textual source code into binary code that can be executed on a computer platform. Further
information on how to compile and run the C code examples is given at
http://www.cambridge.org/pythonforbiology
.
Do'stlaringiz bilan baham: |