Source code
Revision control
Copy as Markdown
Other Tools
.. _writing_matchers:
Writing Matchers
================
On this page we will give some information about what a matcher is, and then provide an example of developing a simple match iteratively.
Types of Matchers
-----------------
There are three types of matches: Node, Narrowing, and Traversal. There isn't always a clear separation or distinction between them, so treat this explanation as illustrative rather than definitive. Here is the documentation on matchers: `https://clang.llvm.org/docs/LibASTMatchersReference.html <https://clang.llvm.org/docs/LibASTMatchersReference.html>`_
On that page it is not obvious, so we want to note, **cicking on the name of a matcher expands help about that matcher.** Example:
.. image:: documentation-expanded.png
Node Matchers
~~~~~~~~~~~~~
Node matchers can be thought of as 'Nouns'. They specify a **type** of node you want to match, that is, a particular *thing*. A function, a binary operation, a variable, a type.
A full list of `node matchers are listed in the documentation <https://clang.llvm.org/docs/LibASTMatchersReference.html#node-matchers>`_. Some common ones are ``functionDecl()``, ``binaryOperator()``, and ``stmt()``.
Narrowing Matchers
~~~~~~~~~~~~~~~~~~
Narrowing matchers can be thought of as 'Adjectives'. They narrow, or describe, a node, and therefore must be applied to a Node Matcher. For instance a node matcher may be a ``functionDecl``, and the narrowing matcher applied to it may be ``parameterCountIs``.
The `table in the documentation <https://clang.llvm.org/docs/LibASTMatchersReference.html#narrowing-matchers>`_ lists all the narrowing matchers, which they apply to and how to use them. Here is how to read the table:
.. image:: narrowing-matcher.png
And some examples:
::
m functionDecl(parameterCountIs(1))
m functionDecl(anyOf(isDefinition(), isVariadic()))
As you can see **only one Narrowing Matcher is allowed** and it goes inside the parens of the Node Matcher. In the first example, the matcher is ``parameterCountIs``, in the second it is ``anyOf``.
In the second, we use the singular ``anyOf`` matcher to match any of multiple other Narrowing Matchers: ``isDefinition`` or ``isVariadic``. The other two common combining narrowing matchers are ``allOf()`` and ``unless()``.
If you *need* to specify a narrowing matcher (because it's a required argument to some other matcher), you can use the ``anything()`` narrowing matcher to have a no-op narrowing matcher.
Traversal Matchers
~~~~~~~~~~~~~~~~~~
Traversal Matchers *also* can be thought of as adjectives - at least most of them. They also describe a specific node, but the difference from a narrowing matcher is that the scope of the description is broader than the individual node. A narrowing matcher says something about the node in isolation (e.g. the number of arguments it has) while a traversal matcher says something about the node's contents or place in the program.
Again, the `the documentation <https://clang.llvm.org/docs/LibASTMatchersReference.html#traversal-matchers>`_ is the best place to explore and understand these, but here is a simple example for the traversal matcher ``hasArraySize()``:
::
Given:
class MyClass { };
MyClass *p1 = new MyClass[10];
cxxNewExpr()
matches the expression 'new MyClass[10]'.
cxxNewExpr(hasArraySize(integerLiteral(equals(9))))
does not match anything
cxxNewExpr(hasArraySize(integerLiteral(equals(10))))
matches the expression 'new MyClass[10]'.
Example of Iterative Matcher Development
----------------------------------------
When developing matchers, it will be much easier if you do the following:
1. Write out the code you want to match. Write it out in as many different ways as you can. Examples: For some value in the code use a variable, a constant and a function that returns a value. Put the code you want to match inside of a function, inside of a conditional, inside of a function call, and inside of an inline function definition.
2. Write out the code you *don't* want to match, but looks like code you do. Write out benign function calls, benign assignments, etc.
3. Iterate on your matcher and treat it as _code_ you're writing. Indent it, copy it somewhere in case your browser crashes, even stick it in a tiny temporary version-controlled file.
As an example of the above, below is a sample iterative development process of a more complicated matcher.
**Goal**: Match function calls where one of the parameters is an assignment expression with an integer literal, but the function parameter has a default value in the function definition.
::
int add1(int a, int b) { return a + b; }
int add2(int c, int d = 8) { return c + d; }
int main() {
int x, y, z;
add1(x, y); // <- No match, no assignment
add1(3 + 4, y); // <- No match, no assignment
add1(z = x, y); // <- No match, assignment, but not an integer literal
add1(z = 2, y); // <- No match, assignment, integer literal, but function parameter lacks default value
add2(3, z = 2); // <- Match
}
Here is the iterative development process:
::
//-------------------------------------
// Step 1: Find all the function calls
m callExpr()
// Matches all calls, as expected.
//-------------------------------------
// Step 2: Start refining based on the arguments to the call
m callExpr(forEachArgumentWithParam()))
// Error: forEachArgumentWithParam expects two parameters
//-------------------------------------
// Step 3: Figure out the syntax to matching all the calls with this new operator
m callExpr(
forEachArgumentWithParam(
anything(),
anything()
)
)
// Matches all calls, as expected
//-------------------------------------
// Step 4: Find the calls with a binary operator of any kind
m callExpr(
forEachArgumentWithParam(
binaryOperator(),
anything()
)
)
// Does not match the first call, but matches the others
//-------------------------------------
// Step 5: Limit the binary operator to assignments
m callExpr(
forEachArgumentWithParam(
binaryOperator(isAssignmentOperator()),
anything()
)
)
// Now matches the final three calls
//-------------------------------------
// Step 6: Starting to refine matching the right-hand of the assignment
m callExpr(
forEachArgumentWithParam(
binaryOperator(
allOf(
isAssignmentOperator(),
hasRHS()
)),
anything()
)
)
// Error, hasRHS expects a parameter
//-------------------------------------
// Step 7:
m callExpr(
forEachArgumentWithParam(
binaryOperator(
allOf(
isAssignmentOperator(),
hasRHS(anything())
)),
anything()
)
)
// Okay, back to matching the final three calls
//-------------------------------------
// Step 8: Refine to just integer literals
m callExpr(
forEachArgumentWithParam(
binaryOperator(
allOf(
isAssignmentOperator(),
hasRHS(integerLiteral())
)),
anything()
)
)
// Now we match the final two calls
//-------------------------------------
// Step 9: Apply a restriction to the parameter definition
m callExpr(
forEachArgumentWithParam(
binaryOperator(
allOf(
isAssignmentOperator(),
hasRHS(integerLiteral())
)),
hasDefaultArgument()
)
)
// Now we match the final call