Coincidence Analysis for Immunization Decision-Making A configurational causal training portal

PART 1 · What is Coincidence Analysis?

Module 1 · Why does CNA exist?

The mystery of the 100 facilities

By the end of this module you will be able to

State why single-factor thinking fails in immunization programs.
Explain, with an example, how the same input can produce different outcomes.
Recognise the kind of problem CNA is designed to solve.

You supervise 100 health facilities. A handful keep missing children, quarter after quarter, no matter what you try. The question that keeps you up at night is simple: why does coverage stay low in these places?

Most of us answer that question by rounding up the usual suspect. "It's the training." Or "it's the fridge." We pick one lever, pull it, and wait for coverage to climb. Then it doesn't, and we're baffled, because on paper that facility had the training. The trouble is that low coverage rarely has one cause sitting alone in a room. It usually shows up when a few things go wrong at the same time, and a facility can survive one weak spot but not a particular combination of them.

Explain it like I'm ten

A cake flops for a reason, and "not enough flour" is only sometimes the reason. Maybe the flour was fine but the oven was cold. Maybe the oven was hot but someone forgot the sugar. If you ask "does flour cause a flopped cake?" the honest answer is "it depends what else went wrong in the bowl." Low coverage is a flopped cake. You have to look at the whole bowl.

Here are two facilities that sat through the exact same training. One does fine. One keeps falling short. Watch what's actually different.

Condition	Facility A	Facility B
Trained vaccinators	Yes	Yes
Functional cold chain	Yes	No
Monthly supportive supervision	Yes	Yes
Community mobilisation	Yes	Yes
Coverage stays low?	No	Yes

Same training, very different lives. The broken cold chain is what tipped Facility B into low coverage, but only because of the company it was keeping.

Both facilities had trained staff, so "more training" was never going to rescue Facility B. The training was fine. It just isn't worth much when the vaccines spoiled before they reached an arm. Diagnosing that correctly, instead of ordering another training, is exactly the kind of problem Coincidence Analysis was built to solve.

Reflect

Picture one facility in your area that keeps missing children.

Have you ever 'fixed' the obvious factor and seen no change? What else might have been going wrong at the same time?
If you listed everything that was in place there, would the real gap stand out?

Module 2 · What does CNA do?

A method for combinations, not lone culprits

By the end of this module you will be able to

Define Coincidence Analysis in one plain sentence.
Describe what a 'configuration' is.
List the program situations where CNA fits best.

Coincidence Analysis (CNA) is a configurational comparative method. In plain terms, it looks for configurations, which are just bundles of conditions that travel together with an outcome, instead of putting each factor on trial one at a time. It was developed by Michael Baumgartner, and it is custom-built for the messy, multi-outcome structures that real programs are made of. It is happiest when causes work in teams.

What makes CNA different

It can find more than one storyline

Most methods assume there is a single best predictor and go looking for it, like a detective convinced there is one culprit. CNA is comfortable with the idea that several different combinations might each lead to the outcome, and it can spot all of them in the same dataset. It can even follow causal chains, where one factor quietly hands off to another.

CNA earns its keep precisely when programs behave the way immunization programs actually behave:

Several things have to go wrong together before coverage really sinks.
Different combinations can drag coverage down in different settings.
Causal chains exist. Weak supervision leads to sloppy planning, which leads to missed children.
Implementation varies from facility to facility and district to district, so one-size answers rarely fit.

A careful word about "cause"

CNA finds structures consistent with your data

CNA does not, by itself, prove that one thing causes another. Strictly speaking, it identifies causal models or structures that are consistent with the data and the assumptions you bring to it. Throughout this course, when you see "pathway," read it as a pathway consistent with the data, not a proven mechanism. That habit of language keeps you honest, and keeps reviewers on your side.

Reflect

Think about how success is usually explained in your program.

Are explanations usually about one factor, or several working together?
Could two facilities be succeeding for completely different reasons?

Module 3 · The three core ideas

Conjunctivity, equifinality, and causal chains

By the end of this module you will be able to

Explain conjunctivity (AND) in plain language.
Recognise equifinality (multiple roads) in programme data.
Identify a causal chain, where one condition acts through another.

Everything CNA does rests on three ideas. Learn these three words and you have the heart of the method. Bonus: they make you sound very impressive in meetings.

1 · Conjunctivity, the "AND" idea

A condition might only matter when it is joined to others. A broken cold chain on its own may be survivable, but a broken cold chain and no outreach and no supervision, all in the same place, is how coverage quietly collapses. The conditions gang up.

Explain it like I'm ten

A door stays shut when the lock, the handle, and the hinge all fail together. Fix any one of them and you might still get through. The door isn't broken because of a single villain. It's broken because three small problems decided to show up on the same day.

2 · Equifinality, the "many roads" idea

Different combinations can lead to the same sad outcome. One facility ends up with low coverage because of a broken cold chain plus no supervision. Another, with a perfectly good fridge, still ends up low because of no community mobilisation plus no outreach funding. Same destination, different roads. That is called equifinality, and CNA is one of the few methods polite enough to report every road instead of insisting there's only one.

Equifinality: two different combinations, each enough to produce low coverage. CNA reports both roads instead of forcing you to crown one "main cause."

3 · Causal chains, the "knock-on" idea

One factor can act through another. Weak supervision doesn't reach out and lower coverage by itself. It leads to weaker session planning, and weaker planning is what loses the children. It works down a chain, like a rumour:

A causal chain. CNA can detect that supervision acts through planning, which is something single-outcome methods simply cannot show you.

Try it in your head

Think of a stubborn problem in your program, say a quarter of repeated stockouts. Name two different combinations of conditions that could each cause it. If you managed two, congratulations, you just spotted equifinality, and it means CNA is a good fit for your question.

Reflect

Think about a district you know well.

What combination of factors might explain its performance, rather than any single one?
Could a different district be reaching the same result through a different combination?
Can you spot a chain, where one weak link quietly causes the next?

PART 2 · Why not just use regression?

Module 4 · Why we need something different

"Can't I just run a regression?"

By the end of this module you will be able to

Explain, in plain language, the question regression is built to answer.
Explain why that question misses how immunization programs actually work.
Say when CNA is the better tool, and when it is not.

It is a fair question, and one every analyst should ask. Logistic regression is an excellent, well-tested tool, and if your question is "which single factor moves the outcome most, holding the others constant," you should reach for it, not CNA. The two methods are not rivals so much as instruments tuned to different questions.

The catch is that regression, by design, estimates the average independent contribution of each factor. It answers "how much does supervision matter on its own?" But immunization coverage rarely fails because of one factor acting on its own. It fails when a broken cold chain meets missing supervision in the same place, at the same time. That is a question about combinations, and it is a different question than the one regression is built to answer.

The question you're asking	Regression	CNA
Which single factor matters most?	✓	–
Which combinations of conditions matter?	–	✓
Are there multiple pathways to the same result?	Limited	✓
Do conditions form causal chains?	Limited	✓

Different tools for different questions. CNA does not replace regression; it answers a question regression was never designed to ask.

The one-line version

Regression ranks ingredients. CNA reads recipes.

Regression is brilliant at telling you which ingredient, on average, has the biggest effect. CNA is built to tell you which combinations of ingredients produce the dish, including the fact that there may be more than one recipe that works. When a program fails because several things went wrong together, you need a method that thinks in recipes.

This is also why CNA does not report coefficients, p-values, or "the effect of supervision." Those numbers answer the ranking question. CNA answers the recipe question, and it speaks in a different language: combinations, pathways, and the conditions an outcome depends on. Keeping the two questions straight now will save you a great deal of confusion later.

Reflect

Think about a district you know that struggles with coverage.

If you had to name the one factor behind its struggle, could you? Or does it really take two or three things together to explain it?
Might a neighbouring district be struggling for entirely different reasons, a different recipe for the same poor result?

PART 3 · Where does CNA apply?

Module 5 · Which research questions fit CNA?

Asking a question CNA can actually answer

By the end of this module you will be able to

Write a research question in the shape CNA can answer.
Tell a CNA question apart from a regression question.
Spot when a question is about combinations rather than single effects.

CNA does not answer "by how much does X move coverage." It answers a sharper, more useful question: "which conditions, alone or in combination, make the difference to the outcome." In the words of the method's own literature, CNA hunts for difference-making conditions, the necessary and/or sufficient combinations that an outcome actually depends on. Good CNA questions are always shaped that way, and the best ones point straight at a problem you can fix.

Here are five questions immunization teams genuinely ask, written the way CNA likes them:

Topic	A CNA-shaped question
Coverage	Which combinations of health-system gaps make the difference to low DTP3 coverage?
Zero-dose	Which conditions, alone or together, are linked to a high share of zero-dose children?
Stockouts	Which combinations of conditions are necessary or sufficient for repeated vaccine stockouts?
Supervision	Which weak supervisory practices, in combination, make the difference to poor session quality?
Pathways	Through which chain of conditions does weak supervision lead, step by step, to missed children?

The first four ask which difference-making conditions sit behind an outcome. The fifth is special: it asks about a pathway, a chain where one condition feeds the next. CNA is one of the few methods that can model these multi-step, multi-outcome structures, which is exactly why it suits messy program systems.

The litmus test

Can you phrase it as "which conditions make the difference?"

If your question can be rewritten to ask which conditions, alone or in combination, make the difference to an outcome, it is a CNA question. If it asks for a single number, a ranking, or an average effect, it belongs to a different method, and that is perfectly fine.

The phrase "difference-making conditions" runs throughout the applied CNA literature. See, for example, studies framed explicitly around difference-making roles and pathways in the references at the end of this course.

Reflect

Take a question your team is currently asking.

Is it really 'which factor matters most', or 'which combinations make the difference'?
How would you rewrite it so CNA could answer it?

Module 6 · What counts as a "case"?

Every row is a "case"

By the end of this module you will be able to

Define what a 'case' is in a CNA study.
Choose an appropriate unit of analysis for a question.
Explain why mixing levels breaks the comparison.

In CNA, the thing you compare is called a case, and every case becomes one row in your data table. In immunization work, a case can sit at almost any level of the system:

A health facility, which is the most common choice for supervision studies.
An LGA, district, or state.
An outreach site, or even a whole country.

Keep the level consistent

Please don't mix facilities and districts in the same table. It's like comparing a single market stall to an entire shopping mall and wondering why the numbers look strange. Every case must sit at the same level, so pick one unit and stay loyal to it for the whole analysis.

Reflect

Think about the data your program already collects.

What would each row, each case, represent: a facility, an LGA, a district?
Are you ever tempted to mix levels in one table? What confusion could that cause?

PART 4 · How do you design a study?

Module 7 · How do you plan a CNA study?

Three decisions before you touch the data

By the end of this module you will be able to

Define a clear, measurable outcome with a threshold.
Draw up a disciplined list of candidate conditions.
Write down a causal theory before analysing.

A CNA study is mostly won or lost at the planning table, long before R enters the picture. Three decisions matter most.

Define the outcome

State exactly what you are trying to explain, and how you'll recognise it when you see it. For example: "Low routine immunization coverage" means DTP3 below 50%. The threshold is part of the definition, not an afterthought.

Identify plausible conditions

List the factors that program experience says could matter: no trained staff, broken refrigerator, no supervision, no community mobilisation, no outreach funding, thin staffing. Keep the list disciplined. Every condition should have a real reason to be there, not just a vibe.

Write down a causal theory

Before you analyse anything, commit to what you expect. For example: low coverage may come from a broken cold chain plus no supervision, OR from no community engagement plus no outreach. Writing it down ahead of time keeps you honest when the results land.

Reflect

Sketch a quick study for a problem you care about.

What is your outcome, and exactly where is the cut-off between high and low?
Which five or six conditions genuinely belong on your list, and why each one?

Module 8 · How do you build a conceptual framework?

Draw your assumptions before the data argues back

By the end of this module you will be able to

Explain what a conceptual framework is and why to draw one first.
Translate a program belief into a simple causal diagram.
Use the framework as a yardstick for later results.

A conceptual framework is just a picture of what you think causes what. Drawing it forces you to be honest about your assumptions, and it gives you something to hold your CNA results up against later. Here are two chains an immunization team might sketch for low coverage:

Two candidate chains for low coverage. CNA will later tell you which links the data actually support, and it may surprise you, which is half the fun.

Reflect

Draw your own causal picture for one outcome.

Which arrows are you most confident about, and which are guesses?
If the data later contradicts an arrow, would you trust the data or your prior belief?

PART 5 · How do you prepare the data?

Module 9 · What must the data look like?

Three kinds of data CNA accepts

By the end of this module you will be able to

Tell crisp-set, multi-value, and fuzzy-set data apart.
Decide which calibration a condition needs.
Explain why beginners should usually start crisp.

CNA reads simple tables. The only real craft is "calibrating" each condition, which is a fancy word for turning messy reality into clean, comparable values. There are three flavours, and the cna package happily handles all of them.

Crisp-set, the easiest, plain yes or no

Every condition is either present (1) or absent (0). "Was the cold chain broken?" becomes 1 or 0. This is where beginners should start, and it is what we use throughout this course, because life is hard enough already.

Facility	Broken cold chain	Meaning
HF1	1	Cold chain was broken
HF2	0	Cold chain was working

Multi-value, for more than two levels

When a condition has natural steps, say supervision quality runs 0 = poor, 1 = moderate, 2 = good, you can use multi-value coding. CNA reads it fine, as long as the levels are clearly defined and you don't make them up on the spot.

Fuzzy-set, for degrees of membership

Sometimes "present" is a matter of degree. A value of 0.85 means "strongly, but not perfectly, in the set of low-coverage facilities." Fuzzy values run between 0 and 1. It's powerful, but save it for the day crisp-set feels boringly easy.

Rule of thumb

Start crisp. Graduate later.

Almost every immunization supervision question can be answered well with clean yes or no coding. Reach for multi-value or fuzzy only when a condition genuinely loses its meaning by being squashed into two boxes.

Reflect

Look at one indicator you collect.

Is it naturally yes/no, stepped, or a matter of degree?
What would be lost or gained by squashing it into a simple 0/1?

Module 10 · The supportive-supervision dataset

The dataset we'll carry through the course

By the end of this module you will be able to

Read a crisp-set supervision dataset row by row.
Spot, by eye, which conditions separate the outcomes.
Describe how each facility becomes one case.

Picture your supervision teams visiting six facilities and recording, for each one, whether four things were missing or broken, plus whether coverage was low. Everything is crisp-set: 1 means yes, 0 means no. Because we're in problem-solving mode, our conditions are framed as gaps, and the outcome is LowCov, where 1 means coverage fell below target.

Facility	NoTrain	NoCold	NoSuper	NoComm	LowCov
HF1	0	1	0	0	1
HF2	0	0	1	0	1
HF3	1	1	0	1	1
HF4	0	0	0	1	0
HF5	1	0	0	0	0
HF6	0	1	1	1	1

NoTrain = vaccinators untrained · NoCold = cold chain broken · NoSuper = no monthly supervision · NoComm = no community mobilisation. We build the real R model on this exact table in Part 6.

Read the table like a detective

Look at every facility where coverage was low. In each one, either the cold chain was broken (NoCold) or supervision was missing (NoSuper), and sometimes both. Now look at the two facilities that were fine: neither problem was present. Notice that NoTrain and NoComm show up on both sides, so they're red herrings. Hold that thought, because CNA is about to back up your detective work.

Reflect

Look only at the facilities where coverage was low.

What do they share that the others don't?
Which conditions appear on both sides, and so probably don't matter?

Module 11 · How do you check data quality?

Clean before you compute

By the end of this module you will be able to

List the data-quality checks to run before any analysis.
Recognise a logical contradiction in coded data.
Explain why clean data is part of the analysis, not a chore before it.

CNA is unforgiving of dirty data, because it reads every row as a real case and takes each one seriously. Before you analyse anything, walk this checklist:

Missing values. CNA needs complete rows, so decide how to handle gaps before you start, not after.
Duplicates. The same facility entered twice quietly bends the patterns without telling you.
Coding errors. A stray 2 in a yes-or-no column will be read literally and believed without question.
Logical contradictions. "Cold chain broken = 1" alongside "Refrigerator functional = Yes" cannot both be true, so go investigate.

Garbage in, confident garbage out

CNA will cheerfully find a crisp, beautiful pattern in mislabelled data and present it with total confidence, like a witness who is very sure and completely wrong. The method cannot tell a typo from the truth. Cleaning is not optional housekeeping. It is part of the analysis.

Reflect

Think about your last messy dataset.

Where do errors usually creep in: missing rows, duplicates, miscoding?
What single check would have caught the most problems?

PART 6 · How do you read a CNA result?

Module 12 · Reading CNA notation

Don't panic at the algebra

By the end of this module you will be able to

Read a CNA formula out loud as an ordinary sentence.
Translate the symbols * + and the arrow into plain words.
Turn any solution formula into something a colleague would understand.

Most people meet their first CNA result and feel their stomach drop. The screen shows something like this, and it looks like algebra homework you thought you had escaped:

a CNA solution

A*B + C*D ↔ Y

Take a breath. There is nothing to fear here, because a CNA formula is just a sentence wearing a lab coat. Once you know the three words it is built from, you can read any result in the course, and any result your own data ever produces.

The whole vocabulary, in three words

Every CNA formula is made of three little symbols. That is the entire language. Learn these and you are fluent.

Symbol	Say it as	It means
*	"and"	these conditions occur together, in the same case
+	"or"	this is a separate route to the same outcome
↔	"goes with"	the left side tracks the outcome on the right

So that scary formula A*B + C*D ↔ Y simply says: "When you have A and B together, or C and D together, you tend to find Y." That's it. A sentence about two recipes for the same result.

A note for the precise: the double arrow ↔ is technically an equivalence. It says the left side is both sufficient for the outcome (where you see the combination, you see Y) and, taken as a whole, necessary for it (where you see Y, you find one of these combinations). "Goes with" is a faithful shorthand for that, and you will meet the full story in the model-evaluation modules.

Read it like a sentence

Here is the trick that makes everything click. Swap the letters for the real conditions, read left to right, and say the symbols out loud as their plain words. Watch a real immunization formula turn into an ordinary observation:

read this out loud

Train * Cold * Super ↔ Coverage

The same formula, in plain English

Trained staff AND a working cold chain AND supervision, goes with coverage

Read fully: facilities with trained staff, a functioning cold chain, and supportive supervision consistently achieved high coverage. Notice you didn't lose anything by saying it in words. You gained a sentence your director can actually act on. The formula and the sentence are the same statement, just dressed differently.

Explain it like I'm ten

A recipe card reads "flour + butter + sugar." Nobody panics at that. CNA notation is a recipe card for outcomes: the * joins the ingredients that have to go in together, the + separates one recipe from a different recipe that also works, and the arrow points at the cake. You have been reading recipes your whole life. This is the same skill.

One more, with two pathways

When a formula has a + in it, that is equifinality showing up in the algebra: two different roads to the same place. Read each side as its own little sentence, joined by "or":

two pathways

NoCold*NoSuper + NoComm*NoOutreach ↔ LowCov

Out loud: "Low coverage goes with a broken cold chain and missing supervision, or with no community mobilisation and no outreach." Two recipes for the same disappointing result. A district could land in low coverage by either route, which is exactly why you need to know which route a given place is on before you spend a naira fixing it.

A small habit that prevents big mistakes

Lower-case letters mean absence. Cold means the cold chain is working; cold (or a name like NoCold) means it is broken. When you read a formula aloud, always check the case, because "supervision is present" and "supervision is absent" are opposite stories with opposite fixes.

Reflect

Take the very first formula in this module, A*B + C*D ↔ Y.

Can you now say it out loud as a sentence, without looking at the table?
Swap in conditions from your own program. Does the sentence describe something you have actually seen in the field?

PART 7 · How do you learn R from zero?

Module 13 · How do you install R and RStudio?

Two free programs, fifteen minutes

By the end of this module you will be able to

Install R and RStudio with confidence.
Tell the difference between the two and why you need both.
Open the console and know where to type.

R is the calculator that runs CNA. RStudio is the comfortable room you run it in. You install both once, they cost nothing, and nobody will quiz you on how they work under the hood.

Install R, the engine

Go to the official R project download page, cran.r-project.org, and download the version for your operating system. Run the installer and accept the defaults, which are sensible.

Install RStudio, the workspace

Then get RStudio Desktop from posit.co. It's the friendly window where you'll type everything. Install R first, because RStudio goes looking for it and gets sad if it isn't there.

Open RStudio

The big panel on the left is the Console. That's where you type commands and press Enter. That's it. You're ready.

Explain it like I'm ten

R is the car engine. RStudio is the dashboard, the steering wheel, and the comfy seat. You could drive with just an engine bolted to a metal frame, but nobody actually wants to. Install both.

Reflect

Before moving on, get set up.

Did R install before RStudio? Does RStudio find it?
Can you find the console panel where commands go?

Module 14 · What are your first R commands?

Talk to R like a calculator

By the end of this module you will be able to

Use R as a simple calculator.
Store a value in a variable with the arrow.
View what a variable contains.

Type this into the Console and press Enter. R answers right away, no small talk.

Console

2 + 2

Output[1] 4

That [1] simply means "this is the first, and in this case only, answer." Now let's store a number in a variable using the arrow <-, which is R's slightly dramatic way of saying "put this into."

Console

# put the number 42 into a box named "coverage"
coverage <- 42

# now look inside the box
coverage

Output[1] 42

Explain it like I'm ten

coverage <- 42 means "label a box coverage and drop 42 inside." Whenever you write coverage later, R opens the box and shows you what's in it. That's all a variable is, a labelled box. No magic, no exam.

Reflect

Try a couple of commands of your own.

What happens if you store your district's coverage in a variable and call it back?
Does the idea of a 'labelled box' make variables feel less mysterious?

Module 15 · How do you install the CNA package?

Adding CNA's tools to R

By the end of this module you will be able to

Install the cna package once.
Load it at the start of every session.
Explain the install-once, library-every-time pattern.

R doesn't know how to do Coincidence Analysis straight out of the box. You teach it by installing the cna package, a free add-on published on CRAN, the official R software library. You install it once, then load it at the start of every session.

Console, run once, ever

# download and install the cna package (needs internet, do this once)
install.packages("cna")

Console, run every new session

# switch the tools on for this session
library(cna)

The pattern to memorise

Install once. Library every time.

install.packages() is like buying a tool and putting it in the cupboard. You do it once. library() is taking the tool back out of the cupboard, which you do at the start of every session. Forgetting library(cna) is the single most common beginner stumble, and now it won't be yours.

Reflect

Lock in the habit now.

Why does install.packages() happen once but library(cna) every time?
What is the most common reason a CNA script fails on the first line?

PART 8 · How do you build your first CNA model?

Module 16 · How do you create the dataset?

Type the supervision table into R

By the end of this module you will be able to

Build a data.frame from typed values.
Map each column to one condition.
Print a dataset to check it.

Let's recreate the Module 9 table as a data.frame, which is just R's name for a spreadsheet. Each c(...) is one column, listing the six facilities' values from top to bottom. Type carefully; R is a stickler for commas.

first_model.R

library(cna)

# build the supportive-supervision dataset (6 facilities)
# conditions are GAPS; outcome LowCov = 1 means coverage fell below target
immunization <- data.frame(
  NoTrain = c(0, 0, 1, 0, 1, 0),
  NoCold  = c(1, 0, 1, 0, 0, 1),
  NoSuper = c(0, 1, 0, 0, 0, 1),
  NoComm  = c(0, 0, 1, 1, 0, 1),
  LowCov  = c(1, 1, 1, 0, 0, 1)
)

# look at it
immunization

Output NoTrain NoCold NoSuper NoComm LowCov
1 0 1 0 0 1
2 0 0 1 0 1
3 1 1 0 1 1
4 0 0 0 1 0
5 1 0 0 0 0
6 0 1 1 1 1

Six rows, five columns. R now holds your supervision data exactly as you designed it, missing children and all.

Reflect

Recreate the table yourself.

Did every column get the right number of values, one per facility?
What would happen if one c(...) had a value too few?

Module 17 · How do you run CNA?

One function does the heavy lifting

By the end of this module you will be able to

Run cna() on a dataset.
Set the outcome argument correctly.
Explain what 'capital letter means present' is about.

The whole method is wrapped in a single function: cna(). You hand it your data and tell it which column is the outcome. It quietly searches for the minimally sufficient and necessary combinations, then assembles them into solution formulas while you sip your tea.

first_model.R

# ask: which combinations of gaps go together with LowCov = 1 ?
model <- cna(immunization, outcome = "LowCov")

One small but important detail

The `outcome` argument names a factor value

Writing outcome = "LowCov" tells CNA to model the presence of low coverage, the value 1. In crisp-set data, a capitalised name means "this factor = 1." That convention runs through the whole package: a capital letter means present, and lower-case means absent.

Running this stores the result in model but prints nothing yet, which feels anticlimactic. Don't worry. Viewing it is the very next module.

Reflect

Run the model on your own version of the data.

Did you name the outcome exactly as the column is spelled?
What is CNA quietly searching for while it runs?

Module 18 · How do you read the result?

Your first solution formula

By the end of this module you will be able to

Print and read a CNA solution formula.
Translate a formula into a plain-language sentence.
Connect the formula back to the raw data rows.

Type the model's name to print it. CNA reports its solution as a formula. For our supervision data, the pattern that perfectly separates the low-coverage facilities from the healthy ones is this:

first_model.R

model

Atomic solution formula (read the condition row)condition: NoCold + NoSuper <-> LowCov
consistency: 1.000 coverage: 1.000

Read it out loud

NoCold + NoSuper <-> LowCov

"Coverage is low wherever the cold chain was broken OR supervision was missing, and nowhere else." The + means OR, so this is two roads to the same sad place. The <-> means the combination tracks the outcome exactly.

Look back at Module 9. Every low-coverage facility had a broken cold chain or no supervision, and the two healthy facilities had neither. Notice what's missing from the formula: NoTrain and NoComm never made the cut, because they showed up on both the good and the bad side. CNA quietly ignored the red herrings and kept only what actually makes the difference, and it confirmed the pattern holds for every single case, with consistency and coverage both at 1.000. We unpack those two numbers in Part 8.

Reflect

Read your result out loud.

Can you say the formula as one sentence a colleague would understand?
Do the facilities in the data actually match what the formula claims?

Module 19 · What do the logic symbols mean?

The five symbols CNA speaks in

By the end of this module you will be able to

Read the five core CNA symbols.
Tell AND (*) from OR (+) in a formula.
Recognise that lower-case means absence.

Every CNA formula is built from a tiny vocabulary. Learn these five and you can read any solution the package ever hands you.

Symbol	Means	In plain words
*	AND	both conditions must be present together
+	OR	either road will do on its own
<->	equivalence	the left side tracks the outcome exactly
->	sufficiency	the left side is enough for the right
a	absence	lower-case means that condition is absent (0)

So a formula like NoCold*NoComm + NoSuper would read: "(broken cold chain AND no mobilisation) OR (no supervision)." That single line captures both conjunctivity, the *, and equifinality, the +, at the same time. Two ideas, one tidy line.

Reflect

Practise on a formula of your own.

Can you point to the conjunctivity (*) and the equifinality (+) in it?
What would the same formula mean if one term were lower-case?

PART 9 · How does CNA work on real problems?

Module 20 · What drives low DTP3 coverage?

A conjunctural road to low DTP3

By the end of this module you will be able to

Frame a low-DTP3 question for CNA.
Read a single-pathway (conjunctural) solution.
Explain why no one condition in the bundle is enough alone.

Suppose you study low DTP3 coverage with four gap conditions: untrained staff, recent stockout, broken cold chain, and no supervision. A common CNA result is a single bundle, where low coverage needs several gaps to gang up at once.

dtp3.R

library(cna)

# columns: NoTrain, Stockout, NoCold, NoSuper, LowDTP3 (all 0/1)
m_dtp3 <- cna(dtp3_data, outcome = "LowDTP3")
m_dtp3

Illustrative solutioncondition: Stockout*NoCold*NoSuper <-> LowDTP3

Interpretation: DTP3 collapses where a stockout, a broken cold chain, and absent supervision all land in the same place. No single gap is the villain. This is conjunctivity, the gangs-up idea, showing its face in a real coverage indicator. It also tells you that fixing only one of the three may not be enough on its own.

Reflect

Apply it to your own DTP3 numbers.

Which gaps would you put on the list of candidate conditions?
If the result is a bundle, what does that imply for what you fund?

Module 21 · What drives zero-dose children?

Two roads to too many zero-dose children

By the end of this module you will be able to

Frame a zero-dose question for CNA.
Read a two-pathway (equifinal) solution joined by OR.
Explain what multiple pathways mean for program design.

Now an equifinality example. Conditions, all framed as gaps: no outreach, no community volunteers, no settlement mapping, no outreach funding. Outcome: HighZeroDose, meaning too many children never got a single dose. CNA can report two separate roads joined by OR.

zerodose.R

m_zd <- cna(zerodose_data, outcome = "HighZeroDose")
m_zd

Illustrative solutioncondition: NoOutreach*NoFunding + NoVolunteers*NoMapping <-> HighZeroDose

Interpretation: a district can end up with too many zero-dose children either because outreach was never funded, or because it lacked active volunteers and had no settlement map to work from. The + is the gift here. It tells you there are two different problems wearing the same uniform, so the fix in one district may not be the fix in the next.

Reflect

Think about zero-dose children in two different areas.

Could each area be 'zero-dose' for different reasons?
How would two pathways change what you'd recommend in each place?

Module 22 · What weakens supervision quality?

Which gaps go with poor-quality sessions?

By the end of this module you will be able to

Design a supervision-quality study with CNA.
Choose supervisory practices as conditions.
Interpret which combination of practices matters.

This is the study supervision teams care about most, because it's about them. Research question: which combinations of weak supervisory practices go together with poor-quality immunization sessions? Conditions, as gaps: no checklist used, no action plan developed, no feedback given, no follow-up visit. Outcome: PoorQuality.

supervision_quality.R

library(cna)

# NoChecklist, NoActionPlan, NoFeedback, NoFollowUp, PoorQuality (0/1)
m_sup <- cna(supervision_data,
             outcome = "PoorQuality")

m_sup

Illustrative solutioncondition: NoChecklist*NoFeedback + NoFollowUp <-> PoorQuality

Interpretation: sessions turn out poor either when the supervisor skips both the checklist and the on-the-spot feedback, or whenever there's simply no follow-up visit at all. A visit that ticks a checklist but never comes back is still a visit that loses quality. CNA tells you which combination of missing practices to fix, not just which single box to scold someone about.

Make it yours

Swap in the four practices your own supervision checklist tracks. The columns change, but the cna() call stays exactly the same. That reusability is the whole point, and your future self will thank you.

Reflect

Look at your own supervision checklist.

Which four practices would you test as conditions?
Do you suspect a single practice matters, or a combination?

PART 10 · How do you evaluate a model?

Module 23 · What is consistency?

Consistency: does the pattern keep its promise?

By the end of this module you will be able to

Define consistency in plain language.
Read a consistency score.
Set the con threshold when data is noisy.

CNA gives every solution two scores. The first is consistency, and it answers a simple question: when this combination of gaps is present, does low coverage actually show up too?

Definition

Consistency is "how often the pattern holds"

A consistency of 1.00 means every facility with the combination also had the outcome, so the pattern never let you down. A consistency of 0.80 means it held 80% of the time. Higher is better. CNA's default threshold is 1, and you lower it when real-world data gets noisy, which it always does.

Explain it like I'm ten

Consistency is "every time I leave the milk out of the fridge, does it go off?" If it goes off every single time, that's perfect consistency. If it sometimes survives, the consistency drops, and you start to wonder what else is going on.

One honest caveat the experts insist on

Consistency can read deceptively high when the outcome itself is very common in your data. If, say, 90% of your facilities already have low coverage, almost any combination will look "consistent" with it, because the outcome is everywhere. So always read a consistency score next to how common the outcome is, and lean on the robustness checks in the next part rather than trusting a single high number. Recent work by De Souter and Baumgartner (2025) adds prevalence-adjusted measures precisely to handle this.

In code, you set the bar with the con argument:

thresholds.R

# accept patterns that hold at least 90% of the time
cna(immunization, outcome = "LowCov", con = 0.9)

Reflect

Think about reliability.

If a recipe works 8 times in 10, is that good enough to act on in your setting?
Where would you set the bar, and what would you risk by lowering it?

Module 24 · What is coverage (the score)?

Coverage: how much of the outcome does the pattern explain?

By the end of this module you will be able to

Define coverage as a model-fit measure.
Keep CNA-coverage and vaccination-coverage straight.
Set the cov threshold sensibly.

The second score is coverage, an evaluation measure that you should not confuse with vaccination coverage. It answers: of all the facilities that had the outcome, what share does this pattern account for?

Definition

Coverage is "how much of the problem this pattern captures"

If 90 of 100 low-coverage facilities fit your pattern, the coverage score is 0.90. A lower score isn't automatically bad. It often just means other roads to the same problem exist, which is equifinality saying hello. You set the threshold with the cov argument, and its default is also 1.

Two meanings of one word

"Coverage" in CNA output is a model-fit statistic. "Coverage" in your program is the share of children vaccinated. Same word, two completely different jobs, which is mildly unfair of the universe. In reports, say "the pattern's coverage score" versus "DTP3 coverage" so nobody gets lost.

thresholds.R

# require patterns that are both reliable AND broadly explanatory
cna(immunization, outcome = "LowCov",
    con = 0.9, cov = 0.9)

Reflect

Mind the two meanings of 'coverage'.

In your own writing, how will you avoid confusing the two?
If a pathway has low coverage-score, what might that tell you about other pathways?

Module 25 · How do you compare and inspect models?

Choosing well, and looking under the hood

By the end of this module you will be able to

Weigh models on simplicity, consistency, coverage, and plausibility.
Use condTbl() and condition() to inspect a formula.
Resist choosing a model on numbers alone.

When CNA hands you several candidate models, weigh them on four criteria, three statistical and one stubbornly human:

Simplicity. Fewer conditions are easier to explain and easier to act on.
Consistency. How reliably the pattern holds.
Coverage. How much of the outcome it accounts for.
Program plausibility. Does it actually make public-health sense?

To inspect any formula's scores directly, use condTbl(), which builds a tidy summary table of consistency and coverage for whatever conditions you pass it. To probe a formula facility by facility, use condition().

inspect.R

library(cna)

# score a specific formula against the data
condTbl("NoCold + NoSuper <-> LowCov", immunization)

# see exactly which facilities each road covers
condition("NoCold + NoSuper", immunization)

condTbl output (summary)outcome consistency coverage
LowCov 1.000 1.000

The golden rule of model choice

Never pick a model just because the numbers are the prettiest. A statistically flawless pattern that no program officer can explain or act on is, in practical terms, useless. The model has to make public-health sense first, and impress the statistician second.

Worth knowing as you grow

Consistency and coverage are the defaults, not the only options

This course uses standard consistency and coverage throughout, because they are the defaults and the easiest to learn. From version 4.0.0 onward, the cna package also offers several additional sufficiency and necessity measures (including prevalence-adjusted ones) through a measures argument in cna(). You do not need them to complete this course, but once you are comfortable, they are the natural next step for handling tricky or imbalanced data.

Reflect

Imagine two models with similar scores.

Which would you trust: the tidier one, or the one that makes more program sense?
What would make a statistically strong model still unusable in practice?

PART 11 · How robust is your finding?

Module 26 · How do you assess robustness?

Would the finding survive a small nudge?

By the end of this module you will be able to

Explain what robustness means for a CNA finding.
Run frscored_cna() across a range of thresholds.
Judge whether a finding is steady enough to brief.

A result you plan to act on should not fall apart the moment the data or the thresholds wobble a little. Robustness asks exactly that. The CNA ecosystem has a dedicated companion package for it called frscore, which scores how "fit-robust" your models are across a whole range of consistency and coverage thresholds, so you don't have to do it by hand and lose an afternoon.

robustness.R

# install once
install.packages("frscore")

library(frscore)

# re-analyse across many thresholds and score robustness in one call
fr <- frscored_cna(immunization,
                   fit.range = c(1, 0.7),
                   granularity = 0.1)
fr

What the score means

Higher fit-robustness means a steadier finding

frscored_cna() runs CNA again and again across a grid of thresholds, here from 1.0 down to 0.7 in steps of 0.1, and rewards the models that keep showing up. A pattern that reappears no matter how you turn the dials is one you can trust enough to put in front of a director without sweating.

Reflect

Think about confidence.

Would your finding survive a small change in the data or thresholds?
How robust does a result need to be before you'd put it in front of a director?

Module 27 · How do you run a sensitivity analysis?

Turn the dials and watch what happens

By the end of this module you will be able to

Run a quick sensitivity check by hand.
Compare findings across two thresholds.
Report fragility honestly.

Even without a special package, you can run a quick sensitivity check by hand. Run CNA at a couple of different thresholds and see whether the story stays the same or suddenly changes its tune.

sensitivity.R

# strict
cna(immunization, outcome = "LowCov", con = 0.9, cov = 0.9)

# relaxed
cna(immunization, outcome = "LowCov", con = 0.85, cov = 0.85)

If the same pattern survives both runs, your confidence quietly goes up. If a tiny change rewrites the whole solution, treat the finding as fragile and say so plainly in your report. Honesty here saves embarrassment later.

Reflect

Try turning the dials.

Does your pathway survive both a strict and a relaxed threshold?
If it changes, how would you describe that honestly in a report?

PART 12 · How do you interpret the results?

Module 28 · From Formula to Program Language

Translate the symbols back into the field

By the end of this module you will be able to

Translate a formula into program language.
Write a finding a non-analyst can act on.
Avoid leaving results as code fragments.

A formula is the start of the work, not the end. Nobody ever approved a budget because a slide said NoCold + NoSuper. Decision-makers need plain language, so compare these two ways of saying the exact same thing:

Please don't write this	Write this instead
"NoCold + NoSuper explains LowCov."	"Coverage fell below target wherever the cold chain was broken or monthly supervision was missing. Fix either gap and the outcome changes."

Same finding, two very different audiences. The first is a code fragment only you understand. The second is something a state immunization officer can read once, repeat in a meeting, and act on by Friday. Always carry the result that last mile into program language.

Reflect

Rewrite one finding for a real audience.

Would a state immunization officer understand your sentence on first read?
Have you said what to do, not just what the symbols are?

Module 29 · Avoiding Common Mistakes

Say what CNA actually found, and not one inch more

By the end of this module you will be able to

Avoid over-claiming causation from one study.
Use careful language like 'consistent with the data'.
Report the whole bundle, not one cherry-picked condition.

The most common interpretation error is over-claiming causation from a single configurational study. CNA identifies dependency structures consistent with a causal reading. It does not, on its own, prove that one thing causes another. Keep your language honest:

Over-claiming	Honest phrasing
"A broken cold chain causes low coverage."	"A broken cold chain appears as a difference-making condition consistently associated with low coverage in this data."

One factor is never the whole story

Because conjunctivity is the whole premise of CNA, yanking one condition out of a bundle and crowning it "the cause" quietly contradicts the method you just spent ten modules learning. Report the bundle, not the brick.

Reflect

Check your own phrasing.

Are you saying 'X causes Y', when 'X is part of a configuration linked to Y' is more honest?
Have you pulled one brick out of a bundle and called it the cause?

PART 13 · How do you write it up?

Module 30 · The Results Section Template

A reusable skeleton for the results section

By the end of this module you will be able to

Follow the nine-part results template.
Make sure each section answers a reviewer's question.
State limitations plainly.

Every CNA write-up moves through the same nine beats. Copy this order and you'll never leave out something a reviewer asks for.

Research question

Stated as a combinations question.

Dataset description

Cases, level, number of facilities, time period.

Conditions & outcome

Each one named, defined, and calibrated (the threshold for "1").

CNA procedure

Package version, con/cov thresholds, any orderings.

Identified pathways

The solution formula(s), in symbols and in plain words.

Consistency & coverage

The fit scores for each pathway.

Robustness

What survived threshold changes / frscore results.

Interpretation

Program-language meaning of each pathway.

Limitations

Case count, data quality, what CNA can and can't claim.

Reflect

Audit a write-up against the template.

Which of the nine sections do people most often skip?
Where would a reviewer push back hardest on your draft?

Module 31 · The Policy Brief

Three boxes a decision-maker reads in two minutes

By the end of this module you will be able to

Structure a policy brief: problem, finding, recommendation.
Strip an analysis to what a director needs.
Tie the recommendation directly to the finding.

A policy brief strips the analysis down to what a busy director actually needs: the problem, the finding, and the recommendation. No methods section, no R output, no suspense. Here is our supervision study written up as a brief.

Problem

Routine immunization coverage keeps falling below target across the LGA, and one-off investments in single interventions have not closed the gap.

Finding

Coverage stayed low in exactly the facilities where the cold chain was broken or monthly supportive supervision was missing. Where both were in place, coverage held. Training and community mobilisation, notably, did not separate the strugglers from the rest.

Recommendation

Target the two difference-makers directly. Repair and maintain cold chain equipment, and guarantee monthly supervision at every facility. Spreading the same budget thinly across interventions that do not move the outcome predicts more of the same low coverage.

Reflect

Draft a three-box brief for a real problem.

Can a busy director grasp it in two minutes?
Does your recommendation follow logically from the finding, with nothing extra?

Module 32 · The Executive Summary

One finding, one instruction

By the end of this module you will be able to

Compress a study to one finding and one action.
Write an executive summary that stands alone.
Lead with what matters most.

The executive summary is the whole brief boiled down to one sentence of finding and one sentence of action. If a director reads nothing else, they read this.

Finding

Low coverage traced back to one of two fixable gaps every time: a broken cold chain or missing supervision.

Recommendation

Close both gaps, facility by facility. Resist the urge to spread the budget across activities that this data shows do not move the needle.

Reflect

Boil your study down.

If a director read only your summary, would they know what to do?
Is the single most important point truly at the top?

PART 14 · Can you run a full study?

Capstone · End-to-End Project

Run a full study, start to finish

You supervise 50 facilities, and a stubborn share of them keep posting low coverage. Outcome: LowCov. Conditions: untrained staff, no supervision, broken cold chain, no outreach, no community mobilisation. Here is the entire workflow as one connected script, every step you have learned, in order.

capstone.R, the whole study

library(cna)
library(frscore)

# 1 . IMPORT: read your facility data from a CSV file
data <- read.csv("facility_data.csv")

# 2 . INSPECT: sanity-check structure and values before analysing
str(data)       # column types and a peek at values
summary(data)   # min/max catches stray codes (e.g. a 2 in a 0/1 column)

# 3 . ANALYSE: run CNA for the LowCov outcome
fit <- cna(data, outcome = "LowCov", con = 0.9, cov = 0.9)

# 4 . REVIEW: print solutions, ordered by fit
fit

# 5 . INTERPRET: score and inspect a chosen pathway case-by-case
condTbl(csf(fit)$condition, data)

# 6 . ROBUSTNESS: does the finding survive threshold changes?
fr <- frscored_cna(data, fit.range = c(1, 0.8), granularity = 0.1)
fr

# 7 . RECOMMEND: translate the surviving pathway into program language
# 8 . BRIEF: write Problem / Finding / Recommendation (Modules 29 and 30)

Interpret the pathways

Read each formula aloud in plain words (Module 26). Which gaps separate the low-coverage facilities from the rest?

Assess robustness

Keep only what survives the threshold grid. Note fragile pathways honestly, rather than quietly hoping nobody checks.

Prepare recommendations

Turn the difference-making gaps into targeted fixes, not a scattershot of isolated activities.

Write the policy brief

Problem, Finding, Recommendation. Two minutes for a director to read.

Note on csf()

Complex solution formulas

csf(fit) extracts the model's complex solution formulas, and $condition pulls out the formula strings so condTbl() can score them. For single-outcome studies you'll often just read fit directly. The csf() function matters most when you model chains with more than one outcome.

You've done the whole loop

Plan → prepare → model → evaluate → check robustness → interpret → brief. That circle is a CNA study. Everything else is practice and judgement.

Course Complete · Final Competencies

What you can now do

By finishing this curriculum you should be able to:

Explain CNA, along with conjunctivity, equifinality, and causal chains, in plain language.
Formulate CNA-shaped research questions for immunization programs.
Design a study: define outcomes, choose conditions, draw a framework.
Prepare and clean crisp-set supervision datasets.
Install and use R, RStudio, and the cna package.
Run cna() models and read solution formulas.
Interpret consistency and coverage, and compare candidate models.
Run robustness checks with frscore and sensitivity analyses.
Translate configurational findings into policy briefs and executive summaries.
Use CNA to push immunization program performance forward.

Create your free account

The mystery of the 100 facilities

A method for combinations, not lone culprits

It can find more than one storyline

CNA finds structures consistent with your data

Conjunctivity, equifinality, and causal chains

1 · Conjunctivity, the "AND" idea

2 · Equifinality, the "many roads" idea

3 · Causal chains, the "knock-on" idea

"Can't I just run a regression?"

Regression ranks ingredients. CNA reads recipes.

Asking a question CNA can actually answer

Can you phrase it as "which conditions make the difference?"

Every row is a "case"

Three decisions before you touch the data

Define the outcome

Identify plausible conditions

Write down a causal theory

Draw your assumptions before the data argues back

Three kinds of data CNA accepts

Crisp-set, the easiest, plain yes or no

Multi-value, for more than two levels

Fuzzy-set, for degrees of membership

Start crisp. Graduate later.

The dataset we'll carry through the course

Clean before you compute

Don't panic at the algebra

The whole vocabulary, in three words

Read it like a sentence

Trained staff AND a working cold chain AND supervision, goes with coverage

One more, with two pathways

Two free programs, fifteen minutes

Install R, the engine

Install RStudio, the workspace

Open RStudio

Talk to R like a calculator

Adding CNA's tools to R

Install once. Library every time.

Type the supervision table into R

One function does the heavy lifting

The outcome argument names a factor value

Your first solution formula

NoCold + NoSuper <-> LowCov

The five symbols CNA speaks in

A conjunctural road to low DTP3

Two roads to too many zero-dose children

Which gaps go with poor-quality sessions?

Consistency: does the pattern keep its promise?

Consistency is "how often the pattern holds"

Coverage: how much of the outcome does the pattern explain?

Coverage is "how much of the problem this pattern captures"

Choosing well, and looking under the hood

Consistency and coverage are the defaults, not the only options

Would the finding survive a small nudge?

Higher fit-robustness means a steadier finding

Turn the dials and watch what happens

Translate the symbols back into the field

Say what CNA actually found, and not one inch more

A reusable skeleton for the results section

Research question

Dataset description

Conditions & outcome

CNA procedure

Identified pathways

Consistency & coverage

Robustness

Interpretation

Limitations

Three boxes a decision-maker reads in two minutes

One finding, one instruction

Run a full study, start to finish

Interpret the pathways

Assess robustness

Prepare recommendations

Write the policy brief

Complex solution formulas

What you can now do

Congratulations!

The `outcome` argument names a factor value