The mystery of the 100 facilities
- State why single-factor thinking fails in immunization programs.
- Explain, with an example, how the same input can produce different outcomes.
- Recognise the kind of problem CNA is designed to solve.
You supervise 100 health facilities. A handful keep missing children, quarter after quarter, no matter what you try. The question that keeps you up at night is simple: why does coverage stay low in these places?
Most of us answer that question by rounding up the usual suspect. "It's the training." Or "it's the fridge." We pick one lever, pull it, and wait for coverage to climb. Then it doesn't, and we're baffled, because on paper that facility had the training. The trouble is that low coverage rarely has one cause sitting alone in a room. It usually shows up when a few things go wrong at the same time, and a facility can survive one weak spot but not a particular combination of them.
A cake flops for a reason, and "not enough flour" is only sometimes the reason. Maybe the flour was fine but the oven was cold. Maybe the oven was hot but someone forgot the sugar. If you ask "does flour cause a flopped cake?" the honest answer is "it depends what else went wrong in the bowl." Low coverage is a flopped cake. You have to look at the whole bowl.
Here are two facilities that sat through the exact same training. One does fine. One keeps falling short. Watch what's actually different.
| Condition | Facility A | Facility B |
|---|---|---|
| Trained vaccinators | Yes | Yes |
| Functional cold chain | Yes | No |
| Monthly supportive supervision | Yes | Yes |
| Community mobilisation | Yes | Yes |
| Coverage stays low? | No | Yes |
Same training, very different lives. The broken cold chain is what tipped Facility B into low coverage, but only because of the company it was keeping.
Both facilities had trained staff, so "more training" was never going to rescue Facility B. The training was fine. It just isn't worth much when the vaccines spoiled before they reached an arm. Diagnosing that correctly, instead of ordering another training, is exactly the kind of problem Coincidence Analysis was built to solve.
Picture one facility in your area that keeps missing children.
- Have you ever 'fixed' the obvious factor and seen no change? What else might have been going wrong at the same time?
- If you listed everything that was in place there, would the real gap stand out?
A method for combinations, not lone culprits
- Define Coincidence Analysis in one plain sentence.
- Describe what a 'configuration' is.
- List the program situations where CNA fits best.
Coincidence Analysis (CNA) is a configurational comparative method. In plain terms, it looks for configurations, which are just bundles of conditions that travel together with an outcome, instead of putting each factor on trial one at a time. It was developed by Michael Baumgartner, and it is custom-built for the messy, multi-outcome structures that real programs are made of. It is happiest when causes work in teams.
It can find more than one storyline
Most methods assume there is a single best predictor and go looking for it, like a detective convinced there is one culprit. CNA is comfortable with the idea that several different combinations might each lead to the outcome, and it can spot all of them in the same dataset. It can even follow causal chains, where one factor quietly hands off to another.
CNA earns its keep precisely when programs behave the way immunization programs actually behave:
- Several things have to go wrong together before coverage really sinks.
- Different combinations can drag coverage down in different settings.
- Causal chains exist. Weak supervision leads to sloppy planning, which leads to missed children.
- Implementation varies from facility to facility and district to district, so one-size answers rarely fit.
CNA finds structures consistent with your data
CNA does not, by itself, prove that one thing causes another. Strictly speaking, it identifies causal models or structures that are consistent with the data and the assumptions you bring to it. Throughout this course, when you see "pathway," read it as a pathway consistent with the data, not a proven mechanism. That habit of language keeps you honest, and keeps reviewers on your side.
Think about how success is usually explained in your program.
- Are explanations usually about one factor, or several working together?
- Could two facilities be succeeding for completely different reasons?
Conjunctivity, equifinality, and causal chains
- Explain conjunctivity (AND) in plain language.
- Recognise equifinality (multiple roads) in programme data.
- Identify a causal chain, where one condition acts through another.
Everything CNA does rests on three ideas. Learn these three words and you have the heart of the method. Bonus: they make you sound very impressive in meetings.
1 · Conjunctivity, the "AND" idea
A condition might only matter when it is joined to others. A broken cold chain on its own may be survivable, but a broken cold chain and no outreach and no supervision, all in the same place, is how coverage quietly collapses. The conditions gang up.
A door stays shut when the lock, the handle, and the hinge all fail together. Fix any one of them and you might still get through. The door isn't broken because of a single villain. It's broken because three small problems decided to show up on the same day.
2 · Equifinality, the "many roads" idea
Different combinations can lead to the same sad outcome. One facility ends up with low coverage because of a broken cold chain plus no supervision. Another, with a perfectly good fridge, still ends up low because of no community mobilisation plus no outreach funding. Same destination, different roads. That is called equifinality, and CNA is one of the few methods polite enough to report every road instead of insisting there's only one.
Equifinality: two different combinations, each enough to produce low coverage. CNA reports both roads instead of forcing you to crown one "main cause."
3 · Causal chains, the "knock-on" idea
One factor can act through another. Weak supervision doesn't reach out and lower coverage by itself. It leads to weaker session planning, and weaker planning is what loses the children. It works down a chain, like a rumour:
A causal chain. CNA can detect that supervision acts through planning, which is something single-outcome methods simply cannot show you.
Think of a stubborn problem in your program, say a quarter of repeated stockouts. Name two different combinations of conditions that could each cause it. If you managed two, congratulations, you just spotted equifinality, and it means CNA is a good fit for your question.
Think about a district you know well.
- What combination of factors might explain its performance, rather than any single one?
- Could a different district be reaching the same result through a different combination?
- Can you spot a chain, where one weak link quietly causes the next?
"Can't I just run a regression?"
- Explain, in plain language, the question regression is built to answer.
- Explain why that question misses how immunization programs actually work.
- Say when CNA is the better tool, and when it is not.
It is a fair question, and one every analyst should ask. Logistic regression is an excellent, well-tested tool, and if your question is "which single factor moves the outcome most, holding the others constant," you should reach for it, not CNA. The two methods are not rivals so much as instruments tuned to different questions.
The catch is that regression, by design, estimates the average independent contribution of each factor. It answers "how much does supervision matter on its own?" But immunization coverage rarely fails because of one factor acting on its own. It fails when a broken cold chain meets missing supervision in the same place, at the same time. That is a question about combinations, and it is a different question than the one regression is built to answer.
| The question you're asking | Regression | CNA |
|---|---|---|
| Which single factor matters most? | ✓ | – |
| Which combinations of conditions matter? | – | ✓ |
| Are there multiple pathways to the same result? | Limited | ✓ |
| Do conditions form causal chains? | Limited | ✓ |
Different tools for different questions. CNA does not replace regression; it answers a question regression was never designed to ask.
Regression ranks ingredients. CNA reads recipes.
Regression is brilliant at telling you which ingredient, on average, has the biggest effect. CNA is built to tell you which combinations of ingredients produce the dish, including the fact that there may be more than one recipe that works. When a program fails because several things went wrong together, you need a method that thinks in recipes.
This is also why CNA does not report coefficients, p-values, or "the effect of supervision." Those numbers answer the ranking question. CNA answers the recipe question, and it speaks in a different language: combinations, pathways, and the conditions an outcome depends on. Keeping the two questions straight now will save you a great deal of confusion later.
Think about a district you know that struggles with coverage.
- If you had to name the one factor behind its struggle, could you? Or does it really take two or three things together to explain it?
- Might a neighbouring district be struggling for entirely different reasons, a different recipe for the same poor result?
Asking a question CNA can actually answer
- Write a research question in the shape CNA can answer.
- Tell a CNA question apart from a regression question.
- Spot when a question is about combinations rather than single effects.
CNA does not answer "by how much does X move coverage." It answers a sharper, more useful question: "which conditions, alone or in combination, make the difference to the outcome." In the words of the method's own literature, CNA hunts for difference-making conditions, the necessary and/or sufficient combinations that an outcome actually depends on. Good CNA questions are always shaped that way, and the best ones point straight at a problem you can fix.
Here are five questions immunization teams genuinely ask, written the way CNA likes them:
| Topic | A CNA-shaped question |
|---|---|
| Coverage | Which combinations of health-system gaps make the difference to low DTP3 coverage? |
| Zero-dose | Which conditions, alone or together, are linked to a high share of zero-dose children? |
| Stockouts | Which combinations of conditions are necessary or sufficient for repeated vaccine stockouts? |
| Supervision | Which weak supervisory practices, in combination, make the difference to poor session quality? |
| Pathways | Through which chain of conditions does weak supervision lead, step by step, to missed children? |
The first four ask which difference-making conditions sit behind an outcome. The fifth is special: it asks about a pathway, a chain where one condition feeds the next. CNA is one of the few methods that can model these multi-step, multi-outcome structures, which is exactly why it suits messy program systems.
Can you phrase it as "which conditions make the difference?"
If your question can be rewritten to ask which conditions, alone or in combination, make the difference to an outcome, it is a CNA question. If it asks for a single number, a ranking, or an average effect, it belongs to a different method, and that is perfectly fine.
The phrase "difference-making conditions" runs throughout the applied CNA literature. See, for example, studies framed explicitly around difference-making roles and pathways in the references at the end of this course.
Take a question your team is currently asking.
- Is it really 'which factor matters most', or 'which combinations make the difference'?
- How would you rewrite it so CNA could answer it?
Every row is a "case"
- Define what a 'case' is in a CNA study.
- Choose an appropriate unit of analysis for a question.
- Explain why mixing levels breaks the comparison.
In CNA, the thing you compare is called a case, and every case becomes one row in your data table. In immunization work, a case can sit at almost any level of the system:
- A health facility, which is the most common choice for supervision studies.
- An LGA, district, or state.
- An outreach site, or even a whole country.
Please don't mix facilities and districts in the same table. It's like comparing a single market stall to an entire shopping mall and wondering why the numbers look strange. Every case must sit at the same level, so pick one unit and stay loyal to it for the whole analysis.
Think about the data your program already collects.
- What would each row, each case, represent: a facility, an LGA, a district?
- Are you ever tempted to mix levels in one table? What confusion could that cause?
Three decisions before you touch the data
- Define a clear, measurable outcome with a threshold.
- Draw up a disciplined list of candidate conditions.
- Write down a causal theory before analysing.
A CNA study is mostly won or lost at the planning table, long before R enters the picture. Three decisions matter most.
Define the outcome
State exactly what you are trying to explain, and how you'll recognise it when you see it. For example: "Low routine immunization coverage" means DTP3 below 50%. The threshold is part of the definition, not an afterthought.
Identify plausible conditions
List the factors that program experience says could matter: no trained staff, broken refrigerator, no supervision, no community mobilisation, no outreach funding, thin staffing. Keep the list disciplined. Every condition should have a real reason to be there, not just a vibe.
Write down a causal theory
Before you analyse anything, commit to what you expect. For example: low coverage may come from a broken cold chain plus no supervision, OR from no community engagement plus no outreach. Writing it down ahead of time keeps you honest when the results land.
Sketch a quick study for a problem you care about.
- What is your outcome, and exactly where is the cut-off between high and low?
- Which five or six conditions genuinely belong on your list, and why each one?
Draw your assumptions before the data argues back
- Explain what a conceptual framework is and why to draw one first.
- Translate a program belief into a simple causal diagram.
- Use the framework as a yardstick for later results.
A conceptual framework is just a picture of what you think causes what. Drawing it forces you to be honest about your assumptions, and it gives you something to hold your CNA results up against later. Here are two chains an immunization team might sketch for low coverage:
Two candidate chains for low coverage. CNA will later tell you which links the data actually support, and it may surprise you, which is half the fun.
Draw your own causal picture for one outcome.
- Which arrows are you most confident about, and which are guesses?
- If the data later contradicts an arrow, would you trust the data or your prior belief?
Three kinds of data CNA accepts
- Tell crisp-set, multi-value, and fuzzy-set data apart.
- Decide which calibration a condition needs.
- Explain why beginners should usually start crisp.
CNA reads simple tables. The only real craft is "calibrating" each condition, which is a fancy word for turning messy reality into clean, comparable values. There are three flavours, and the cna package happily handles all of them.
Crisp-set, the easiest, plain yes or no
Every condition is either present (1) or absent (0). "Was the cold chain broken?" becomes 1 or 0. This is where beginners should start, and it is what we use throughout this course, because life is hard enough already.
| Facility | Broken cold chain | Meaning |
|---|---|---|
| HF1 | 1 | Cold chain was broken |
| HF2 | 0 | Cold chain was working |
Multi-value, for more than two levels
When a condition has natural steps, say supervision quality runs 0 = poor, 1 = moderate, 2 = good, you can use multi-value coding. CNA reads it fine, as long as the levels are clearly defined and you don't make them up on the spot.
Fuzzy-set, for degrees of membership
Sometimes "present" is a matter of degree. A value of 0.85 means "strongly, but not perfectly, in the set of low-coverage facilities." Fuzzy values run between 0 and 1. It's powerful, but save it for the day crisp-set feels boringly easy.
Start crisp. Graduate later.
Almost every immunization supervision question can be answered well with clean yes or no coding. Reach for multi-value or fuzzy only when a condition genuinely loses its meaning by being squashed into two boxes.
Look at one indicator you collect.
- Is it naturally yes/no, stepped, or a matter of degree?
- What would be lost or gained by squashing it into a simple 0/1?
The dataset we'll carry through the course
- Read a crisp-set supervision dataset row by row.
- Spot, by eye, which conditions separate the outcomes.
- Describe how each facility becomes one case.
Picture your supervision teams visiting six facilities and recording, for each one, whether four things were missing or broken, plus whether coverage was low. Everything is crisp-set: 1 means yes, 0 means no. Because we're in problem-solving mode, our conditions are framed as gaps, and the outcome is LowCov, where 1 means coverage fell below target.
| Facility | NoTrain | NoCold | NoSuper | NoComm | LowCov |
|---|---|---|---|---|---|
| HF1 | 0 | 1 | 0 | 0 | 1 |
| HF2 | 0 | 0 | 1 | 0 | 1 |
| HF3 | 1 | 1 | 0 | 1 | 1 |
| HF4 | 0 | 0 | 0 | 1 | 0 |
| HF5 | 1 | 0 | 0 | 0 | 0 |
| HF6 | 0 | 1 | 1 | 1 | 1 |
NoTrain = vaccinators untrained · NoCold = cold chain broken · NoSuper = no monthly supervision · NoComm = no community mobilisation. We build the real R model on this exact table in Part 6.
Look at every facility where coverage was low. In each one, either the cold chain was broken (NoCold) or supervision was missing (NoSuper), and sometimes both. Now look at the two facilities that were fine: neither problem was present. Notice that NoTrain and NoComm show up on both sides, so they're red herrings. Hold that thought, because CNA is about to back up your detective work.
Look only at the facilities where coverage was low.
- What do they share that the others don't?
- Which conditions appear on both sides, and so probably don't matter?
Clean before you compute
- List the data-quality checks to run before any analysis.
- Recognise a logical contradiction in coded data.
- Explain why clean data is part of the analysis, not a chore before it.
CNA is unforgiving of dirty data, because it reads every row as a real case and takes each one seriously. Before you analyse anything, walk this checklist:
- Missing values. CNA needs complete rows, so decide how to handle gaps before you start, not after.
- Duplicates. The same facility entered twice quietly bends the patterns without telling you.
- Coding errors. A stray 2 in a yes-or-no column will be read literally and believed without question.
- Logical contradictions. "Cold chain broken = 1" alongside "Refrigerator functional = Yes" cannot both be true, so go investigate.
CNA will cheerfully find a crisp, beautiful pattern in mislabelled data and present it with total confidence, like a witness who is very sure and completely wrong. The method cannot tell a typo from the truth. Cleaning is not optional housekeeping. It is part of the analysis.
Think about your last messy dataset.
- Where do errors usually creep in: missing rows, duplicates, miscoding?
- What single check would have caught the most problems?
Don't panic at the algebra
- Read a CNA formula out loud as an ordinary sentence.
- Translate the symbols * + and the arrow into plain words.
- Turn any solution formula into something a colleague would understand.
Most people meet their first CNA result and feel their stomach drop. The screen shows something like this, and it looks like algebra homework you thought you had escaped:
A*B + C*D ↔ Y
Take a breath. There is nothing to fear here, because a CNA formula is just a sentence wearing a lab coat. Once you know the three words it is built from, you can read any result in the course, and any result your own data ever produces.
The whole vocabulary, in three words
Every CNA formula is made of three little symbols. That is the entire language. Learn these and you are fluent.
| Symbol | Say it as | It means |
|---|---|---|
| * | "and" | these conditions occur together, in the same case |
| + | "or" | this is a separate route to the same outcome |
| ↔ | "goes with" | the left side tracks the outcome on the right |
So that scary formula A*B + C*D ↔ Y simply says: "When you have A and B together, or C and D together, you tend to find Y." That's it. A sentence about two recipes for the same result.
A note for the precise: the double arrow ↔ is technically an equivalence. It says the left side is both sufficient for the outcome (where you see the combination, you see Y) and, taken as a whole, necessary for it (where you see Y, you find one of these combinations). "Goes with" is a faithful shorthand for that, and you will meet the full story in the model-evaluation modules.
Read it like a sentence
Here is the trick that makes everything click. Swap the letters for the real conditions, read left to right, and say the symbols out loud as their plain words. Watch a real immunization formula turn into an ordinary observation:
Train * Cold * Super ↔ Coverage
Trained staff AND a working cold chain AND supervision, goes with coverage
Read fully: facilities with trained staff, a functioning cold chain, and supportive supervision consistently achieved high coverage. Notice you didn't lose anything by saying it in words. You gained a sentence your director can actually act on. The formula and the sentence are the same statement, just dressed differently.
A recipe card reads "flour + butter + sugar." Nobody panics at that. CNA notation is a recipe card for outcomes: the * joins the ingredients that have to go in together, the + separates one recipe from a different recipe that also works, and the arrow points at the cake. You have been reading recipes your whole life. This is the same skill.
One more, with two pathways
When a formula has a + in it, that is equifinality showing up in the algebra: two different roads to the same place. Read each side as its own little sentence, joined by "or":
NoCold*NoSuper + NoComm*NoOutreach ↔ LowCov
Out loud: "Low coverage goes with a broken cold chain and missing supervision, or with no community mobilisation and no outreach." Two recipes for the same disappointing result. A district could land in low coverage by either route, which is exactly why you need to know which route a given place is on before you spend a naira fixing it.
Lower-case letters mean absence. Cold means the cold chain is working; cold (or a name like NoCold) means it is broken. When you read a formula aloud, always check the case, because "supervision is present" and "supervision is absent" are opposite stories with opposite fixes.
Take the very first formula in this module, A*B + C*D ↔ Y.
- Can you now say it out loud as a sentence, without looking at the table?
- Swap in conditions from your own program. Does the sentence describe something you have actually seen in the field?
Two free programs, fifteen minutes
- Install R and RStudio with confidence.
- Tell the difference between the two and why you need both.
- Open the console and know where to type.
R is the calculator that runs CNA. RStudio is the comfortable room you run it in. You install both once, they cost nothing, and nobody will quiz you on how they work under the hood.
Install R, the engine
Go to the official R project download page, cran.r-project.org, and download the version for your operating system. Run the installer and accept the defaults, which are sensible.
Install RStudio, the workspace
Then get RStudio Desktop from posit.co. It's the friendly window where you'll type everything. Install R first, because RStudio goes looking for it and gets sad if it isn't there.
Open RStudio
The big panel on the left is the Console. That's where you type commands and press Enter. That's it. You're ready.
R is the car engine. RStudio is the dashboard, the steering wheel, and the comfy seat. You could drive with just an engine bolted to a metal frame, but nobody actually wants to. Install both.
Before moving on, get set up.
- Did R install before RStudio? Does RStudio find it?
- Can you find the console panel where commands go?
Talk to R like a calculator
- Use R as a simple calculator.
- Store a value in a variable with the arrow.
- View what a variable contains.
Type this into the Console and press Enter. R answers right away, no small talk.
2 + 2
That [1] simply means "this is the first, and in this case only, answer." Now let's store a number in a variable using the arrow <-, which is R's slightly dramatic way of saying "put this into."
# put the number 42 into a box named "coverage" coverage <- 42 # now look inside the box coverage
coverage <- 42 means "label a box coverage and drop 42 inside." Whenever you write coverage later, R opens the box and shows you what's in it. That's all a variable is, a labelled box. No magic, no exam.
Try a couple of commands of your own.
- What happens if you store your district's coverage in a variable and call it back?
- Does the idea of a 'labelled box' make variables feel less mysterious?
Adding CNA's tools to R
- Install the cna package once.
- Load it at the start of every session.
- Explain the install-once, library-every-time pattern.
R doesn't know how to do Coincidence Analysis straight out of the box. You teach it by installing the cna package, a free add-on published on CRAN, the official R software library. You install it once, then load it at the start of every session.
# download and install the cna package (needs internet, do this once) install.packages("cna")
# switch the tools on for this session library(cna)
Install once. Library every time.
install.packages() is like buying a tool and putting it in the cupboard. You do it once. library() is taking the tool back out of the cupboard, which you do at the start of every session. Forgetting library(cna) is the single most common beginner stumble, and now it won't be yours.
Lock in the habit now.
- Why does install.packages() happen once but library(cna) every time?
- What is the most common reason a CNA script fails on the first line?
Type the supervision table into R
- Build a data.frame from typed values.
- Map each column to one condition.
- Print a dataset to check it.
Let's recreate the Module 9 table as a data.frame, which is just R's name for a spreadsheet. Each c(...) is one column, listing the six facilities' values from top to bottom. Type carefully; R is a stickler for commas.
library(cna) # build the supportive-supervision dataset (6 facilities) # conditions are GAPS; outcome LowCov = 1 means coverage fell below target immunization <- data.frame( NoTrain = c(0, 0, 1, 0, 1, 0), NoCold = c(1, 0, 1, 0, 0, 1), NoSuper = c(0, 1, 0, 0, 0, 1), NoComm = c(0, 0, 1, 1, 0, 1), LowCov = c(1, 1, 1, 0, 0, 1) ) # look at it immunization
1 0 1 0 0 1
2 0 0 1 0 1
3 1 1 0 1 1
4 0 0 0 1 0
5 1 0 0 0 0
6 0 1 1 1 1
Six rows, five columns. R now holds your supervision data exactly as you designed it, missing children and all.
Recreate the table yourself.
- Did every column get the right number of values, one per facility?
- What would happen if one c(...) had a value too few?
One function does the heavy lifting
- Run cna() on a dataset.
- Set the outcome argument correctly.
- Explain what 'capital letter means present' is about.
The whole method is wrapped in a single function: cna(). You hand it your data and tell it which column is the outcome. It quietly searches for the minimally sufficient and necessary combinations, then assembles them into solution formulas while you sip your tea.
# ask: which combinations of gaps go together with LowCov = 1 ? model <- cna(immunization, outcome = "LowCov")
The outcome argument names a factor value
Writing outcome = "LowCov" tells CNA to model the presence of low coverage, the value 1. In crisp-set data, a capitalised name means "this factor = 1." That convention runs through the whole package: a capital letter means present, and lower-case means absent.
Running this stores the result in model but prints nothing yet, which feels anticlimactic. Don't worry. Viewing it is the very next module.
Run the model on your own version of the data.
- Did you name the outcome exactly as the column is spelled?
- What is CNA quietly searching for while it runs?
Your first solution formula
- Print and read a CNA solution formula.
- Translate a formula into a plain-language sentence.
- Connect the formula back to the raw data rows.
Type the model's name to print it. CNA reports its solution as a formula. For our supervision data, the pattern that perfectly separates the low-coverage facilities from the healthy ones is this:
model
consistency: 1.000 coverage: 1.000
NoCold + NoSuper <-> LowCov
"Coverage is low wherever the cold chain was broken OR supervision was missing, and nowhere else." The + means OR, so this is two roads to the same sad place. The <-> means the combination tracks the outcome exactly.
Look back at Module 9. Every low-coverage facility had a broken cold chain or no supervision, and the two healthy facilities had neither. Notice what's missing from the formula: NoTrain and NoComm never made the cut, because they showed up on both the good and the bad side. CNA quietly ignored the red herrings and kept only what actually makes the difference, and it confirmed the pattern holds for every single case, with consistency and coverage both at 1.000. We unpack those two numbers in Part 8.
Read your result out loud.
- Can you say the formula as one sentence a colleague would understand?
- Do the facilities in the data actually match what the formula claims?
The five symbols CNA speaks in
- Read the five core CNA symbols.
- Tell AND (*) from OR (+) in a formula.
- Recognise that lower-case means absence.
Every CNA formula is built from a tiny vocabulary. Learn these five and you can read any solution the package ever hands you.
| Symbol | Means | In plain words |
|---|---|---|
| * | AND | both conditions must be present together |
| + | OR | either road will do on its own |
| <-> | equivalence | the left side tracks the outcome exactly |
| -> | sufficiency | the left side is enough for the right |
| a | absence | lower-case means that condition is absent (0) |
So a formula like NoCold*NoComm + NoSuper would read: "(broken cold chain AND no mobilisation) OR (no supervision)." That single line captures both conjunctivity, the *, and equifinality, the +, at the same time. Two ideas, one tidy line.
Practise on a formula of your own.
- Can you point to the conjunctivity (*) and the equifinality (+) in it?
- What would the same formula mean if one term were lower-case?
A conjunctural road to low DTP3
- Frame a low-DTP3 question for CNA.
- Read a single-pathway (conjunctural) solution.
- Explain why no one condition in the bundle is enough alone.
Suppose you study low DTP3 coverage with four gap conditions: untrained staff, recent stockout, broken cold chain, and no supervision. A common CNA result is a single bundle, where low coverage needs several gaps to gang up at once.
library(cna) # columns: NoTrain, Stockout, NoCold, NoSuper, LowDTP3 (all 0/1) m_dtp3 <- cna(dtp3_data, outcome = "LowDTP3") m_dtp3
Interpretation: DTP3 collapses where a stockout, a broken cold chain, and absent supervision all land in the same place. No single gap is the villain. This is conjunctivity, the gangs-up idea, showing its face in a real coverage indicator. It also tells you that fixing only one of the three may not be enough on its own.
Apply it to your own DTP3 numbers.
- Which gaps would you put on the list of candidate conditions?
- If the result is a bundle, what does that imply for what you fund?
Two roads to too many zero-dose children
- Frame a zero-dose question for CNA.
- Read a two-pathway (equifinal) solution joined by OR.
- Explain what multiple pathways mean for program design.
Now an equifinality example. Conditions, all framed as gaps: no outreach, no community volunteers, no settlement mapping, no outreach funding. Outcome: HighZeroDose, meaning too many children never got a single dose. CNA can report two separate roads joined by OR.
m_zd <- cna(zerodose_data, outcome = "HighZeroDose") m_zd
Interpretation: a district can end up with too many zero-dose children either because outreach was never funded, or because it lacked active volunteers and had no settlement map to work from. The + is the gift here. It tells you there are two different problems wearing the same uniform, so the fix in one district may not be the fix in the next.
Think about zero-dose children in two different areas.
- Could each area be 'zero-dose' for different reasons?
- How would two pathways change what you'd recommend in each place?
Which gaps go with poor-quality sessions?
- Design a supervision-quality study with CNA.
- Choose supervisory practices as conditions.
- Interpret which combination of practices matters.
This is the study supervision teams care about most, because it's about them. Research question: which combinations of weak supervisory practices go together with poor-quality immunization sessions? Conditions, as gaps: no checklist used, no action plan developed, no feedback given, no follow-up visit. Outcome: PoorQuality.
library(cna) # NoChecklist, NoActionPlan, NoFeedback, NoFollowUp, PoorQuality (0/1) m_sup <- cna(supervision_data, outcome = "PoorQuality") m_sup
Interpretation: sessions turn out poor either when the supervisor skips both the checklist and the on-the-spot feedback, or whenever there's simply no follow-up visit at all. A visit that ticks a checklist but never comes back is still a visit that loses quality. CNA tells you which combination of missing practices to fix, not just which single box to scold someone about.
Swap in the four practices your own supervision checklist tracks. The columns change, but the cna() call stays exactly the same. That reusability is the whole point, and your future self will thank you.
Look at your own supervision checklist.
- Which four practices would you test as conditions?
- Do you suspect a single practice matters, or a combination?
Consistency: does the pattern keep its promise?
- Define consistency in plain language.
- Read a consistency score.
- Set the con threshold when data is noisy.
CNA gives every solution two scores. The first is consistency, and it answers a simple question: when this combination of gaps is present, does low coverage actually show up too?
Consistency is "how often the pattern holds"
A consistency of 1.00 means every facility with the combination also had the outcome, so the pattern never let you down. A consistency of 0.80 means it held 80% of the time. Higher is better. CNA's default threshold is 1, and you lower it when real-world data gets noisy, which it always does.
Consistency is "every time I leave the milk out of the fridge, does it go off?" If it goes off every single time, that's perfect consistency. If it sometimes survives, the consistency drops, and you start to wonder what else is going on.
Consistency can read deceptively high when the outcome itself is very common in your data. If, say, 90% of your facilities already have low coverage, almost any combination will look "consistent" with it, because the outcome is everywhere. So always read a consistency score next to how common the outcome is, and lean on the robustness checks in the next part rather than trusting a single high number. Recent work by De Souter and Baumgartner (2025) adds prevalence-adjusted measures precisely to handle this.
In code, you set the bar with the con argument:
# accept patterns that hold at least 90% of the time cna(immunization, outcome = "LowCov", con = 0.9)
Think about reliability.
- If a recipe works 8 times in 10, is that good enough to act on in your setting?
- Where would you set the bar, and what would you risk by lowering it?
Coverage: how much of the outcome does the pattern explain?
- Define coverage as a model-fit measure.
- Keep CNA-coverage and vaccination-coverage straight.
- Set the cov threshold sensibly.
The second score is coverage, an evaluation measure that you should not confuse with vaccination coverage. It answers: of all the facilities that had the outcome, what share does this pattern account for?
Coverage is "how much of the problem this pattern captures"
If 90 of 100 low-coverage facilities fit your pattern, the coverage score is 0.90. A lower score isn't automatically bad. It often just means other roads to the same problem exist, which is equifinality saying hello. You set the threshold with the cov argument, and its default is also 1.
"Coverage" in CNA output is a model-fit statistic. "Coverage" in your program is the share of children vaccinated. Same word, two completely different jobs, which is mildly unfair of the universe. In reports, say "the pattern's coverage score" versus "DTP3 coverage" so nobody gets lost.
# require patterns that are both reliable AND broadly explanatory cna(immunization, outcome = "LowCov", con = 0.9, cov = 0.9)
Mind the two meanings of 'coverage'.
- In your own writing, how will you avoid confusing the two?
- If a pathway has low coverage-score, what might that tell you about other pathways?
Choosing well, and looking under the hood
- Weigh models on simplicity, consistency, coverage, and plausibility.
- Use condTbl() and condition() to inspect a formula.
- Resist choosing a model on numbers alone.
When CNA hands you several candidate models, weigh them on four criteria, three statistical and one stubbornly human:
- Simplicity. Fewer conditions are easier to explain and easier to act on.
- Consistency. How reliably the pattern holds.
- Coverage. How much of the outcome it accounts for.
- Program plausibility. Does it actually make public-health sense?
To inspect any formula's scores directly, use condTbl(), which builds a tidy summary table of consistency and coverage for whatever conditions you pass it. To probe a formula facility by facility, use condition().
library(cna) # score a specific formula against the data condTbl("NoCold + NoSuper <-> LowCov", immunization) # see exactly which facilities each road covers condition("NoCold + NoSuper", immunization)
LowCov 1.000 1.000
Never pick a model just because the numbers are the prettiest. A statistically flawless pattern that no program officer can explain or act on is, in practical terms, useless. The model has to make public-health sense first, and impress the statistician second.
Consistency and coverage are the defaults, not the only options
This course uses standard consistency and coverage throughout, because they are the defaults and the easiest to learn. From version 4.0.0 onward, the cna package also offers several additional sufficiency and necessity measures (including prevalence-adjusted ones) through a measures argument in cna(). You do not need them to complete this course, but once you are comfortable, they are the natural next step for handling tricky or imbalanced data.
Imagine two models with similar scores.
- Which would you trust: the tidier one, or the one that makes more program sense?
- What would make a statistically strong model still unusable in practice?
Would the finding survive a small nudge?
- Explain what robustness means for a CNA finding.
- Run frscored_cna() across a range of thresholds.
- Judge whether a finding is steady enough to brief.
A result you plan to act on should not fall apart the moment the data or the thresholds wobble a little. Robustness asks exactly that. The CNA ecosystem has a dedicated companion package for it called frscore, which scores how "fit-robust" your models are across a whole range of consistency and coverage thresholds, so you don't have to do it by hand and lose an afternoon.
# install once install.packages("frscore") library(frscore) # re-analyse across many thresholds and score robustness in one call fr <- frscored_cna(immunization, fit.range = c(1, 0.7), granularity = 0.1) fr
Higher fit-robustness means a steadier finding
frscored_cna() runs CNA again and again across a grid of thresholds, here from 1.0 down to 0.7 in steps of 0.1, and rewards the models that keep showing up. A pattern that reappears no matter how you turn the dials is one you can trust enough to put in front of a director without sweating.
Think about confidence.
- Would your finding survive a small change in the data or thresholds?
- How robust does a result need to be before you'd put it in front of a director?
Turn the dials and watch what happens
- Run a quick sensitivity check by hand.
- Compare findings across two thresholds.
- Report fragility honestly.
Even without a special package, you can run a quick sensitivity check by hand. Run CNA at a couple of different thresholds and see whether the story stays the same or suddenly changes its tune.
# strict cna(immunization, outcome = "LowCov", con = 0.9, cov = 0.9) # relaxed cna(immunization, outcome = "LowCov", con = 0.85, cov = 0.85)
If the same pattern survives both runs, your confidence quietly goes up. If a tiny change rewrites the whole solution, treat the finding as fragile and say so plainly in your report. Honesty here saves embarrassment later.
Try turning the dials.
- Does your pathway survive both a strict and a relaxed threshold?
- If it changes, how would you describe that honestly in a report?
Translate the symbols back into the field
- Translate a formula into program language.
- Write a finding a non-analyst can act on.
- Avoid leaving results as code fragments.
A formula is the start of the work, not the end. Nobody ever approved a budget because a slide said NoCold + NoSuper. Decision-makers need plain language, so compare these two ways of saying the exact same thing:
| Please don't write this | Write this instead |
|---|---|
| "NoCold + NoSuper explains LowCov." | "Coverage fell below target wherever the cold chain was broken or monthly supervision was missing. Fix either gap and the outcome changes." |
Same finding, two very different audiences. The first is a code fragment only you understand. The second is something a state immunization officer can read once, repeat in a meeting, and act on by Friday. Always carry the result that last mile into program language.
Rewrite one finding for a real audience.
- Would a state immunization officer understand your sentence on first read?
- Have you said what to do, not just what the symbols are?
Say what CNA actually found, and not one inch more
- Avoid over-claiming causation from one study.
- Use careful language like 'consistent with the data'.
- Report the whole bundle, not one cherry-picked condition.
The most common interpretation error is over-claiming causation from a single configurational study. CNA identifies dependency structures consistent with a causal reading. It does not, on its own, prove that one thing causes another. Keep your language honest:
| Over-claiming | Honest phrasing |
|---|---|
| "A broken cold chain causes low coverage." | "A broken cold chain appears as a difference-making condition consistently associated with low coverage in this data." |
Because conjunctivity is the whole premise of CNA, yanking one condition out of a bundle and crowning it "the cause" quietly contradicts the method you just spent ten modules learning. Report the bundle, not the brick.
Check your own phrasing.
- Are you saying 'X causes Y', when 'X is part of a configuration linked to Y' is more honest?
- Have you pulled one brick out of a bundle and called it the cause?
A reusable skeleton for the results section
- Follow the nine-part results template.
- Make sure each section answers a reviewer's question.
- State limitations plainly.
Every CNA write-up moves through the same nine beats. Copy this order and you'll never leave out something a reviewer asks for.
Research question
Stated as a combinations question.
Dataset description
Cases, level, number of facilities, time period.
Conditions & outcome
Each one named, defined, and calibrated (the threshold for "1").
CNA procedure
Package version, con/cov thresholds, any orderings.
Identified pathways
The solution formula(s), in symbols and in plain words.
Consistency & coverage
The fit scores for each pathway.
Robustness
What survived threshold changes / frscore results.
Interpretation
Program-language meaning of each pathway.
Limitations
Case count, data quality, what CNA can and can't claim.
Audit a write-up against the template.
- Which of the nine sections do people most often skip?
- Where would a reviewer push back hardest on your draft?
Three boxes a decision-maker reads in two minutes
- Structure a policy brief: problem, finding, recommendation.
- Strip an analysis to what a director needs.
- Tie the recommendation directly to the finding.
A policy brief strips the analysis down to what a busy director actually needs: the problem, the finding, and the recommendation. No methods section, no R output, no suspense. Here is our supervision study written up as a brief.
Routine immunization coverage keeps falling below target across the LGA, and one-off investments in single interventions have not closed the gap.
Coverage stayed low in exactly the facilities where the cold chain was broken or monthly supportive supervision was missing. Where both were in place, coverage held. Training and community mobilisation, notably, did not separate the strugglers from the rest.
Target the two difference-makers directly. Repair and maintain cold chain equipment, and guarantee monthly supervision at every facility. Spreading the same budget thinly across interventions that do not move the outcome predicts more of the same low coverage.
Draft a three-box brief for a real problem.
- Can a busy director grasp it in two minutes?
- Does your recommendation follow logically from the finding, with nothing extra?
One finding, one instruction
- Compress a study to one finding and one action.
- Write an executive summary that stands alone.
- Lead with what matters most.
The executive summary is the whole brief boiled down to one sentence of finding and one sentence of action. If a director reads nothing else, they read this.
Finding
Low coverage traced back to one of two fixable gaps every time: a broken cold chain or missing supervision.
Recommendation
Close both gaps, facility by facility. Resist the urge to spread the budget across activities that this data shows do not move the needle.
Boil your study down.
- If a director read only your summary, would they know what to do?
- Is the single most important point truly at the top?
Run a full study, start to finish
You supervise 50 facilities, and a stubborn share of them keep posting low coverage. Outcome: LowCov. Conditions: untrained staff, no supervision, broken cold chain, no outreach, no community mobilisation. Here is the entire workflow as one connected script, every step you have learned, in order.
library(cna) library(frscore) # 1 . IMPORT: read your facility data from a CSV file data <- read.csv("facility_data.csv") # 2 . INSPECT: sanity-check structure and values before analysing str(data) # column types and a peek at values summary(data) # min/max catches stray codes (e.g. a 2 in a 0/1 column) # 3 . ANALYSE: run CNA for the LowCov outcome fit <- cna(data, outcome = "LowCov", con = 0.9, cov = 0.9) # 4 . REVIEW: print solutions, ordered by fit fit # 5 . INTERPRET: score and inspect a chosen pathway case-by-case condTbl(csf(fit)$condition, data) # 6 . ROBUSTNESS: does the finding survive threshold changes? fr <- frscored_cna(data, fit.range = c(1, 0.8), granularity = 0.1) fr # 7 . RECOMMEND: translate the surviving pathway into program language # 8 . BRIEF: write Problem / Finding / Recommendation (Modules 29 and 30)
Interpret the pathways
Read each formula aloud in plain words (Module 26). Which gaps separate the low-coverage facilities from the rest?
Assess robustness
Keep only what survives the threshold grid. Note fragile pathways honestly, rather than quietly hoping nobody checks.
Prepare recommendations
Turn the difference-making gaps into targeted fixes, not a scattershot of isolated activities.
Write the policy brief
Problem, Finding, Recommendation. Two minutes for a director to read.
csf()Complex solution formulas
csf(fit) extracts the model's complex solution formulas, and $condition pulls out the formula strings so condTbl() can score them. For single-outcome studies you'll often just read fit directly. The csf() function matters most when you model chains with more than one outcome.
Plan → prepare → model → evaluate → check robustness → interpret → brief. That circle is a CNA study. Everything else is practice and judgement.
What you can now do
By finishing this curriculum you should be able to:
- Explain CNA, along with conjunctivity, equifinality, and causal chains, in plain language.
- Formulate CNA-shaped research questions for immunization programs.
- Design a study: define outcomes, choose conditions, draw a framework.
- Prepare and clean crisp-set supervision datasets.
- Install and use R, RStudio, and the
cnapackage. - Run
cna()models and read solution formulas. - Interpret consistency and coverage, and compare candidate models.
- Run robustness checks with
frscoreand sensitivity analyses. - Translate configurational findings into policy briefs and executive summaries.
- Use CNA to push immunization program performance forward.