everybody-lies | Danny Zuckerman

Key Insights

Google search as the world’s most honest database. People lie on surveys, to interviewers, and on social media — but they tell Google what they actually think, fear, want, and wonder. Search data reveals the gap between stated preferences and revealed preferences at scale, with near-zero social desirability bias.
Racism, abuse, and other hidden phenomena are more prevalent than surveys suggest. Stephens-Davidowitz found that searches for racist content, domestic abuse resources, and child abuse-related queries were far more common than any survey-based estimate. The searches don’t cause the behavior — they reveal what was already present but unmeasured.
Small samples, big signals — zooming in is often more useful than zooming out. One of the book’s counterintuitive findings: big data is often most useful not for population-level statistics but for finding highly specific subgroups. The ability to slice data finely enough to find “people who search for X and also Y in state Z” reveals patterns invisible in aggregate.
The Oregon divorce study — aggregate data hides variation. The famous finding: states that introduced no-fault divorce saw increases in female suicide and domestic violence decrease — but aggregate statistics had masked this because effects on different subpopulations cancelled out. Big data’s ability to preserve subgroup variation is one of its genuine advantages over aggregate statistics.
Causal inference remains hard even with big data. The book is honest about what Google data can and can’t tell you. Correlation at scale is still correlation. The examples where causation is most credibly established use natural experiments or other designs — the data volume alone doesn’t solve the identification problem.

— Drafted from external sources; review and edit to make your own.