Opportunity Data focuses on what unfolds over time: earnings trajectories, variation across students, and the data gaps that matter most for understanding real program value. One of the most consequential of those gaps is data suppression.

The data problem and why it is changing

For many years, post-completion outcomes were difficult to observe. Institutions had limited visibility into where students worked or how their earnings evolved over time.

That is changing. Administrative data systems and linked datasets now provide direct measures of labor market outcomes.

The Post-Secondary Employment Outcomes (PSEO) dataset represents a significant advance. By linking institutional records to federal wage data through the Longitudinal Employer-Household Dynamics (LEHD) system, PSEO provides earnings at 1, 5, and 10 years after completion, along with percentile distributions (P25, P50, P75). This enables direct observation of both earnings trajectory and within-program variation.

However, like all administrative datasets, PSEO is subject to structural limitations that affect coverage and completeness.

Limitations in the underlying wage records

The PSEO technical documentation identifies several constraints inherent to the UI wage record system:

Self-employment is excluded. Unincorporated self-employed individuals do not appear in UI wage records.

Certain employment categories are missing. This includes some federal civilian employees, railroad workers, and informal or family-based employment arrangements.

Coverage is bounded by the UI system's purpose. UI wage records were designed for unemployment insurance administration, not comprehensive earnings measurement. They capture W-2 employment but not all forms of labor market participation.

These gaps are not uniform across fields. They are more consequential in occupations where self-employment and independent contracting are prevalent—including several skilled trades and technical fields common in short-term certificate programs.

Despite these constraints, PSEO remains one of the most complete longitudinal earnings datasets available for postsecondary graduates, with the ability to track workers across state lines.

Small cell suppression

A more structural limitation is small cell suppression.

PSEO applies differential privacy protections that suppress earnings estimates when the underlying cohort size falls below the minimum threshold required to protect individual identity. Suppressed cells are not rounded or flagged—they are removed entirely from the published data.

This is a necessary privacy safeguard. But it produces systematic analytical consequences: programs with small numbers of graduates do not produce reportable earnings at any percentile or time horizon.

The suppression is not random. It is directly related to cohort size and therefore correlated with institutional and program characteristics.

Impact on short-term certificate programs

Short-term certificate programs (less than one year) are often small by design. They are closely tied to local labor markets, specific occupations, and targeted training pipelines. As a result, they are disproportionately affected by small cell suppression.

The result is a form of structured missingness:

Programs with smaller cohorts are less likely to have observable outcomes at any time horizon.

Rural and specialized institutions are systematically underrepresented in the reportable data.

Entire segments of workforce training provision—including programs aligned with regional economic needs in areas like wildland fire, agricultural technology, or mine safety—may be excluded from analysis entirely.

Across the national PSEO data, a substantial share of short-term certificate program-institution combinations have no reportable earnings. In some states, more than half of all programs are fully suppressed.

Missing data is not neutral. Suppression follows predictable patterns and falls hardest on the programs that are already the most under-resourced.
Explore the data: The interactive earnings explorer displays the full picture for each institution—programs with complete P25–P75 ranges at Year 1, 5, and 10 alongside programs with partial or fully suppressed data. The whisker chart, career ladder, and earnings growth scatter filter to complete trajectories; the data table shows all records, including suppressed cells.

Comparison to other data sources

Other commonly used datasets for postsecondary outcomes introduce different coverage and methodological constraints:

State longitudinal data systems can provide detailed program-level outcomes but are bounded by state borders and subject to methodological choices that affect interpretation. Some state systems exclude lower-earning individuals from reported distributions, which truncates the observed variance and can bias estimates of program value upward.

Alumni-based datasets (e.g., Lightcast) provide occupational and career pathway context, but coverage depends on observable online activity—leading to uneven representation across fields and demographics.

The College Scorecard provides national coverage but is limited to federal financial aid recipients. This is a significant constraint for short-term programs, where many participants are not Title IV eligible and therefore absent from the data.

No single dataset provides a complete view of postsecondary labor market outcomes. Each introduces its own pattern of missingness.

Analytical implications

PSEO represents a substantial improvement in the available evidence base for postsecondary earnings analysis. It captures longitudinal earnings, enables analysis of both trajectory and distributional spread, and tracks graduates across state lines.

However, three constraints should inform any analysis built on PSEO data:

First, not all forms of employment are captured. Estimates for fields with high self-employment rates should be interpreted with this in mind.

Second, small cell suppression removes a non-random subset of programs from the observable data. The remaining programs skew toward larger institutions, higher-enrollment programs, and urban areas.

Third, PSEO should be interpreted alongside complementary sources rather than treated as a standalone measure of program value.

When a substantial share of programs in a given state or sector are suppressed, policy conclusions drawn from the visible programs alone carry an implicit selection bias: they reflect the outcomes of scale, not the outcomes of the full workforce training system.

Conclusion

The current data environment represents meaningful progress in the observation of post-completion labor market outcomes. Longitudinal, linked administrative data provides far greater analytical clarity than was previously possible.

But important gaps remain, and those gaps are not distributed randomly. They are concentrated in smaller programs, in less-populated areas, and in parts of the labor market that are structurally less visible in administrative systems.

Understanding these gaps is a prerequisite for interpreting the data correctly and for making informed decisions about program evaluation, funding allocation, and workforce policy.

In workforce analysis, what is not observed is often as important as what is.


Data: U.S. Census Bureau, Post-Secondary Employment Outcomes (PSEO), 2025Q4 release. Earnings in 2023 dollars (CPI-U adjusted). Short-term certificate programs (<1 year), all graduation cohorts, national geographic scope. Explore the data in the interactive earnings explorer.