Episode 63 — Sizing: Collective Estimation and Relative Measures

Sizing in agile work is less about precision and more about building shared understanding. The orientation for sizing is that it provides collaborative, relative measures to aid forecasting, trade-off discussions, and risk awareness. What it does not do is act as a performance score or individual productivity metric. When teams confuse sizing with measurement of output, they create distortions that harm trust and invite gaming of the system. Instead, sizing is best viewed as a tool to make plans honest in the face of variability. Knowledge work is unpredictable: learning curves, dependencies, and unknowns all affect how long something will take. By sizing relatively, teams acknowledge uncertainty but still gain enough shared perspective to compare work and make informed choices. This honest, collective approach allows plans to adapt without pretending that every detail is controllable or that estimates equal guarantees.
Relative sizing differs from absolute sizing in a fundamental way. Absolute sizing attempts to predict hours or days, assuming that time is a fixed measure that can be accurately assigned in advance. In complex, variable work, these guesses often prove unreliable because knowledge discovery, hidden dependencies, and unexpected risks change the trajectory. Relative sizing, on the other hand, sidesteps false precision by comparing items to one another. Instead of asking, “How many hours will this take?” the team asks, “Is this item larger or smaller than that one?” This shift acknowledges that while absolute accuracy is elusive, relative relationships are easier to gauge. It is like estimating weight at a farmers’ market—you may not know exactly how heavy a melon is in pounds, but you can confidently say whether it is heavier or lighter than another melon. This comparative framing reduces anxiety and improves consistency across uncertain domains.
Story points are the most common relative sizing method used in agile teams. They represent a unitless number, meaning they are not hours or days, but instead a scale of relative effort, risk, and complexity. One team might decide that a simple login screen is a three-point story, while a more complex checkout flow is an eight. What matters is not the number itself but how it relates to other items on the same team’s scale. Because points are unitless, they avoid the trap of false precision and instead encourage teams to think in terms of patterns. T-shirt sizes serve a similar purpose but with even coarser granularity: items are grouped into extra-small, small, medium, large, or extra-large. This method is useful when precision adds no value and the goal is simply to sort work into buckets that can guide planning. Both methods help anchor conversations without devolving into time-guessing games.
Collective estimation is a core principle in agile because it brings together cross-functional perspectives. If only a developer sizes work, they may miss hidden dimensions such as testing complexity, data preparation, or operational deployment. By including testers, designers, operators, and other specialists, estimation becomes a holistic view of what the slice of work entails. This diversity surfaces hidden work that might otherwise derail plans later. For example, a developer might think implementing a feature is straightforward, but a tester might raise concerns about edge cases, or an operations engineer might note the need for new monitoring scripts. When all voices contribute, the size reflects a fuller picture, reducing the chance of unpleasant surprises. Collective estimation also strengthens team cohesion, as each member feels ownership in shaping the forecast. It embodies the agile value of collaboration, turning sizing into a dialogue rather than a solitary guess.
Planning Poker is one of the most recognizable techniques for facilitating collective estimation. Each participant privately selects a card representing their estimate, then all cards are revealed simultaneously. This practice reduces anchoring—the tendency to be influenced by the first number spoken aloud. If estimates differ widely, the team discusses the reasons behind the divergence. Often, the highest and lowest estimators share their rationale, surfacing risks or perspectives others had not considered. Through conversation, the team converges on a shared size, but the important outcome is not the number itself—it is the shared understanding that emerges. For instance, disagreement might reveal that one person assumed integration with a third-party system was trivial, while another anticipated major obstacles. The act of reconciling those views produces alignment. Planning Poker, then, is less about producing a numeric output and more about ensuring that estimation is transparent, inclusive, and grounded in real dialogue.
Affinity estimation offers a faster, lighter-weight method compared to Planning Poker. Instead of debating one item at a time, the team quickly groups items based on perceived similarity. Cards or sticky notes are placed together on a wall, clustered into piles that represent similar sizes. Once the items are grouped, the team reviews them, adjusting any that feel out of place. This process emphasizes speed and relative sorting rather than exact calibration. The Bucket System extends this idea to larger backlogs, using pre-labeled bins such as “1, 2, 3, 5, 8, 13” points or T-shirt sizes. Items are distributed into the buckets quickly, providing an initial spread that can later be refined. These techniques shine when teams face a high volume of items and need to gain a sense of scale without bogging down in detail. They prove that estimation can be efficient as well as collaborative.
Reference stories anchor the sizing process by providing stable examples that the team agrees on. Instead of treating every estimation session as a blank slate, teams designate a few items as benchmarks. For example, a login feature might be the reference for a three-point story, while a password reset might anchor the five-point level. When new items arise, they are compared to these anchors: “Is this more like the login feature or the password reset?” This method prevents drift over time, ensuring that the scale remains consistent and explainable. Reference stories also help onboard new team members, as they provide concrete illustrations of what different sizes mean in practice. Without anchors, sizing can fluctuate as memories fade or team composition changes. With them, sizing becomes more stable, comparable, and communicable across sprints. Anchors turn abstract numbers into shared, memorable touchstones for the team.
Handling uncertainty is central to sizing because knowledge work often involves unknowns. When uncertainty is high, teams can use spikes—time-boxed investigations designed to answer a specific question—rather than assigning a misleading size. Alternatively, they may use size bands such as “medium–large” to indicate variability when risk is significant. The purpose of these techniques is to prompt decomposition rather than allow wishful thinking to carry forward. If something is too uncertain to size, that itself is a signal that more learning is required before planning. For example, if a feature requires integrating with an unfamiliar vendor API, a spike may be scheduled to explore feasibility. Only after that exploration can the team responsibly assign a size. By handling uncertainty explicitly, teams preserve honesty in planning and resist the temptation to smooth over unknowns with false precision. Uncertainty becomes visible and actionable, not hidden.
Split-before-size is a guiding discipline that prevents oversized items from distorting estimation and planning. When a story or epic feels too large to fit within normal flow, the team should decompose it before attempting to assign a size. Oversized items tend to create volatility in forecasts and delay feedback loops. By splitting them first, teams improve predictability and keep cycle times short. For example, instead of sizing an epic like “build user management system,” the team might split it into smaller stories: create users, edit profiles, manage roles, reset passwords. Each of these can then be sized with more confidence and flow naturally through the delivery pipeline. This practice ensures that estimation is applied to items that actually fit the team’s cadence. It reinforces the idea that decomposition and sizing are interdependent disciplines, both necessary to maintain steady, predictable delivery and frequent validation.
Non-functional and enabler work must be sized explicitly, even though it may not produce visible features for users. Tasks such as improving performance, adding security checks, or building tooling infrastructure often carry significant effort and risk. If left unsized, they create hidden load that undermines planning accuracy. For example, adding monitoring dashboards may not deliver new functionality to end-users, but it improves operability and reduces risk. Similarly, enhancing encryption might appear invisible but is critical to compliance. By sizing such work, teams make it visible and ensure it competes fairly with feature development in prioritization decisions. Explicit sizing also reinforces the notion that value is multi-dimensional: stability, performance, and safety matter as much as user-facing features. Recognizing non-functional and enabler work in estimation helps teams sustain quality and avoid the trap of underinvesting in foundational capabilities that make future delivery reliable.
Calibration to reality ensures that relative sizes remain grounded in actual outcomes. Over time, teams compare their story point or T-shirt size distributions to real cycle times and throughput. If the data shows that five-point stories are consistently completing faster than expected, or that large T-shirt items are unpredictably dragging on, the team adjusts its reference stories or interpretation. Calibration prevents the size scale from drifting into abstraction detached from reality. Importantly, this does not mean converting points into hours. Rather, it is about checking whether relative comparisons still correlate with delivery patterns. Teams who recalibrate regularly maintain honest planning and avoid false confidence. Calibration is not about perfection but about alignment, keeping estimation a tool for learning and decision-making. It reinforces the agile principle that data should inform adaptation, making sizing an evolving practice that reflects the team’s lived experience rather than rigid rules.
Wideband Delphi is a structured estimation method designed for complex or contentious items. The process involves multiple anonymous rounds of estimation, with a facilitator summarizing feedback between rounds. Because estimates are anonymous, participants are less influenced by authority bias or the loudest voices. Each round converges the group closer to a shared perspective while retaining diversity of thought. For example, if one participant believes a feature is a small effort and another sees it as very large, the anonymous format allows both to explain their reasoning without fear of pressure. Through iterative refinement, the team arrives at a size that reflects collective intelligence. Wideband Delphi is particularly valuable when stakes are high or when authority gradients could distort open discussion. It demonstrates that estimation is as much about surfacing different perspectives as it is about producing a number.
Bias mitigation is an ongoing concern in estimation because human judgment is vulnerable to psychological influences. Anchoring occurs when early numbers bias subsequent discussion. The halo effect allows perceptions of one feature to spill over into another. Senior voices may dominate, creating authority bias. To counter these, teams adopt practices such as independent first votes, facilitator neutrality, and explicit evidence checks. Planning Poker’s simultaneous reveal is one such mechanism. Another is asking the highest and lowest estimators to explain their rationale, forcing assumptions into the open. By systematically addressing bias, teams preserve the integrity of collective estimation. This protects against inflated sizes, distorted forecasts, or unhealthy team dynamics. Bias mitigation demonstrates that agile practices are designed not only to manage work but also to manage human psychology, ensuring that decisions emerge from evidence and dialogue rather than subconscious influence or hierarchy.
Capacity realism is the practice of reminding stakeholders that size is not a promise. Points or T-shirt sizes indicate relative effort, not guaranteed delivery dates. Commitments should instead derive from historical throughput and limits on work in progress. For example, if a team historically completes twenty points per iteration, committing to sixty is unrealistic regardless of the estimation. Capacity realism protects both the team and stakeholders from disappointment caused by overcommitment. It also shifts the conversation toward empirical planning rather than wishful thinking. By tying commitments to actual data, teams avoid the trap of using estimation as a proxy for capacity. Realism fosters trust because stakeholders learn that plans are grounded in evidence, not inflated numbers. It reinforces the agile principle that predictability comes from stable flow, not from forcing estimates into promises. In this way, capacity realism connects sizing back to honesty and sustainability.
Anti-patterns in sizing reveal what happens when practices are misused. A frequent mistake is converting points into hours, defeating the purpose of relative measures and reintroducing false precision. Another is using story points as a metric of individual performance, which incentivizes inflation and erodes collaboration. Teams may also fall into the trap of inflating sizes to “win” planning games, treating estimation as negotiation rather than as shared understanding. These anti-patterns distort behavior, turning estimation into a political tool rather than a collaborative learning device. The result is mistrust, inaccurate forecasts, and wasted energy. Recognizing these pitfalls helps teams stay true to the original intent of sizing: to facilitate planning, trade-offs, and risk conversations, not to score productivity. Anti-patterns serve as warnings that remind teams why discipline matters and why the agile community emphasizes values like transparency, respect, and collaboration over mechanical metrics.
For more cyber related content and books, please check out cyber author dot me. Also, there are other prepcasts on Cybersecurity and more at Bare Metal Cyber dot com.
Probabilistic forecasting is an advanced approach that replaces single-point estimates with ranges and confidence levels. Instead of saying, “We will finish in exactly ten days,” teams use historical throughput and cycle-time data to model distributions. Monte Carlo simulation is often applied, where thousands of random samples are drawn from past data to project possible futures. The result is a set of probability curves: for example, an eighty-five percent chance of delivering within twelve days, or a fifty percent chance within eight. This shifts the planning conversation from false certainty to informed probability. Stakeholders can then make trade-offs with eyes open, knowing the range of outcomes. Probabilistic forecasting acknowledges that software delivery is not deterministic but stochastic, shaped by variability. By grounding forecasts in evidence, teams build trust and improve decision-making, moving away from promises that collapse when reality intrudes.
Service-level expectations are best framed in percentile terms rather than averages. An average cycle time might be eight days, but averages hide the tails—the unusually short or unusually long cases. Percentiles, by contrast, provide a more realistic view: for example, “Eighty-five percent of items finish within ten days.” This way, stakeholders understand not only typical performance but also the outliers that may affect delivery. Service-level expectations communicate reliability as a probability distribution, aligning with how systems truly behave. For example, a team can set a service-level goal that ninety percent of defects are resolved within five days. This framing is clearer than averages and aligns with probabilistic thinking. It helps leaders and teams manage expectations responsibly, avoiding the trap of committing to dates that only hold under ideal conditions. Percentile-based commitments honor the variability inherent in knowledge work while still offering predictable service boundaries.
Iteration capacity planning uses sizing to inform what can realistically be delivered in the near term. Instead of filling an iteration with as many points as possible, teams select a small set of the highest-value items that fit within empirical limits. Historical throughput data provides the boundary, while slack is intentionally reserved for emergent or unplanned work. This slack is not wasted space—it protects stability by absorbing inevitable surprises. For example, a team that typically completes twenty points per iteration may plan for fifteen to seventeen, leaving room for urgent fixes or scope adjustments. This practice avoids overloading the team and keeps flow steady. Capacity planning is therefore about balance: delivering value while respecting variability. It demonstrates that sustainable delivery is not achieved by squeezing every drop of capacity but by preserving flexibility, protecting quality, and aligning scope with reality.
Risk-adjusted sizing acknowledges that not all items of equal apparent size carry equal uncertainty. A feature involving unfamiliar technology, heavy coupling to external systems, or reliance on vendor deliverables carries more variability than a straightforward change. Risk-adjusted sizing builds this into the plan by shaping buffers or sequencing differently. For instance, higher-risk items might be tackled earlier to surface problems while there is still time to adapt. Alternatively, the team may allocate contingency to absorb variability. By adjusting for risk, the plan becomes more resilient and less likely to collapse under surprise. This approach shifts thinking from treating all work as equal to treating it as heterogeneous, where novelty and uncertainty matter as much as nominal size. Risk-adjusted sizing integrates foresight into estimation, turning potential hazards into manageable elements of planning rather than hidden landmines.
Date-driven slicing provides a disciplined response when fixed deadlines loom. Instead of cutting quality to meet the calendar, teams decompose features into the smallest outcome-true items that can ship within the available window. For example, if a product must demonstrate compliance by a regulatory deadline, the team may deliver minimal but auditable functionality that proves adherence, leaving enhancements for later. This approach preserves quality while meeting the external constraint. It reinforces that scope is the lever, not quality or integrity. Date-driven slicing reframes deadlines from being threats to being boundaries that focus prioritization. It demands creativity in finding the smallest slice that still delivers genuine value or compliance evidence. By taking this route, teams avoid the trap of last-minute shortcuts that undermine long-term stability. The discipline of slicing rather than compressing ensures that deadlines are met responsibly and predictably.
Multi-team estimation becomes necessary when large initiatives require coordination across several groups. Without shared references, each team may size work differently, making aggregated forecasts incoherent. Multi-team estimation introduces a shared reference scale and acknowledges integration costs explicitly. For instance, two teams building complementary components might each estimate five points of effort, but integration and testing add another three points. This “integration tax” is made visible and included in planning. Shared reference stories across teams also prevent drift, ensuring that a “medium” in one group is not misaligned with a “medium” in another. Multi-team estimation is not about forcing uniformity but about creating comparability. It acknowledges that delivery at scale depends not only on local flow but also on cross-team synchronization. By accounting for integration and harmonizing references, organizations improve the reliability of program-level forecasts.
Spike estimation provides structure to exploratory work. Instead of treating spikes as open-ended research, teams define a firm timebox and expected artifacts. These artifacts may include findings, documented options, and a clear go/no-go recommendation. For example, a team exploring whether a cloud service can support required encryption might allocate two days to the spike, with the deliverable being a recommendation supported by test results. This discipline ensures spikes contribute value without becoming unbounded distractions. It also keeps planning honest, since spikes are acknowledged as time investments with tangible outputs rather than hidden effort. Spike estimation reminds teams that exploration itself is work, and it deserves the same rigor as feature development. By sizing and bounding spikes, teams transform uncertainty into knowledge systematically, reducing the risk of surprises and keeping the delivery pipeline flowing predictably.
Size-distribution analysis is a reflective practice where teams examine the spread of their work across size categories. If too many items fall into large buckets, it signals a need for better decomposition or skill development. For example, if most backlog items remain at “large” or “extra-large,” the team is likely struggling to maintain predictability and feedback tempo. Analysis might prompt splitting stories further, improving refinement practices, or investing in skills to reduce complexity. Over time, a healthy distribution shows most work clustered in small and medium sizes, with large items reserved for exceptional cases. This distribution aligns with steady flow and manageable cycle times. Size-distribution analysis turns estimation into a diagnostic tool, highlighting where refinement and decomposition need improvement. It prevents backlog inflation and ensures that the work pipeline remains sustainable.
Re-estimation policy sets boundaries to prevent churn. While it may be tempting to re-size items frequently as understanding evolves, constant re-estimation creates noise without improving flow. Instead, teams adopt the policy that items are only re-sized when scope or understanding changes materially. If priorities shift or new risks emerge, re-sizing may be appropriate. Otherwise, the team adapts the plan by re-ordering items or adjusting throughput expectations, not by repeatedly re-sizing. For example, if a story expands significantly because of regulatory changes, it is re-estimated; but if minor clarifications arise, it remains as-is. This discipline avoids wasting time while still allowing responsiveness to meaningful change. Re-estimation policy reflects agile pragmatism: accept that estimates are provisional, but resist the urge to tinker endlessly with numbers that add no real value.
Stakeholder communication is where estimation earns its real payoff. Teams must translate sizes, ranges, and probabilities into plain language that stakeholders can act on. This means sharing assumptions, caveats, and confidence levels rather than presenting numbers as guarantees. For example, a team might tell stakeholders, “We have an eighty percent confidence of completing this release by mid-June, assuming vendor integration arrives on schedule.” This kind of transparency builds trust because it acknowledges uncertainty while still providing useful guidance. It also prevents the dangerous misinterpretation of estimates as commitments. Good communication aligns expectations to reality, reducing conflict later. It frames estimation not as a promise but as a tool for decision-making, negotiation, and prioritization. When stakeholders are educated on how to interpret sizes, the entire organization benefits from more honest planning and fewer unpleasant surprises.
Flow-first alternatives highlight that estimation is not always necessary. In environments with highly stable flow and predictable throughput, teams may rely on pull-based planning: simply taking the next ready slice without upfront sizing. This method works when cycle-time distributions are narrow and consistent, making probabilistic forecasting easier without per-item estimates. For example, a mature Kanban team may focus on managing work-in-process limits and tracking throughput rather than assigning story points. This approach emphasizes actual delivery data over speculative estimation. Flow-first planning reduces overhead and shifts attention from debating numbers to improving system health. It demonstrates that estimation is a means, not an end. Where flow is stable, it may be more efficient to bypass sizing altogether, relying on throughput metrics to guide forecasting. This alternative reinforces agility by focusing on real performance rather than ritualized estimation.
Tooling hygiene ensures that estimation practices do not devolve into heavy documentation. The goal is to capture just enough rationale to provide context and traceability. For example, recording why a story was sized as an eight-point effort and linking it to its reference story provides transparency for later reflection. Lightweight charts showing size distribution or throughput trends can be useful, but they should not consume disproportionate effort. The danger is turning estimation into a bureaucratic exercise where maintaining spreadsheets overshadows the purpose of informed planning. Tooling hygiene strikes the balance: enough structure to support learning, not so much that it becomes overhead. It keeps estimation practices lean, transparent, and aligned with value. By curating only what is necessary, teams preserve agility while still enabling analysis and reflection when needed.
Remote estimation practices adapt collaborative methods for distributed teams. Techniques such as asynchronous voting, lightweight digital boards, and recorded summaries ensure inclusivity without exhausting calendars. For example, a distributed team might use an online tool where members submit estimates independently, with the results discussed in a short follow-up call. Recorded summaries allow absent members to catch up. The emphasis is on clarity and participation rather than prolonged debate. Remote practices recognize that collaboration must fit the realities of modern, globally distributed teams. They also leverage digital tools to preserve anonymity and reduce bias in ways that physical meetings sometimes cannot. By tailoring estimation to remote environments, teams maintain collective ownership while avoiding the pitfalls of time zone conflicts or meeting fatigue. This ensures that estimation remains collaborative, efficient, and aligned to team health regardless of geography.
Success measures determine whether sizing practices are actually improving delivery. Metrics such as forecast reliability, cycle-time stability, and reduced rework provide evidence that sizing is serving its purpose. If forecasts consistently align with outcomes, if cycle times show healthy distributions, and if rework decreases, then estimation is adding value. If not, it may be time to adjust methods, calibrate references, or shift toward flow-first planning. Success is not measured by the precision of numbers but by the quality of decisions and the health of delivery. Sizing should ultimately help teams and stakeholders make better choices, not create bureaucracy. Success measures provide the feedback loop that keeps estimation honest and adaptive. They reinforce the agile principle that practices are tools to serve outcomes, not rituals to be preserved for their own sake. When sizing improves clarity and predictability, it has succeeded.
In conclusion, sizing in agile practice is about collective understanding, relative comparisons, and informed forecasting, not about producing exact predictions. Part 2 has shown how probabilistic forecasting, percentile-based commitments, risk adjustments, and multi-team estimation strengthen planning under variability. Supporting practices like spike estimation, size-distribution analysis, and re-estimation policies keep sizing practical and adaptive. Stakeholder communication, flow-first alternatives, and remote practices ensure sizing is transparent and inclusive. Success measures tie it all back to outcomes, confirming that sizing improves reliability and decision-making. Together, these techniques demonstrate that estimation is not a promise but a dialogue—a way of illuminating uncertainty, framing choices, and aligning expectations. By treating sizing as a collaborative, relative measure rather than a score, teams retain honesty and agility in their planning, ensuring that work flows predictably without the illusion of false precision.

Episode 63 — Sizing: Collective Estimation and Relative Measures
Broadcast by