Methodology

How we test + score outdoor gear.

Every review on this site is built from a four-dimension scoring rubric — Overall, Value, Durability, and Comfort — pulled together from a fixed set of public, attributable data sources. Here's exactly what each score means, how it's calculated, and what we deliberately won't do.

The four scores

What Overall, Value, Durability, and Comfort actually measure.

Overall

1.0 – 10.0

Weighted average of the three sub-scores plus a category-specific 'critical-flaw' modifier. A single deal-breaker (a rain shell that wets out, a stove that won't simmer) caps Overall at 6.5 no matter how the sub-scores look.

Inputs

· All three sub-scores below
· Whether the product fails its primary job in any common-but-not-extreme condition
· Long-term reputation signals from public trail reports and editorial outlets — never Amazon customer reviews

Value

1.0 – 10.0

Price-vs-performance against the three closest peers in the same category and price tier. A 9.5 means category-best for the dollar; a 5.0 means you're paying for a brand more than the gear.

Inputs

· Current Amazon price (last checked timestamp printed on every review)
· Spec parity vs. similarly-priced peers (weight, fabric, capacity, warranty)
· Long-term cost: replaceable parts, reparability, expected lifespan

Durability

1.0 – 10.0

Materials and construction quality, plus how the gear is reported to age over multiple seasons. Heavily category-dependent — a 7.5 on a 14oz running shoe means something different than a 7.5 on a 4lb hardshell tent.

Inputs

· Manufacturer-published fabric weights, denier, plate counts, etc.
· Stitching, seams, zippers, closures, and other failure-prone components
· Real-world wear reports from public trail journals (no Amazon customer reviews)
· Warranty terms — a meaningful warranty is a durability signal

Comfort & fit

1.0 – 10.0

How the gear feels in use over a representative window for its category — a few hours for a stove, a full multi-day trip for a backpack or sleeping bag. We also score how forgiving the fit is to body type and edge cases.

Inputs

· Manufacturer fit specs (torso range, last shape, cut)
· Cross-section of public field reports flagging hot spots, fit mismatches, sleep-temperature issues
· Documented adjustability range (straps, vents, cinches)

Awards on roundup pages

How "Best Overall," "Best Value," and the rest are assigned.

On every Best of roundup, the at-a-glance matrix awards four badges — one per scoring dimension. The leader in each dimension wins that badge, full stop. There is no manual override, no editor's-choice tiebreaker, no "Best Overall" award shuffled because of brand familiarity. If two products tie within 0.1, we award both and leave the call to the reader.

The point of the matrix is to make the scoring legible. If our "Best Value" pick only scores 6.2 — that means the whole category is mid right now, and the matrix will say so. We won't manufacture a 9.0 to make the page feel like an endorsement.

Data sources

Where the numbers come from.

Manufacturer specs: Primary source for weight, dimensions, fabrics, ratings, capacity, warranty.
Independent editorial reviews: OutdoorGearLab, Switchback Travel, Section Hiker, Wirecutter, Backpacker, Outside, GearJunkie.
Public trail reports: Long-distance trail journals (Trailjournals.com, blogs, YouTube field reviews) where we can attribute the trip and conditions.
Subreddit field intelligence: r/Ultralight, r/CampingGear, r/wildernessbackpacking and category-specific subs — we read for patterns, not for individual quotes.
Hands-on testing: Where we have direct experience with the product or a generation-adjacent version of it. Always called out explicitly when present.
Public repair / failure logs: Manufacturer recall pages, REI customer-feedback pages on durability, retailer Q&A — anywhere the failure modes for a product are documented in public.

What we won't do

The hard rules.

Use Amazon customer reviews or Amazon star ratings as a data source. The Associates Operating Agreement prohibits it, and crowd ratings are heavily skewed by sample bias and review-gating.
Inflate a score because a brand sponsors a podcast, advertises elsewhere, or sends review samples. We don't accept review samples in exchange for guaranteed coverage.
Hide a critical flaw in the prose and bury it in 'cons.' If a product fails its core job, the verdict says so first, before anything else.
Recommend a product because it's the cheapest, when the cheapest is wrong. Value is price ÷ performance, not price alone.
Use stock photography of a product and imply we have one in hand. Hero images are clearly editorial; manufacturer product photos are clearly labeled.