How SubChoice Rates AI Plans

The exact scoring methodology behind every SubChoice rating: 5 criteria, 8 dimensions, and calibrated anchors.

Scoring Criteria

Each dimension score is derived from five weighted criteria:

SubChoice scoring criteria and weights
Criterion Weight What It Measures
Feature coverage 30% How many relevant features for this use case the plan includes
Model quality 25% Quality/capability of the AI models available in the plan
Usage limits 20% How generous the plan's usage allowances are for this use case
Value 15% Price relative to what you get for this specific use case
Bundled tools 10% Relevant bundled tools/integrations included in the plan

Dimension Scale Definitions

Plans are scored 1–10 per dimension. Each tier has concrete, observable criteria — not subjective impressions.

Score scale labels and meanings
Score Label Meaning
9–10 Excellent Purpose-built for this use case
7–8 Strong Very capable, minor gaps
5–6 Adequate Usable but not optimized
3–4 Limited Can technically do it, significant limitations
1–2 Not designed Not intended for this use case

Coding

Software development, debugging, code review, and code generation — evaluated on how well the plan supports a developer's day-to-day workflow.

Coding dimension scoring tiers
Score Tier Criteria
9–10 Excellent Dedicated AI IDE or deep IDE extension with inline completions, multi-file context editing, code agents with autonomous execution, terminal integration, and support for all major languages and frameworks
7–8 Strong Excellent code generation via chat, large context window enabling multi-file review, some agentic coding capability or IDE integration, supports all major languages
5–6 Adequate Can write and explain code in chat but lacks IDE integration, limited context window constrains multi-file work, no autonomous code execution
3–4 Limited Basic code output via general-purpose chat, no specialized coding tools, no IDE integration, struggles with complex multi-file codebases
1–2 Not designed Writing, SEO, or creative tool with no dedicated code features — incidental code output only

Writing

Blog posts, marketing copy, long-form articles, creative writing, and content optimization — evaluated on language model quality for text generation and any writing-specific tooling.

Writing dimension scoring tiers
Score Tier Criteria
9–10 Excellent Purpose-built writing platform with brand voice customization, SEO integration, template library (50+), bulk content generation, plagiarism detection, tone analysis, and top-tier language models
7–8 Strong Top-tier language model with exceptional prose quality, long-form document support, revision workflow, voice consistency; possibly missing specialized writing features but output quality is excellent
5–6 Adequate Good language model capable of writing assistance, but limited document context, no brand voice, no writing-specific templates or workflows
3–4 Limited Can produce written content but not optimized for it — designed for another purpose (coding, image gen), limited text context, basic prose
1–2 Not designed Coding IDE or specialized tool where writing is a side effect, not a feature

Research

Deep research, information synthesis, multi-source analysis, and knowledge retrieval — evaluated on web search quality, context capacity for document analysis, and ability to synthesize complex information.

Research dimension scoring tiers
Score Tier Criteria
9–10 Excellent Purpose-built research platform with multi-source web search, automatic citation, academic database access, multi-document synthesis in a single session, and structured report generation
7–8 Strong High-quality web search with citation, large context window enabling full-document analysis, strong synthesis capability, can handle multi-step research questions accurately
5–6 Adequate Has web search but limited depth, moderate context window, can answer factual questions but struggles with complex multi-source synthesis or long document analysis
3–4 Limited Primarily uses training data, limited or no web search, cannot analyze uploaded documents thoroughly, struggles with questions requiring current information
1–2 Not designed Coding IDE or creative tool where research is not a designed capability; provides no web search or document analysis

Creative

Image generation, video creation, graphic design, and visual creative projects — evaluated on native image/video generation capability, quality of creative output, and breadth of visual tools.

Creative dimension scoring tiers
Score Tier Criteria
9–10 Excellent Purpose-built creative platform with native high-quality image or video generation, multiple style options, editing/inpainting tools, commercial license, high output volume
7–8 Strong Native image generation with good quality and style diversity, sufficient for most creative tasks; may lack video or advanced editing features
5–6 Adequate Has image generation but limited quality, style range, or generation quota; not the primary use case of the platform
3–4 Limited Basic image generation as a side feature, very limited quota, lower quality relative to dedicated creative tools
1–2 Not designed No native image or video generation; tool is built for text, code, or research — creative output is not a designed capability

Business

Project management, business documentation, team productivity, meeting notes, and organizational workflows — evaluated on team collaboration features, document management, workflow automation, and integrations with business tools.

Business dimension scoring tiers
Score Tier Criteria
9–10 Excellent Purpose-built business platform with project management, databases, shared knowledge base, workflow automation, SSO/SCIM, admin controls, deep Slack/Jira/Google Workspace integration
7–8 Strong Strong document creation and summarization, good business writing, some workflow automation; team features present but not the primary focus; works well in business settings
5–6 Adequate Useful for business writing and document review, but minimal native collaboration, no project management, limited integrations
3–4 Limited Can generate business documents in chat but lacks any native business tooling — no PM features, no integrations, no team collaboration
1–2 Not designed Coding IDE or creative tool — business productivity is incidental, no collaboration or document management features

Learning

Education, tutoring, skill development, and structured knowledge acquisition — evaluated on ability to explain concepts at varying levels, generate quizzes/exercises, provide Socratic dialogue, and support a learner's comprehension arc.

Learning dimension scoring tiers
Score Tier Criteria
9–10 Excellent Purpose-built tutoring platform with structured curriculum, adaptive difficulty, spaced repetition, quiz generation, progress tracking, and expert tutors for specific domains
7–8 Strong Excellent at explaining complex concepts at any level, generates practice problems and quizzes on demand, engages in Socratic dialogue, large context for extended learning sessions
5–6 Adequate Can explain concepts and answer follow-up questions, but limited session context, no structured curriculum, does not adapt to learner level proactively
3–4 Limited Can answer factual questions about a topic but not optimized for teaching — no quiz generation, no adaptive explanation depth, no structured pedagogy
1–2 Not designed Coding IDE or narrow-domain tool with no teaching capability; explanations are incidental to primary function

General

Daily assistant tasks — Q&A, casual chat, task management, scheduling assistance, general productivity, and anything that doesn't fit a specialized category — evaluated on versatility, response quality, and breadth of handled task types.

General dimension scoring tiers
Score Tier Criteria
9–10 Excellent Highly versatile assistant that handles any daily task well; fast, accurate, multi-modal, persistent memory, proactive suggestions, handles ambiguous or casual requests gracefully
7–8 Strong Very capable general assistant; handles most daily tasks reliably; good conversation quality; may lack memory or be slightly slower; minimal refusals on non-sensitive topics
5–6 Adequate Useful for general questions and casual chat but has noticeable gaps — limited memory, topic restrictions, slower, or less accurate on off-the-wall requests
3–4 Limited Can answer basic questions but designed for a specific context; feels awkward for casual or unrelated daily tasks; limited instruction-following for varied requests
1–2 Not designed Purpose-built tool where general chat is actively off-scope (coding IDE, SEO tool) — general queries are tolerated but not supported

Automation

Workflow automation, AI agents, multi-step automated tasks, pipelines, and autonomous task execution — evaluated on native agent capabilities, API/integration access, multi-step reasoning reliability, and ability to complete tasks with minimal human supervision.

Automation dimension scoring tiers
Score Tier Criteria
9–10 Excellent Purpose-built automation platform or agent framework with visual workflow builder, 100+ integrations, reliable multi-step autonomous execution, error handling, and scheduling
7–8 Strong Native agent mode with multi-step task execution, tool use (web browsing, code execution, file management), API access, and reliable task completion on complex workflows
5–6 Adequate Some agentic capability but limited reliability on complex multi-step tasks, limited integrations, requires more human oversight than a dedicated automation tool
3–4 Limited Basic task chaining in chat, no true autonomous execution, API available but not automation-native; can describe workflows but cannot reliably execute them
1–2 Not designed No agent or automation features; tool is designed for synchronous interactive use only

Calibration Anchor Table

All scores are calibrated against these five anchor vendors (Pro tier). Non-anchor plans derive scores relative to their anchor using documented tier delta rules. Source: Board Round 4 (2026-03-23) — 7-member consensus.

Calibration anchor scores by plan and dimension
Plan Coding Writing Research Creative Business Learning General Automation
ChatGPT Plus 7 8 8 7 7 8 9 4
Claude Pro 8 9 8 5 7 8 8 5
Gemini Pro 6 7 7 6 6 7 8 3
Cursor Pro 9 2 2 1 2 3 3 5
Windsurf Pro 8 2 2 1 2 3 3 4

See these scores in action

The AI Stack Optimizer uses this scoring methodology to recommend the best AI tool combination for your workflow — with real savings calculations.

Try the Stack Optimizer →

Frequently Asked Questions

How does SubChoice score AI tools?

SubChoice rates each AI tool plan across 8 use-case dimensions on a 1-10 integer scale. Each score is derived from 5 weighted criteria: Feature coverage (30%), Model quality (25%), Usage limits (20%), Value (15%), and Bundled tools (10%). Scores are calibrated against anchor vendors to ensure consistency.

What does a score of 9 or 10 mean?

A score of 9-10 means best-in-class for that use case. The plan offers comprehensive, specialized features with minimal limitations. For example, a coding score of 10 indicates a dedicated code-first platform with advanced IDE integration, agent capabilities, and generous usage limits.

How often are scores updated?

Scores are reviewed whenever a vendor updates their pricing, features, or model lineup. Each vendor file includes a last_verified date showing when the data was last confirmed against the vendor's live pricing page.

Who decides the scores?

Scores are determined by a documented algorithm based on observable criteria, not subjective opinion. The algorithm was reviewed and approved by SubChoice's advisory board (CTO, CDO, and Skeptic roles). Every score has a written rationale in the scores-rationale document.

Can I see the full scoring algorithm?

Yes. The complete scoring algorithm, including per-dimension field mappings, tier definitions, and delta rules for pricing tiers, is published in our documentation. This page summarizes the key elements; the full technical specification is available in our open-source repository.

Ready to compare? Compare AI plans side by side.