tool-evaluation

Name: tool-evaluation
Author: elophanto/EloPhanto

$npx mdskill add elophanto/EloPhanto/tool-evaluation

Evaluate tools with weighted scoring and TCO analysis for strategic adoption.

Identifies hidden costs and compares vendor stability against business needs.
Integrates web_search for market research and shell_execute for testing.
Calculates ROI scenarios using quantitative metrics and qualitative feedback.
Delivers structured recommendations with clear success metrics and timelines.

SKILL.md

.github/skills/tool-evaluationView on GitHub ↗

---
name: tool-evaluation
description: Evaluate, test, and recommend tools and platforms with weighted scoring, TCO analysis, and strategic adoption planning. Adapted from msitarzewski/agency-agents.
---

## Triggers

- tool evaluation
- software comparison
- platform assessment
- tool recommendation
- vendor evaluation
- technology assessment
- tool selection
- software review
- buy vs build
- tool adoption
- platform comparison
- TCO analysis
- tool ROI
- vendor selection
- technology stack

## Instructions

### Requirements Gathering and Tool Discovery
- Conduct stakeholder interviews to understand requirements and pain points
- Research market landscape and identify potential tool candidates using `web_search`
- Define evaluation criteria with weighted importance based on business priorities
- Establish success metrics and evaluation timeline

### Comprehensive Tool Testing
- Set up structured testing environment with realistic data and scenarios
- Test functionality, usability, performance, security, and integration capabilities
- Conduct user acceptance testing with representative user groups
- Document findings with quantitative metrics and qualitative feedback using `shell_execute`

### Financial and Risk Analysis
- Calculate total cost of ownership (TCO) including hidden costs and scaling fees
- Assess vendor stability and strategic alignment
- Evaluate implementation risk and change management requirements
- Analyze ROI scenarios with different adoption rates and usage patterns
- Use `knowledge_write` to store evaluation frameworks and past results

### Implementation Planning and Vendor Selection
- Create detailed implementation roadmap with phases and milestones
- Negotiate contract terms and service level agreements
- Develop training and change management strategy
- Establish success metrics and monitoring systems

### Evaluation Standards
- Always test tools with real-world scenarios and actual user data
- Use quantitative metrics and statistical analysis for comparisons
- Validate vendor claims through independent testing and user references
- Document evaluation methodology for reproducible decisions
- Consider long-term strategic impact beyond immediate feature requirements
- Factor in training, migration, and change management costs

### Weighted Evaluation Criteria
- Functionality: 25% -- Core feature completeness
- Usability: 20% -- User experience and ease of use
- Performance: 15% -- Speed, reliability, scalability
- Security: 15% -- Data protection and compliance
- Integration: 10% -- API quality and system compatibility
- Support: 8% -- Vendor support quality and documentation
- Cost: 7% -- Total cost of ownership and value

## Deliverables

### Tool Evaluation Report Template

```markdown
# [Tool Category] Evaluation and Recommendation Report

## Executive Summary
**Recommended Solution**: [Top-ranked tool with key differentiators]
**Investment Required**: [Total cost with ROI timeline and break-even analysis]
**Implementation Timeline**: [Phases with milestones and resource requirements]
**Business Impact**: [Quantified productivity gains and efficiency improvements]

## Evaluation Results
**Tool Comparison Matrix**: [Weighted scoring across all criteria]
**Category Leaders**: [Best-in-class tools for specific capabilities]
**Performance Benchmarks**: [Quantitative performance testing results]
**User Experience Ratings**: [Usability testing results across user roles]

## Financial Analysis
**Total Cost of Ownership**: [3-year TCO breakdown with sensitivity analysis]
**ROI Calculation**: [Projected returns with different adoption scenarios]
**Cost Comparison**: [Per-user costs and scaling implications]
**Budget Impact**: [Annual budget requirements and payment options]

## Risk Assessment
**Implementation Risks**: [Technical, organizational, and vendor risks]
**Security Evaluation**: [Compliance, data protection, vulnerability assessment]
**Vendor Assessment**: [Stability, roadmap alignment, partnership potential]
**Mitigation Strategies**: [Risk reduction and contingency planning]

## Implementation Strategy
**Rollout Plan**: [Phased implementation with pilot and full deployment]
**Change Management**: [Training strategy, communication plan, adoption support]
**Integration Requirements**: [Technical integration and data migration planning]
**Success Metrics**: [KPIs for measuring implementation success and ROI]

---
**Confidence Level**: [High/Medium/Low with supporting methodology]
**Next Review**: [Scheduled re-evaluation timeline]
```

## Success Metrics

- 90% of tool recommendations meet or exceed expected performance after implementation
- 85% successful adoption rate for recommended tools within 6 months
- 20% average reduction in tool costs through optimization and negotiation
- 25% average ROI achievement for recommended tool investments
- 4.5/5 stakeholder satisfaction rating for evaluation process and outcomes

## Verify

- The intended other agent / tool / channel actually received the message; an ack, message ID, or response payload is captured
- Identity, scopes, and permissions used by the call were the minimum required; over-permissioned tokens are called out
- Failure handling was exercised: at least one retry/timeout/permission-denied path is shown to behave as designed
- Hand-off context passed to the next actor is complete enough that the receiver could act without a follow-up question
- Any state mutated (config, memory, queue, file) is listed with before/after values, not just 'updated'
- Sensitive material (keys, tokens, PII) was redacted from logs/transcripts shared in the verification evidence