Code Smells and Detection
When to Use
When reviewing code, performing maintenance, or evaluating technical debt related to duplication.
Duplication Code Smells
| Code Smell | What to Look For | Severity |
|---|---|---|
| Exact duplication | Copy-pasted code blocks | High -- refactor immediately |
| Semantic duplication | Same business logic, different implementations | High -- consolidate |
| Structural duplication | Similar patterns serving different purposes | Low -- may be acceptable |
| Magic numbers repeated | Same constants hardcoded in multiple places | Medium -- extract to config |
| Parallel inheritance hierarchies | Adding subclass requires adding another elsewhere | High -- redesign abstraction |
| Shotgun surgery | Single change requires editing many files | High -- indicates missing abstraction |
Detection Techniques
1. Manual Code Review
# Checklist during code review:
# - Are there repeated code blocks? (exact duplication)
# - Are business rules expressed in multiple places? (knowledge duplication)
# - Are constants hardcoded repeatedly? (magic numbers)
# - Do multiple functions have similar structure? (structural duplication)
# - Would a change require editing multiple files? (shotgun surgery)
2. Automated Tools
| Tool | Language Support | Strengths | Use Case |
|---|---|---|---|
| jscpd | 150+ languages | Fast, supports many formats | Multi-language projects |
| PMD CPD | 31+ languages (Java, C++, Python, Go) | Comprehensive, XML/HTML reports | Java-focused shops |
| SonarQube | 30+ languages | Integrates with CI/CD, tracks over time | Enterprise codebases |
| Simian | Java, C#, C++, Ruby, etc. | Cross-language detection | Polyglot codebases |
| Clone Digger | Python | Deep semantic analysis | Python projects |
# Example: jscpd for detecting duplication
npx jscpd src/ --min-lines 5 --min-tokens 50 --format json
# Example: PMD CPD for Java
pmd cpd --minimum-tokens 50 --files src/ --language java --format text
# Example: SonarQube scanner
sonar-scanner \
-Dsonar.projectKey=my-project \
-Dsonar.sources=src \
-Dsonar.host.url=http://localhost:9000
3. Metrics to Track
| Metric | What It Measures | Healthy Range |
|---|---|---|
| Duplication % | Percentage of duplicated code | < 5% |
| Duplication blocks | Number of duplicated code blocks | Trend down over time |
| Shotgun surgery instances | Files changed per feature | < 5 files per change |
| Cyclomatic complexity | Branching/conditionals in functions | < 10 per function |
Identifying Wrong Abstractions
// RED FLAGS for wrong abstractions:
// 1. Lots of boolean parameters (configuration smell)
function sendNotification(
user: User,
message: string,
isEmail: boolean,
isSms: boolean,
isPush: boolean,
isUrgent: boolean
) {
// Too many conditionals based on booleans
if (isEmail) { /* ... */ }
if (isSms) { /* ... */ }
// This should be separate functions or strategy pattern
}
// 2. Null/undefined passed frequently (abstraction doesn't fit all callers)
processOrder(order, null, undefined, false); // What do these mean?
// 3. Constantly modifying abstraction for new cases
class Exporter {
export(data, format) {
if (format === 'csv') { /* ... */ }
else if (format === 'json') { /* ... */ }
else if (format === 'xml') { /* ... */ }
else if (format === 'pdf') { /* ... */ } // Keep adding...
// This should be strategy pattern or separate classes
}
}
// 4. Abstraction name is vague
class DataManager { /* What does this even do? */ }
class Helper { /* Helper for what? */ }
class Utility { /* Too generic */ }
Code Review Checklist
When reviewing for duplication:
- [ ] Is there exact code duplication (copy-paste)?
- [ ] Is the same business logic expressed in multiple places?
- [ ] Are there magic numbers or strings that should be constants?
- [ ] If I change a business rule, how many files do I need to edit?
- [ ] Are there similar functions that could be parameterized or generalized?
- [ ] Is this abstraction helping or hiding complexity?
- [ ] Could a developer understand this code in 6 months?
When creating abstractions:
- [ ] Have I seen this pattern at least 3 times?
- [ ] Do all instances represent the same knowledge?
- [ ] Is the abstraction simpler than the duplication?
- [ ] Are requirements stable enough to abstract?
- [ ] Can I give the abstraction a clear, specific name?
- [ ] Does the abstraction have fewer than 5 parameters?
- [ ] Will this abstraction make future changes easier?
Refactoring Strategy
1. Identify duplication (tools + manual review)
|
2. Classify type:
- Exact -> Extract function/class immediately
- Knowledge -> Consolidate to single source
- Incidental -> Leave separate
- Imposed -> Code generation or accept
|
3. Apply Rule of Three:
- 1-2 instances -> Wait
- 3+ instances -> Consider refactoring
|
4. Choose abstraction strategy:
- Simple logic -> Extract function
- State + behavior -> Extract class
- Cross-cutting -> Decorator/middleware
- Variation -> Strategy pattern
|
5. Validate:
- Tests pass
- Code is simpler
- Future changes are easier
- No hidden coupling introduced
Common Mistakes
- Relying only on tools — Tools catch syntactic duplication, miss semantic/knowledge duplication
- Ignoring structural duplication — Sometimes similar code serves different purposes (OK to keep)
- Not tracking duplication over time — Need baseline and trend analysis
- Refactoring without tests — Risky; write tests first, then refactor
- Treating all duplication as urgent — Prioritize knowledge duplication and security-critical code