Performance¶
Optimizing bash scripts for speed and efficiency. Know when to optimize and when to use a different tool.
The First Rule¶
Premature optimization is the root of all evil. - Donald Knuth
Before optimizing:
- Does the script run fast enough?
- Is the bottleneck in bash or external commands?
- Would a different language be more appropriate?
Profiling Scripts¶
Simple Timing¶
Output:
real 0m5.032s # Wall clock time
user 0m1.234s # CPU time in user mode
sys 0m0.567s # CPU time in kernel mode
Timing Sections¶
#!/usr/bin/env bash
start=$(date +%s.%N)
# ... code section ...
end=$(date +%s.%N)
echo "Section took: $(echo "$end - $start" | bc) seconds"
DEBUG Trap Profiling¶
#!/usr/bin/env bash
exec 3>&2 2>/tmp/bashprofile.$$
BASH_XTRACEFD=3
PS4='+ $(date +%s.%N) ${FUNCNAME[0]:+${FUNCNAME[0]}():}line $LINENO: '
set -x
# ... your script ...
set +x
exec 2>&3 3>&-
Using time for Loops¶
Common Performance Issues¶
External Commands in Loops¶
Problem: Each external command forks a new process.
Solution: Use bash built-ins.
Unnecessary Subshells¶
Problem: $() creates a subshell.
cat Abuse¶
Problem: Useless use of cat.
# Slow
cat file.txt | grep "pattern"
# Fast
grep "pattern" file.txt
# Also slow
content=$(cat file.txt)
# Fast
content=$(<file.txt)
Reading Files¶
Problem: Loading entire file into memory.
Solution: Stream processing.
String Operations¶
Problem: External tools for simple strings.
# Slow
filename=$(basename "$path")
extension=$(echo "$file" | sed 's/.*\.//')
# Fast - parameter expansion
filename="${path##*/}"
extension="${file##*.}"
Built-in vs External¶
Use Built-ins When Possible¶
| Task | External (Slow) | Built-in (Fast) |
|---|---|---|
| Arithmetic | expr 5 + 3 | $((5 + 3)) |
| Substring | echo "$s" \| cut -c1-5 | ${s:0:5} |
| Replace | echo "$s" \| sed 's/a/b/' | ${s/a/b} |
| Basename | basename "$p" | ${p##*/} |
| Dirname | dirname "$p" | ${p%/*} |
| Length | echo "$s" \| wc -c | ${#s} |
| Uppercase | echo "$s" \| tr a-z A-Z | ${s^^} |
| Test file | test -f "$f" | [[ -f "$f" ]] |
When External is Faster¶
For large-scale text processing, specialized tools beat bash:
# For processing large files, awk is faster
awk '{sum += $1} END {print sum}' hugefile.txt
# Faster than
sum=0
while read num; do
((sum += num))
done < hugefile.txt
Array Performance¶
Appending to Arrays¶
# Slow - recreates array each time
for i in {1..1000}; do
arr=("${arr[@]}" "$i")
done
# Fast - append operator
for i in {1..1000}; do
arr+=("$i")
done
Array vs String¶
Loop Optimizations¶
Move Invariants Out¶
# Slow - calculates every iteration
for file in *.txt; do
base_dir=$(pwd)
process "$base_dir/$file"
done
# Fast - calculate once
base_dir=$(pwd)
for file in *.txt; do
process "$base_dir/$file"
done
Batch External Commands¶
# Slow - one process per file
for file in *.txt; do
wc -l "$file"
done
# Fast - one process for all
wc -l *.txt
Use find -exec + or xargs¶
# Slow - one process per file
find . -name "*.txt" -exec wc -l {} \;
# Fast - batch processing
find . -name "*.txt" -exec wc -l {} +
# Or with xargs
find . -name "*.txt" | xargs wc -l
Conditional Optimizations¶
Short-Circuit Evaluation¶
Case vs If-Elif¶
For many conditions, case can be faster:
# Many elif branches
if [[ "$cmd" == "start" ]]; then
start
elif [[ "$cmd" == "stop" ]]; then
stop
elif [[ "$cmd" == "restart" ]]; then
restart
fi
# case - often faster for string matching
case "$cmd" in
start) start ;;
stop) stop ;;
restart) restart ;;
esac
I/O Optimizations¶
Batch Output¶
# Slow - many small writes
for i in {1..1000}; do
echo "$i" >> output.txt
done
# Fast - single write
{
for i in {1..1000}; do
echo "$i"
done
} > output.txt
Avoid Repeated File Opens¶
# Slow - opens file 1000 times
for i in {1..1000}; do
echo "$i" >> output.txt
done
# Fast - opens once
exec 3>>output.txt
for i in {1..1000}; do
echo "$i" >&3
done
exec 3>&-
Process Substitution vs Temp Files¶
# With temp file
cmd1 > /tmp/temp.txt
cmd2 < /tmp/temp.txt
rm /tmp/temp.txt
# With process substitution - no temp file
cmd2 < <(cmd1)
Parallel Execution¶
Simple Parallelism¶
Controlled Parallelism¶
max_jobs=4
for item in "${items[@]}"; do
while (( $(jobs -rp | wc -l) >= max_jobs )); do
sleep 0.1
done
process "$item" &
done
wait
Using GNU Parallel¶
When Not to Use Bash¶
Consider other languages when:
- Processing large data sets
- Complex data structures needed
- Floating-point math required
- Performance is critical
- Cross-platform compatibility needed
Better Alternatives¶
| Task | Better Tool |
|---|---|
| JSON processing | jq, Python |
| Large text processing | awk, sed, Python |
| Complex logic | Python, Ruby |
| Numerical computing | Python, R |
| Web requests | Python, curl |
Benchmarking¶
Compare Approaches¶
#!/usr/bin/env bash
echo "Testing external command:"
time for i in {1..1000}; do
result=$(expr $i + 1)
done
echo "Testing arithmetic expansion:"
time for i in {1..1000}; do
((result = i + 1))
done
Iterations Matter¶
Run enough iterations to get meaningful results:
# Too few - noise dominates
time for i in {1..10}; do operation; done
# Better
time for i in {1..10000}; do operation; done
Summary¶
Quick Wins¶
| Instead of | Use |
|---|---|
$(echo "$var") | "$var" |
cat file \| cmd | cmd < file |
$(cat file) | $(<file) |
expr $a + $b | $((a + b)) |
basename "$p" | ${p##*/} |
echo "$s" \| wc -c | ${#s} |
Performance Checklist¶
- Avoid external commands in loops
- Use parameter expansion for strings
- Use
(( ))for arithmetic - Batch I/O operations
- Use
[[ ]]instead of[ ] - Consider parallel execution
- Use appropriate tools for large data
- Profile before optimizing
Rules of Thumb¶
- External command: ~10ms overhead per call
- Subshell: ~1-5ms overhead
- Built-in operations: microseconds
- File operations: depends on I/O
Remember: Clarity often matters more than micro-optimizations. Optimize only when necessary and profile to find real bottlenecks.