问题描述
我试图将iconv指向目录,并且无论当前编码如何,所有文件都将转换为UTF-8
我正在使用此脚本,但您必须指定要使用的编码。如何使其自动检测当前编码?
#!/bin/bash
ICONVBIN='/usr/bin/iconv' # path to iconv binary
if [ $# -lt 3 ]
then
echo "$0 dir from_charset to_charset"
exit
fi
for f in $1/*
do
if test -f $f
then
echo -e "\nConverting $f"
/bin/mv $f $f.old
$ICONVBIN -f $2 -t $3 $f.old > $f
else
echo -e "\nSkipping $f - not a regular file";
fi
done
终端线
sudo convert/dir_iconv.sh convert/books CURRENT_ENCODING utf8
最佳方法
也许您正在寻找enca
:
Enca is an Extremely Naive Charset Analyser. It detects character set and encoding of text files and can also convert them to other encodings using either a built-in converter or external libraries and tools like libiconv, librecode, or cstocs.
Currently it supports Belarusian, Bulgarian, Croatian, Czech, Estonian, Hungarian, Latvian, Lithuanian, Polish, Russian, Slovak, Slovene, Ukrainian, Chinese, and some multibyte encodings independently on language.
请注意,通常,自动检测当前编码是一个困难的过程(相同的字节序列可以是多种编码中的正确文本)。 enca
使用启发式方法,基于您告诉它要检测的语言(以限制编码数量)。您可以使用enconv
到convert text files单个编码。
次佳方法
您可以使用标准的gnu utils文件和awk获得所需的内容。例:
file -bi .xsession-errors
给我:“文本/纯文本;字符集= us-ascii”
所以file -bi .xsession-errors |awk -F "=" '{print $2}'
给我”us-ascii”
我在像这样的脚本中使用它:
CHARSET="$(file -bi "$i"|awk -F "=" '{print $2}')"
if [ "$CHARSET" != utf-8 ]; then
iconv -f "$CHARSET" -t utf8 "$i" -o outfile
fi
第三种方法
全部编译。转到目录,创建dir2utf8.sh:
#!/bin/bash
# converting all files in a dir to utf8
for f in *
do
if test -f $f then
echo -e "\nConverting $f"
CHARSET="$( file -bi "$f"|awk -F "=" '{print $2}')"
if [ "$CHARSET" != utf-8 ]; then
iconv -f "$CHARSET" -t utf8 "$f" -o "$f"
fi
else
echo -e "\nSkipping $f - it's a regular file";
fi
done
第四种方法
这是我放置所有文件的解决方案:
#!/bin/bash
apt-get -y install recode uchardet > /dev/null
find "$1" -type f | while read FFN # 'dir' should be changed...
do
encoding=$(uchardet "$FFN")
echo "$FFN: $encoding"
enc=`echo $encoding | sed 's#^x-mac-#mac#'`
set +x
recode $enc..UTF-8 "$FFN"
done
https://gist.github.com/demofly/25f856a96c29b89baa32
将其放入convert-dir-to-utf8.sh
并运行:
bash convert-dir-to-utf8.sh /pat/to/my/trash/dir
请注意,sed是Mac编码的解决方法。许多不常见的编码都需要这样的解决方法。