[翻译]  How to change the encoding during CSV parsing in Rails

[CHINESE]  如何在Rails中解析CSV时更改编码


I would like to know how can I change the encoding of my CSV file when I import it and parse it. I have this code:

我想知道如何在导入和解析CSV文件时更改CSV文件的编码。我有这个代码:

csv = CSV.parse(output, :headers => true, :col_sep => ";")
csv.each do |row|
  row = row.to_hash.with_indifferent_access
  insert_data_method(row)
end

When I read my file, I get this error:

当我读取我的文件时,我收到此错误:

Encoding::CompatibilityError in FileImportingController#load_file
incompatible character encodings: ASCII-8BIT and UTF-8

I read about row.force_encoding('utf-8') but it does not work:

我读到了关于row.force_encoding('utf-8')但它不起作用:

NoMethodError in FileImportingController#load_file
undefined method `force_encoding' for #<ActiveSupport::HashWithIndifferentAccess:0x2905ad0>

Thanks.

谢谢。

3 个解决方案

#1


15  

I had to read CSV files encoded in ISO-8859-1. Doing the documented

我必须阅读ISO-8859-1编码的CSV文件。做记录

CSV.foreach(filename, encoding:'iso-8859-1:utf-8', col_sep: ';', headers: true) do |row|

threw the exception

扔了例外

ArgumentError: invalid byte sequence in UTF-8
    from csv.rb:2027:in '=~' 
    from csv.rb:2027:in 'init_separators' 
    from csv.rb:1570:in 'initialize' 
    from csv.rb:1335:in 'new' 
    from csv.rb:1335:in 'open' 
    from csv.rb:1201:in 'foreach'

so I ended up reading the file and converting it to UTF-8 while reading, then parsing the string:

所以我最后读取文件并在读取时将其转换为UTF-8,然后解析字符串:

CSV.parse(File.open(filename, 'r:iso-8859-1:utf-8'){|f| f.read}, col_sep: ';', headers: true, header_converters: :symbol) do |row|
    pp row
end

#2


3  

force_encoding is meant to be run on a string, but it looks like you're calling it on a hash. You could say:

force_encoding意味着在字符串上运行,但看起来你在哈希上调用它。你可以说:

output.force_encoding('utf-8')
csv = CSV.parse(output, :headers => true, :col_sep => ";")
...

#3


0  

Hey I wrote a little blog post about what I did, but it's slightly more verbose than what's already been posted. For whatever reason, I couldn't get those solutions to work and this did.

嘿,我写了一篇关于我做了什么的小博客文章,但它比已发布的内容稍微冗长一些。无论出于何种原因,我无法将这些解决方案付诸实践。

This gist is that I simply replace (or in my case, remove) the invalid/undefined characters in my file then rewrite it. I used this method to convert the files:

这个要点是我只是替换(或者在我的情况下,删除)我文件中的无效/未定义字符然后重写它。我用这种方法转换文件:

def convert_to_utf8_encoding(original_file)  
  original_string = original_file.read
  final_string = original_string.encode(invalid: :replace, undef: :replace, replace: '') #If you'd rather invalid characters be replaced with something else, do so here.
  final_file = Tempfile.new('import') #No need to save a real File
  final_file.write(final_string)
  final_file.close #Don't forget me
  final_file
end 

Hope this helps.

希望这可以帮助。

Edit: No destination encoding is specified here because encode assumes that you're encoding to your default encoding which for most Rails applications is UTF-8 (I believe)

编辑:这里没有指定目标编码,因为编码假定您编码为默认编码,对于大多数Rails应用程序是UTF-8(我相信)


注意!

本站转载的文章为个人学习借鉴使用,本站对版权不负任何法律责任。如果侵犯了您的隐私权益,请联系我们删除。



 
© 2014-2018 ITdaan.com 粤ICP备14056181号