Skip to content

Specify the columns to read from file #154

@daschw

Description

@daschw

I often have to deal with CSV files that where edited by someone in Excel resulting in something like
test.csv:

datacol1, datacol2, ,
0, 1, , [some annoying comment]

When I try to read such a file (CSV.read("test.csv")) I get

MethodError: Cannot `convert` an object of type WeakRefString{UInt8} to an object of type Missings.Missing
This may have arisen from a call to the constructor Missings.Missing(...),
since type constructors fall back to convert methods.
setindex!(::Array{Missings.Missing,1}, ::WeakRefString{UInt8}, ::Int64) at array.jl:583
streamto! at io.jl:303 [inlined]
macro expansion at DataStreams.jl:547 [inlined]
stream!(::CSV.Source{Base.AbstractIOBuffer{Array{UInt8,1}},Missings.Missing}, ::Type{DataStreams.Data.Field}, ::DataFrames.DataFrameStream{Tuple{Array{Int64,1},Array{Int64,1},Array{Missings.Missing,1},CategoricalArrays.CategoricalArray{String,1,UInt32,String,CategoricalArrays.CategoricalString{UInt32},Union{}}}}, ::DataStreams.Data.Schema{true,Tuple{Int64,Int64,Missings.Missing,CategoricalArrays.CategoricalValue{String,UInt32}}}, ::Int64, ::NTuple{4,Base.#identity}, ::DataStreams.Data.##15#16, ::Array{Any,1}, ::Type{Ref{(:datacol1, :datacol2, Symbol(""), Symbol(""))}}) at DataStreams.jl:614
#stream!#17(::Bool, ::Dict{Int64,Function}, ::Function, ::Array{Any,1}, ::Array{Any,1}, ::Function, ::CSV.Source{Base.AbstractIOBuffer{Array{UInt8,1}},Missings.Missing}, ::Type{DataFrames.DataFrame}) at DataStreams.jl:490
(::DataStreams.Data.#kw##stream!)(::Array{Any,1}, ::DataStreams.Data.#stream!, ::CSV.Source{Base.AbstractIOBuffer{Array{UInt8,1}},Missings.Missing}, ::Type{DataFrames.DataFrame}) at <missing>:0
#read#43(::Bool, ::Dict{Int64,Function}, ::Bool, ::Array{Any,1}, ::Function, ::String, ::Type{T} where T) at Source.jl:312
read(::String) at Source.jl:311
include_string(::String, ::String) at loading.jl:522
eval(::Module, ::Any) at boot.jl:235
(::Atom.##63#66)() at eval.jl:104
withpath(::Atom.##63#66, ::Void) at utils.jl:30
withpath(::Function, ::Void) at eval.jl:38
macro expansion at eval.jl:103 [inlined]
(::Atom.##62#65{Dict{String,Any}})() at task.jl:80

It would be nice to be able to specify the columns to read with a keyword like Pandas' usecols to be able to easily avoid such issues.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions