From: Junhui Liao on



> File.open("../original_data/test_2lines.tsv") do |file|
> first_line = file.readline
> first_line_times = first_line.chomp.split("\t").each_slice(2).map
> {|time,signal| time}
> write_line_to_file first_line
> file.each_line do |record|
> line_data = record.chomp.split("\t")
> write_line_to_file line_data, first_line_times
> end
> end
>
> def write_line_to_file line, base_time = Hash.new(0)
> line_data.each_slice(2).with_index do |(time,signal), index|
> File.open("#{index}_debug_split"+".tsv" , "w") do |f|
> f << "#{time.to_i - base_time[index]}\t#{signal}\n"
> end
> end
> end
>
> Hope this gives you an idea to explore,
>
> Jesus.


Dear Jesús Gabriel y Galán,

Thanks a lot for your second comment.

While to the following script, there are few problems
while running.

1, To this line " write_line_to_file first_line" and
this line "write_line_to_file line_data, first_line_times ",
the error message like following,

in `block (2 levels) in <class:File_spliter>': undefined method
`write_line_to_file' for File_spliter:Class (NoMethodError)

It seemed the "write_line_to_file" has no definition before calling.
I tried to make a class to "surround" your code, but same error
occurred.


2, I don't understand this two lines.
2.1, "write_line_to_file first_line". Since the definition of
write_line_to_file has
two arguments, while here just only one, and I don't know the purpose of
this
line.

2.2, "line_data.each_slice(2).with_index do |(time,signal), index|".
Why is not "line.each_slice(2).with_index do |(time,signal), index|" ?

Thanks again !
Best regards,
Junhui


--
Posted via http://www.ruby-forum.com/.

From: Jesús Gabriel y Galán on
On Fri, Jul 30, 2010 at 2:18 PM, Junhui Liao <junhui.liao(a)uclouvain.be> wrote:
>
>
>
>> File.open("../original_data/test_2lines.tsv") do |file|
>>   first_line = file.readline
>>   first_line_times = first_line.chomp.split("\t").each_slice(2).map
>> {|time,signal| time}
>>   write_line_to_file first_line
>>   file.each_line do |record|
>>     line_data = record.chomp.split("\t")
>>       write_line_to_file line_data, first_line_times
>>   end
>> end
>>
>> def write_line_to_file line, base_time = Hash.new(0)
>>   line_data.each_slice(2).with_index do |(time,signal), index|
>>     File.open("#{index}_debug_split"+".tsv" , "w") do |f|
>>       f << "#{time.to_i - base_time[index]}\t#{signal}\n"
>>     end
>>   end
>> end
>>
>> Hope this gives you an idea to explore,
>>
>> Jesus.
>
>
> Dear Jesús Gabriel y Galán,
>
> Thanks a lot for your second comment.
>
> While to the following script, there are few problems
> while running.
>
> 1, To this line " write_line_to_file first_line" and
> this line "write_line_to_file line_data, first_line_times ",
> the error message like following,
>
> in `block (2 levels) in <class:File_spliter>': undefined method
> `write_line_to_file' for File_spliter:Class (NoMethodError)
>
> It seemed the "write_line_to_file" has no definition before calling.
> I tried to make a class to "surround" your code, but same error
> occurred.

Sorry, the method write_line_to_file should be defined before the
other block of code, so that it exists when it's called.

>
>
> 2, I don't understand this two lines.
> 2.1, "write_line_to_file first_line". Since the definition of
> write_line_to_file has
> two arguments, while here just only one, and I don't know the purpose of
> this
> line.

It has two arguments, but the second is optional, and if not passed,
it will be assigned a new Hash(0).
See the def of the method.

>
> 2.2, "line_data.each_slice(2).with_index do |(time,signal), index|".
> Why is not "line.each_slice(2).with_index do |(time,signal), index|" ?

It's a mistake on my part. Copy/paste error.

Jesus.

From: Junhui Liao on
Dear Jesús Gabriel y Galán,

It's quite strange. There is such a error message.

To this line, every_line.each_slice(2).with_index do |(time,signal),
index| .

split_time_subtract.rb:5:in `write_line_to_file': undefined method
`each_slice' for #<String:0x9ebe608> (NoMethodError)


The version of ruby is 1.9.1. And I checked 'each_slice' worked well
like this:

irb(main):001:0> [1,2,3].each_slice(2){|s| p s}
[1, 2]
[3]
=> nil


Any suggestions ?
Thanks a lot in advance!

Junhui
--
Posted via http://www.ruby-forum.com/.

From: Jesús Gabriel y Galán on
On Fri, Jul 30, 2010 at 4:00 PM, Junhui Liao <junhui.liao(a)uclouvain.be> wrote:
> Dear Jesús Gabriel y Galán,
>
> It's quite strange. There is such a error message.
>
> To this line,   every_line.each_slice(2).with_index do |(time,signal),
> index| .
>
> split_time_subtract.rb:5:in `write_line_to_file': undefined method
> `each_slice' for #<String:0x9ebe608> (NoMethodError)

What I'm typing is untested, is just to give you some ideas. In any
case, this error is because I meant the first_line variable to contain
an array, but I misplaced the call to split:

Replace this:

>> first_line = file.readline
>> first_line_times = first_line.chomp.split("\t").each_slice(2).map

with this:

first_line = file.readline.chomp.split("\t")
first_line_times = first_line.each_slice(2).map

Jesus.

From: Junhui Liao on
Hi, Jesus.

Thanks a lot for your help!
I modified a little to the script and make it running as expected.

Here is the code:

def write_line_to_file every_line, base_time = Hash.new(0)
every_line.each_slice(2).with_index do |(time,signal), index|
File.open("header_split_#{index}"+".tsv" , "a") do |f|
f << "#{time.to_f - base_time[index].to_f}\t#{signal}\n"
end
end
end

#count = 0
first_line = file.readline.chomp.split("\t")
# counter +=1
# if counter >= 2
# puts "here!"
first_line_times = first_line.each_slice(2).map{|time,signal| time}
file.each_line do |record|
line_data = record.chomp.split("\t")
write_line_to_file line_data, first_line_times
end
end


However, there existed two items need to be improved at least.
Item 1, this code took ~2 hours to save into 4096 files.
BTW, the original tsv file is around 250M. I wonder if there exist
some tricks to make it speed up?

Item 2, the original data has 21 lines header. Although it could be
deleted then read by the script. But I do want to update the script
to make it exclude the fist 21 lines header.

I tried two ways to do this job, but it failed.

The first way was referencing from cvs file's header reading.

require 'csv'
reader = File.open("../data/test_7_10lines.tsv") do |file|
header = reader.shift
reader.each {|row| process(header,2)}#Suppose the first two lines are
header.

The error code seemed no
undefined method `shift' for nil:NilClass (NoMethodError)

Another try is, insert the two lines like above code commented.

counter +=1
if counter >= 2

But it seemed the counter +=1 did not work at all, since
counter always <=1 !

What's wrong and any suggestions?

Best regards,
Junhui
--
Posted via http://www.ruby-forum.com/.