From: Junhui Liao on 29 Jul 2010 07:43 Dear all, My script tried to read from one original tsv file and distribute into new multiple tsv files. Each line of the original file is like this: time_1, signal_1, time_2, signal_2... time_4096, signal_4096. I would like to write them into file_1, file_2, ... file_4096 accordingly, and these files contain time_1, signal_1; time_2, signal_2; ... time_4096, signal_4096 separately. My script did well only if the original file contains ONE line. If the original file has two or more lines, the error message like following, new_split.rb:17: undefined method `+' for nil:NilClass (NoMethodError) By tracing the output, I found it seemed the script just read ONE line, since the put results like this : ........(omitted a lots of lines here) 8182 "4.08963252844486E+00" "-2.3E-03" 8184 "4.09063219413236E+00" "-3.1E-03" 8186 "4.09163185987611E+00" "-7E-04" 8188 "4.09263152560423E+00" "-3.7E-03" 8190 "4.09363119136048E+00" "3.6E-03" 8192 nil nil And my script is like this: @a = [] @itemnum = 4096 @counter = 0 @linenum = 10 File.open("../original_data/test_2lines.tsv").each_line do |record| # "^M" #File.open("../original_data/one_line.tsv").each_line do |record| @a = record.chomp.split("\t") @itemnum.times do |n| File.open("#{n}_debug_split"+".tsv" , "w") do |f| puts @counter puts @a[@counter].inspect + "\n" puts @a[@counter+1].inspect + "\n" f << @a[@counter] + "\t" + @a[@counter+1] + "\n" @counter += 2 end end end Thanks a lot for your comments in advance ! Junhui BTW, at the end of each line in original tsv file, this is a "^M" appended. I don't know how it comes and results something or not. -- Posted via http://www.ruby-forum.com/.
From: Jesús Gabriel y Galán on 29 Jul 2010 08:29 On Thu, Jul 29, 2010 at 1:43 PM, Junhui Liao <junhui.liao(a)uclouvain.be> wrote: > Dear all, > > My script tried to read from one original tsv file and distribute into > new multiple tsv files. > > Each line of the original file is like this: time_1, signal_1, time_2, > signal_2... time_4096, signal_4096. > I would like to write them into file_1, file_2, ... file_4096 > accordingly, and these files contain time_1, signal_1; time_2, signal_2; > ... time_4096, signal_4096 separately. > > My script did well only if the original file contains ONE line. > If the original file has two or more lines, the error message like > following, > > new_split.rb:17: undefined method `+' for nil:NilClass (NoMethodError) > And my script is like this: > > @a = [] you don't need to declare this, because you later are assigning directly to @a again > @itemnum = 4096 > @counter = 0 > @linenum = 10 and, by the way, you probably don't need instance variables, probably local variables could suffice, itemnum looks like a constant and linenum is not used, so: ITEM_NUM = 4096 counter = 0 > File.open("../original_data/test_2lines.tsv").each_line > do |record| # "^M" > #File.open("../original_data/one_line.tsv").each_line do > |record| > @a = record.chomp.split("\t") a = record.chomp.split("\t") # although maybe fields or line_fields are better names than a > @itemnum.times do |n| > > File.open("#{n}_debug_split"+".tsv" , "w") do |f| > puts @counter > puts @a[@counter].inspect + "\n" > puts @a[@counter+1].inspect + "\n" > f << @a[@counter] + "\t" + @a[@counter+1] + "\n" > @counter += 2 > end > end > end You are adding 2 to the counter every iteration, but not clearing it after every line. So, on the second line, counter will still be 4096, and so you will try to get an element from the array that is out of bounds, returning nil and raising the NoMethodError, because you are calling the + method on nil. I think you are complicated the issue with the counting and so on, usually the Ruby iterators are a cleaner way to traverse lists of things. You can remove the use of itemnum,counter and so on like this (untested): File.open("../original_data/test_2lines.tsv").each_line do |record| a = record.chomp.split("\t") a.each_slice(2).with_index do |(time,signal), index| File.open("#{index}_debug_split"+".tsv" , "w") do |f| f << "#{time}\t#{signal}\n" end end end Although this will open and close the 4096 files for every line. Are there many lines? If not, you can read the whole file and build a structure in memory (a hash of arrays) to store the lines that belong to every file, and then write them at once to each file. Jesus.
From: Junhui Liao on 29 Jul 2010 10:04 > You are adding 2 to the counter every iteration, but not clearing it > after every line. So, on the second line, counter will still be 4096, > and so you will try to get an element from the array that is out of > bounds, returning nil and raising the NoMethodError, because you are > calling the + method on nil. I think you are complicated the issue > with the counting and so on, usually the Ruby iterators are a cleaner > way to traverse lists of things. You can remove the use of > itemnum,counter and so on like this (untested): Many thanks for your comment ! > File.open("../original_data/test_2lines.tsv").each_line do |record| > a = record.chomp.split("\t") > a.each_slice(2).with_index do |(time,signal), index| > File.open("#{index}_debug_split"+".tsv" , "w") do |f| > f << "#{time}\t#{signal}\n" > end > end > end I tried the script, but added "require 'enumerator' ". Still, there is a problem like this : new_split_Jesus.rb:1:in `each_slice': no block given (LocalJumpError) After looking for this forum, I got that this results from my mac based ruby is 1.8.6, and your code should worked under 1.9 + . Even though I don't know how to do "requires a block to be passed to it" Refer to this link please: http://www.ruby-forum.com/topic/201095#new > Although this will open and close the 4096 files for every line. Are > there many lines? If not, you can read the whole file and build a > structure in memory (a hash of arrays) to store the lines that belong > to every file, and then write them at once to each file. Yes, my file is totally 2048 lines, ~260M. So, if read the whole file into memory, the efficiency maybe not so nice. Thanks again for your help ! Best, Junhui -- Posted via http://www.ruby-forum.com/.
From: Junhui Liao on 29 Jul 2010 19:58 Dear Jesús Gabriel y Galán and all, > File.open("../original_data/test_2lines.tsv").each_line do |record| > a = record.chomp.split("\t") > a.each_slice(2).with_index do |(time,signal), index| > File.open("#{index}_debug_split"+".tsv" , "w") do |f| > f << "#{time}\t#{signal}\n" > end > end > end This code ran well at 1.9.1 version of ruby. Since I tried at our server where ruby is this version. Actually, I need to do this also: make the first line's time value subtracted by other lines' corresponding time ones. First line: time_1.1, signal_1.1, time_1.2, signal_1.2... time_1.4096, signal_1.4096. Second line: time_2.1, signal_2.1, time_2.2, signal_2.2... time_2.4096, signal_2.4096. ....... I would like to do, time_2.1 = time_2.1 - time_1.1 , time_2.2 = time_2.2 - time_1.2 , ...... time_2.4096 = time_2.4096 - time_1.4096. ...... Similar to other lines' time value. I tried to use a counter to pick up the first line (stupid way, I know) than save in an array, and take other lines time values to subtract this array, but failed. Since it seemed to the enumerator I could not access individual ? But "puts a[index] " printed two items (time and signal) well. However, i could not print just time or signal value. Thanks a lot for in advance! Best, Junhui -- Posted via http://www.ruby-forum.com/.
From: Jesús Gabriel y Galán on 30 Jul 2010 03:33
On Fri, Jul 30, 2010 at 1:58 AM, Junhui Liao <junhui.liao(a)uclouvain.be> wrote: > Dear Jesús Gabriel y Galán and all, > >> File.open("../original_data/test_2lines.tsv").each_line do |record| >> a = record.chomp.split("\t") >> a.each_slice(2).with_index do |(time,signal), index| >> File.open("#{index}_debug_split"+".tsv" , "w") do |f| >> f << "#{time}\t#{signal}\n" >> end >> end >> end > > This code ran well at 1.9.1 version of ruby. Since I tried at our > server where ruby is this version. BTW, I'm using 1.8.7. And also, File.open().each_line doesn't properly close the file, so we should be using File.foreach() > > Actually, I need to do this also: make the first line's time value > subtracted by other lines' corresponding time ones. > > First line: time_1.1, signal_1.1, time_1.2, signal_1.2... time_1.4096, > signal_1.4096. > Second line: time_2.1, signal_2.1, time_2.2, signal_2.2... time_2.4096, > signal_2.4096. > ....... > > I would like to do, time_2.1 = time_2.1 - time_1.1 , time_2.2 = > time_2.2 - time_1.2 , > ...... time_2.4096 = time_2.4096 - time_1.4096. > ...... > Similar to other lines' time value. > > I tried to use a counter to pick up the first line (stupid way, I know) > than save in an array, and > take other lines time values to subtract this array, but failed. Since > it seemed > to the enumerator I could not access individual ? But "puts a[index] " > printed two items > (time and signal) well. However, i could not print just time or signal > value. What I'd do is create an array for the first line with the times, and use that after on to substract. I've refactored a little bit to simplify (this is completely untested): File.open("../original_data/test_2lines.tsv") do |file| first_line = file.readline first_line_times = first_line.chomp.split("\t").each_slice(2).map {|time,signal| time} write_line_to_file first_line file.each_line do |record| line_data = record.chomp.split("\t") write_line_to_file line_data, first_line_times end end def write_line_to_file line, base_time = Hash.new(0) line_data.each_slice(2).with_index do |(time,signal), index| File.open("#{index}_debug_split"+".tsv" , "w") do |f| f << "#{time.to_i - base_time[index]}\t#{signal}\n" end end end Hope this gives you an idea to explore, Jesus. |