Tuesday, 27 August 2013

Categorizing data, compute average and standard deviation of each category

Categorizing data, compute average and standard deviation of each category

I'm writing a code to categorize the datas, and get the average and
standard deviation. Here are the example of my data.
3917 1 -0.662261 25.148 22.9354 68.8076
3918 1 12.7649 18.7451 7.68473 69.0063
3919 1 -9.56836 -23.3265 -61.953 68.8357
3920 1 11.6292 31.6525 -29.3697 69.1372
3921 2 26.4837 -66.7897 12.0257 69.2282
3922 1 -9.81652 14.3788 9.38343 69.1217
3923 2 39.931 -88.1879 109.498 69.1604
3924 1 4.5502 3.53887 -6.59604 69.486
3925 2 13.6801 -24.6628 -5.7568 69.9398
3926 1 -10.5635 7.05517 -8.82785 70.2263
As you can see, there are 6 columns. I'm thinking of 3 step calculation here.
Categorize these numbers based on 6th column. 6th column is consist of
float numbers from 0 ~ n. I hope to generate n sections (or sub matrices,
or whatever), like 0~1, 1~2, 2~3 .... n-1 ~ n. The last number should be
round up number of last data, because I hope to make sections. For
example, if the last number is the 121.2513, the last section should be
120~121 to contain that data.
Reallocate the all other numbers of column 1~5, to the their corresponding
subsections based on 6th column. If there are no number in specific
sections, just print it as 0. There will be n number of subsections. The
number of elements in each subsections will be random.
Get the average and standard deviation of 3th, 4th, and 5th column for
each sub sections, and write to the output file with 'number of elements
in subsection, beginning number of subsection, and avg and standard
deviation of 3th, 4th, and 5th column'
I was trying this with multiple for loops, but it became too complex, and
makes error. Is there any other easy way to categorize the data, play with
each of the sub section, and print them out in Python? Also, my for loops
are not working at all. Any simple example suggestion using this data?

No comments:

Post a Comment