c# - Generate Two Arrays of Non-Duplicate Numbers -
i have following code generates 2 non-duplicates arrays of integers based on ratio. codes works 4000 line file, takes time.
//train & test numbers int train = (int)(((double)settings.default.trainingratio / 100) * inputlines.count()); int test = inputlines.count() - train; //train & test list random rnd = new random(); var trainlist = enumerable.range(1, inputlines.count()).orderby(x => rnd.next()).take(train).tolist(); var testlist = new list<int>(); (int = 1; <= inputlines.count(); i++) { if (!trainlist.contains(i)) testlist.add(i); }
and worse, how read lines:
foreach (var n in trainlist) { objdataintilizer.generatemasterlablefile(inputlines.skip(n - 1).take(1).first().tostring()); }
could advice way have better performance.
each time code calls inputfiles.count()
, you're re-reading entire file, since file.readlines
using deferred execution, , aren't materializing it. since need entire list in-memory anyway, use file.readalllines
instead, returns string[]
, has length
property, o(1) operation instead of o(n).
then, instead of using list<int>
trainlist
, use hashset<int>
faster lookup contains
:
public static class enumerableextensions { public static hashset<t> tohashset(this ienumerable<t> enumerable) { return new hashset<t>(enumerable); } } random rnd = new random(); var trainlist = enumerable.range(1, inputlines.length) .orderby(x => rnd.next()) .take(train) .tohashset(); var testlist = new list<int>(); (int = 1; <= inputlines.length; i++) { if (!trainlist.contains(i)) testlist.add(i); }
Comments
Post a Comment