Time consumption for larger datasets

Discussion:

Jonas Kaufmann

2007-04-27 10:26:21 UTC

Hello,

I am thinking of using FANN for computing weather data. The goal is to take
the forecast values for a weather station as inputs and train the net with
the actual weather readings at the weather station. This should give me
modified weather forecasts which are adapted to the location of the weather
station.

I don't know how long it would take to train and run such a network. I also
tried finding larger example data for FANN, but didn't find anything. So my
question is:

Does anyone have experience with larger data sets? I think I will have about
10000 weather readings and forecasts available. There should be about 10-15
input neurons and 3-5 output neurons. How long do you think it will take to
train the net? And once trained, how long will it take to run the net? And:
do I have to buy a new computer to do this? :)

Thanks for your help!

Regards,
Jonas

Josh Menke

2007-04-27 10:34:15 UTC

Permalink

Hiya Jonas,

Just FYI, I have used fann to train on millions examples with 500-1000 input
neurons, 32-1600 hidden units and 1-411 outputs. On the high end, you may be
looking at one week to train.

As for new computer, it depends on how you manage your memory, but to me,
10,000 is pretty small, and you won't have problems.

One thing I think FANN could benefit from is streamed training with a simple
way to randomize presentation order. There should be an option to read only
one example from the fann training data file at a time instead of trying to
load the whole thing in memory.

--Josh

Post by Jonas Kaufmann
Hello,
I am thinking of using FANN for computing weather data. The goal is to
take the forecast values for a weather station as inputs and train the net
with the actual weather readings at the weather station. This should give me
modified weather forecasts which are adapted to the location of the weather
station.
I don't know how long it would take to train and run such a network. I
also tried finding larger example data for FANN, but didn't find anything.
Does anyone have experience with larger data sets? I think I will have
about 10000 weather readings and forecasts available. There should be about
10-15 input neurons and 3-5 output neurons. How long do you think it will
take to train the net? And once trained, how long will it take to run the
net? And: do I have to buy a new computer to do this? :)
Thanks for your help!
Regards,
Jonas
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Fann-general mailing list
https://lists.sourceforge.net/lists/listinfo/fann-general

--
Joshua Menke, Ph.D.
Machine Learning Scientist
Dev Group
Hi-Rez Studios
josh-***@public.gmane.org

John O'Hare

2007-04-27 11:31:27 UTC

Permalink

Definitely don't worry about performance with those numbers. I'm training on
100,000 entries, 10 input nodes and 1 output node in 30 seconds to 8 minutes
(depending on configuration) on a 4-year-old PC. You'll be fine.

I'm actually doing the randomization of presentation order as Josh mentioned
as well. It would be nice if this were included in fann. What I did was
write a seperate layer that retrieves random a datum from a berkeley db and
encapsulates it in a struct fann_train_data* object, then hands it off to
fann.

-Jack

_____

From: fann-general-bounces-5NWGOfrQmneRv+***@public.gmane.org
[mailto:fann-general-bounces-5NWGOfrQmneRv+***@public.gmane.org] On Behalf Of Josh Menke
Sent: Friday, April 27, 2007 5:34 AM
To: FANN General and development discussion
Subject: Re: [Fann-general] Time consumption for larger datasets

Hiya Jonas,

Just FYI, I have used fann to train on millions examples with 500-1000 input
neurons, 32-1600 hidden units and 1-411 outputs. On the high end, you may be
looking at one week to train.

As for new computer, it depends on how you manage your memory, but to me,
10,000 is pretty small, and you won't have problems.

One thing I think FANN could benefit from is streamed training with a simple
way to randomize presentation order. There should be an option to read only
one example from the fann training data file at a time instead of trying to
load the whole thing in memory.

--Josh

On 4/27/07, Jonas Kaufmann <jonas.kaufmann-***@public.gmane.org> wrote:

Hello,

I am thinking of using FANN for computing weather data. The goal is to take
the forecast values for a weather station as inputs and train the net with
the actual weather readings at the weather station. This should give me
modified weather forecasts which are adapted to the location of the weather
station.

I don't know how long it would take to train and run such a network. I also
tried finding larger example data for FANN, but didn't find anything. So my
question is:

Does anyone have experience with larger data sets? I think I will have about
10000 weather readings and forecasts available. There should be about 10-15
input neurons and 3-5 output neurons. How long do you think it will take to
train the net? And once trained, how long will it take to run the net? And:
do I have to buy a new computer to do this? :)

Thanks for your help!

Regards,
Jonas

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Fann-general mailing list
Fann-general-5NWGOfrQmneRv+***@public.gmane.org
https://lists.sourceforge.net/lists/listinfo/fann-general

--
Joshua Menke, Ph.D.
Machine Learning Scientist
Dev Group
Hi-Rez Studios
josh-***@public.gmane.org

Josh Menke

2007-04-27 13:09:16 UTC

Permalink

Well, FANN does have a shuffle function, but that still requires loading the
entire data set at first. It'd be better to just randomly pick---even with
replacement is OK. Although it'd be nice to do something like create a list
of indexes, shuffle the indexes, and then march through them.

--Josh

Post by John O'Hare
Definitely don't worry about performance with those numbers. I'm training
on 100,000 entries, 10 input nodes and 1 output node in 30 seconds to 8
minutes (depending on configuration) on a 4-year-old PC. You'll be fine.
I'm actually doing the randomization of presentation order as Josh
mentioned as well. It would be nice if this were included in fann. What I
did was write a seperate layer that retrieves random a datum from a berkeley
db and encapsulates it in a struct fann_train_data* object, then hands it
off to fann.
-Jack
------------------------------
*Sent:* Friday, April 27, 2007 5:34 AM
*To:* FANN General and development discussion
*Subject:* Re: [Fann-general] Time consumption for larger datasets
Hiya Jonas,
Just FYI, I have used fann to train on millions examples with 500-1000
input neurons, 32-1600 hidden units and 1-411 outputs. On the high end, you
may be looking at one week to train.
As for new computer, it depends on how you manage your memory, but to me,
10,000 is pretty small, and you won't have problems.
One thing I think FANN could benefit from is streamed training with a
simple way to randomize presentation order. There should be an option to
read only one example from the fann training data file at a time instead of
trying to load the whole thing in memory.
--Josh

--
Joshua Menke, Ph.D.
Machine Learning Scientist
Dev Group
Hi-Rez Studios
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Fann-general mailing list
https://lists.sourceforge.net/lists/listinfo/fann-general

--
Joshua Menke, Ph.D.
Machine Learning Scientist
Dev Group
Hi-Rez Studios
josh-***@public.gmane.org

Jonas Kaufmann

2007-04-27 18:05:12 UTC

Permalink

Thank you for all your responses! As this would be my first use of FANN, I
did not know anything about performance. Seems like I won't have to worry
about it.

Regards,
Jonas

Continue reading on narkive:

Search results for 'Time consumption for larger datasets' (Questions and Answers)

replies

Should British school children be warned of the bias in Gore's An Inconvenient Truth?

started 2007-10-08 10:58:58 UTC

global warming

replies

Do you think the world is burning up according to this article? ... or is NASA scaring us with colors?

started 2014-01-29 14:46:07 UTC

global warming

replies

can i get question answer of asp.net ?