Very large datasets

Hi,
I have a 64-bit machine Gentoo machine but I'm probably not enough
of a programmer to do this on my own. (I've not used FANN yet at all
actually.) If you have some code and a large data set then I'd be
happy to try building it and seeing if it runs.

Contact me off-line if that's of interest.

Cheers,
Mark

Post by Poul-Erik Andreasen
Hi
There have earlier been some discussion about datasets larger than 4G,
because of the adressing issues.
This should bee solved by running fann at a 64-bit machine (as far as i
can say, correct mee if i am worong). But i should like to know if any
one have tried that with succes. Else i have to try it:-)
Poul-Erik Andreasen.
-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier.
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Fann-general mailing list
https://lists.sourceforge.net/lists/listinfo/fann-general

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier.
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642

Josh Menke

2007-01-31 18:30:55 UTC

As a quick work around, I wrote a batch in SAS that split an 8 GB data set
into manageable chunks one at at time and then ran my FANN code on each
split.

--Josh

Post by Mark Knecht
Hi,
I have a 64-bit machine Gentoo machine but I'm probably not enough
of a programmer to do this on my own. (I've not used FANN yet at all
actually.) If you have some code and a large data set then I'd be
happy to try building it and seeing if it runs.
Contact me off-line if that's of interest.
Cheers,
Mark

-------------------------------------------------------------------------

Post by Poul-Erik Andreasen
Using Tomcat but need to do more? Need to support web services,

security?

Post by Poul-Erik Andreasen
Get stuff done quickly with pre-integrated technology to make your job

easier.

Post by Poul-Erik Andreasen
Download IBM WebSphere Application Server v.1.0.1 based on Apache

Geronimo

Post by Poul-Erik Andreasen
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Fann-general mailing list
https://lists.sourceforge.net/lists/listinfo/fann-general

--
Joshua Menke
josh-***@public.gmane.org

Conor Robinson

2007-01-31 19:21:48 UTC

If your running an intel chip, I used icc and recompiled, worked fine.
The real question is, do you need to run a data set that big? If
your inputs are very wide and/or sparse you might run into real
problems, the 'curse of dimensionality'. It all depends on your data
set, but a random sample may be more practical. You could get better
results with x-fold validation. I'm curious as to what kind of data
your looking at. I don't think recompiling fann for 64bit should
cause you any trouble, good luck.

Post by Josh Menke
As a quick work around, I wrote a batch in SAS that split an 8 GB data set
into manageable chunks one at at time and then ran my FANN code on each
split.
--Josh

-------------------------------------------------------------------------

Post by Poul-Erik Andreasen
Using Tomcat but need to do more? Need to support web services,

security?

Post by Poul-Erik Andreasen
Get stuff done quickly with pre-integrated technology to make your job

easier.

Post by Poul-Erik Andreasen
Download IBM WebSphere Application Server v.1.0.1 based on Apache

Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642

Post by Poul-Erik Andreasen
_______________________________________________
Fann-general mailing list

https://lists.sourceforge.net/lists/listinfo/fann-general
-------------------------------------------------------------------------

Post by Mark Knecht
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job

easier.

Post by Mark Knecht
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo

http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642

Post by Mark Knecht
_______________________________________________
Fann-general mailing list
https://lists.sourceforge.net/lists/listinfo/fann-general

--
Joshua Menke
-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier.
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Fann-general mailing list
https://lists.sourceforge.net/lists/listinfo/fann-general

Josh Menke

2007-01-31 19:32:18 UTC

There are a couple of reasons in general for wanting a very large data set
if you can have one instead of sampling:

1. Like you mentioned, sparse data. But I also mean sparse as in the target
classes may have very few members compared to the whole population. For
example, a concept learning (2-class) problem where one class represents
99.99% of the population. In this case, if you want to both have enough data
to learn the concept AND automatically infer the prior distributions
correctly, then having a lot of data is an easy way to go.

2. If you have a very difficult problem, then research has shown that neural
networks have an uncanny ability to continue to improve accuracy by using
more and more data and larger and larger networks. A group out of ICSI at
Berkely showed this a few years back for large-scale speaker-independent
phoneme recognition. They were using MASSIVE speech corpora and showing the
accuracy kept increasing at a rate worth the cost.

--Josh

Post by Conor Robinson
If your running an intel chip, I used icc and recompiled, worked fine.
The real question is, do you need to run a data set that big? If
your inputs are very wide and/or sparse you might run into real
problems, the 'curse of dimensionality'. It all depends on your data
set, but a random sample may be more practical. You could get better
results with x-fold validation. I'm curious as to what kind of data
your looking at. I don't think recompiling fann for 64bit should
cause you any trouble, good luck.

Post by Josh Menke
As a quick work around, I wrote a batch in SAS that split an 8 GB data

set

Post by Josh Menke
into manageable chunks one at at time and then ran my FANN code on each
split.
--Josh

Post by Poul-Erik Andreasen
Hi
There have earlier been some discussion about datasets larger than

4G,

Post by Poul-Erik Andreasen
because of the adressing issues.
This should bee solved by running fann at a 64-bit machine (as far

as i

Post by Poul-Erik Andreasen
can say, correct mee if i am worong). But i should like to know if

any

Post by Poul-Erik Andreasen
one have tried that with succes. Else i have to try it:-)
Poul-Erik Andreasen.

-------------------------------------------------------------------------

Post by Poul-Erik Andreasen
Using Tomcat but need to do more? Need to support web services,

security?

Post by Poul-Erik Andreasen
Get stuff done quickly with pre-integrated technology to make your

job

Post by Josh Menke
easier.

Post by Poul-Erik Andreasen
Download IBM WebSphere Application Server v.1.0.1 based on Apache

Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642

Post by Poul-Erik Andreasen
_______________________________________________
Fann-general mailing list

https://lists.sourceforge.net/lists/listinfo/fann-general

-------------------------------------------------------------------------

Post by Mark Knecht
Using Tomcat but need to do more? Need to support web services,

security?

Post by Mark Knecht
Get stuff done quickly with pre-integrated technology to make your job

easier.

Post by Mark Knecht
Download IBM WebSphere Application Server v.1.0.1 based on Apache

Geronimo

Post by Josh Menke
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642

Post by Mark Knecht
_______________________________________________
Fann-general mailing list
https://lists.sourceforge.net/lists/listinfo/fann-general

--
Joshua Menke

-------------------------------------------------------------------------

Post by Josh Menke
Using Tomcat but need to do more? Need to support web services,

security?

Post by Josh Menke
Get stuff done quickly with pre-integrated technology to make your job easier.
Download IBM WebSphere Application Server v.1.0.1 based on Apache

Geronimo

Post by Josh Menke
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Fann-general mailing list
https://lists.sourceforge.net/lists/listinfo/fann-general

--
Joshua Menke
Statistician, Machine Learning Scientist
TnS Detection Platforms
ebay, Inc
josh-***@public.gmane.org

Conor Robinson

2007-01-31 21:22:00 UTC