Scaling: yes, it's recommended to scale your data to below 1. And in
line to what I have said before it has to be over the whole I/O space.
integers of small range.
(limited number of) buckets, go with sigmoid.
better balanced around 0.
-1. Outputs will be binary, if you have 3 outputs, only one should be on
1 at any time, the other two should be -1 (Note. Those are ideal values,
month. And I would rather introduce another input "hour" so that the
minutes range is reduced. I'd rather favour more inputs than significant
difference in ranges.
Volume is not an input for you ANN. Training data should have inputs
(time) and outputs (volume bucket at that time - minute if you want).
We are talking simple Feed-Forward (Multilayered Perceptron). Although
input, a snapshot with no succession/flow defined. If you are not happy
NN)... I'm a bit rusty with FANN, but I don't think there is no
recursive, nevermind TDNN.
Outputs: As above, no point having some of the inputs as outputs. It
outputs corresponding to that input).
doesn't really fit my proposed model. I assumed your volume follows some
your history. It's not exactly a "predictor" as you try to do, this is
probably why "Day Expected Flag" doesn't really make sense for me...
Post by Matt ShockleyThanks Adrian!
You really cleared up a lot of my confusion.
I took your advice and narrowed the scope a bit, I am starting with
one of my smaller historical data sets as a toy model to see how
things perform.
Anyway, the problem now is scaling my inputs and outputs, as you
stated. As far as I'm aware, I cannot do this with FANN's functions
since they scale the entire input/output set to the same range, while
my data requires many different ranges. Or am I mistaken?
If I am scaling my data for both the input and output sets is it
necessary to use a linear activation function for the outputs? Could I
scale everything to 0-1 and stick with a sigmoid for both hidden and
output?
I don't really want to break my volume into buckets of amounts since
it does not change that much, usually just a little up or down. Maybe
it would be good enough to have 3 cases - lower, higher or the same as
the previous day.
Just a little more information about my solution for my problem. Right
now I have 5 inputs and 3 outputs.
Time of Day-
When data arrives on the server (measured in minutes 0 - 720)
Day Expected Flag-
This is to simplify the problem. It will be 0 if data arrived on the
current day and 1 if it arrived on a following day.
Day of Month-
0-31
Day of Week-
0-6
Volume-
Amount of data. Continuous but usually not hugely variable, I am
really including this to try and track / predict trends.
Time of Day-
Same as input
Next Day Flag-
Same as input
Volume-
Same as input
You didn't disappoint me at all, I figured this would be a bit of a
tricky problem to tackle =)
But if I can get this ANN producing any kind of halfway decent results
for any of my outputs it would be greatly beneficial.
Thanks again,
Matt Shockley
Hi Matt,
I will try to answer some of your questions, probably not that
much of FANN advice more like general ANN.
------------------------------------------------------------------------
*Sent:* Friday, 10 February 2012, 21:48
*Subject:* [Fann-general] inputs, outputs advice
I need some advice on handling the inputs and outputs for my NN.
What is the best way to handle continuous inputs and outputs? Use
the linear activation function for the inputs and outputs? What
about the hidden layer?
adsp: yes, linear activation for output layer, there is no
activation function for inputs. Hidden could be sigmoid type.
I would like to use datetime data for both an input and my output.
What's a good format for this?
Will the network be okay with a mix of numerical and categorical inputs?
What about scaling, is FANN's scaling function reliable?
adsp: Just keep in mind that the network will not make any
difference on the type of your inputs, they are just numbers. As a
consequence, if you have lots of numbers of one category and just
a few of the other, don't expect the ANN to see that difference.
In other words, if yourinput space is a mix with different
metrics, the network will try to find some commonality but
depending on how different your inputs are, it might be close to
impossible.In principle, during training you have to cover pretty
much the whole of your input space fairly uniformly.
Is there an effective way to weight certain inputs to have more
impact on the outputs or will the nature of the NN figure that out
through training?
adsp: well, weights, you may try to initialise as you see fit, the
recommended way is random... see above, space, metric, number of
inputs on each "category" (subspace)...
I want to predict the arrival datetime and volume of a chunk of
data that hits my servers on a roughly daily basis.
I have extensive historical records. I will use these to train the
network, but I would like to add in some other inputs including
some statistics that I've calculated and various values out of the
chunk of data itself. Many of these values are continuous. I'd
also like to factor in the date, day of month, day of year, day of
week and holidays as inputs.
adsp: a simple way I would go, might not yield satisfactory
results, depending on your expectations... People try to use ANN
to predict stock market :-) Your case is not far, since there is
human behaviour involved...
Anyway, you can get a mapping between volume and time that would
otherwise require extensive analysis. I would just take volume
(output) as a function of time (pick your resolution, but I
wouldn't expect wonders if your volume jumps a lot from one minute
to another). Inputs integers: year (0...n), month (0...11), day,
hour... or maybe you prefer week and day of the week (0...6);
there is this problem of Feb 29, I don't know what to do with it,
just ignore it and accept you can't predict that day using this
method?
I wouldn't bother with the year at all if you only have historic
data going back 2 or 3 years.
Rather than a continuous output (volume) I would try a discrete
(fuzzy value) like very high, (extreme even?), high, medium... You
may split it in how many buckets you like, depending on your
expected precision, but the finer you go, the higher your error
will be, don't expect miracles.
You will have to re-work your input data to create the appropriate
training data, but that's usually the big task in ANN.
Sorry if I disappointed you :-) it wasn't my intention.
Nice problem, by the way.
Regards,
Adrian
Thanks!
------------------------------------------------------------------------------
Virtualization & Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing
also focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/
_______________________________________________
Fann-general mailing list
https://lists.sourceforge.net/lists/listinfo/fann-general
------------------------------------------------------------------------------
Virtualization & Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing
also focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/
_______________________________________________
Fann-general mailing list
https://lists.sourceforge.net/lists/listinfo/fann-general
------------------------------------------------------------------------------
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2
_______________________________________________
Fann-general mailing list
https://lists.sourceforge.net/lists/listinfo/fann-general