This section describes in more details about ANN and thebackground.

Hecht-Nielsen proposed the formal definition of anArtificial Neural Network in 70:”An ArtificialNeural Network is a parallel, distributed information processing structureconsisting of processing units (which can possess a local memory and can carryout localized information processing operations) interconnected viaunidirectional signal channels called connections. Each processing unit has asingle output connection that branches (“fans out”) into as manycollateral connections as desired; each carries the same signal – theprocessing unit output signal. The processing unit output signal can be of anymathematical type desired. The information processing that goes on within eachprocessing unit can be defined arbitrarily with the restriction that it must becompletely local; that is, it must depend only on the current values of theinput signals arriving at the processing element via impinging connections andon values stored in the processing unit’s local memory.” Department of Aeronautics (ITDA, Sao Paulo) mentions in 71 that there can be various ANNmodels, but each model can be precisely specified by the following eight major aspects,also as stated in 72:o A set of processing unitso A state of activation for each unito An output function for each unito A pattern of connectivity among units ortopology of the networko A propagation rule, or combining function, topropagate the activities of the units through the networko An activation rule to update the activities ofeach unit by using the current activation value and the inputs received fromother unitso An external environment that providesinformation to the network and/or interacts with it.o A learning rule to modify the pattern ofconnectivity by using information provided by the external environment.Louis Francis in 73 states that Neural Networksoriginated in the artificial intelligence discipline, where they’re oftenportrayed as a brain in a computer.

They are designed to incorporate keyfeatures of neurons in the brain and to process data in a manner analogous tothe human brain. Much of the terminology used to describe and explain neuralnetworks is borrowed from biology. Data mining tools can be trained to identifycomplex relationships in data. Typically, the data sets are large, with thenumber of records at least in the tens of thousands and the number ofindependent variables often in the hundreds. Their advantage over classicalstatistical models used to analyse data, such as regression and ANOVA, is thatthey can fit data where the relationship between independent and dependent variablesis nonlinear and where the specific form of the nonlinear relationship isunknown 73.Artificial neural networks share the same advantages as manyother data mining tools, but also offer advantages of their own 73. For instance, decision tree,a method of splitting data into homogenous clusters with similar expectedvalues for the dependent variable, are often less effective when the predictorvariables are continuous than when they are categorical 73.

Neural networks work wellwith both categorical and continuous variables.There are several data mining techniques, such as regressionsplines, were developed by statisticians 73. Louis further states in 73 that the data miningtechniques are computationally intensive generalizations of classical linearmodels. Classical linear models assume that the functional relationship betweenthe independent variables and the dependent variable is linear. Classical modellingalso allows linear relationships that result from a transformation of dependentor independent variables, so some nonlinear relationships can be approximated.Neural networks and other data mining techniques don’t require that the relationshipsbetween predictor and dependent variables be linear (whether the variables aretransformed or not).The various data mining tools differ in their approaches toapproximating nonlinear functions and complex data structures 73. Neural networks use a seriesof neurons in what is known as the hidden layer that apply nonlinear activationfunctions to approximate complex functions in the data.

Despite their advantages, Louis 73 states that manystatisticians and actuaries are reluctant to embrace neural networks. Onereason is that they are considered a “black box”: Data goes in and a predictioncomes out, but the nature of the relationship between independent and dependentvariables is usually not revealed. Because of the complexity of the functionsused in the neural network approximations, neural network software typicallydoes not supply the user with information about the nature of the relationshipbetween predictor and target variables. The output of a neural network is apredicted value and some goodness-of-fit statistics. However, the functionalform of the relationship between independent and dependent variables is notmade explicit.In addition, the strength of the relationship betweendependent and independent variables, i.

e., the importance of each variable, isalso often not revealed 73. Classical models as well asother popular data mining techniques, such as decision trees, supply the userwith a functional description or map of the relationships.There exist two main types of training process: supervisedand unsupervised training 74. In supervised training (e.g.

multi-layer feed-forward (MLF) neural network), the neural network knows thedesired output, and adjusting of the weight coefficients is done in such a waythat the calculated and desired outputs are as close as possible to each other 74. Unsupervised training (e.g.Kohonen network 4) means, that the desired output is not known, the system isprovided with a group of facts (patterns) and then left to itself, to train andsettle down (or not) to a stable state in some number of iterations 74.The Neural Network type most commonly used is theFeedforward Network or the Multilayer Perceptron. This is also called theBackpropagation Neural Network as it uses the Backpropagation Algorithm 73,74. A neural network model contains three types of layers – aninput layer, hidden layer(s), and an output layer. A feedforward neural networkis a network where the signal is passed from an input layer of neurons througha hidden layer to an output layer of neurons 73.

The input layer is the first layer of a Neural Network Modeland contains a list of influencers, or input parameters. These inputparameters, occupying the input nodes, represent the actual data used to fit amodel to the dependent variable, and each node is a separate independentvariable. These are connected to another layer of neurons called the hiddenlayer or hidden nodes, which modifies the data while attempting to solve thefitting equation. The connection between the ith and ythneuron (Figure) is characterisedby the weight coefficient (W) and a threshold coefficient (T). The weightcoefficient reflects the degree of importance of the given connection in theneural network 73.

The nodes in the hidden layer connect to the output layer.The output layer represents the target or dependent variable(s). It is commonfor networks to have only one target variable, or output node, but there can bemore 73. Generally, each node in the input layer connects to eachnode in the hidden layer and each node in the hidden layer connects to eachnode in the output layer. The artificial intelligence literature views thisstructure as analogous to biological neurons. The arrows leading to a node arelike the axons leading to a neuron. Like the axons, they carry a signal to theneuron or node.

The arrows leading away from a node are like the dendrites of aneuron, and they carry a signal away from a neuron or node. The neurons of abrain have far more complex interactions than those displayed in the diagram,but the developers of neural networks view them as abstracting the mostrelevant features of neurons in the human brain 73.Neural networks “learn” by adjusting the strength of thesignal coming from nodes in the previous layer connecting to it. As the neuralnetwork better learns how to predict the target value from the input pattern,each of the connections between the input neurons and the hidden orintermediate neurons and between the intermediate neurons and the outputneurons increases or decreases in strength 73. A function called a threshold or activation functionmodifies the signal coming into the hidden layer nodes. In the early days ofneural networks, this function produced a value of 1 or 0 (Equation 2.19),depending on whether the signal from the prior layer exceeded a thresholdvalue.

Thus, the node or neuron would only fire if the signal exceeded thethreshold, a process thought to be similar to that of a neuron (Equation 2.18) 73. (2.18) (2.

19) Hence, the activation functions currently used are typicallysigmoid in shape and can take on any value between 0 and 1, as stated above, orbetween –1 and 1, depending on the particular function chosen. Sigmoidfunctions are often used in artificial neural networks to introducenonlinearity in the model. A neural network element computes a linearcombination of its input signals, and applies a sigmoid function to the result.

A reason for its popularity in neural networks is because the sigmoid functionsatisfies a property between the derivative and itself such that it iscomputationally easy to perform. Derivatives of the sigmoid function are usuallyemployed in learning algorithms. Equation 2.

20 determines the mathematicalexpression of the sigmoid function. (2.20) The modified signal is then output to the output layernodes, which also apply activation functions. Thus, the information about thepattern being learned is encoded in the signals carried to and from the nodes.

These signals map a relationship between the input nodes (the data) and theoutput nodes (the dependent variable(s)).The Multi-layered Feedforward (MLF) neural network operatesin two modes: training and prediction mode. For the training of the MLF neuralnetwork and for the prediction using the MLF neural network we need two datasets, the training set and the set that we want to predict (test set).The training mode begins with arbitrary values of theweights – they might be random numbers – and proceeds iteratively. Eachiteration of the complete training set is called an epoch.

In each epoch thenetwork adjusts the weights in the direction that reduces the error (seeback-propagation algorithm). As the iterative process of incremental adjustmentcontinues, the weights gradually converge to the locally optimal set of values.Many epochs are usually required before training is completed.For a given training set, back-propagation leaming mayproceed in one of two basic ways: pattern mode and batch mode. In the patternmode of backpropagation learning, weight updating is performed after thepresentation of each training pattern. In the batch mode of back-propagationlearning, weight updating is performed after the presentation of all the trainingexamples (i.e.

after the whole epoch). From an ‘on-line’ point of view, thepattern mode is preferred over the batch mode, because it requires less localstorage for each synaptic connection. Moreover, given that the patterns arepresented to the network in a random manner, the use of pattern-by-pattern updatingof weights makes the search in weight space stochastic, which makes it lesslikely for the back-propagation algorithm to be trapped in a local minimum.

Onthe other hand, the use of batch mode of training provides a more accurateestimate of the gradient vector. Pattern mode is necessary to use for examplein on-line process control, because there are not all of training patternsavailable in the given time. In the final analysis the relative effectivenessof the two training modes depends on the solved problem 75,76. In prediction mode, information flows forward through thenetwork, from inputs to outputs. The network processes one example at a time,producing an estimate of the output value(s) based on the input values. Theresulting error is used as an estimate of the quality of prediction of thetrained network.

In back-propagation learning, the usual start is with atraining set and the usage of the back-propagation algorithm to compute thesynaptic weights of the network with the neural network so designed aiming atgeneralising. A network is said to generalise well when the input-outputrelationship computed by network is correct (or nearly correct) forinput/output patterns never used in training the network. Generalisation is nota mystical property of neural networks, but it can be compared to the effect ofa good non-linear interpolation of the input data S. When the learningprocess is repeated too many iterations (i.e. the neural network is over-trainedor over-fitted, between overtraining and overfitting is no difference), thenetwork may memorise the training data and therefore be less able to generalisebetween similar input-output patterns. The network gives nearly perfect resultsfor examples from the training set, but fails for examples from the test set.

Overfitting can be compared to improper choose of the degree of polynom in thepolynomial regression. Severe overfitting can occur with noisy data, even whenthere are many more training cases than weights. The basic condition for good generalisation is sufficientlylarge set of the training cases. This training set must be in the same time representativesubset of the set of all cases that you want to generalise to. The importanceof this condition is related to the fact that there are two different types ofgeneralisation: interpolation and extrapolation.

Interpolation applies to casesthat are more or less surrounded by nearby training cases; everything else isextrapolation. In particular, cases that are outside the range of the trainingdata require extrapolation. Interpolation can often be done reliably, butextrapolation is notoriously unreliable. Hence it is important to havesufficient training data to avoid the need for extrapolation.

Methods forselecting good training sets arise from experimental design 9.Data splitting for ANN development is essentially a samplingroblem where, given a database D comprising N data, the goal is to sample thedata into disjoint subsets T, test, and val of size NT, Ntest and Nval,for training, testing and validating, respectively. Within ANN literature, thistask has been performed using many different approaches, each with theiradvantages and disadvantages. Simple Random Sampling (SRS) is the most commonmethod for data splitting in ANN development, where data are selected with uniformprobability, which is determined as (2.21) and similarly, for x ?Stest and x ?Sval. Simple random sampling is easy to perform, and can beefficiently implemented in just a single pass over the data using algorithmssuch as Knuth’s algorithm (Knuth,1997).

However, the problem with this approach is that there is a chancethat the splitting of data suffers from variance, or bias, especially when thedata are non-uniformly distributed (Tourassi, Frederick, Markey, & Floyd Jr., 2001).2.7.1. Artificial Intelligence Applications inbuildings: Why needed? Driven by the pressure of cutting down the building energyconsumption, the management of these special purpose buildings seek to severalpotential measures to induce energy savings from all the aspects of buildingdesign 8.

This is byno means an easy task. Firstly, designing and implementing an energy savingintervention measure is complex in nature 10. Secondly, most of the special-purpose historicalbuildings impose restrictions forbidding any retrofit solutions to beimplemented that may alter the original appearance and character of thebuilding 11. Thirdly,the strategy of using the building Air Handling Unit (AHU) needs carefulplanning, satisfying a balanced optimisation for both ensuring propermicroclimatic controls as well as energy savings 12. Hence, it is imperative for the buildingmanagement to monitor, predict and analyse the indoor environment and energyuse to target adequate future energy saving and optimisation programs.

It is known that the first step for optimising energy use inbuildings is to have a mean for adequate energy usage prediction 1, not only for thebuilding owners but also for urban planners and energy suppliers. With thepotential of buildings to contribute towards the reduction in CO2emissions well recognised 2,urban planners seek to prediction of building energy systems to assess theimpact of energy conservation measures13. It is also known that the building energy and indoor environmentalpredict model forms the core of a building’s energy control and operationstrategy design to induce energy savings including peak demand shaving 14,15. However, due to costconstraints, building energy systems are typically not well measured ormonitored.

Sensors are only installed when they are necessary for certaincontrol actions. Sub-metering for building’s energy sub-systems are also notcommonly available in a building 15. These problems lead to a lot of vital information notavailable to better understand the existing building system.

A number of datamodel analyses, developed in recent years, cater to the need of obtainingbuilding energy prediction and optimisation strategies while tackling theassociated problems of system uncertainties and data availabilities. Buildingthermal and energy performance modelling is very complicated. It requiressubstantial and quality data input. The gap between design predicted and actualperformance are common and mainly due to discrepancy of the two set of data 16,17. For old buildings, agreat amount of data is not available, and the study information is usuallybased on the best assumption, which further enlarges the gap 16,17.Some of the modelling approaches followed White-BoxModelling, involving detailed physics-based dynamic equations to model thebuilding components 18–20.

A number of mature white box software tools, such as EnergyPlus, ESP-r, IES,TAS, and TRNSYS, also exists and they simplify the manual modelling processusing this technique 21.However, even though the tools are effective and accurate, these approachesbear the drawback of requiring detailed information and parameters of thebuildings, energy systems, and outside weather conditions which are difficultto obtain or even unavailable 15.Also, creating these models demand a lot of calculation time investment andexpertise 10. Someother approaches follow the Grey-Box Modelling strategy, such as Resistance andCapacitance model, or lumped capacitance model, representing the buildingelements in an analogue circuit 22,23. These approaches reduce the requisite amount of trainingdataset and calculation time.

Model coefficients are identified based onoperational data using statistics and parameter identification 24–26. However, theparameter computation process is often computationally demanding and timeconsuming and developing the structure of the grey model requires expertknowledge 10,15. This is where Black-Box Models, or purely data-driven modelsare beneficial as they are easy to build and computationally efficient 15,27–29, especially when alarge amount of historical data is available to train the models. Multiplelinear regression and self-regression methods were combined to predict buildingmonthly energy consumption 30.Fuzzy inferences system is also extensively used 31,32. Autoregressive with exogenous (ARX)model was developed to predict building load in 33.

An optimal trade-off between comfort andenergy using a meta-model based on regression techniques was developed in 34. Another simple andeasy to implement building energy tool is the Degree Day model 35. However, linear modelsare obtained around a specific working condition hence cannot guarantee asatisfactory approximation performance under varying working environments 36. ArtificialNeural Networks (ANN) have also been extensively used in the past ten years fortheir outstanding approximation ability of non-linear mapping along with onlinelearning. The application of ANN models in building modelling sector has mostlybeen towards prediction and optimisation of building energy consumption 25,37,38, cooling loads 35,39–41, temperature 10,36,42 and systemidentification 43–45.

System identification, which is the process for developingor improving a mathematical representation of a physical system using datacollection is widely used in engineering problems, but with limited use inbuilding system modelling 46,47.Owing to the inherently different building type and function of buildings likeart galleries and museums than the ones already studied, it would beinteresting to use this successful SID approach to obtain not only energyprediction but also a prediction of future indoor conditions based on the studyof historical patterns. 2.8.Conclusion: What theexisting literature tells us? References