Title: | Classification of RNA Sequences using Complex Network and Information Theory |
---|---|
Description: | It makes the creation of networks from sequences of RNA, with this is done the abstraction of characteristics of these networks with a methodology of maximum entropy for the purpose of making a classification between the classes of the sequences. There are two data present in the 'BASiNET' package, "mRNA", and "ncRNA" with 10 sequences. These sequences were taken from the data set used in the article (LI, Aimin; ZHANG, Junying; ZHOU, Zhongyin, 2014) <doi:10.1186/1471-2105-15-311>, these sequences are used to run examples. |
Authors: | Murilo Montanini Breve [aut] , Matheus Henrique Pimenta-Zanon [aut] , Fabricio Martins Lopes [aut, cre] |
Maintainer: | Fabricio Martins Lopes <[email protected]> |
License: | GPL-3 |
Version: | 0.99.6 |
Built: | 2025-01-07 04:14:49 UTC |
Source: | https://github.com/cran/BASiNETEntropy |
Given three or two distinct data sets, one of mRNA, one of lncRNA and one of sncRNA. The classification of the data is done from the structure of the networks formed by the sequences, that is filtered by an entropy methodology. After this is done, the classification starts.
classify( mRNA, lncRNA, sncRNA = NULL, trainingResult, save_dataframe = NULL, save_model = NULL, predict_with_model = NULL )
classify( mRNA, lncRNA, sncRNA = NULL, trainingResult, save_dataframe = NULL, save_model = NULL, predict_with_model = NULL )
mRNA |
Directory where the file .FASTA lies with the mRNA sequences |
lncRNA |
Directory where the file .FASTA lies with the lncRNA sequences |
sncRNA |
Directory where the file .FASTA lies with the sncRNA sequences (optional) |
trainingResult |
The result of the training, (three or two matrices) |
save_dataframe |
save when set, this parameter saves a .csv file with the features in the current directory. No file is created by default. |
save_model |
save when set, this parameter saves a .rds file with the model in the current directory. No file is created by default. |
predict_with_model |
predict the input sequences with the previously generated model. |
Results
Murilo Montanini Breve
library(BASiNETEntropy) arqSeqMRNA <- system.file("extdata", "mRNA.fasta",package = "BASiNETEntropy") arqSeqLNCRNA <- system.file("extdata", "ncRNA.fasta", package = "BASiNETEntropy") load(system.file("extdata", "trainingResult.RData", package = "BASiNETEntropy")) r_classify <- classify(mRNA=arqSeqMRNA, lncRNA=arqSeqLNCRNA, trainingResult = trainingResult)
library(BASiNETEntropy) arqSeqMRNA <- system.file("extdata", "mRNA.fasta",package = "BASiNETEntropy") arqSeqLNCRNA <- system.file("extdata", "ncRNA.fasta", package = "BASiNETEntropy") load(system.file("extdata", "trainingResult.RData", package = "BASiNETEntropy")) r_classify <- classify(mRNA=arqSeqMRNA, lncRNA=arqSeqLNCRNA, trainingResult = trainingResult)
A function that from a biological sequence generates a graph not addressed having as words vertices, this being able to have its size parameter set by the' word 'parameter. The connections between words depend of the' step 'parameter that indicates the next connection to be formed
createedges(sequence, word = 3, step = 1)
createedges(sequence, word = 3, step = 1)
sequence |
It is a vector that represents the sequence |
word |
This integer parameter decides the size of the word that will be formed |
step |
It is the integer parameter that decides the step that will be taken to make a new connection |
Returns the array used to creates the edge list
Murilo Montanini Breve
A function that from the complex network topological measures create the feature matrix.
creatingDataframe(measures, tamM, tamLNC, tamSNC)
creatingDataframe(measures, tamM, tamLNC, tamSNC)
measures |
The complex network topological measures |
tamM |
mRNA sequence size |
tamLNC |
lncRNA sequence size |
tamSNC |
snRNA sequence size |
Returns the feature matrix in scale 0-1
Murilo Montanini Breve
A function that from the entropy measures and threshold creates an entropy curve.
curveofentropy(H, threshold)
curveofentropy(H, threshold)
H |
The 'training' return for the entropy measures |
threshold |
The 'training' return for the threshold |
Returns a entropy curve
Murilo Montanini Breve
A function that calculates the entropy
entropy(x)
entropy(x)
x |
The probabilities P0 and P1 |
Returns the entropy
Murilo Montanini Breve
A function that filters the edges after the maximum entropy is obtained
filtering(edgestoselect, edgestofilter)
filtering(edgestoselect, edgestofilter)
edgestoselect |
The selected edges |
edgestofilter |
The edges used to filter |
Returns the filtered edges
Murilo Montanini Breve
A function that compares the matrices 'trainingResult' and the adjacency matrix to produce a filtered adjacency matrix.
matrixmultiplication(data, histodata)
matrixmultiplication(data, histodata)
data |
Adjacency matrix |
histodata |
'trainingResult' data |
Returns the filtered adjacency matrix
Murilo Montanini Breve
A function that calculates the maximum entropy
maxentropy(histogram)
maxentropy(histogram)
histogram |
The histogram (used in 'training' function) |
Returns the maximum entropy
Murilo Montanini Breve
Given the results the data is rescaled for values between 0 and 1, so that the length of the sequences does not influence the results. The rescaling of the sequences are made separately
preprocessing(datah, tamM, tamLNC, tamSNC)
preprocessing(datah, tamM, tamLNC, tamSNC)
datah |
Array with results numerics |
tamM |
Integer number of mRNA sequences |
tamLNC |
Integer number of lncRNA sequences |
tamSNC |
Integer number of sncRNA sequences |
Returns the array with the rescaled values
Murilo Montanini Breve
A function that selects the edges of the adjacency matrix
selectingEdges(MAX, data)
selectingEdges(MAX, data)
MAX |
The maximum entropy |
data |
The adjacency matrix |
Returns the selected edges of the adjacency matrix
Murilo Montanini Breve
A function that trains the algorithm to select the edges that maximize the entropy
training(mRNA, lncRNA, sncRNA = NULL)
training(mRNA, lncRNA, sncRNA = NULL)
mRNA |
Directory where the file .FASTA lies with the mRNA sequences |
lncRNA |
Directory where the file .FASTA lies with the lncRNA sequences |
sncRNA |
Directory where the file .FASTA lies with the sncRNA sequences (optional) |
Returns the edge lists and the 'curveofentropy' function inputs
Murilo Montanini Breve