, Chi of E. coli could be considered as an over-represented one from 99 occurrences for a significance degree s of 0.0001. Because Chen-Stein bound is equal to 0.067726, Chen-Stein method does not permit to conclude for significance degrees of 0.01 and 0.001. Moreover, it is well known that Chi of E. coli is a very relevant word in this bacteria. Then, we expect a very small References M. Abadi. Exponential approximation for hitting times in mixing processes, Mathematical Physics Electronic Journal, vol.7, 2001.

M. Abadi, Instantes de ocorrência de eventos raros em processos misturadores, 2001.

M. Abadi, Sharp error terms and necessary conditions for exponential hitting times in mixing processes, Annals of Probability, vol.32, pp.243-264, 2004.

H. Almagor, A Markov analysis of DNA sequences, J.Theor. Biol, vol.104, pp.633-645, 1983.

R. Arratia, L. Goldstein, and L. Gordon, Two moments suffice for Poisson approximations: the Chen-Stein method, Ann. Prob, vol.17, pp.9-25, 1989.

R. Arratia, L. Goldstein, and L. Gordon, Poisson approximation and the Chen-Stein method, Statist. Sci, vol.5, pp.403-434, 1990.

A. D. Bardour, L. H. Chen, and W. L. Loh, Compound Poisson approximation for nonnegative random variables via Stein's method, Ann. Prob, vol.20, pp.1843-1866, 1992.

B. E. Blaisdell, Markov chain analysis finds a significant influence of neighboring bases on the occurrence of a base in eucaryotic nuclear DNA sequences both protein-coding and noncoding, J. Mol. Evol, vol.21, pp.278-288, 1985.

F. R. Blattner, G. Plunkett, C. A. Bloch, N. T. Perna, V. Burland et al., The complete genome sequence of escherichia coli k-12, Science, vol.277, pp.1453-1474, 1997.

L. H. Chen, Poisson approximation for dependant trials, Ann. Prob, vol.3, pp.534-545, 1975.

S. A. Douglass, Introduction to Mathematical Analysis, 1996.

M. E. Karoui, V. Biaudet, S. Schbath, and A. Gruss, Characteristics of Chi distribution on different bacterial genomes, Res. Microbiol, vol.150, pp.579-587, 1999.

R. D. Fleishmann, M. D. Adams, O. White, and R. A. Clayton, Whole-genome random sequencing and assembly of haemophilus influenzae rd, Science, vol.269, pp.496-512, 1995.

M. S. Gelfand, C. G. Kozhukhin, and P. A. Pevzner, Extendable words in nucleotide sequences, Bioinformatics, vol.8, pp.129-135, 1992.

A. P. Godbole, Poisson approximations for runs and patterns of rare events, Adv. Appl. Prob, vol.23, pp.851-865, 1991.

S. Karlin, C. Burge, and A. M. Campbell, Statistical analyses of counts and distributions of restriction sites in DNA sequences, Nucl. Acids Res, vol.20, pp.1363-1370, 1992.

S. P. Meyn and R. L. Tweedie, Markov Chains and Stochastic Stability, 1993.

V. Miele, P. Y. Bourguignon, D. Robelin, G. Nuel, and H. Richard, seq++ : analyzing biological sequences with a range of Markov-related models, Bioinformatics, vol.21, pp.2783-2784, 2005.

P. Nicodème, T. Doerks, and M. Vingron, Proteome analysis based on motif statistics, Bioinformatics, vol.18, pp.5161-5171, 2002.

G. Nuel, LD-SPatt: Large Deviations Statistics for Patterns on Markov Chains, Comp. Biol, vol.11, pp.1023-1033, 2004.
URL : https://hal.archives-ouvertes.fr/hal-00271507

G. J. Philips, J. Arnold, and R. Ivarie, The effect of codon usage on the oligonucleotide composition of the e. coli genome and identification of over-and underrepresented sequences by Markov chain analysis, Nucl. Acids Res, vol.15, pp.2627-2638, 1987.

B. Prum, F. Rodolphe, and E. De-turckheim, Finding words with unexpected frequencies in DNA sequences, J. R. Statis. Soc. B, vol.11, pp.190-192, 1995.

M. Régnier, A unified approach to word occurrence probabilities, Discr. Appl. Math, vol.104, pp.259-280, 2000.

G. Reinert and S. Schbath, Compound Poisson and Poisson process approximations for occurrences of multiple words in Markov chains, J. Comput. Biol, vol.5, pp.223-253, 1998.

G. Reinert, S. Schbath, and M. S. Waterman, Probabilistic and Statistical Properties of Words: An Overview, J. Comput. Biol, vol.7, 2000.

S. Robin and J. J. Daudin, Exact distribution of word occurrences in a random sequence of letters, J. Appl. Prob, vol.36, 1999.
URL : https://hal.archives-ouvertes.fr/hal-01222427

G. R. Smith, S. M. Kunes, D. W. Schultz, A. Taylor, and K. L. Triman, Structure of chi hotspots of generalized recombination, Cell, vol.24, pp.429-436, 1981.

H. O. Smith, M. L. Gwinn, and S. L. Salzberg, DNA uptake signal sequences in naturally transformable bacteria, Res. Microbiol, vol.150, pp.603-616, 1999.

C. Stein, A bound for the error in the normal approximation to the distribution of a sum of dependent random variables, Proc. Sixth Berkeley Symp, vol.2, pp.583-602, 1972.

J. Van-helden, B. André, and J. Collado-vides, Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies, J. Mol. Biol, vol.281, pp.872-842, 1998.

J. Van-helden, M. Olmo, and J. E. Pérez-ortín, Statistical analysis of yeast genomic downstream sequences reveals putative polyadenylation signals, Nucl. Acids Res, vol.28, pp.1000-1010, 2000.