A Bridge Between Natural Language— And Ontology-based Biomedical Resources The Examples of Biomedical Knowledge Expressed by SLHMR

A Bridge Between Natural Language- And Ontology-based Biomedical Resources

—-The Examples of Biomedical Knowledge Expressed by SLHMR

Hanfei Bao,BMKI Lab, Toronto, Canada Mail: hanfeib@gmail.com

I. The Structured Language Readable To Human And Machine(SLHMR)

The Structured Language Readable To Human And Machine(SLHMR) is a formatted language which is expected by the author to be used to express the biomedical knowledge in both the formatted (and therefore machine-readable) and still natural-text-like way.  The resources which are expressed  by SLHMR (SLHMRRs) are taken as a bridge between the traditional natural-text biomedical resources (NTBMRs) and the ontology-based formatted biomedical resources(OFBMRs) . The philosophic bases for SLHMR are the views that NTBMRs are usually only human-readable (rather than machine- or computer program-readable), whereas the OFBMRs are generally only machine-readable. Thus the biomedical resources expressed by SLHMR would help human reader to understand the corresponding OFBMRs by their natural language features and are expected to be more powerful in the expression capability than the ontology-technique, as well, especially in respects of expressing the knowledge about the super-complex systems like the molecular biomedicine, such as in HIV/AIDS area. Additionally, SLHMR are intedned to be readable by machine through the development of the powerful applications. (see Fig19-1)

Fig19-1.png

Fig19-1

II. The Examples of Knowledge About HIV/HIDS Expressed By SLHMR

1. (LTR–or—U3–or–R–or–U5)—isA—SequenceOfViralDNA
2. (U3–or–R–or–U5)—isA—RegionOfViralDNA
3.  LTR —physicalyContains—(U3–and–R–and–U5)
4.  LTR —isSubdividedInto—(U3–and–R–and–U5)
5.  LTR—physicalyContains—(Enhancer–and–Promotor–and–Cleavage/Poly(A))
6. U3—isEncodedBy—ViralRNA-SequencesUniquelyAt3′EndOfGenome
7. U5—isEncodedBy—ViralRNA-SequencesUniquelyAt5′EndOfGenome
8. R—isEncodedBy—RepeatedSequenceOfViralRNAAtEitherEndOfGenome
9. SiteOf(CleavageAlsoCalledPolyadenylation)—isLocatedAt—R
10. (Promoter–or–Eenhancer)—isLocatedAt—U3
11. (Enhancer–or–Promotor–or–SiteOf(CleavageAlsoCalledPolyadenylation)—isA—TranscriptionalSignal
12. Transcription —isInitiatedAt—BoundaryBetweenU3AndR
13. CleavageAlsoCalledPolyadenylation—takesPlaceAt— BoundaryBetweenU5AndR
14. BoundaryBetweenU3AndR—isLocatedAt—UpstreamLTR
15. BoundaryBetweenU5AndR—isLocatedAt—DownstreamLTR

 

III.  The Examples of Relations Drawn from the Particular References and Their Composite and the Granularity Evolution of Those Relations

We can draw the following set of the physical relations from ref. 2,3,4. They are

16. RNA-PolymeraseII—regulatesPositively—(TranscriptionOfDNA-ToSynthesizePrecursorsOf-mRNA–or–TranscriptionOfDNA-ToSynthesizePrecursorsOf-snRNA–or–TranscriptionOfDNA-ToSynthesizePrecursorsOf-microRNA)

17. PhosphorylationOfCTD-OfLargerSubUnitOfRNA-PolymeraseII—regulatesPositively—ActivityOfRNA-PolymeraseII

18. The compound form of the relations 16 and 17 would be
(RNA-PolymeraseII—regulatesPositively—(TranscriptionOfDNA-ToSynthesizePrecursorsOf-mRNA–or–TranscriptionOfDNA-ToSynthesizePrecursorsOf-snRNA–or–TranscriptionOfDNA-ToSynthesizePrecursorsOf-microRNA))—byMeansOf—PhosphorylationOfCTD-OfLargerSubUnitOfRNA-PolymeraseII

19. The more detailed form of 18 would be
(RNA-PolymeraseII—regulatesPositively—(TranscriptionOfDNA-ToSynthesizePrecursorsOf-mRNA–or–(TranscriptionOfDNA-ToSynthesizePrecursorsOf-snRNA—hasDescrPanWeightOfProportionOfPopulation—MostOfPopulation)–or–(TranscriptionOfDNA-ToSynthesizePrecursorsOf-microRNA—hasDescrPanWeightOfProportionOfPopulation—MostOfPopulation)))—byMeansOf—PhosphorylationOfCTD-OfLargerSubUnitOfRNA-PolymeraseII

20 The coarseness or granularity evolution here means the processes of turninng a content-detailed relation into its less detailed one or the coarser granularity form or even the blackbox-like form. For 19, we have the steps as the following:
(RNA-PolymeraseII—regulatesPositively—(TranscriptionOfDNA-ToSynthesizePrecursorsOf-mRNA–or–(TranscriptionOfDNA-ToSynthesizePrecursorsOf-snRNA—hasDescrPanWeightOfProportionOfPopulation—MostOfPopulation)(#st1)–or–(TranscriptionOfDNA-ToSynthesizePrecursorsOf-microRNA—hasDescrPanWeightOfProportionOfPopulation—MostOfPopulation)(#st2)))—byMeansOf—PhosphorylationOfCTD-OfLargerSubUnitOfRNA-PolymeraseII
—>
(RNA-PolymeraseII—regulatesPositively—(TranscriptionOfDNA-ToSynthesizePrecursorsOf-mRNA–or–(#st1)–or–(#st2))(#st3))—byMeansOf—PhosphorylationOfCTD-OfLargerSubUnitOfRNA-PolymeraseII
—>
(RNA-PolymeraseII—regulatesPositively—(#st3))(#st4)—byMeansOf—PhosphorylationOfCTD-OfLargerSubUnitOfRNA-PolymeraseII
—>
(#st4)—byMeansOf—PhosphorylationOfCTD-OfLargerSubUnitOfRNA-PolymeraseII

Thus a “simple” binary relation would be the last result of the granularization evolution.

IV. The Network-like Diagram For The Physical Relations Above And Some Necessary Logic Relations

The data file RNAPolII.sif, which is of the required form for Cytoscape, of the completed physical and logical relations for the network-like diagram is as the following:
Column1 Column2 Column3
st14 byMeansOf PhosphorylationOfCTD-OfLargerSubUnitOfRNA-PolymeraseII
RNA-PolymeraseII regulatesPositively TranscriptionOfDNA-ToSynthesizePrecursorsOf-mRNA
RNA-PolymeraseII regulatesPositively st11
RNA-PolymeraseII regulatesPositively st12
st14 represents RNA-PolymeraseII—regulatesPositively—st13
st13 represents RNA-PolymeraseII—regulatesPositively—TranscriptionOfDNA-ToSynthesizePrecursorsOf-mRNA
st13 represents RNA-PolymeraseII—regulatesPositively—st11
st13 represents RNA-PolymeraseII—regulatesPositively—st12
st12 represents TranscriptionOfDNA-ToSynthesizePrecursorsOf-microRNA—hasDescrPanWeightOfProportionOfPopulation—MostOfPopulation
st11 represents TranscriptionOfDNA-ToSynthesizePrecursorsOf-snRNA—hasDescrPanWeightOfProportionOfPopulation—MostOfPopulation
st12 hasSubjectOfRelation TranscriptionOfDNA-ToSynthesizePrecursorsOf-microRNA
st11 hasSubjectOfRelation TranscriptionOfDNA-ToSynthesizePrecursorsOf-snRNA
st13 hasObjectOfRelation st11
st13 hasObjectOfRelation st12
st13 hasObjectOfRelation TranscriptionOfDNA-ToSynthesizePrecursorsOf-mRNA
st14 hasObjectOfRelation st13

And the network diagram shown by Cytoscape can be seen in Fig. 19-2.

Fig19-2.png

Fig19-2 the diagram made by Cytoscape.

V. Realization Of Ontology Based On the Resources Expressed By SLHMR

From references 5-6, we have the SLHMR-expressed resources and based on which the author has built the corresponding ontology HIVGenetics00*.owl, which will be published as a open medical machine reading resource:

(1)LTR-FullNmLongTerminalRepeatOfHIV1ProViralDNA—isA—SequenceOfDNA;
(2)LTR-FullNmLongTerminalRepeatOfHIV1ProViralDNA—hasNumberOfBasePair— some 634(see Fig19-3)

Fig19-3.png

Fig. 19-3

(3)LTR-FullNmLongTerminalRepeatOfHIV1ProViralDNA—isLocatedInOrAt—EitherEndRegionOfHIV1ProviralDNA
(4)LTR-FullNmLongTerminalRepeatOfHIV1ProViralDNA—hasPhysicalComponentOrPhysicallyContains—
(U3RegionFullNmUnique3’Sequence–And–R-RegionFullNmRepeatedSequence–And–U5FullNmUnique5’Sequence)(see Fig.19-4)

Fig19-4.png

Fig.19-4

(5)U3RegionFullNmUnique3’Sequence—hasNumberOfBasePair—some 450
(6)U3RegionFullNmUnique3’Sequence—physicallyAndFunctionallyContains— CisActingDNA-Element
(7)CisActingDNA-Element—isA—SiteBindingCellularTranscriptionFactor(see Fig.19-5)

Fig19-5.png

Fig.19-5

(8)R-RegionFullNmRepeatedSequence—hasNumberOfBasePair—some 100
(9)Transcription—startsAtLocation—FirstBaseOfR-RegionFullNmRepeatedSequence(see Fig. 19-6)

Fig19-6.png

Fig. 19-6

(10)Polyadenylation—startsAtLocation— ImmediatelyAfterLastBaseOfR-RegionFullNmRepeatedSequence(see Fig, 19-7)

Fig-19-7.png

Fig. 19-7

(11)U5RegionFullNmUnique5’Sequence—hasNumberOfBasePair—some 180
(12)U5RegionFullNmUnique5’Sequence—binds—Tat(Fig. 19-8)
(13)U5RegionFullNmUnique5’Sequence—binds–PackagingSequenceOfHIV(Fig. 19-8)

Fig-19-8.png

Fig. 19-8

References

1, Warner C. Greene, and B. Matija Peterlin:Molecular Insights Into HIV Biology

http://hivinsite.ucsf.edu/InSite?page=kb-00&doc=kb-02-01-01

2, Nguyen V.T., Kiss T., Michels A.A., Bensaude O.:,7SK small nuclear RNA binds to and inhibits the activity of CDK9/cyclin T complexes.
http://www.uniprot.org/citations/11713533

3,
https://en.wikipedia.org/wiki/RNA_polymerase_II

4, Steven Hahn:
Structure and mechanism of the RNA Polymerase II transcription machinery,
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1189732/

5. Hope, PhD, Didier Trono, MD: Structure, Expression, and Regulation of the HIV Genome,http://hivinsite.ucsf.edu/InSite?page=kb-02-01-02#S3.1X
6. https://en.wikipedia.org/wiki/Long_terminal_repeat

 (Relative sites for ontologies and discussions:

https://bioportal.bioontology.org/ontologies/HIVO004
http://bioportal.bioontology.org/ontologies/HCODONONT/
http://bioportal.bioontology.org/ontologies/AIDSCLIN
http://blog.51.ca/u-345129/
http://wp.miforum.net/baohanfei/)

(All rights reserved),(Updates: 2017-01-18,2017-01-19,2017-01-24,2017-01-26,2017-02-10,2017-02-11)

To Make Biomedical Knowldge Storehouse To Medical Knowlege Systems--The Goals of The Ontology Sciences In HIV/AIDS Areas

The Preface Of A Pilot Project

Hanfei Bao,BMKI Lab, Toronto, Canada

Mail: hanfeib@gmail.com

 

One of the big challenges faced by human beings in the development of the Biomedicine is how to better understand, integrate and use those countless and ever growing biomedical data, information and knowledge, especcially in a precise or biomedical engineering way.

We would prefer taking the mass biomedical knowledge resources(BMKRs) as the complete systems, being precisely organized or tightly connected. But unfortunately it is not true. Our BMKRs nowadays are essentially only fragmentary ones in terms of their cognitive and operative features. They are kinds of well classified knowledge storehouses rather than well connected systems. Except those biomedical sciences such as the anatomy or the praxiology etc. which are obtained on basis of ordinary sense-organs like eyes (neither macro- nor micro-instruments), the most of our BMKRs, especially in the molecular biomedicine, are made up of numberless knowledge pieces which are, in most cases, isolated or separated from each other and poorly connected.

Secondly, our BMKRs are usually only partly known. Thus why they are described visuablly as the grey boxes rather than the white boxes in the Systems Theories. These situations are particularly true for the biomedical sciences at the levels of the molecular biology.

Thirdly, BMKRs are dominantly the knowledge or information of phenomenology in its cognitive nature. That means the knowledge or information are usually observable or measurable but hardly understandable.  About them, we generally know what have happened, but we don’t know why they happen or how the molecular mechanics make them happen, and therefore they usually lack of the capacities of precisely reasoning and calculation abilities. In other words, if you intend to use these knowledge in your health practices, you had better to retest them to your target again before a real clinical usage. This is, in fact, the way frequently used in the current clinic practices.

That is why we can’t only make our efforts to mine more and more new biomedical information and knowledge and express or accumulate them only in the traditional ways, i.e. in the way of the free text or the natural language(NL).

One of the goals of the Biomedical Informatics(BMI) and Biomedical Ontology Sciences(BMOS) is to help improve the above situations by formatedly organizations and expressions of biomedical knowledge and developments of powerful applications which operate on them.

(All rights reserved)

(Updates: 1017-01-11,)