Replies: 4 comments
-
ComponentsThe basic unit is a document, which makes sense by itself but is often part of a corpus. Documents or document-components can have uni- or bi-directional links to other documents or document-components. Links can be typed. Documents are given meaning by term-based dictionaries, which also organize the semantics and ontology. DocumentsThese contain: ** nested sections**, paragraphs/sentences, lists, tables, boxes/floats, footnotes, figures, images. (we do not currently support audio, video, interactive code). Semantic hierarchy of documentsThis is a hierarchy of semanticity from low to high. ImagesWe only work with born-digital images of formatted text (e.g. dumps of pages). Photographs, handwriting are too complex and unreliable. We use AMI: has used Optical Character Recognition (OCR) successfully (see PDF has only 3 components: characters, curves, and bitmaps. There are no words, sentences, paragraphs, lists, footnotes, floating-boxes or tables. These can often be reconstructed by heuristics (or ML) but it's not guaranteed. Spaces are sometimes included and often the character stream is in reading order but sometimes we need geometry and inter-character distances are used to compute spaces and words. Similarly paragraphs and list depend on line spacing. AMI: uses |
Beta Was this translation helpful? Give feedback.
-
Catalogue: The Things We HaveThis list - no particular order - is things we have partly or completely developed, that are Open. for Sharing. They may or may not be actively supported.
|
||||||||||||||||||||||||||||||||||||||||||
Beta Was this translation helpful? Give feedback.
-
The Tech We WantA list of the tech we now need. We may have made starts on some of this. And there may be tools that we don't know about. Documents and software must be Open. Software ideally Python.
|
Beta Was this translation helpful? Give feedback.
-
People and Activities we Want and NeedWe need to share experience and are keen to share our own. We need a community which understands the needs of newcomers, and we use synchronous and asynchronous tools to manage communication.
We avoid email except for formal management (e.g. induction, internship details) and containg the alumn community. Similarly we try to use Markdown on Github rather than GoogleDocs. Induction and Learning by Doing.We have no formal entry requirements and have successfully inducted school students and undergraduates with no IT experience. We have a number of set-up tasks which are essential - we give help when we can , but some are very platform/configuration-dependent. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
A high-level overview of everything we want to set out for TTWW presentation.
OUR CATALOGUE of everything we have and have done
Beta Was this translation helpful? Give feedback.
All reactions