Автор Тема: AbiWord - структура(схема, описание) документа(шаблона)  (Прочитано 975 раз)

Оффлайн Pureproft

  • Давно тут
  • **
  • Сообщений: 329
    • Email
    На мой взгляд, недооценённый под всеми системами инструмент шаблонизации отчётов(по аналогии с rtf,html,...) в широком смысле слова.
    Давно в планах исправить ситуацию, но вот незадача, не вижу не где описания структуры abw. Не хотелось бы заниматься исследованиями в этом вопросе. Может плохо(не умело) ищу?

Рассвет наступит неизбежно!

Оффлайн Skull

  • Глобальный модератор
  • *****
  • Сообщений: 19 926
    • Домашняя страница
    • Email
Андрей Черепанов (cas@)

Оффлайн Pureproft

  • Давно тут
  • **
  • Сообщений: 329
    • Email
Спасибо.
Спойлер
It is my intention to keep this document in sync with the current state of the code.  Thus, this document does not yet describe the full feature set planned for version 1.0.  This document is current with version 0.7.14-2 of the code.


AbiWord Document Format

Version 1.0

Copyright (C) 1999-2000 AbiSource, Inc., All Rights Reserved.

Jeff Hostetler

jeff@abisource.com

AbiSource, Inc.


1. Introduction


This document describes the AbiWord file format used to represent AbiWord native documents. This document describes file format version 1.0.


AbiWord uses XML[1] to represent a document. This does not imply that AbiWord is an XML editor; but rather, AbiWord is a Word Processor that just happens to use XML as a convenient syntax for representing documents. AbiWord contains a very strict and unforgiving import; it requires well-formed XML in strict adherence the format specified by the code.  This code is primarily located in ie_exp_AbiWord_1.cpp and ie_imp_AbiWord_1.cpp. AbiWord has a DTD[2], but it should not be taken as definitive.  Our primary goal is to support documents written by AbiWord rather than hand written XML.


AbiWord also uses some of the syntax and conventions from CSS2[3] to represent certain concepts, such as character formatting. CSS2 was designed as a style mechanism for WWW documents and not as a style mechanism for page-oriented documents. We used CSS2 as a guideline, taking parts that were of use and inventing our own mechanism as necessary.


2. Document Structure


The AbiWord file format is an 7bit-clean ASCII XML file. Non-US-ASCII characters are represented using standard XML numeric entities (e.g., "ÿ" or "").


The following illustrates the basic form of an AbiWord file:


 <?xml version="1.0"?>

<abiword version="0.7.8">
 <section>
  <p props="text-align:center">Hello World.</p>
  <p>This is a test paragraph.</p>
  <p>This word is
   <c props="font-weight:bold">bold</c>.</p>
 </section>
 <section props="column-gap:0.25in; columns:2">
  <p>This section <image dataid="foo">has two      columns.</p>
 </section>
 <data>
  <d name="foo">
   XXXXXXXXXX...
  </d>
 </data>
</abiword>


In general, a tag has a set of attributes; in the above example, the d tag has the attribute

name with value foo.  One special tag is the props flag, which contains a semicolon-separated list of properties for its tag.


2.1.  <abiword>...</abiword>


The entire content of the document is contained within this pair of tags. Within these tags are a series of sections and an optional data block.


The <abiword> tag has two possible attributes, both of which should always be present.


 version = number | "unnumbered" | “”


This specifies the version of AbiWord that created the document.  The empty string and “unnumbered” are equivalent.


 fileformat = number


This represents the file format version.  This must be the string "1.0“.

 


2.2.  <section>...</section>


These tags delimit a section; see src/text/fmt/xp/fl_SectionLayout.cpp. A section is a portion of a document that has some common characteristic, such as its column layout. A section does not necessarily correspond to anything in the actual content, such as a chapter.  Every document must contain at least one section. 


Section-level attributes are:


id: unique-section-id

 type: footer | header | doc

header: section-id for current section’s header.

footer: section-id


The following are section-level properties and may appear in the value of the props attribute of the <section> tag:


columns: integer-number-of-columns ;

column-gap: dimensioned-distance ;

column-line: on | off;

section-space-after: dimensioned-distance;

page-margin-top: distance from top of page to text;

page-margin-left: ...;

page-margin-right: ...;

page-margin-bottom: distance from last line of

     text to bottom margin;

page-margin-header: distance from top of page to header;

page-margin-footer: distance from bottom margin to footer;

background-color: RRGGBB colour | transparent;


2.2.1.  <p>...</p>


These tags represent a block (or paragraph); see src/text/fmt/xp/fl_BlockLayout.cpp. Also notice that all but the last 7 of the paragraph properties are listed in src/wp/ap/xp/ap_Dialog_Styles.cpp. All document text must be within a block. Blocks may not be nested (at the current time). All paragraph formatting options appear as attributes of the this tag. All document content must appear within a block.  A section must contain at least one paragraph.


The following attributes may be attached to the p tag:


id: unique-p-id;

level: integer;

style: style-name;

listid: makes this para an element of the list with

  given id;

parentid: listid of this para’s parent list;

props: list of properties


The following are block-level properties and may appear in the value of the props attribute:


text-align: left | center | right | justify ;

line-height: dimensioned-distance;

margin-top: dimensioned-distance;

margin-left: ...;

margin-right: ...;

margin-bottom: ...;

text-indent: dimensioned-distance;

tabstops: tab specification;

field-font: font-name;

field-color: RRGGBB | transparent;

start-value: integer;

list-delim: string, where %L stands for the list counter;

list-decimal: string;

orphans: integer (# of lines that constitute an orphan);

widows: integer;

keep-together: yes | no;

keep-with-next: yes | no;

default-tab-interval: dimensioned-distance;

lang: ...;

dom-dir: rtl | ltr;


A tab specification is a comma-delimited list of tab stops.  A tab stop is specified with a position, followed by a slash (/), an alignment value (LRCDB), and a leader value; for instance, 0.95in/C1 is a centered tab at 0.95 inches, with type 1 leader (dots).


2.2.2.  <c>...</c>


These tags delimit an in-line span format; see src/text/fmt/xp/fp_Run.cpp. All but the last 3 attributes are also listed in ap_Dialog_Styles.cpp. These tags are used to apply a span-level format change within a block. For example, we use a c tag to make a word italic within a paragraph.


Spans may be nested, but this should be thought of as a convenience for document translators; AbiWord will flatten these during import and will write them out flattened on export.


A set of start and end c tags delimit a span of document text. These are only necessary to change a style from the settings inherited from the block. Text need not be enclosed in c tags.


The following attributes belong to the c tag:


style: style-name

props: list of properties

type:  “list-label"


The following are span-level properties and may appear in the value of the props attribute:


color: RRGGBB | transparent;

bgcolor: RRGGBB | transparent;

font-family: font-name;

font-style: ...;

font-variant: ...;

text-decoration: [underline | line-through | overline

    | topline | bottomline]*;

font-weight: bold | normal;

font-stretch: ...;

font-size: floating-point;

text-position: [] | superscript | subscript;

dir: ltr | rtl | ntrl;

dir-override: ltr | rtl | ntrl;


2.2.3.  <image/>


This tag defines an in-line image reference.


The <image/> tag has the following attributes:


dataid = data-item-name

props = formatting specification


The dataid is a reference to a named Data Item in the Data section of the file.


The <image/> tag has the following properties that may appear in the value of the props attribute:


width: ...;

height: ...;


2.2.4.  <br/>


This tag defines a forced line-break.  This tag has no attributes or properties.


2.2.5.  <cbr/>


This tag defines a forced column-break.  This tag has no attributes or properties.


2.2.6.  <pbr/>


This tag defines a forced page-break.  This tag has no attributes or properties.


2.2.7.  <field/>


This tag defines an in-line computed field reference.


This tag has the following attribute:


 type: list-label | time | page_number | page_count |

  date | date_mmddyy | date_ddmmyy | date_mdy |

  date_mthdy | date_dfl | date_ntdfl | date_wkday |

  date_doy | time_miltime | time_ampm | time_zone |

  time_epoch | word_count | char_count |

  line_count | nbsp_count | file_name | app_ver |

  app_id | app_options | app_target |

  app_compiledate | app_compiletime


2.3.  <data>...</data>


These tags delimit a series of one or more data items.


2.3.1.  <d>...</d>


These tags define a Data Item -- an opaque blob of Base64-Encoded data.  Encoded content may be broken up on multiple lines as in MIME.  For images, these are Base64-encoded PNG objects.  Other types of objects may be defined in the future.


The <d> tag has the following attributes:


name=unique-data-item-name


The name attribute provides a target for the DATAID attribute of the <I> and other tags.


2.4. <styles>...</styles>


These tags delimit a series of one or more styles. 


2.4.1. <s>...</s>


These tags define a style -- a set of formatting commands that can be applied to any p or c element in any <section>.  When applied, these specify formatting for that span. 


The <s> tag has the following attributes:


 basedon = style-name

name = unique-style-name

type = ...

props = any properties of a p or a c tag

followedby = style-name

listid = list-id

parentid = ...

level = ...

style = ...

 

[What do the last 4 of these do for us?]


2.5. <lists>...</lists>


These tags delimit a series of one or more of list declarations.  A list must appear at the top-level document level.


2.5.1. <l>...</l>


These tags describe a list which occurs in the body of the document; lists are referred to by their id in p elements.


The <l> tag has the following attributes:


 id = unique identifier for list

parentid = parent of this list

type = LARGE-ENUMERATION

   (see src/text/fmt/xp/fl_AutoLists.h)

start-value = number

list-delim = formatting specification

list-decimal = ...


2.6. <ignoredwords>...</ignoredwords>


These tags delimit a series of zero of more words to be ignored by the spellchecker.  This tag has no attributes or properties.


2.6.1. <iw>...</iw>


Every ignored word appears in an <iw>..</iw> tag.  No attributes or properties are allowed.


3.  Example Document


The source for this document can be used as an example to study from. Additionally, the source tree contains numerous simple example documents (in abi/docs and abi/src/wp/samples). The XML source for these documents can be viewed with any text editor.


4.  References


[1] XML - http://www.w3.org/TR/REC-xml

[2] AbiWord DTD - http://www.abisource.com/awml.dtd

[3] CSS2 - http://www.w3.org/TR/REC-CSS2

[4] The AbiWord source. (See http://www.abisource.com/lxr/ for a fully cross-referenced view of the AbiWord source code.)


5.  TODO


[] Discuss white-space handling within content and around tags.


[] Discuss mapping of CR, LF, VT, HT, FF into various tags and vice versa.

Рассвет наступит неизбежно!

Оффлайн YYY

  • Мастер
  • ***
  • Сообщений: 5 972
    На мой взгляд, недооценённый под всеми системами инструмент шаблонизации отчётов(по аналогии с rtf,html,...)

а чем лучше RTF?

RTF с картинками и табличками легко генерируется... да хоть думаю из bash'а :)

Оффлайн Pureproft

  • Давно тут
  • **
  • Сообщений: 329
    • Email
Я и не утверждал, что лучше....
Всё что описано в документации можно сгенерить и на ассемблере ;-D кому на чём удобней.
Просто я не знаю другого широко известного лёгкого редактора под все десктоп платформы кроме abiword. И он конечно покажет rtf, но хочется разобраться с родным форматом.

p.s. qt,kde совсем не рассматривал, как то не моё, не сложилось.
« Последнее редактирование: 14.04.2015 17:49:56 от jobless »
Рассвет наступит неизбежно!

Оффлайн Pureproft

  • Давно тут
  • **
  • Сообщений: 329
    • Email
новость на opennet http://www.opennet.ru/opennews/art.shtml?num=42037 привела к https://wiki.documentfoundation.org/DLP/Libraries/libabw

любопытно, AbiWord к устаревшим или проприетарным отнесли? ;-)
Рассвет наступит неизбежно!