--- title: "BibTeX and CFF" subtitle: A deterministic crosswalk between two non-equivalent citation schemas bibliography: REFERENCES.bib author: - name: Diego Hernangómez orcid: 0000-0001-8457-4658 affiliations: - Independent Researcher tbl-cap-location: bottom description: > This article presents a crosswalk between BibTeX and the Citation File Format, as implemented by the cffr package. abstract: > This article introduces a crosswalk between BibTeX and the Citation File Format (CFF) [@druskat_citation_2021], as implemented by the **cffr** package [@hernangomez2021]. Because the two formats differ in structure and expressiveness, the mapping is not bijective. Nevertheless, it provides a deterministic and reproducible strategy for practical interoperability across citation workflows. link-citations: true documentclass: article toc: true toc-depth: 3 vignette: > %\VignetteIndexEntry{BibTeX and CFF} %\VignetteEngine{quarto::html} %\VignetteEncoding{UTF-8} knitr: opts_chunk: collapse: true comment: "#>" --- ```{r} #| include: false library(cffr) # Load the table of tables p2file <- system.file("extdata/crosswalk_tables.csv", package = "cffr") table_master <- read.csv(p2file) ``` ::: callout-important Generative AI tools were used to assist in producing some of the material in this article. ::: ## Citation Please cite this article using this BibTeX entry: ``` bib @article{hernangomez2022, title = {{BibTeX} and {CFF}, A deterministic crosswalk between two non-equivalent citation schemas}, author = {Diego Hernangómez}, year = 2022, journal = {The {cffr} package}, volume = {Vignettes}, doi = {10.21105/joss.03900}, url = {https://docs.ropensci.org/cffr/articles/bibtex-cff.html} } ``` ## BibTeX and R [BibTeX](https://en.wikipedia.org/wiki/BibTeX) is a widely used format for storing bibliographic references, originally designed in 1985 for document‑centric workflows. It represents citations as relatively flat records with loosely constrained fields, relying on convention rather than an explicit schema. The Citation File Format (CFF) [@druskat_citation_2021] provides a structured and extensible alternative for representing citation metadata, particularly for software and research outputs. It supports explicit typing, nested objects, and richer semantics, including contributor roles and identifiers. In modern research workflows, both formats frequently coexist. BibTeX remains dominant in academic publishing pipelines, while CFF is increasingly adopted by infrastructure platforms such as GitHub and Zenodo. The **cffr** package [@hernangomez2021] provides a deterministic mapping (crosswalk) between BibTeX and CFF, enabling practical interoperability. Due to fundamental differences in their data models, this mapping is not bijective and may involve structural transformations, heuristic parsing, and controlled information loss. ## Conceptual differences between BibTeX and CFF At a conceptual level, BibTeX and CFF embody different design philosophies. BibTeX represents bibliographic information as flat records with loosely defined fields, optimized for citation rendering in documents. Semantics are largely implicit and contextual, depending on entry types and bibliography styles. CFF, in contrast, defines a structured schema with explicit typing, nested objects, and support for richer metadata. It is designed not only for producing citations, but also for machine‑actionable interoperability across platforms. These conceptual differences explain why a direct one‑to‑one correspondence between BibTeX and CFF is not possible. Any crosswalk must therefore rely on explicit design decisions that balance fidelity, usability, and schema constraints. ## BibTeX Definitions @patashnik1988 provides the canonical description of the BibTeX format. Two key concepts are central: **entries** and **fields**. ### Entries {#entries} The original BibTeX specification defines fourteen canonical entry types, each corresponding to a class of cited work: 1. **\@article**: An article from a journal or magazine. 2. **\@book**: A book with an explicit publisher. 3. **\@booklet**: A printed and bound work without a named publisher. 4. **\@conference**: Equivalent to **\@inproceedings**, included for [Scribe](https://en.wikipedia.org/wiki/Scribe_(markup_language)) compatibility. 5. **\@inbook**: A part of a book, such as a chapter or page range. 6. **\@incollection**: A part of a book with its own title. 7. **\@inproceedings**: An article in conference proceedings. 8. **\@manual**: Technical documentation. 9. **\@mastersthesis**: A Master’s thesis. 10. **\@misc**: A fallback type when no other entry fits. 11. **\@phdthesis**: A PhD thesis. 12. **\@proceedings**: The proceedings of a conference. 13. **\@techreport**: A numbered technical report. 14. **\@unpublished**: A work not formally published. Other implementations, notably BibLaTeX [@biblatexpack], extend this set with additional entry types such as online resources, software, and datasets. In standard BibTeX, such entries must typically be represented as **\@misc**. In **R** [@R_2021], the base function `bibentry()` does not implement **\@conference**. Instead, **\@inproceedings** is used, as both share the same conceptual definition. ### Fields {#fields} BibTeX entries consist of fields whose relevance and requirement depend on the entry type. Some fields are mandatory, others optional, and some are ignored by standard bibliography styles but may still carry useful information. The following table summarizes the relationship between BibTeX **entries** and their **required or optional fields**, following @patashnik1988. Required fields are marked with **x**, and optional fields with **o**. ```{r} #| label: tbl-entry_fields1 #| echo: false #| tbl-cap: BibTeX entries #| tbl-subcap: #| - "BibTeX: required fields by entry" #| - "BibTeX: required fields by entry (cont.)" #| df_table <- table_master[table_master$table == "entry_fields", -1] nms <- c( "field", "\\@article", "\\@book", "\\@booklet", "\\@inbook", "\\@incollection", "\\@conference, \\@inproceedings", "\\@manual", "\\@mastersthesis, phdthesis", "\\@misc", "\\@proceedings", "\\@techreport", "\\@unpublished" ) df_table[is.na(df_table)] <- "" row.names(df_table) <- NULL t1 <- df_table[, c(1:7)] nm1 <- nms[1:7] knitr::kable( t1, col.names = nm1, row.names = NA, align = c("l", rep("c", 6)) ) t2 <- df_table[, c(1, 8:13)] nm2 <- nms[c(1, 8:13)] knitr::kable( t2, col.names = nm2, row.names = NA, align = c("l", rep("c", 6)) ) ``` Only a subset of fields is required for any given entry. Fields such as **title**, **author**, and **year** appear across most entry types, whereas others are optional or never mandatory. This strict coupling between entry types and fields is a defining feature of BibTeX and a key source of friction when interoperating with schema‑driven formats such as CFF. ## Citation File Format Citation File Format (CFF) consists of plain‑text, YAML‑based files that encode human‑ and machine‑readable citation metadata for software and datasets. Two keys play a central role in CFF reference modeling: - `preferred-citation`: Identifies a work that should be cited instead of the software or dataset itself. - `references`: Lists related creative works, analogous to references in a scholarly article. Both keys expect `definition-reference` objects, as defined in the [CFF schema guide](https://github.com/citation-file-format/citation-file-format/blob/main/schema-guide.md#preferred-citation). These objects support explicit typing, metadata nesting, and structured identifiers, in contrast to BibTeX’s flat field model. The following table summarizes the valid keys for CFF `definition-reference`: ```{r} #| label: tbl-refkeys #| echo: false #| message: false #| warning: false #| results: asis #| tbl-cap: "Valid keys in CFF `definition-reference` objects" library(cffr) # Fill with whites init <- paste0("`", cff_schema_definitions_refs(), "`") l <- c(init, rep("", 4)) refkeys <- matrix(l, ncol = 5, byrow = TRUE) knitr::kable( refkeys, row.names = NA ) ``` Conceptually, most of these keys correspond to BibTeX [**fields**](#fields), with one key playing a special role: `type`. In CFF, `type` explicitly identifies the kind of referenced work, making it conceptually analogous to a BibTeX [**entry**](#entries) type rather than a **field**[^1]. [^1]: See a complete list of possible values of CFF `type` in [Appendix B](#appendix_cff_type) ## Mapping strategy The mapping implemented in **cffr** follows a three‑step pipeline: 1. Parse BibTeX entries into an intermediate representation. 2. Apply deterministic mapping rules to transform entry types and fields. 3. Serialize the result into CFF (and vice versa). Whenever possible, mappings are deterministic. However, some fields require heuristic parsing, particularly when normalizing names, dates, or page ranges. As a result, semantic round‑trip reversibility is not guaranteed, even when the conversion process itself is reproducible. ```{r} #| label: cffbibread #| comment: '#>' string <- "@book{einstein1921, title = {Relativity: The Special and the General Theory}, author = {Einstein, A.}, year = 1920, publisher = {Henry Holt and Company}, address = {London, United Kingdom}, isbn = 9781587340925}" # To cff library(cffr) cff_format <- cff_read_bib_text(string) cff_format # To BibTeX with S3 method toBibtex(cff_format) ``` ## Mapping tables The following tables summarize the mapping rules implemented in **cffr**. They are intended as implementation documentation rather than as a normative specification. Mapping types include: - **direct**: preserved without modification. - **transform**: renamed or structurally reorganized. - **split**: one field mapped to multiple fields. - **heuristic**: requires parsing or inference. - **unsupported**: omitted due to lack of correspondence. - **enrichment**: exists only in CFF. These mappings are not bijective and may introduce normalization effects. ::: callout-note For clarity throughout this document, **bold** formatting (e.g., **\@book**, **edition**) is used for BibTeX entries and fields, whereas inline code formatting (e.g., `book`, `edition`) is used for CFF keys. ::: ::: {#tbl-mapentry} | BibTeX entry | CFF key `type` | Mapping type | Notes | |:-------------------|:-------------------|:-------------------|:-------------------| | **\@article** | `type: article` | direct | \- | | **\@book** | `type: book` | direct | \- | | **\@inbook** | `type: chapter` | transform | Normalized to CFF taxonomy | | **\@incollection** | `type: chapter` | transform | Same as **\@inbook** | | **\@manual** | `type: manual` | direct | \- | | **\@misc** | `type: generic` | heuristic | `generic` is used as a fallback when no more specific CFF type applies. | | **\@phdthesis** | `type: thesis` | transform | The specific thesis subtype (e.g., master’s vs PhD) may be lost when mapping from BibTeX to CFF | | **\@techreport** | `type: report` | transform | \- | | **other** | `type: generic` | heuristic | Default fallback | : BibTeX entries and CFF key `type` mapping ::: ::: {#tbl-mapcore} | BibTeX field | CFF key | Mapping type | Notes | |:-------------------|:-------------------|:-------------------|:--------------------| | **title** | `title` | direct | \- | | **author** | `authors` | transform | Parsed into structured name objects | | **editor** | `editors` | transform | Same parsing logic as authors | | **year** | `year` | direct | \- | | **month** | `month` | heuristic | Format normalization required | | **journal** | `journal` | direct | \- | | **booktitle** | `collection-title` | transform | Container title | | **volume** | `volume` | direct | \- | | **number** | `issue` | transform | Naming normalization | | **pages** | `start`/`end` | split | Parsed from page range | | **doi** | `doi` | direct | \- | | **url** | `url` | direct | \- | : Core fields ::: ::: {#tbl-mapstruct} | BibTeX field | CFF key | Mapping type | Notes | |:-------------------|:-------------------|:-------------------|:--------------------| | **publisher** | `publisher.name` | transform | Converted to structured object | | **address** | `publisher.location` | transform | Combined with `publisher` | | **institution** | `institution` | direct | Used mainly in reports/thesis | | **school** | `institution` | transform | Thesis normalization | : Structural fields ::: ::: {#tbl-mapparsed} | BibTeX field | CFF key | Mapping type | Notes | |:-------------|:--------------|:-------------|:--------------------------------------| | **pages** | `start`/`end` | heuristic | Requires parsing, may fail | | **author** | `authors` | heuristic | Name splitting is not always reliable | | **month** | `month` | heuristic | Multiple formats possible | | **note** | `notes` | heuristic | Free text | : Parsed fields ::: ::: {#tbl-mapnobib} | CFF key | BibTeX equivalent | Mapping type | Notes | |:---------------------|:------------------|:-------------|:---------------------| | `preferred-citation` | \- | enrichment | CFF-specific concept | | `references` | \- | enrichment | Not representable | | `type: software` | \- | unsupported | No BibTeX equivalent | | `identifiers` | partial | transform | Only DOI/URL mapped | : CFF keys with no BibTeX equivalence ::: ## Proposed Crosswalk The **cffr** package provides utilities for mapping BibTeX entries (via `bibentry()`) to CFF and back. This section describes how the crosswalk is implemented, partially informed by @Haines_Ruby_CFF_Library_2021[^2]. [^2]: Note that this software performs only the mapping from CFF to BibTeX, however **cffr** can perform the mapping in both directions. The crosswalk is primarily defined for BibTeX to CFF conversion, reverse mapping is a best-effort approximation and may be lossy. After presenting the general mapping between BibTeX **entries** and **fields** and CFF keys, the next section introduces [**Entry Models**](#entrymodels) that refine these rules for specific BibTeX entry types. The mapping is structured according to the transformation semantics defined below. ## Transformation Semantics The crosswalk applies different transformation strategies: - **Direct mapping**: one-to-one correspondence between a BibTeX field and a CFF key. - **Structural transformation**: a single field is expanded into multiple keys, such as **pages** to `start` and `end`. - **Heuristic mapping**: equivalence is inferred based on entry type or context. - **Lossy mapping**: information is discarded due to lack of an equivalent representation. Unless stated otherwise, mappings are lossy in at least one direction. All mappings and examples in the following sections are instances of these transformation classes. ### Entry/Key `type` Crosswalk For mapping general BibTeX entries to CFF key `type`, the following equivalence is proposed: ```{r} #| label: tbl-entry_bib2cff #| echo: false #| results: asis #| tbl-cap: "Entry/Type crosswalk: From BibTeX to CFF" df_table <- table_master[table_master$table == "entry_bib2cff", c(2:4)] df_table[is.na(df_table)] <- "" # fix links df_table$f3 <- gsub("link_to_entry_models", "#entrymodels", df_table$f3) row.names(df_table) <- NULL knitr::kable( df_table, col.names = c("BibTeX Entry", "CFF key: `type`", "Notes"), row.names = NA ) ``` The previous mapping has the following specifications: - **\@book**, **\@inbook**, and **\@incollection** are closely related in BibTeX[^3]. While **\@inbook** and **\@incollection** both reference parts of a **\@book**, the former is used for citing sections, chapters, pages, or other specific parts, whereas the latter is used for citing parts with a specific title. Since CFF allows keys `type: book` and `collection-type: book`, we may utilize a combination of these fields to tag each entry type in CFF accordingly. - **\@mastersthesis** and **\@phdthesis** are tagged using a combination of `type: thesis` and `thesis-type` keys. [^3]: Note that BibLaTeX [@biblatexpack] handles **\@inbook** differently, see [Appendix A](#appendix_inbook). Additionally, considering that CFF allows for a wide range of values[^4] for the key `type`, the following mapping is applied from CFF to BibTeX: [^4]: See [Appendix B](#appendix_cff_type) for all possible values. Information extracted from @druskat2019. The reverse mapping prioritizes BibTeX compatibility and normalization over exact round-trip preservation. ```{r} #| label: tbl-entry_cff2bib #| echo: false #| results: asis #| tbl-cap: "Entry/Key `type` crosswalk: From CFF to BibTeX" df_table <- table_master[table_master$table == "entry_cff2bib", c(2:4)] df_table[is.na(df_table)] <- "" # fix links df_table$f3 <- gsub("link_to_entry_models", "#entrymodels", df_table$f3) row.names(df_table) <- NULL knitr::kable( df_table, col.names = c("CFF key `type`", "BibTeX Entry", "Notes") ) ``` ### Fields/Key Crosswalk There is a significant similarity between the definitions and names of certain BibTeX fields and CFF keys. While the equivalence is straightforward in some cases, there are instances where certain keys need to be processed depending on the **entry** type. ```{r} #| label: tbl-fields_bib2cff #| echo: false #| results: asis #| tbl-cap: "BibTeX - CFF key crosswalk" df_table <- table_master[table_master$table == "fields_bib2cff", c(2:4)] df_table[is.na(df_table)] <- "" # fix links df_table$f3 <- gsub("link_to_entry_models", "#entrymodels", df_table$f3) row.names(df_table) <- NULL knitr::kable( df_table, col.names = c("BibTeX **Field**", "CFF key", "Notes") ) ``` We provide more detail on some of the mappings presented in the table above: - Some fields are not mapped because there is no clear equivalence with CFF keys (such as **annote**, **crossref**, and **key**). Regarding the **type** field, the CFF key `type` corresponds to the identifier of the work (similar to an entry in BibTeX), therefore, BibTeX **type** won't be mapped. These fields are always optional in BibTeX. - For the **address** field, its intended use in BibTeX varies depending on the entry type (e.g., for **\@inproceedings**, it denotes the **address** of the **conference**, while for **\@mastersthesis/\@phdthesis**, it is the **address** of the **school**, etc.). Mapping between BibTeX and CFF becomes more complex when institutional metadata is involved. This results in varying final mappings in CFF. When mapping from CFF to BibTeX, we propose to follow the same entry-based logic, using the key `location` as a fallback value when mapping to **address**. - In relation with this complexity mentioned above, **institution, organization** and **school** are mapped to `institution`. - **series** is mapped to `collection-title` only on those entries that do not require **booktitle**. In practice, this means that `collection-title` corresponds to **booktitle** (for **\@incollection** and **\@inproceedings**), and in the other cases it corresponds to **series**. As a consequence, **series** information is lost for **\@incollection** and **\@inproceedings**, but in those cases it is an optional field. - When mapping from CFF to BibTeX, we propose to use `date-published` as a fallback for extracting **month** and **year** fields. - When **pages** is provided as a range separated by `--`, i.e. **pages = {3--5}**, it is coerced as `start: 3`, `end: 5` in CFF. #### BibLaTeX Additionally, there are other CFF keys that correspond to BibLaTeX fields. We propose to include these fields in the crosswalk[^5], even though they are not part of the core BibTeX fields definition. [^5]: See @biblatexcheatsheet for a preview of the accepted BibLaTeX fields. ```{r} #| label: tbl-fields_biblatex2cff #| echo: false #| results: asis #| tbl-cap: "BibLaTeX - CFF Field/Key crosswalk" df_table <- table_master[table_master$table == "fields_biblatex2cff", c(2:3)] df_table[is.na(df_table)] <- "" # fix links df_table$f2 <- gsub("link_to_entry_models", "#entrymodels", df_table$f2) row.names(df_table) <- NULL knitr::kable( df_table, col.names = c("**BibLaTeX Field**", "CFF key") ) ``` ## Limitations The proposed crosswalk is subject to several structural limitations arising from differences between BibTeX and CFF schemas: - **Lossy mappings**: Some BibTeX fields (e.g., **crossref**, **annote**, **key**) have no equivalent in CFF and are omitted during conversion. - **Ambiguous semantics**: Certain fields (e.g., **address**, **series**) have context-dependent meanings in BibTeX, requiring heuristic mapping to CFF keys such as `institution`, `location`, or `collection-title`. - **Type collapsing**: Multiple BibTeX entry types (e.g., **\@misc**, **\@incollection**) are mapped to a single CFF type (`generic`), which acts as a fallback when no more specific type is available. - **Structural transformations**: Some fields require transformation rather than direct mapping (e.g., **pages** → `start`/`end`), altering the representation of the original data. - **Non-reversibility**: Due to the above factors, round-trip conversion (BibTeX → CFF → BibTeX) is not guaranteed to preserve the original structure or semantics. These limitations reflect fundamental differences between the two formats rather than implementation-specific constraints. ## Design decisions The mapping implemented in **cffr** is guided by the following principles: - Prefer deterministic mappings whenever possible - Use heuristics only when unavoidable - Normalize metadata to align with CFF conventions - Prioritize interoperability over exact round‑trip fidelity ## Entry Models {#entrymodels} This section documents entry‑specific mapping behavior, expanding the general crosswalk into concrete and testable models. Examples are adapted from the [xampl.bib](https://tug.org/texmf-docs/bibtex/xampl.bib) distributed with BibTeX [@patashnik]. ### \@article {#article} The crosswalk of **\@article** does not require any special treatment. ```{r} #| label: tbl-model_article #| echo: false #| results: asis df_table <- table_master[table_master$table == "model_article", c(2:4)] df_table[is.na(df_table)] <- "" # fix links df_table$f3 <- gsub("link_to_entry_models", "#entrymodels", df_table$f3) df_table$f3 <- gsub("link_to_article", "#article", df_table$f3) df_table$f3 <- gsub("link_to_booklet", "#booklet", df_table$f3) df_table$f3 <- gsub("link_to_book", "#book-inbook", df_table$f3) row.names(df_table) <- NULL knitr::kable( df_table, col.names = c("BibTeX", "CFF", "Notes"), caption = "**\\@article** Model" ) ``` **Round-trip** ```{r} bib <- "@article{article-full, title = {The Gnats and Gnus Document Preparation System}, author = {Leslie A. Aamport}, year = 1986, month = jul, journal = {{G-Animal's} Journal}, volume = 41, number = 7, pages = {73+}, note = {This is a full ARTICLE entry}}" cff_read_bib_text(bib) toBibtex(cff_read_bib_text(bib)) ``` ### \@book / \@inbook {#book-inbook} In terms of the fields required in BibTeX, the primary difference between **\@book** and **\@inbook** is that **\@inbook** requires a **chapter** or **page** field, while **\@book** does not even allow these fields as optional. Therefore, we propose that an **\@inbook** entry in CFF be treated as a **\@book** with the following supplementary fields: 1. `section`: To denote the specific **chapter** within the book. 2. `start`/`end`: To indicate the range of **pages** covered by the section. Additionally, note that in CFF, the **series** field corresponds to `collection-title`, and the **address** field represents the `publisher`'s `address`. Finally, the key `collection-type` is populated with `book-series`. ```{r} #| label: tbl-model_book #| echo: false #| results: asis df_table <- table_master[table_master$table == "model_book", c(2:4)] df_table[is.na(df_table)] <- "" # fix links df_table$f3 <- gsub("link_to_entry_models", "#entrymodels", df_table$f3) df_table$f3 <- gsub("link_to_article", "#article", df_table$f3) df_table$f3 <- gsub("link_to_booklet", "#booklet", df_table$f3) df_table$f3 <- gsub("link_to_book", "#book-inbook", df_table$f3) row.names(df_table) <- NULL knitr::kable( df_table, col.names = c("BibTeX", "CFF", "Notes"), caption = "**\\@book / \\@inbook** Model" ) ``` There are notable differences in how BibTeX and **BibLaTeX** handle the **\@inbook** entry (further discussed in the [Appendix A](#appendix_inbook)). We propose to treat a BibLaTeX **\@inbook** as a BibTeX **\@incollection.** **Round-trip: \@book** ```{r} bib <- "@book{book-full, title = {Seminumerical Algorithms}, author = {Donald E. Knuth}, year = 1981, month = 10, publisher = {Addison-Wesley}, address = {Reading, Massachusetts}, series = {The Art of Computer Programming}, volume = 2, note = {This is a full BOOK entry}, edition = {Second} }" cff_read_bib_text(bib) toBibtex(cff_read_bib_text(bib)) ``` **Round-trip: \@inbook** ```{r} bib <- "@inbook{inbook-full, title = {Fundamental Algorithms}, author = {Donald E. Knuth}, year = 1973, month = 10, publisher = {Addison-Wesley}, address = {Reading, Massachusetts}, series = {The Art of Computer Programming}, volume = 1, pages = {10--119}, note = {This is a full INBOOK entry}, edition = {Second}, type = {Section}, chapter = {1.2} }" cff_read_bib_text(bib) toBibtex(cff_read_bib_text(bib)) ``` ### \@booklet {#booklet} In **\@booklet** **address** is mapped to `location`. ```{r} #| label: tbl-model_booklet #| echo: false #| results: asis df_table <- table_master[table_master$table == "model_booklet", c(2:4)] df_table[is.na(df_table)] <- "" # fix links df_table$f3 <- gsub("link_to_entry_models", "#entrymodels", df_table$f3) df_table$f3 <- gsub("link_to_article", "#article", df_table$f3) df_table$f3 <- gsub("link_to_booklet", "#booklet", df_table$f3) df_table$f3 <- gsub("link_to_book", "#book-inbook", df_table$f3) row.names(df_table) <- NULL knitr::kable( df_table, col.names = c("BibTeX", "CFF", "Notes"), caption = "**\\@booklet** Model" ) ``` **Round-trip** ```{r } bib <- "@booklet{booklet-full, title = {The Programming of Computer Art}, author = {Jill C. Knvth}, date = {1988-03-14}, month = feb, address = {Stanford, California}, note = {This is a full BOOKLET entry}, howpublished = {Vernier Art Center} }" cff_read_bib_text(bib) toBibtex(cff_read_bib_text(bib)) ``` ### \@conference / \@inproceedings {#conf_inproc} Note that in this case, **organization** is mapped to `institution`. Additionally, **series** is ignored because there is no clear mapping in CFF for this field. ```{r} #| label: tbl-model_inproceedings #| echo: false #| results: asis df_table <- table_master[table_master$table == "model_inproceedings", c(2:4)] df_table[is.na(df_table)] <- "" # fix links df_table$f3 <- gsub("link_to_entry_models", "#entrymodels", df_table$f3) df_table$f3 <- gsub("link_to_article", "#article", df_table$f3) df_table$f3 <- gsub("link_to_booklet", "#booklet", df_table$f3) df_table$f3 <- gsub("link_to_book", "#book-inbook", df_table$f3) row.names(df_table) <- NULL knitr::kable( df_table, col.names = c("BibTeX", "CFF", "Notes"), caption = "**\\@conference / \\@inproceedings** Model" ) ``` **Round-trip** ```{r} bib <- "@inproceedings{inproceedings-full, title = {On Notions of Information Transfer in {VLSI} Circuits}, author = {Alfred V. Oaho and Jeffrey D. Ullman and Mihalis Yannakakis}, year = 1983, month = mar, booktitle = {Proc. Fifteenth Annual ACM Symposium on the Theory of Computing}, publisher = {Academic Press}, address = {Boston}, series = {All ACM Conferences}, number = 17, pages = {133--139}, editor = {Wizard V. Oz and Mihalis Yannakakis}, organization = {The OX Association for Computing Machinery} }" cff_read_bib_text(bib) toBibtex(cff_read_bib_text(bib)) ``` ### \@incollection {#incol} As **booktitle** is a required field, we propose to map that field to `collection-title` and the `type` to `generic`. Therefore, an **\@incollection** is a `type: generic` in CFF with a `collection-title` key. The `generic` type is used as a fallback when no semantically equivalent CFF type exists. Additionally, **series** and **type** are ignored because there is no clear mapping in CFF for this field. ```{r} #| label: tbl-model_incollection #| echo: false #| results: asis df_table <- table_master[table_master$table == "model_incollection", c(2:4)] df_table[is.na(df_table)] <- "" # fix links df_table$f3 <- gsub("link_to_entry_models", "#entrymodels", df_table$f3) df_table$f3 <- gsub("link_to_article", "#article", df_table$f3) df_table$f3 <- gsub("link_to_booklet", "#booklet", df_table$f3) df_table$f3 <- gsub("link_to_book", "#book-inbook", df_table$f3) row.names(df_table) <- NULL knitr::kable( df_table, col.names = c("BibTeX", "CFF", "Notes"), caption = "**\\@incollection** Model" ) ``` **Round-trip** ```{r} bib <- "@incollection{incollection-full, title = {Semigroups of Recurrences}, author = {Daniel D. Lincoll}, year = 1977, month = sep, booktitle = {High Speed Computer and Algorithm Organization}, publisher = {Academic Press}, address = {New York}, series = {Fast Computers}, number = 23, pages = {179--183}, note = {This is a full INCOLLECTION entry}, editor = {David J. Lipcoll and D. H. Lawrie and A. H. Sameh}, chapter = 3, type = {Part}, edition = {Third} }" cff_read_bib_text(bib) toBibtex(cff_read_bib_text(bib)) ``` ### \@manual As in the case of [**\@conference** / **\@inproceedings**](#conf_inproc), **organization** is mapped to `institution`. ```{r} #| label: tbl-model_manual #| echo: false #| results: asis df_table <- table_master[table_master$table == "model_manual", c(2:4)] df_table[is.na(df_table)] <- "" # fix links df_table$f3 <- gsub("link_to_entry_models", "#entrymodels", df_table$f3) df_table$f3 <- gsub("link_to_article", "#article", df_table$f3) df_table$f3 <- gsub("link_to_booklet", "#booklet", df_table$f3) df_table$f3 <- gsub("link_to_book", "#book-inbook", df_table$f3) row.names(df_table) <- NULL knitr::kable( df_table, col.names = c("BibTeX", "CFF", "Notes"), caption = "**\\@manual** Model" ) ``` **Round-trip** Note that **month** cannot be coerced to a single integer in the range `1--12` as required in CFF, so it is ignored to avoid validation errors. ```{r} bib <- "@manual{manual-full, title = {The Definitive Computer Manual}, author = {Larry Manmaker}, year = 1986, month = {apr-may}, address = {Silicon Valley}, note = {This is a full MANUAL entry}, organization = {Chips-R-Us}, edition = {Silver} }" cff_read_bib_text(bib) toBibtex(cff_read_bib_text(bib)) ``` ### \@mastersthesis / \@phdthesis In terms of fields required by BibTeX, it is identical for both **\@mastersthesis** and **\@phdthesis.** We propose here to identify each type of thesis using the key `thesis-type`. If `thesis-type` contains a [regex pattern](https://regex101.com/r/mBWfbs/1) `(?i)(phd)`, it is recognized as **\@phdthesis**. Additionally, **school** is mapped to `institution`. ```{r} #| label: tbl-model_thesis #| echo: false #| results: asis df_table <- table_master[table_master$table == "model_thesis", c(2:4)] df_table[is.na(df_table)] <- "" # fix links df_table$f3 <- gsub("link_to_entry_models", "#entrymodels", df_table$f3) df_table$f3 <- gsub("link_to_article", "#article", df_table$f3) df_table$f3 <- gsub("link_to_booklet", "#booklet", df_table$f3) df_table$f3 <- gsub("link_to_book", "#book-inbook", df_table$f3) row.names(df_table) <- NULL knitr::kable( df_table, col.names = c("BibTeX", "CFF", "Notes"), caption = "**\\@mastersthesis / \\@phdthesis** Model" ) ``` **Round-trip: \@mastersthesis** ```{r} bib <- "@mastersthesis{mastersthesis-full, title = {Mastering Thesis Writing}, author = {Edouard Masterly}, year = 1988, month = jun, address = {English Department}, note = {This is a full MASTERSTHESIS entry}, school = {Stanford University}, type = {Master's project} }" cff_read_bib_text(bib) toBibtex(cff_read_bib_text(bib)) ``` **Round-trip: \@phdthesis** ```{r} bib <- "@phdthesis{phdthesis-full, title = {Fighting Fire with Fire: Festooning {F}rench Phrases}, author = {F. Phidias Phony-Baloney}, year = 1988, month = jun, address = {Department of French}, note = {This is a full PHDTHESIS entry}, school = {Fanstord University}, type = {{PhD} Dissertation} }" cff_read_bib_text(bib) toBibtex(cff_read_bib_text(bib)) ``` ### \@misc The crosswalk of **\@misc** does not require any special treatment. This **entry** does not require any **field**. Note also that it is mapped to `type: generic` as [**\@incollection**](#incol), but in this case **booktitle** is not even an option, so the proposed definition should cover both **\@misc** and **\@incollection** without problems. ```{r} #| label: tbl-model_misc #| echo: false #| results: asis df_table <- table_master[table_master$table == "model_misc", c(2:4)] df_table[is.na(df_table)] <- "" # fix links df_table$f3 <- gsub("link_to_entry_models", "#entrymodels", df_table$f3) df_table$f3 <- gsub("link_to_article", "#article", df_table$f3) df_table$f3 <- gsub("link_to_booklet", "#booklet", df_table$f3) df_table$f3 <- gsub("link_to_book", "#book-inbook", df_table$f3) row.names(df_table) <- NULL knitr::kable( df_table, col.names = c("BibTeX", "CFF", "Notes"), caption = "**\\@misc** Model" ) ``` **Round-trip** ```{r} bib <- "@misc{misc-full, title = {Handing out random pamphlets in airports}, author = {Joe-Bob Missilany}, year = 1984, month = oct, note = {This is a full MISC entry}, howpublished = {Handed out at O'Hare} }" cff_read_bib_text(bib) toBibtex(cff_read_bib_text(bib)) ``` ### \@proceedings The proposed model is consistent with [**\@conference** / **\@inproceedings**](#conf_inproc). Note that **\@proceedings** does not prescribe an **author** field. In these cases, as `authors` is required in CFF, we use *anonymous*[^6] when mapping to CFF and omit it when mapping from CFF to BibTeX. [^6]: As proposed on [*How to deal with unknown individual authors?*](https://github.com/citation-file-format/citation-file-format/blob/main/schema-guide.md#how-to-deal-with-unknown-individual-authors), **(Guide to Citation File Format schema version 1.2.0)** ```{r} #| label: tbl-model_proceedings #| echo: false #| results: asis df_table <- table_master[table_master$table == "model_proceedings", c(2:4)] df_table[is.na(df_table)] <- "" # fix links df_table$f3 <- gsub("link_to_entry_models", "#entrymodels", df_table$f3) df_table$f3 <- gsub("link_to_article", "#article", df_table$f3) df_table$f3 <- gsub("link_to_booklet", "#booklet", df_table$f3) df_table$f3 <- gsub("link_to_book", "#book-inbook", df_table$f3) row.names(df_table) <- NULL knitr::kable( df_table, col.names = c("BibTeX", "CFF", "Notes"), caption = "**\\@proceedings** Model" ) ``` **Round-trip** ```{r} bib <- "@proceedings{proceedings-full, title = {Proc. Fifteenth Annual ACM Symposium on the Theory of Computing}, year = 1983, month = mar, publisher = {Academic Press}, address = {Boston}, series = {All ACM Conferences}, number = 17, note = {This is a full PROCEEDINGS entry}, editor = {Wizard V. Oz and Mihalis Yannakakis}, organization = {The OX Association for Computing Machinery} }" cff_read_bib_text(bib) toBibtex(cff_read_bib_text(bib)) ``` ### \@techreport The crosswalk of **\@techreport** does not require any special treatment. ```{r} #| label: tbl-model_techreport #| echo: false #| results: asis df_table <- table_master[table_master$table == "model_techreport", c(2:4)] df_table[is.na(df_table)] <- "" # fix links df_table$f3 <- gsub("link_to_entry_models", "#entrymodels", df_table$f3) df_table$f3 <- gsub("link_to_article", "#article", df_table$f3) df_table$f3 <- gsub("link_to_booklet", "#booklet", df_table$f3) df_table$f3 <- gsub("link_to_book", "#book-inbook", df_table$f3) row.names(df_table) <- NULL knitr::kable( df_table, col.names = c("BibTeX", "CFF", "Notes"), caption = "**\\@techreport** Model" ) ``` **Round-trip** ```{r} bib <- "@techreport{techreport-full, title = {A Sorting Algorithm}, author = {Tom Terrific}, year = 1988, month = oct, address = {Computer Science Department, Fanstord, California}, number = 7, note = {This is a full TECHREPORT entry}, institution = {Fanstord University}, type = {Wishful Research Result} }" cff_read_bib_text(bib) toBibtex(cff_read_bib_text(bib)) ``` ### \@unpublished The crosswalk of **\@unpublished** does not require any special treatment. ```{r} #| label: tbl-model_unpublished #| echo: false #| results: asis df_table <- table_master[table_master$table == "model_unpublished", c(2:4)] df_table[is.na(df_table)] <- "" # fix links df_table$f3 <- gsub("link_to_entry_models", "#entrymodels", df_table$f3) df_table$f3 <- gsub("link_to_article", "#article", df_table$f3) df_table$f3 <- gsub("link_to_booklet", "#booklet", df_table$f3) df_table$f3 <- gsub("link_to_book", "#book-inbook", df_table$f3) row.names(df_table) <- NULL knitr::kable( df_table, col.names = c("BibTeX", "CFF", "Notes"), caption = "**\\@unpublished** Model" ) ``` **Round-trip** ```{r} bib <- "@unpublished{unpublished-minimal, title = {Lower Bounds for Wishful Research Results}, author = {Ulrich Underwood and Ned Net and Paul Pot}, note = {Talk at Fanstord University (this is a minimal UNPUBLISHED entry)} }" cff_read_bib_text(bib) toBibtex(cff_read_bib_text(bib)) ``` ## Conclusion This article presents a practical and reproducible crosswalk between BibTeX and CFF. Although the formats are not fully equivalent, the deterministic mapping implemented in **cffr** enables consistent transformations across heterogeneous citation ecosystems. By making design decisions explicit and documenting limitations, this work supports interoperable citation workflows bridging legacy bibliographic systems and modern software‑centric practices. ## Appendix A: **\@inbook** in BibTeX and BibLaTeX {#appendix_inbook .appendix} The definition of **\@inbook** and **\@incollection** in BibTeX [@patashnik1988] is as follows: > - **\@inbook**: A part of a book, which may be a chapter (or section) and/or a > range of pages. Required fields: author or editor, title, chapter and/or > pages, publisher, year (...) > > - **\@incollection**: A part of a book having its own title. Required fields: > author, title, booktitle, publisher, year (...) Whereas BibLaTeX [@biblatexpack] specifies: > - **\@inbook:** A part of a book which forms a self-contained unit with its > own title. Note that the [profile]{.underline} of this entry type is > [different from standard BibTeX]{.underline}, see § 2.3.1. Required fields: > author, title, booktitle, year/date (...). When considering required fields, an important difference is the **booktitle** requirement in BibLaTeX. Notably, BibTeX **\@incollection** requires also this field. Moreover, both BibTeX **\@incollection** and BibLaTeX **\@inbook** emphasize its reference to *"a part of a book (...) with its own title"*. In this document, the proposed crosswalk ensures full compatibility with BibTeX. Hence, we propose to consider a BibLaTeX **\@inbook** entry as equivalent to a BibTeX **\@incollection**, given the congruence in their definitions and field requirements. **Round-trip** ```{r} bib <- "@inbook{inbook-biblatex, author = {Yihui Xie and Christophe Dervieux and Emily Riederer}, title = {Bibliographies and citations}, booktitle = {{R} Markdown Cookbook}, date = {2023-12-30}, publisher = {Chapman and Hall/CRC}, address = {Boca Raton, Florida}, series = {The {R} Series}, isbn = 9780367563837, url = {https://yihui.org/rmarkdown-cookbook/}, chapter = {4.5} }" cff_read_bib_text(bib) toBibtex(cff_read_bib_text(bib)) ``` ## Appendix B: CFF key: `type` values {#appendix_cff_type .appendix} From @druskat2019, Table 4: Complete list of CFF reference types for key `type`. Only a subset of these types is actively used in the proposed crosswalk. ```{r} #| label: tbl-cff_types #| echo: false #| results: asis #| tbl-cap: "Complete list of CFF reference types" df_table <- table_master[table_master$table == "cff_types", c(2:3)] df_table[is.na(df_table)] <- "" row.names(df_table) <- NULL knitr::kable( df_table, col.names = c("Reference type string", "Description"), row.names = NA ) ```