The CharacterChunker class is designed to break down large text documents into smaller, more manageable chunks of text. This process is based on the number of characters, which can be defined by the user.


Required properties:

  • None

Optional properties:

  • chunk_size: The number of characters each chunk should contain.
  • chunk_overlap: The number of characters that can overlap between consecutive chunks.
  • batch_size: The number of chunks to process in one batch.
  • separator: The character used to separate chunks.
from neumai.Chunkers import CharacterChunker

character_chunker =  CharacterChunker(
    chunk_size = 500,
    chunk_overlap = 50,
    batch_size = 1000,
    separator = "\n"